Re: Message in Error log and AR Server timeout

2012-03-05 Thread strauss
I just finished applying the javascript to my three mid-tiers, and yes, it 
meant that my app admin had to rename about a dozen support staff who had mixed 
case login names and inform them of the change.  I did this primarily because 
of the problem we have seen with mobile devices that _attempt_ to use the mid 
tier capitalizing the first letter - and our LDAP will authenticate them if it 
is otherwise correct, at which point the mid-tier caches a bad login name.  
Never mind that the mid-tier is useless on a mobile device; the damage to the 
mid-tier cache has been done.  We might have considered retaining them in mixed 
case if we were working against Active Directory (which is mixed case) but our 
primary authentication system is Identity Manager which contains ~300,000 lower 
case login names.

This change (forcing everything lower case) will not prevent another situation, 
where a support staff member departs and our practice is to remove them from 
support status which removes their User record and login name.  Then the AR 
server gets thread deaths when the mid-tier tries to prefetch for the 
previously valid login name that no longer exists.  I got eight server 
terminations (below) tonight from one Sp1 mid-tier without doing anything - I 
didn't restart it - it was probably an interval-based pre-fetch check.

Sun Mar 04 18:29:09 2012: AR System server terminated when a signal/exception 
was received by the server (ARNOTE  20)

   Thread Id: 3848
   Version: 7.6.04 SP1 HotFix 01 201107051610 Jul  5 2011 17:17:48
   ServerName: itsm76 
   Database: SQL -- SQL Server
   Hardware: x86_64
   OS: Windows Server 2003
   RPC Id: 235871
   RPC Call: 11 (GLS)
   RPC Queue: 390620
   Client: User removed username from Mid-tier (protocol 18) at IP address 
xxx.xxx.xxx.xxx
   Logging On: User
   Code: c005
   Operation: read
   Access Addr: 
   Stack Begin: 
   Stack End 
Sun Mar 04 18:29:08 2012  390620 : AR System server terminated when a 
signal/exception was received by the server (ARNOTE 20)
Sun Mar 04 18:29:08 2012 0xc005
Sun Mar 04 18:29:08 2012  390620 : AR System server terminated — fatal error 
occurred in ARSERVER (ARNOTE 21)

I got 8 more server thread terminations on a different account tonight while 
upgrading one of the mid-tiers to SP3 for a live test by some of my production 
users.  These were for one of the users my app admin had already changed from 
mixed case to all lower case - it tried to prefetch for the cached, mixed-case 
version and no longer could find it, and exploded.

I did learn the most effective trick for eliminating a poisoned cache - where 
the mid-tier remembers a login name in the wrong case or that no longer exists:
 
1. You add another SERVER NAME in AR Server Settings, then change all four of 
the General Settings servers to that new one, go back and Delete the original 
server from AR Server Settings, and Flush the Cache.
2. Repeat the exact same steps to restore the mid-tier to the original 
(production) AR Server name, flushing the cache again.
3. Set the server to PRE-LOAD = Yes and restart tomcat; after the preload, set 
it back OFF

The mid-tier is now cleaned of improper login names that will kill threads on 
the AR Server - for now.  I have done that successfully on two of my mid-tiers 
so far.  It must be done during a scheduled maintenance window because it 
definitely knocks it out of action for at least a half hour.

I will be watching the system for any new thread terminations from the SP3 
mid-tier to see if that SP has fixed the problem with prefetching.  I suspect 
that it can still generate errors, especially since the AR Server is still SP1, 
but maybe we will get lucky.

Christopher Strauss, Ph.D.
Call Tracking Administration Manager
University of North Texas Computing  IT Center
http://itsm.unt.edu/

-Original Message-
From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of h@rry
Sent: Monday, March 05, 2012 1:53 AM
To: arslist@ARSLIST.ORG
Subject: Re: Message in Error log and AR Server timeout

Thanks everyone for your inputs..

@strauss, even we are working with support team on the plan to upgrade to ARS 
SP3, this might however take time as we don't want to jump to that straight 
away. Do share your findings on this once you test.

@LJ, our authentication chaining mode is set to AREA - ARS, we even tried 
changing this to 'Off', but the behavior turned weird with even some of the 
valid users (having entries in both Remedy and LDAP) getting authentication 
failed message. Other related settings which we have made in EA tab are  - 
Unchecked both 'Authenticate Unregistered users' and 'Cross reference blank 
password' and also unchecked 'Allow Guest Users' in Configuration tab.

@Bill, even we have some user id's in mixed case. As per my understanding, the 
javascript you mentioned converts the user id's which are in mixed case to 
lower case irrespective of the case

Re: Message in Error log and AR Server timeout

2012-03-05 Thread strauss
I stand corrected.  The Migrator login timeout problem is STILL not fixed in 
7.6.04 SP3, in spite of the fact that they got me a hotfix in November 2011.  
Installing SP3 over a system with the hotfix in place reintroduces the error.  
Considering that I first reported the problem in mid-February 2011, and they 
had a YEAR to figure it out and get the fix into SP3, their track record on 
fixing broken stuff is just plain pitiful.  I checked the SW bug, and it is in 
fact still open, Tentatively Targeted for a Future Patch Release.  I 
suspect that hell will freeze over first.

Christopher Strauss, Ph.D.
Call Tracking Administration Manager
University of North Texas Computing  IT Center
http://itsm.unt.edu/

-Original Message-
From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of strauss
Sent: Thursday, March 01, 2012 11:19 PM
To: arslist@ARSLIST.ORG
Subject: Re: Message in Error log and AR Server timeout

We reported this problem with 7.6.04.01 last summer and have been living with 
it ever since.  The mid-tier caches incorrect login names or incorrect case 
login names that are passed to AREA LDAP, especially when people try to use the 
mid-tier with a mobile device that capitalizes their login, and then kills 
threads on the AR Server when AREA allows them in and they do not exist in the 
User table in that form (Jsmith versus jsmith).  These incorrect or invalid 
account names are cached by the mid-tier, especially the ones where 
non-case-sensitive LDAP will authenticate them whereas case-sensitive ARS would 
not have.  On the end of the next prefetch, the mid-tier throws them at the AR 
Server without passwords and they are in turn thrown at AREA and you get three 
or four thread crashes per account.  I have seen the production mid-tier 
unusable for 30 minutes after a restart (a different mid-tier already connected 
is
unaffected) as thread after thread crashed on the AR Server - exactly the same 
problem you experienced.  Real high quality code, here; I don't think BMC ever 
tested the ehcache prefetching on systems with AREA authentication enabled.

We also saw the same thread deaths when someone used an ipad to hit our Kinetic 
portal - invariably a capitalized version of their login name - so AREA would 
let them in, then Kinetic would halt them when it could not match their 
credentials (Jsmith != jsmith) and could not pull their customer data.  We 
finally killed that off with javascript in the Kinetic login page that forced 
the login name to lower case; we may have to do exactly the same thing on the 
mid-tier login.jsp pages (anyone have a working snippet of code for that??).

I am testing now to see if this is fixed in SP3; we identified so many fatal 
installer problems in SP2 (ARS, mid-tier, Atrium, ITSM - against ITSM Suite 
servers upgraded from 7.1/7.0.02) last November that I have refused to use it 
for ANYTHING ever since.  This defect is definitely NOT fixed if you only 
update the mid-tier to SP3 and the AR Server is still on SP1; I tried that 
first.  I only just today got a full ARS SP3 over SP1 install completed on a 
clone of production, with a local and remote SP3 mid-tier, so I should be able 
to tell if they really fixed it tomorrow.
If not, then I'll give up and start forcing the login to lower case on each 
mid-tier.  That will keep Demo and similar mixed-case system
accounts out too, but that's what the User Tool is still essential for.

SP3 does fix the date-range crystal report problem we reported last summer 
(must be SP3 on BOTH the AR Server and mid-tier), as well as the Firefox-Work 
Info-text box defect we reported at the same time, so _some_ issues have been 
fixed that have caused us problems on production for the last 7 months.  I 
think they fixed the Migrator login timeout problem introduced in 7.6.04.00 and 
retained in .01 and .02 (they finally gave me a post-SP2 hotfix that worked) 
but I have to test that as well.

I'll post an update tomorrow.

Christopher Strauss, Ph.D.
Call Tracking Administration Manager
University of North Texas Computing  IT Center

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug12 www.wwrug12.com ARSList: Where the Answers Are


Re: Message in Error log and AR Server timeout

2012-03-04 Thread h@rry
Thanks everyone for your inputs..

@strauss, even we are working with support team on the plan to upgrade to ARS 
SP3, this might however take time as we don't want to jump to that straight 
away. Do share your findings on this once you test.

@LJ, our authentication chaining mode is set to AREA - ARS, we even tried 
changing this to 'Off', but the behavior turned weird with even some of the 
valid users (having entries in both Remedy and LDAP) getting authentication 
failed message. Other related settings which we have made in EA tab are  - 
Unchecked both 'Authenticate Unregistered users' and 'Cross reference blank 
password' and also unchecked 'Allow Guest Users' in Configuration tab.

@Bill, even we have some user id's in mixed case. As per my understanding, the 
javascript you mentioned converts the user id's which are in mixed case to 
lower case irrespective of the case format in People/User form, is this true?

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug12 www.wwrug12.com ARSList: Where the Answers Are


Re: Message in Error log and AR Server timeout

2012-03-02 Thread Bill Clary
We had problems with Logins in mixed case in the Mid Tier and we fixed it by 
adding javascript in the login page.

We added it to the login field itself; so all logins are submitted in lower 
case, below is the line from our login page.

input name=username maxlength=254 id=username-id value= 
onChange=javascript:this.value=this.value.toLowerCase(); class=loginfield 
size=30 type=text/td


So far this has worked great on our 7.5 version of Remedy.


Bill Clary
Bright House Networks

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug12 www.wwrug12.com ARSList: Where the Answers Are


Re: Message in Error log and AR Server timeout

2012-03-02 Thread LJ LongWing
Harry,
We experienced this when we changed the 'Authentication Chaining Mode' to 
something other than 'Off'when we changed it back to off, we were still 
able to use the AREA and local...

-Original Message-
From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of h@rry
Sent: Thursday, March 01, 2012 8:16 PM
To: arslist@ARSLIST.ORG
Subject: Message in Error log and AR Server timeout

Hello Experts,

We are on ARS and ITSM 7.6.04 and are facing an issue where we are finding the 
below content in the error log after which application goes into a timeout and 
users are unable to access Remedy, only solution is to restart the application. 
This is not happening at any defined time intervals, occurs sometimes when we 
have most of the users logged in and sometimes when only 1 or 2 of them are 
logged in. Only similarity we could find for all of these messages is that the 
user name appearing is not existing in Remedy but existing in AD (we have the 
LDAP integration done). However, there was were instances when the timeout 
occurred and the user name which appeared in the log was an active user in 
Remedy and at one instance, it also showed the user name as Demo.
We have raised a ticket with support and looking from the LDAP/Authentication 
side of it, they are mentioning that this is a defect which has been fixed in 
ARS 7.6.04 SP2.  But we are not very sure if this content in error log is 
occurring only because of authentication related issue as mentioned by support 
or could there be any other reason.

Though we have raised a case with support, i wanted to know if anyone else has 
faced a similar issue and is upgrading to SP2 the solution for this and 
moreover, is this issue related to authentication itself. Would be great if 
anyone can help on this.

Below content in error log is for a user who does not exist in Remedy and is 
existing in AD. This did not lead to a timeout, but there was slowness (almost 
inaccessible) in accessing the application for around 30 mins.

Thu Mar 01 18:31:47 2012: AR System server terminated when a signal/exception 
was received by the server (ARNOTE  20)

   Thread Id: 5728
   Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43
   ServerName: RMDAPD01 
   Database: SQL -- SQL Server
   Hardware: x86_64
   OS: Windows 6.1
   RPC Id: 18539
   RPC Call: 33 (GLG)
   RPC Queue: 390620
   Client: User z28801 from Mid-tier (protocol 18) at IP address 10.226.138.68
   Logging On:
   Code: c005
   Operation: read
   Access Addr: 
   Stack Begin: 
   Stack End 
Thu Mar 01 18:31:47 2012  390620 : AR System server terminated when a 
signal/exception was received by the server (ARNOTE 20)
Thu Mar 01 18:31:47 2012 0xc005
Thu Mar 01 18:31:47 2012  390620 : AR System server terminated — fatal error 
occurred in ARSERVER (ARNOTE 21)


Below content in error log is for a user existing in Remedy and when it led to 
a timeout and subsequent restart of application.

Fri Mar 02 09:38:59 2012: AR System server terminated when a signal/exception 
was received by the server (ARNOTE  20)

   Thread Id: 3576
   Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43
   ServerName: RMDAPD01 
   Database: SQL -- SQL Server
   Hardware: x86_64
   OS: Windows 6.1
   RPC Id: 49046
   RPC Call: 36 (GSI)
   RPC Queue: 390620
   Client: User Demo from SIM Publishing Server (protocol 12) at IP address 
10.226.138.72
   Logging On:
   Code: c005
   Operation: read
   Access Addr: 000C
   Stack Begin: 
   Stack End 
Fri Mar 02 09:38:59 2012  390620 : AR System server terminated when a 
signal/exception was received by the server (ARNOTE 20)
Fri Mar 02 09:38:59 2012 0xc005
Fri Mar 02 09:38:59 2012  390620 : AR System server terminated — fatal error 
occurred in ARSERVER (ARNOTE 21)
Fri Mar 02 09:42:04 2012  AssignEng : Timeout during database update -- the 
operation has been accepted by the server and will usually complete 
successfully (RMDAPD01)  ARERR - 92
Fri Mar 02 09:42:05 2012  AssignEng : AR System Application server terminated 
-- fatal error encountered (ARAPPNOTE 4501)
Fri Mar 02 09:43:37 2012 : Action Request System(R) Server x64 Version 7.6.04 
Build 002 201101141059
(c) Copyright 1991-2010 BMC Software, Inc. 
Fri Mar 02 09:51:26 2012  AssignEng : Timeout during database update -- the 
operation has been accepted by the server and will usually complete 
successfully (RMDAPD01)  ARERR - 92
Fri Mar 02 09:51:26 2012  AssignEng : AR System Application server terminated 
-- fatal error encountered (ARAPPNOTE 4501)
Fri Mar 02 09:53:26 2012  AssignEng : Timeout during database update -- the 
operation has been accepted by the server and will usually complete 
successfully (RMDAPD01)  ARERR - 92
Fri Mar 02 09:53:26 2012  AssignEng : AR System Application server terminated 
-- fatal error encountered (ARAPPNOTE 4501)
Fri Mar 02 09:54:26 2012 : Action Request System(R) Server x64 Version 

Re: Message in Error log and AR Server timeout

2012-03-01 Thread Tauf Chowdhury
Harry,
If you're running an unpatched 7604 instance then you are probably
experiencing common memory leak issues with that build. Unfortunately,
the errors you see may be the result of the memory leak and not the
cause. In order to capture it, you would need to run logging and hope
to catch the offending code. It may just be easier to apply the SP2
patch.

Sent from my iPhone

On Mar 1, 2012, at 10:16 PM, h@rry hari...@yahoo.com wrote:

 Hello Experts,

 We are on ARS and ITSM 7.6.04 and are facing an issue where we are finding 
 the below content in the error log after which application goes into a 
 timeout and users are unable to access Remedy, only solution is to restart 
 the application. This is not happening at any defined time intervals, occurs 
 sometimes when we have most of the users logged in and sometimes when only 1 
 or 2 of them are logged in. Only similarity we could find for all of these 
 messages is that the user name appearing is not existing in Remedy but 
 existing in AD (we have the LDAP integration done). However, there was were 
 instances when the timeout occurred and the user name which appeared in the 
 log was an active user in Remedy and at one instance, it also showed the user 
 name as Demo.
 We have raised a ticket with support and looking from the LDAP/Authentication 
 side of it, they are mentioning that this is a defect which has been fixed in 
 ARS 7.6.04 SP2.  But we are not very sure if this content in error log is 
 occurring only because of authentication related issue as mentioned by 
 support or could there be any other reason.

 Though we have raised a case with support, i wanted to know if anyone else 
 has faced a similar issue and is upgrading to SP2 the solution for this and 
 moreover, is this issue related to authentication itself. Would be great if 
 anyone can help on this.

 Below content in error log is for a user who does not exist in Remedy and is 
 existing in AD. This did not lead to a timeout, but there was slowness 
 (almost inaccessible) in accessing the application for around 30 mins.

 Thu Mar 01 18:31:47 2012: AR System server terminated when a signal/exception 
 was received by the server (ARNOTE  20)

   Thread Id: 5728
   Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43
   ServerName: RMDAPD01
   Database: SQL -- SQL Server
   Hardware: x86_64
   OS: Windows 6.1
   RPC Id: 18539
   RPC Call: 33 (GLG)
   RPC Queue: 390620
   Client: User z28801 from Mid-tier (protocol 18) at IP address 10.226.138.68
   Logging On:
   Code: c005
   Operation: read
   Access Addr: 
   Stack Begin:
   Stack End
 Thu Mar 01 18:31:47 2012  390620 : AR System server terminated when a 
 signal/exception was received by the server (ARNOTE 20)
 Thu Mar 01 18:31:47 2012 0xc005
 Thu Mar 01 18:31:47 2012  390620 : AR System server terminated — fatal error 
 occurred in ARSERVER (ARNOTE 21)


 Below content in error log is for a user existing in Remedy and when it led 
 to a timeout and subsequent restart of application.

 Fri Mar 02 09:38:59 2012: AR System server terminated when a signal/exception 
 was received by the server (ARNOTE  20)

   Thread Id: 3576
   Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43
   ServerName: RMDAPD01
   Database: SQL -- SQL Server
   Hardware: x86_64
   OS: Windows 6.1
   RPC Id: 49046
   RPC Call: 36 (GSI)
   RPC Queue: 390620
   Client: User Demo from SIM Publishing Server (protocol 12) at IP address 
 10.226.138.72
   Logging On:
   Code: c005
   Operation: read
   Access Addr: 000C
   Stack Begin:
   Stack End
 Fri Mar 02 09:38:59 2012  390620 : AR System server terminated when a 
 signal/exception was received by the server (ARNOTE 20)
 Fri Mar 02 09:38:59 2012 0xc005
 Fri Mar 02 09:38:59 2012  390620 : AR System server terminated — fatal error 
 occurred in ARSERVER (ARNOTE 21)
 Fri Mar 02 09:42:04 2012  AssignEng : Timeout during database update -- the 
 operation has been accepted by the server and will usually complete 
 successfully (RMDAPD01)  ARERR - 92
 Fri Mar 02 09:42:05 2012  AssignEng : AR System Application server terminated 
 -- fatal error encountered (ARAPPNOTE 4501)
 Fri Mar 02 09:43:37 2012 : Action Request System(R) Server x64 Version 7.6.04 
 Build 002 201101141059
 (c) Copyright 1991-2010 BMC Software, Inc.
 Fri Mar 02 09:51:26 2012  AssignEng : Timeout during database update -- the 
 operation has been accepted by the server and will usually complete 
 successfully (RMDAPD01)  ARERR - 92
 Fri Mar 02 09:51:26 2012  AssignEng : AR System Application server terminated 
 -- fatal error encountered (ARAPPNOTE 4501)
 Fri Mar 02 09:53:26 2012  AssignEng : Timeout during database update -- the 
 operation has been accepted by the server and will usually complete 
 successfully (RMDAPD01)  ARERR - 92
 Fri Mar 02 09:53:26 2012  AssignEng : AR System Application server terminated 
 -- fatal error encountered (ARAPPNOTE 4501)
 Fri Mar 02 09:54:26 

Re: Message in Error log and AR Server timeout

2012-03-01 Thread strauss
We reported this problem with 7.6.04.01 last summer and have been living
with it ever since.  The mid-tier caches incorrect login names or
incorrect case login names that are passed to AREA LDAP, especially when
people try to use the mid-tier with a mobile device that capitalizes their
login, and then kills threads on the AR Server when AREA allows them in
and they do not exist in the User table in that form (Jsmith versus
jsmith).  These incorrect or invalid account names are cached by the
mid-tier, especially the ones where non-case-sensitive LDAP will
authenticate them whereas case-sensitive ARS would not have.  On the end
of the next prefetch, the mid-tier throws them at the AR Server without
passwords and they are in turn thrown at AREA and you get three or four
thread crashes per account.  I have seen the production mid-tier unusable
for 30 minutes after a restart (a different mid-tier already connected is
unaffected) as thread after thread crashed on the AR Server - exactly the
same problem you experienced.  Real high quality code, here; I don't think
BMC ever tested the ehcache prefetching on systems with AREA
authentication enabled.

We also saw the same thread deaths when someone used an ipad to hit our
Kinetic portal - invariably a capitalized version of their login name - so
AREA would let them in, then Kinetic would halt them when it could not
match their credentials (Jsmith != jsmith) and could not pull their
customer data.  We finally killed that off with javascript in the Kinetic
login page that forced the login name to lower case; we may have to do
exactly the same thing on the mid-tier login.jsp pages (anyone have a
working snippet of code for that??).

I am testing now to see if this is fixed in SP3; we identified so many
fatal installer problems in SP2 (ARS, mid-tier, Atrium, ITSM - against
ITSM Suite servers upgraded from 7.1/7.0.02) last November that I have
refused to use it for ANYTHING ever since.  This defect is definitely NOT
fixed if you only update the mid-tier to SP3 and the AR Server is still on
SP1; I tried that first.  I only just today got a full ARS SP3 over SP1
install completed on a clone of production, with a local and remote SP3
mid-tier, so I should be able to tell if they really fixed it tomorrow.
If not, then I'll give up and start forcing the login to lower case on
each mid-tier.  That will keep Demo and similar mixed-case system
accounts out too, but that's what the User Tool is still essential for.

SP3 does fix the date-range crystal report problem we reported last summer
(must be SP3 on BOTH the AR Server and mid-tier), as well as the
Firefox-Work Info-text box defect we reported at the same time, so _some_
issues have been fixed that have caused us problems on production for the
last 7 months.  I think they fixed the Migrator login timeout problem
introduced in 7.6.04.00 and retained in .01 and .02 (they finally gave me
a post-SP2 hotfix that worked) but I have to test that as well.

I'll post an update tomorrow.

Christopher Strauss, Ph.D.
Call Tracking Administration Manager
University of North Texas Computing  IT Center
http://itsm.unt.edu/





On 3/1/12 9:16 PM, h@rry hari...@yahoo.com wrote:

Hello Experts,

We are on ARS and ITSM 7.6.04 and are facing an issue where we are
finding the below content in the error log after which application goes
into a timeout and users are unable to access Remedy, only solution is to
restart the application. This is not happening at any defined time
intervals, occurs sometimes when we have most of the users logged in and
sometimes when only 1 or 2 of them are logged in. Only similarity we
could find for all of these messages is that the user name appearing is
not existing in Remedy but existing in AD (we have the LDAP integration
done). However, there was were instances when the timeout occurred and
the user name which appeared in the log was an active user in Remedy and
at one instance, it also showed the user name as Demo.
We have raised a ticket with support and looking from the
LDAP/Authentication side of it, they are mentioning that this is a defect
which has been fixed in ARS 7.6.04 SP2.  But we are not very sure if this
content in error log is occurring only because of authentication related
issue as mentioned by support or could there be any other reason.

Though we have raised a case with support, i wanted to know if anyone
else has faced a similar issue and is upgrading to SP2 the solution for
this and moreover, is this issue related to authentication itself. Would
be great if anyone can help on this.

Below content in error log is for a user who does not exist in Remedy and
is existing in AD. This did not lead to a timeout, but there was slowness
(almost inaccessible) in accessing the application for around 30 mins.

Thu Mar 01 18:31:47 2012: AR System server terminated when a
signal/exception was received by the server (ARNOTE  20)

   Thread Id: 5728
   Version: 7.6.04 Build 002 201101141059 Jan 14 2011