Re: Message in Error log and AR Server timeout
I just finished applying the javascript to my three mid-tiers, and yes, it meant that my app admin had to rename about a dozen support staff who had mixed case login names and inform them of the change. I did this primarily because of the problem we have seen with mobile devices that _attempt_ to use the mid tier capitalizing the first letter - and our LDAP will authenticate them if it is otherwise correct, at which point the mid-tier caches a bad login name. Never mind that the mid-tier is useless on a mobile device; the damage to the mid-tier cache has been done. We might have considered retaining them in mixed case if we were working against Active Directory (which is mixed case) but our primary authentication system is Identity Manager which contains ~300,000 lower case login names. This change (forcing everything lower case) will not prevent another situation, where a support staff member departs and our practice is to remove them from support status which removes their User record and login name. Then the AR server gets thread deaths when the mid-tier tries to prefetch for the previously valid login name that no longer exists. I got eight server terminations (below) tonight from one Sp1 mid-tier without doing anything - I didn't restart it - it was probably an interval-based pre-fetch check. Sun Mar 04 18:29:09 2012: AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thread Id: 3848 Version: 7.6.04 SP1 HotFix 01 201107051610 Jul 5 2011 17:17:48 ServerName: itsm76 Database: SQL -- SQL Server Hardware: x86_64 OS: Windows Server 2003 RPC Id: 235871 RPC Call: 11 (GLS) RPC Queue: 390620 Client: User removed username from Mid-tier (protocol 18) at IP address xxx.xxx.xxx.xxx Logging On: User Code: c005 Operation: read Access Addr: Stack Begin: Stack End Sun Mar 04 18:29:08 2012 390620 : AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Sun Mar 04 18:29:08 2012 0xc005 Sun Mar 04 18:29:08 2012 390620 : AR System server terminated — fatal error occurred in ARSERVER (ARNOTE 21) I got 8 more server thread terminations on a different account tonight while upgrading one of the mid-tiers to SP3 for a live test by some of my production users. These were for one of the users my app admin had already changed from mixed case to all lower case - it tried to prefetch for the cached, mixed-case version and no longer could find it, and exploded. I did learn the most effective trick for eliminating a poisoned cache - where the mid-tier remembers a login name in the wrong case or that no longer exists: 1. You add another SERVER NAME in AR Server Settings, then change all four of the General Settings servers to that new one, go back and Delete the original server from AR Server Settings, and Flush the Cache. 2. Repeat the exact same steps to restore the mid-tier to the original (production) AR Server name, flushing the cache again. 3. Set the server to PRE-LOAD = Yes and restart tomcat; after the preload, set it back OFF The mid-tier is now cleaned of improper login names that will kill threads on the AR Server - for now. I have done that successfully on two of my mid-tiers so far. It must be done during a scheduled maintenance window because it definitely knocks it out of action for at least a half hour. I will be watching the system for any new thread terminations from the SP3 mid-tier to see if that SP has fixed the problem with prefetching. I suspect that it can still generate errors, especially since the AR Server is still SP1, but maybe we will get lucky. Christopher Strauss, Ph.D. Call Tracking Administration Manager University of North Texas Computing IT Center http://itsm.unt.edu/ -Original Message- From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of h@rry Sent: Monday, March 05, 2012 1:53 AM To: arslist@ARSLIST.ORG Subject: Re: Message in Error log and AR Server timeout Thanks everyone for your inputs.. @strauss, even we are working with support team on the plan to upgrade to ARS SP3, this might however take time as we don't want to jump to that straight away. Do share your findings on this once you test. @LJ, our authentication chaining mode is set to AREA - ARS, we even tried changing this to 'Off', but the behavior turned weird with even some of the valid users (having entries in both Remedy and LDAP) getting authentication failed message. Other related settings which we have made in EA tab are - Unchecked both 'Authenticate Unregistered users' and 'Cross reference blank password' and also unchecked 'Allow Guest Users' in Configuration tab. @Bill, even we have some user id's in mixed case. As per my understanding, the javascript you mentioned converts the user id's which are in mixed case to lower case irrespective of the case
Re: Message in Error log and AR Server timeout
I stand corrected. The Migrator login timeout problem is STILL not fixed in 7.6.04 SP3, in spite of the fact that they got me a hotfix in November 2011. Installing SP3 over a system with the hotfix in place reintroduces the error. Considering that I first reported the problem in mid-February 2011, and they had a YEAR to figure it out and get the fix into SP3, their track record on fixing broken stuff is just plain pitiful. I checked the SW bug, and it is in fact still open, Tentatively Targeted for a Future Patch Release. I suspect that hell will freeze over first. Christopher Strauss, Ph.D. Call Tracking Administration Manager University of North Texas Computing IT Center http://itsm.unt.edu/ -Original Message- From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of strauss Sent: Thursday, March 01, 2012 11:19 PM To: arslist@ARSLIST.ORG Subject: Re: Message in Error log and AR Server timeout We reported this problem with 7.6.04.01 last summer and have been living with it ever since. The mid-tier caches incorrect login names or incorrect case login names that are passed to AREA LDAP, especially when people try to use the mid-tier with a mobile device that capitalizes their login, and then kills threads on the AR Server when AREA allows them in and they do not exist in the User table in that form (Jsmith versus jsmith). These incorrect or invalid account names are cached by the mid-tier, especially the ones where non-case-sensitive LDAP will authenticate them whereas case-sensitive ARS would not have. On the end of the next prefetch, the mid-tier throws them at the AR Server without passwords and they are in turn thrown at AREA and you get three or four thread crashes per account. I have seen the production mid-tier unusable for 30 minutes after a restart (a different mid-tier already connected is unaffected) as thread after thread crashed on the AR Server - exactly the same problem you experienced. Real high quality code, here; I don't think BMC ever tested the ehcache prefetching on systems with AREA authentication enabled. We also saw the same thread deaths when someone used an ipad to hit our Kinetic portal - invariably a capitalized version of their login name - so AREA would let them in, then Kinetic would halt them when it could not match their credentials (Jsmith != jsmith) and could not pull their customer data. We finally killed that off with javascript in the Kinetic login page that forced the login name to lower case; we may have to do exactly the same thing on the mid-tier login.jsp pages (anyone have a working snippet of code for that??). I am testing now to see if this is fixed in SP3; we identified so many fatal installer problems in SP2 (ARS, mid-tier, Atrium, ITSM - against ITSM Suite servers upgraded from 7.1/7.0.02) last November that I have refused to use it for ANYTHING ever since. This defect is definitely NOT fixed if you only update the mid-tier to SP3 and the AR Server is still on SP1; I tried that first. I only just today got a full ARS SP3 over SP1 install completed on a clone of production, with a local and remote SP3 mid-tier, so I should be able to tell if they really fixed it tomorrow. If not, then I'll give up and start forcing the login to lower case on each mid-tier. That will keep Demo and similar mixed-case system accounts out too, but that's what the User Tool is still essential for. SP3 does fix the date-range crystal report problem we reported last summer (must be SP3 on BOTH the AR Server and mid-tier), as well as the Firefox-Work Info-text box defect we reported at the same time, so _some_ issues have been fixed that have caused us problems on production for the last 7 months. I think they fixed the Migrator login timeout problem introduced in 7.6.04.00 and retained in .01 and .02 (they finally gave me a post-SP2 hotfix that worked) but I have to test that as well. I'll post an update tomorrow. Christopher Strauss, Ph.D. Call Tracking Administration Manager University of North Texas Computing IT Center ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug12 www.wwrug12.com ARSList: Where the Answers Are
Re: Message in Error log and AR Server timeout
Thanks everyone for your inputs.. @strauss, even we are working with support team on the plan to upgrade to ARS SP3, this might however take time as we don't want to jump to that straight away. Do share your findings on this once you test. @LJ, our authentication chaining mode is set to AREA - ARS, we even tried changing this to 'Off', but the behavior turned weird with even some of the valid users (having entries in both Remedy and LDAP) getting authentication failed message. Other related settings which we have made in EA tab are - Unchecked both 'Authenticate Unregistered users' and 'Cross reference blank password' and also unchecked 'Allow Guest Users' in Configuration tab. @Bill, even we have some user id's in mixed case. As per my understanding, the javascript you mentioned converts the user id's which are in mixed case to lower case irrespective of the case format in People/User form, is this true? ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug12 www.wwrug12.com ARSList: Where the Answers Are
Re: Message in Error log and AR Server timeout
We had problems with Logins in mixed case in the Mid Tier and we fixed it by adding javascript in the login page. We added it to the login field itself; so all logins are submitted in lower case, below is the line from our login page. input name=username maxlength=254 id=username-id value= onChange=javascript:this.value=this.value.toLowerCase(); class=loginfield size=30 type=text/td So far this has worked great on our 7.5 version of Remedy. Bill Clary Bright House Networks ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug12 www.wwrug12.com ARSList: Where the Answers Are
Re: Message in Error log and AR Server timeout
Harry, We experienced this when we changed the 'Authentication Chaining Mode' to something other than 'Off'when we changed it back to off, we were still able to use the AREA and local... -Original Message- From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of h@rry Sent: Thursday, March 01, 2012 8:16 PM To: arslist@ARSLIST.ORG Subject: Message in Error log and AR Server timeout Hello Experts, We are on ARS and ITSM 7.6.04 and are facing an issue where we are finding the below content in the error log after which application goes into a timeout and users are unable to access Remedy, only solution is to restart the application. This is not happening at any defined time intervals, occurs sometimes when we have most of the users logged in and sometimes when only 1 or 2 of them are logged in. Only similarity we could find for all of these messages is that the user name appearing is not existing in Remedy but existing in AD (we have the LDAP integration done). However, there was were instances when the timeout occurred and the user name which appeared in the log was an active user in Remedy and at one instance, it also showed the user name as Demo. We have raised a ticket with support and looking from the LDAP/Authentication side of it, they are mentioning that this is a defect which has been fixed in ARS 7.6.04 SP2. But we are not very sure if this content in error log is occurring only because of authentication related issue as mentioned by support or could there be any other reason. Though we have raised a case with support, i wanted to know if anyone else has faced a similar issue and is upgrading to SP2 the solution for this and moreover, is this issue related to authentication itself. Would be great if anyone can help on this. Below content in error log is for a user who does not exist in Remedy and is existing in AD. This did not lead to a timeout, but there was slowness (almost inaccessible) in accessing the application for around 30 mins. Thu Mar 01 18:31:47 2012: AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thread Id: 5728 Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43 ServerName: RMDAPD01 Database: SQL -- SQL Server Hardware: x86_64 OS: Windows 6.1 RPC Id: 18539 RPC Call: 33 (GLG) RPC Queue: 390620 Client: User z28801 from Mid-tier (protocol 18) at IP address 10.226.138.68 Logging On: Code: c005 Operation: read Access Addr: Stack Begin: Stack End Thu Mar 01 18:31:47 2012 390620 : AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thu Mar 01 18:31:47 2012 0xc005 Thu Mar 01 18:31:47 2012 390620 : AR System server terminated — fatal error occurred in ARSERVER (ARNOTE 21) Below content in error log is for a user existing in Remedy and when it led to a timeout and subsequent restart of application. Fri Mar 02 09:38:59 2012: AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thread Id: 3576 Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43 ServerName: RMDAPD01 Database: SQL -- SQL Server Hardware: x86_64 OS: Windows 6.1 RPC Id: 49046 RPC Call: 36 (GSI) RPC Queue: 390620 Client: User Demo from SIM Publishing Server (protocol 12) at IP address 10.226.138.72 Logging On: Code: c005 Operation: read Access Addr: 000C Stack Begin: Stack End Fri Mar 02 09:38:59 2012 390620 : AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Fri Mar 02 09:38:59 2012 0xc005 Fri Mar 02 09:38:59 2012 390620 : AR System server terminated — fatal error occurred in ARSERVER (ARNOTE 21) Fri Mar 02 09:42:04 2012 AssignEng : Timeout during database update -- the operation has been accepted by the server and will usually complete successfully (RMDAPD01) ARERR - 92 Fri Mar 02 09:42:05 2012 AssignEng : AR System Application server terminated -- fatal error encountered (ARAPPNOTE 4501) Fri Mar 02 09:43:37 2012 : Action Request System(R) Server x64 Version 7.6.04 Build 002 201101141059 (c) Copyright 1991-2010 BMC Software, Inc. Fri Mar 02 09:51:26 2012 AssignEng : Timeout during database update -- the operation has been accepted by the server and will usually complete successfully (RMDAPD01) ARERR - 92 Fri Mar 02 09:51:26 2012 AssignEng : AR System Application server terminated -- fatal error encountered (ARAPPNOTE 4501) Fri Mar 02 09:53:26 2012 AssignEng : Timeout during database update -- the operation has been accepted by the server and will usually complete successfully (RMDAPD01) ARERR - 92 Fri Mar 02 09:53:26 2012 AssignEng : AR System Application server terminated -- fatal error encountered (ARAPPNOTE 4501) Fri Mar 02 09:54:26 2012 : Action Request System(R) Server x64 Version
Re: Message in Error log and AR Server timeout
Harry, If you're running an unpatched 7604 instance then you are probably experiencing common memory leak issues with that build. Unfortunately, the errors you see may be the result of the memory leak and not the cause. In order to capture it, you would need to run logging and hope to catch the offending code. It may just be easier to apply the SP2 patch. Sent from my iPhone On Mar 1, 2012, at 10:16 PM, h@rry hari...@yahoo.com wrote: Hello Experts, We are on ARS and ITSM 7.6.04 and are facing an issue where we are finding the below content in the error log after which application goes into a timeout and users are unable to access Remedy, only solution is to restart the application. This is not happening at any defined time intervals, occurs sometimes when we have most of the users logged in and sometimes when only 1 or 2 of them are logged in. Only similarity we could find for all of these messages is that the user name appearing is not existing in Remedy but existing in AD (we have the LDAP integration done). However, there was were instances when the timeout occurred and the user name which appeared in the log was an active user in Remedy and at one instance, it also showed the user name as Demo. We have raised a ticket with support and looking from the LDAP/Authentication side of it, they are mentioning that this is a defect which has been fixed in ARS 7.6.04 SP2. But we are not very sure if this content in error log is occurring only because of authentication related issue as mentioned by support or could there be any other reason. Though we have raised a case with support, i wanted to know if anyone else has faced a similar issue and is upgrading to SP2 the solution for this and moreover, is this issue related to authentication itself. Would be great if anyone can help on this. Below content in error log is for a user who does not exist in Remedy and is existing in AD. This did not lead to a timeout, but there was slowness (almost inaccessible) in accessing the application for around 30 mins. Thu Mar 01 18:31:47 2012: AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thread Id: 5728 Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43 ServerName: RMDAPD01 Database: SQL -- SQL Server Hardware: x86_64 OS: Windows 6.1 RPC Id: 18539 RPC Call: 33 (GLG) RPC Queue: 390620 Client: User z28801 from Mid-tier (protocol 18) at IP address 10.226.138.68 Logging On: Code: c005 Operation: read Access Addr: Stack Begin: Stack End Thu Mar 01 18:31:47 2012 390620 : AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thu Mar 01 18:31:47 2012 0xc005 Thu Mar 01 18:31:47 2012 390620 : AR System server terminated — fatal error occurred in ARSERVER (ARNOTE 21) Below content in error log is for a user existing in Remedy and when it led to a timeout and subsequent restart of application. Fri Mar 02 09:38:59 2012: AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thread Id: 3576 Version: 7.6.04 Build 002 201101141059 Jan 14 2011 11:55:43 ServerName: RMDAPD01 Database: SQL -- SQL Server Hardware: x86_64 OS: Windows 6.1 RPC Id: 49046 RPC Call: 36 (GSI) RPC Queue: 390620 Client: User Demo from SIM Publishing Server (protocol 12) at IP address 10.226.138.72 Logging On: Code: c005 Operation: read Access Addr: 000C Stack Begin: Stack End Fri Mar 02 09:38:59 2012 390620 : AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Fri Mar 02 09:38:59 2012 0xc005 Fri Mar 02 09:38:59 2012 390620 : AR System server terminated — fatal error occurred in ARSERVER (ARNOTE 21) Fri Mar 02 09:42:04 2012 AssignEng : Timeout during database update -- the operation has been accepted by the server and will usually complete successfully (RMDAPD01) ARERR - 92 Fri Mar 02 09:42:05 2012 AssignEng : AR System Application server terminated -- fatal error encountered (ARAPPNOTE 4501) Fri Mar 02 09:43:37 2012 : Action Request System(R) Server x64 Version 7.6.04 Build 002 201101141059 (c) Copyright 1991-2010 BMC Software, Inc. Fri Mar 02 09:51:26 2012 AssignEng : Timeout during database update -- the operation has been accepted by the server and will usually complete successfully (RMDAPD01) ARERR - 92 Fri Mar 02 09:51:26 2012 AssignEng : AR System Application server terminated -- fatal error encountered (ARAPPNOTE 4501) Fri Mar 02 09:53:26 2012 AssignEng : Timeout during database update -- the operation has been accepted by the server and will usually complete successfully (RMDAPD01) ARERR - 92 Fri Mar 02 09:53:26 2012 AssignEng : AR System Application server terminated -- fatal error encountered (ARAPPNOTE 4501) Fri Mar 02 09:54:26
Re: Message in Error log and AR Server timeout
We reported this problem with 7.6.04.01 last summer and have been living with it ever since. The mid-tier caches incorrect login names or incorrect case login names that are passed to AREA LDAP, especially when people try to use the mid-tier with a mobile device that capitalizes their login, and then kills threads on the AR Server when AREA allows them in and they do not exist in the User table in that form (Jsmith versus jsmith). These incorrect or invalid account names are cached by the mid-tier, especially the ones where non-case-sensitive LDAP will authenticate them whereas case-sensitive ARS would not have. On the end of the next prefetch, the mid-tier throws them at the AR Server without passwords and they are in turn thrown at AREA and you get three or four thread crashes per account. I have seen the production mid-tier unusable for 30 minutes after a restart (a different mid-tier already connected is unaffected) as thread after thread crashed on the AR Server - exactly the same problem you experienced. Real high quality code, here; I don't think BMC ever tested the ehcache prefetching on systems with AREA authentication enabled. We also saw the same thread deaths when someone used an ipad to hit our Kinetic portal - invariably a capitalized version of their login name - so AREA would let them in, then Kinetic would halt them when it could not match their credentials (Jsmith != jsmith) and could not pull their customer data. We finally killed that off with javascript in the Kinetic login page that forced the login name to lower case; we may have to do exactly the same thing on the mid-tier login.jsp pages (anyone have a working snippet of code for that??). I am testing now to see if this is fixed in SP3; we identified so many fatal installer problems in SP2 (ARS, mid-tier, Atrium, ITSM - against ITSM Suite servers upgraded from 7.1/7.0.02) last November that I have refused to use it for ANYTHING ever since. This defect is definitely NOT fixed if you only update the mid-tier to SP3 and the AR Server is still on SP1; I tried that first. I only just today got a full ARS SP3 over SP1 install completed on a clone of production, with a local and remote SP3 mid-tier, so I should be able to tell if they really fixed it tomorrow. If not, then I'll give up and start forcing the login to lower case on each mid-tier. That will keep Demo and similar mixed-case system accounts out too, but that's what the User Tool is still essential for. SP3 does fix the date-range crystal report problem we reported last summer (must be SP3 on BOTH the AR Server and mid-tier), as well as the Firefox-Work Info-text box defect we reported at the same time, so _some_ issues have been fixed that have caused us problems on production for the last 7 months. I think they fixed the Migrator login timeout problem introduced in 7.6.04.00 and retained in .01 and .02 (they finally gave me a post-SP2 hotfix that worked) but I have to test that as well. I'll post an update tomorrow. Christopher Strauss, Ph.D. Call Tracking Administration Manager University of North Texas Computing IT Center http://itsm.unt.edu/ On 3/1/12 9:16 PM, h@rry hari...@yahoo.com wrote: Hello Experts, We are on ARS and ITSM 7.6.04 and are facing an issue where we are finding the below content in the error log after which application goes into a timeout and users are unable to access Remedy, only solution is to restart the application. This is not happening at any defined time intervals, occurs sometimes when we have most of the users logged in and sometimes when only 1 or 2 of them are logged in. Only similarity we could find for all of these messages is that the user name appearing is not existing in Remedy but existing in AD (we have the LDAP integration done). However, there was were instances when the timeout occurred and the user name which appeared in the log was an active user in Remedy and at one instance, it also showed the user name as Demo. We have raised a ticket with support and looking from the LDAP/Authentication side of it, they are mentioning that this is a defect which has been fixed in ARS 7.6.04 SP2. But we are not very sure if this content in error log is occurring only because of authentication related issue as mentioned by support or could there be any other reason. Though we have raised a case with support, i wanted to know if anyone else has faced a similar issue and is upgrading to SP2 the solution for this and moreover, is this issue related to authentication itself. Would be great if anyone can help on this. Below content in error log is for a user who does not exist in Remedy and is existing in AD. This did not lead to a timeout, but there was slowness (almost inaccessible) in accessing the application for around 30 mins. Thu Mar 01 18:31:47 2012: AR System server terminated when a signal/exception was received by the server (ARNOTE 20) Thread Id: 5728 Version: 7.6.04 Build 002 201101141059 Jan 14 2011