Eric, We had a similar symptom many years back. There were 3 DNS servers configured for our AR System server. Over time the first and second ones were retired and the DNS configuration did not get updated. So, for every DNS call the system had to wait for the first and second servers to timeout before trying the third and if the third was busy everything just went to sleep while waiting for a response. We updated our DNS config and hosts file and everything returned to normal.
Suppose there might also be other resources besides DNS servers that could cause the same symptom. Our network guys sniffed the network to see what we were waiting on. HTH, Dennis "ZHANG, ERIC L" <ezh...@entergy.com> Sent by: "Action Request System discussion list(ARSList)" <arslist@ARSLIST.ORG> 01/20/2011 03:10 PM Please respond to arslist@ARSLIST.ORG To arslist@ARSLIST.ORG cc Subject Strange ARS Timeout Problem ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: <API > <TID: 0000000004> <RPC ID: 0000000000> <Queue: Admin > <Client-RPC: 999999 > <USER: Remedy Application Service > /* Tue Jan 18 2011 12:06:16.2224 */-GLEWF OK <API > <TID: 0000000004> <RPC ID: 0000000000> <Queue: Admin > <Client-RPC: 999999 > <USER: Remedy Application Service > /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) ? Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) ? Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_ _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are"