Re: Strange ARS Timeout Problem

Axton Thu, 27 Jan 2011 14:33:17 -0800

I'm not following how you got to broken db connections.  If arerror.log does
not show the sql connection dropped, it didn't.  Oracle connections are
stateful, meaning that if that link drops, that session is dead.  If
arerror.log doesn't indicate broken sessions to the db, chances are things
are good there.


On Thu, Jan 27, 2011 at 4:26 PM, ZHANG, ERIC L <ezh...@entergy.com> wrote:

> ** **
>
> Good idea.  I just put a cron job on the ars server that runs traceroute
> <db_server> every minute and appends the output to an output file. Waiting
> for the next timeout.
>
>
>
> -----Original Message-----
> *From:* LJ LongWing [mailto:lj.longw...@gmail.com]
> *Sent:* Thursday, January 27, 2011 9:18 AM
>
> *Subject:* Re: Strange ARS Timeout Problem
>
>
>
> Ok….I just completely re-read the original post…..all indications save one
> are that during that 5 minute interval the application server lost
> connectivity with the DB server.  The only exception to that appears to be
> the escalation thread which continued processing during that 5 minute
> window…..so, what I would do would be to setup a cron to run every 30
> seconds or every minute, something along those lines that issues a tracert
> between your remedy server and your db server.  My primary thought is that
> you are losing network connectivity….even though the escalation server is
> still working…it’s at least something you can try and report back.
>
>
>
> *From:* Action Request System discussion list(ARSList) [mailto:
> arslist@ARSLIST.ORG] *On Behalf Of *ZHANG, ERIC L
> *Sent:* Wednesday, January 26, 2011 7:19 PM
>
> *To:* arslist@ARSLIST.ORG
>
> *Subject:* Re: Strange ARS Timeout Problem
>
>
>
> **
>
> Yes, I did initial log analysis. As I said in the original posting, there
> was 5-minutes gap in the api log, while no gap/waiting/error/long operation
> was showing in the sql log and escalation log. All the sql queries were for
> user AR_ESCALATOR in the sql log.
>
>
>
>
>
> -----Original Message-----
> *From:* Axton [mailto:axton.gr...@gmail.com]
> *Sent:* Wednesday, January 26, 2011 8:18 AM
> *Subject:* Re: Strange ARS Timeout Problem
>
>
>
> ** What do the logs say?  I haven't seen that you've done analysis with
> the logs.  Is there a gap in time in the logs (indicating the server was not
> doing anything)?  Is there are gap in time in the logs (indicating a long
> operation was running?
>
> On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L <ezh...@entergy.com> wrote:
>
> **
>
> We have sent BMC tech support all the logs including api, filter, sql,
> escalation, thread, plug-in, arfork, even pstack output that were taken
> during hanging, and so far they haven’t been able to identify the cause of
> the problem.
>
>
>
> -----Original Message-----
> *From:* Axton [mailto:axton.gr...@gmail.com]
> *Sent:* Monday, January 24, 2011 5:45 PM
> *Subject:* Re: Strange ARS Timeout Problem
>
>
>
> ** Try to get the api, filter, and sql logs leading up to the point where
> it started hanging.  Those are your best indicator.  Also check the
> arerror.log for crashes.
>
>
>
> There are things that can cause behavior like this that the logs will
> indicate.  For example, try creating a computed group during production
> operations, or importing a deployable application.
>
> On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L <ezh...@entergy.com> wrote:
>
> **
>
> Hi Listers.
>
>
>
> We are experiencing intermittent timeouts with the ARS. Without me doing
> anything, the AR system becomes normal again after about 5 minutes. All
> users are getting timeout (or hourglass) but no process is being restarted
> in armonitor.log.
>
>
>
> This is the message showing in arerror.log:
>
>
>
> Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to
> busy server -- retry the operation (server_name)  ARERR - 93
>
> Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
> consider using more specific search criteria to narrow the results, and
> retry the operation (ARERR 94)
>
>
>
> In the API log, it shows a 5-minute gap:
>
>
>
> <API > <TID: 0000000004> <RPC ID: 0000000000> <Queue: Admin     >
> <Client-RPC: 999999   > <USER: Remedy Application Service
> > /* Tue Jan 18 2011 12:06:16.2224 */-GLEWF            OK
>
> <API > <TID: 0000000004> <RPC ID: 0000000000> <Queue: Admin     >
> <Client-RPC: 999999   > <USER: Remedy Application Service
> > /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
> schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address
>
>
>
> Our DBA was monitoring the database during the time and found few
> activities in the database. The activities shown in SQL log during the
> timeout were all for user AR_ESCALATOR, which means the escalation was still
> running during the time. This can also be verified from the escalation log.
>
>
>
> When this occurs, the CPU and RAM utilizations are dramatically dropping to
> the lowest levels on both the ARS server and the database server. There was
> no application change in the last couple of months. The problem started
> about two weeks ago. It could occur 3 times a day and sometimes it works
> fine for days without it occurring.
>
>
>
> Our configuration/environment:
>
>
>
> ARS: 7.1 patch 7
>
> ITSM: 7.0.03 patch 9
>
> SLM: 7.1 patch 2
>
> SRM: 2.2 patch 4
>
> Midtier: 7.6.03
>
>
>
> ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) –
> Dedicated to ARServer, ITSM, SLM, and SRM.
>
> Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) – Used only
> by customers to submit service request.
>
> Database: Oracle: 10gR2 (remote)
>
>
>
> The following are threads settings in ar.conf:
>
>
>
> Private-RPC-Socket:  390601   2   6
>
> Private-RPC-Socket:  390603   2   2
>
> Private-RPC-Socket:  390620  16  24  (FAST)
>
> Private-RPC-Socket:  390626   8  16
>
> Private-RPC-Socket:  390627   2  12
>
> Private-RPC-Socket:  390635  24  30  (LIST)
>
> Private-RPC-Socket:  390680  24  24
>
> Private-RPC-Socket:  390693   2   4
>
> Private-RPC-Socket:  390698   2   4
>
>
>
> We have about 300 concurrent Remedy users during the peak hours. ARServer
> is running as non-root process. The number of open file descriptors for
> arserverd (~700) was well below the ulimit 3072.  The FAST and LIST threads
> never reached the maximums.
>
>
>
> I have an open ticket with BMC Support but thought I might get a solution
> quicker from the Arslist here.
>
>
>
> Thanks,
>
> Eric
>
>
>
> _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_
>
>
>
> _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_
>
> _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_
>
>
> _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_
>
> _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_
>  _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_ _attend
> WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_
>

_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are"

Re: Strange ARS Timeout Problem

Reply via email to