Major index problem - RESOLVED - with performance/troubleshoot tips because it's Friday

William Rentfrow Fri, 15 Aug 2008 08:28:23 -0700

This one was a proverbial hum-dinger to fix.  It's an interesting
problem though so I'll go back through it.
 
Server: ARS 7.1 on Solaris with Incident Management 7.03, SLM 7.1,
normal ancillary apps such as AE, EE, MT, etc
4 MT servers on Solaris with Websphere/IBM HTTP server
 
The original problem: Users were experiencing a "Unique index violation"
when saving an incident.  The only failure noted in any of the logs
client or server side was an API -CE entry that said "Fail".
 
Lesson one: The one I have to re-learn every couple of years - always
ask your users "Were there any other error messages you received?"  
 
Users will always tell you the LAST error message they got and not the
first necessarily.  In this case the answer was "Yes - I got an RPC
timeout message first".
 
I didn't get that piece of information until day 3 of this problem.
 
Lesson two: Sometimes an error is correct.
 
In my case the unique index error WAS correct.  The users were hitting
"Save" and then getting an RPC timeout after 2 minutes or so of not
receiving a response from the AR Server.  In the meantime the AR Server
was receiving the Incident and saving it.  The user - who had only seen
an RPC timeout message - was still in the "new" mode of the system and
hit save again which correctly generated the unique index violation
message.  The instanceID (179) would cause this error - so would the
Incident Number.  Both were duplicated in the second save attempt and
Oracle said "NO."
 
Lesson three:  If you are using Oracle remotely with Remedy it is
imperative you switch to the in-row CLOB storage option.
 
Once I figured #2 above out I had to deal with the performance issue.
We had previously had major performance problems on our QA server.  The
symptoms were a <long string of expletives goes here> to deal with - all
you saw in any of the log files was a huge gap while Oracle was doing
transactions.  Literally nothing was going on.
 
When we built our PRD system we switched the in-row option immediately
after installing the AR Server - then we installed the apps.  The apps
were nice enough to "correct" our setting back to out-of-row and we
didn't notice until this happened.  We ended up doing an emergency
re-structuring of the database once I was able to show again in the logs
there were long, long gaps (like 10 seconds total per transaction) in
the Submit/Modify processes.
 
Lesson four: All that improved the problem but still didn't fix it.
After much more log reading and noticing very small gaps ( < 50 ms)
between API transactions I upped the fast/list queues by 25% to 16 each.
We run 3000 incidents/day through this system with  several hundred
users.
 
Voila.  Problem fixed.
 
Must............enjoy..................weekend.....................
 
William Rentfrow, Principal Consultant
[EMAIL PROTECTED]
C 701-306-6157
O 952-432-0227


_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
Platinum Sponsor: www.rmsportal.com ARSlist: "Where the Answers Are"

Major index problem - RESOLVED - with performance/troubleshoot tips because it's Friday

Reply via email to