John, You say you captured an SQL file....what about a combined file...API/Filter/SQL. I would be interested to see if there is anything going on during that time that the CPU is busy....a 100% CPU is abnormal, but not unheard-of by any means, it all depends on what your system is doing to determine if it's appropriate. Due to the fact that you are 100% custom, you could have put something into some sort of a loop accidentally....what's your stack and max filter settings?
-----Original Message----- From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of Reiser, John J Sent: Monday, June 27, 2011 3:27 PM To: arslist@ARSLIST.ORG Subject: arserver.exe is consuming 100% cpu - possible DB corruption? (Long Post) Hello Listers, ARS 7.6.03 MS 2003 Enterprise MS SQL 2005 (remote) Total home grown system. No OOTB modules. I have a real stumper here. It even has BMC scratching their heads. I have a production system that is experiencing cpu overload that runs up to 99 in the processes and sits there. The ARSystem server is virtual machine. We thought maybe it was a MS "Patch Tuesday" issue and we removed the 10 recent MS patches one at a time and restarted the machine each time. The problem still exists after the arserver service starts. Sometime immediately and sometimes it will sit for 1- 20 minutes before it starts to hog the CPUs. To eliminate any other OS and file system issues we grabbed a two week old backup image of the server and restored it. The system came back ok for a short while and then started to lock up the CPU again. Working with BMC I set the logs on and restarted. We saw the system jump to 100% within a minute and captured a 10MB arsql.log file. It can force the overload at anytime by firing filter workflow with a notification action in it. I disabled this one filter but the system still loaded up. I added a Filter that ran a 0 and the only action was Goto 1000 to jump all Filter actions that fired on the change of the Status field in question. Still no joy. I've disabled every piece of Notify workflow. That worked the best and kept the system alive for longer stretches but we can't run a system that way. I've come to the realization that there may be corrupted information in the DB object tables and I wanted to get some feedback. Using rrrChive I can pull a copy of every form's data since, say, two weeks ago. Then have the DBA restore the entire system from that date. After the restore I would use rrrChive to reload the two weeks' data (Modified date' > "06/11/2011") and hope for the best. Any workflow that was changed in the last two weeks is negligible and could be recreated/updated as needed. Do you think this is a viable solution? When I asked the BMC tech if I could dump the T,H & B tables ; restore the db and reload the T, H & B tables he reminded me that the arschema and other meta tables would probably be out of synch. That's when I thought of using rrrChive. Sorry to be so long winded but I need to get this back online, BMC can't find anything in the logs and I don't want to lose the tickets we've taken in the last week. --- John J. Reiser Remedy Developer/Administrator Senior Software Development Analyst Lockheed Martin - MS2 The star that burns twice as bright burns half as long. Pay close attention and be illuminated by its brilliance. - paraphrased by me ____________________________________________________________________________ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are" _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are"