PS. The 91 is a red herring. It's the Sig 11 (SEGV) you need to worry about. The 91 is another process not being able to communicate with the arserverd process. Cheers Ben
_____ From: Ben Chernys [mailto:ben.cher...@softwaretoolhouse.com] Sent: November 13, 2009 8:42 PM To: 'arslist@ARSLIST.ORG' Subject: RE: Prod server down - services will not stay up The signal 11 is bad code - simple as that. It's a "segmentation violation" which means that the server (arserverd) attempted to read or write to an address not allocated to its virtual space. It can also be caused by a double free or two pointers to one block which has been freed. In any event, you cannot fix this without the ARS source code which I expect you would find hard to get. That being said, the easiest way to determine (and then circumvent) these types of things is to turn on SQL logging on the server before the system starts (through the ar.conf file). The exact settings are in the configuring ARS guide. Then, when the blow up happens, see what the server was attempting to do. You can usually spot some possible internal database inconsistencies (in ARS meta-data) in this way and then repair them manually through SQL before the ARS start-up. Additionally, there may be patches available that address the problem. Cheers Ben Chernys _____ From: Action Request System discussion list(ARSList) [mailto:arsl...@arslist.org] On Behalf Of Susan Palmer Sent: November 13, 2009 8:30 PM To: arslist@ARSLIST.ORG Subject: Prod server down - services will not stay up ** Help !! Working with support but could use anyone else's input. I'm at WWRUG so it's somewhat limiting. We did a truss log and and when the services drop (arerror 91) we see the following: 167 /11: read(54, "\0FE\0\006\0\0\0\0\01017".., 2064) = 254 /11: write(54, "\0A1\0\006\0\0\0\0\003 ^".., 161) = 161 /11: read(54, "\0F7\0\006\0\0\0\0\01017".., 2064) = 247 /11: Incurred fault #6, FLTBOUNDS %pc = 0xFE6A3558 /11: siginfo: SIGSEGV SEGV_MAPERR addr=0xFB47FB4C /11: Received signal #11, SIGSEGV [caught] /11: siginfo: SIGSEGV SEGV_MAPERR addr=0xFB47FB4C The services do restart automatically so armonitor is doing it's job. We've commented out everything from armonitor but the arserverd command. We stay up for between 2-10 minutes and then wham, we're down again. Obviously this just started this morning. unix sun solaris 10 oracle 10g ars 7.0.1P2 They did expand the database size last night if that has any bearing. But we can connect to the database successfully when ar is down. Nothing else helpful in arerror.log, only 91 error. I'm at the Hardrock hotel, call room 30601 if you have questions or can help! Thanks, Susan _Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the Answers Are"_ _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org Platinum Sponsor:rmisoluti...@verizon.net ARSlist: "Where the Answers Are"