Re: [AOLSERVER] Diagnosing an AOLserver performance problem
Janine Sisk wrote: Unfortunately I have no idea; they process their own stats and so far have not shared (I have asked but they keep forgetting). ... With just those changes I'm still seeing the nsd process consume 50%+ of the CPU from time to time but it looks like it's happening for a shorter period of time most of the time, so that is at least an improvement. From my experience, you can obtain much more improvement by looking for bottlenecks in the application than by tuning a few parameter in the aolserver configuration. So far, i did not see any case of "slow" behavior like you are describing attributable to aolserver. Your problem is most likely more an "application performance problem" than a "aolserver performance problem". If you have an application running since several years, it is not unlikely that some tables have grown quite large, such that missing indices and sequentual scans might have become a problem over the years. Furthermore, many OpenACS user don't care much about the performance of scheduled jobs, but these can lead exactly to the behavior you are observing. I would recommend to monitor the behavior more closely, by using e.g. the xotcl request monitor and the aolserver statistics http://aolserver.cvs.sourceforge.net/viewvc/aolserver/aolserver/examples/config/stats.tcl?view=markup Since you seem to run quite an old version of OpenACS: Another option, i would consider is to upgrade OpenACS. We fixed many performance problems over the last years in the base framework. -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
[AOLSERVER] Question on two AOLserver tickets
Hi, I'm trying to debug an AOLserver crash and the point of crash seems to be AppendConn in NS_GetProcInfo... I will post the stack trace after just for reference. Looking through the ticket tracker on AOLserver, I found two tickets of particular interest: http://dev.aolserver.com/trac/ticket/325 --> My question with this ticket is was it ever resolved? and the second ticket: http://dev.aolserver.com/trac/ticket/152 --> This problem should only happen if the command ns_server was called in a multi-threaded environment, right? Here is the call stack trace I'm working with. I'm more interested in Ticket #325 as it may be related to my problem. - Call Stack Trace - calling call entryargument values in hex location type point(? means dubious value) kpedbg_dmp_stack()+ call B5B81884 B45FFB74 ? 0 ? 219 kpeDbgCrash()+72 call B5B75E14 0 ? 5 ? 0 ? 80BD810 ? B45FFC08 ? B45FFBF0 ? kpeDbgSignalHandler call B5B867B4 0 ? 5 ? B72A331C ? 2 ? 4 ? ()+107 5F ? 4 ? B4600C5D ? skgesig_sigactionHa call B45FFC54 ? B739FFE0 ? ndler()+214 gsignal()+71 signal 6 ? B460110C ? B460118C ? abort()+265 call gsignal()6 ? B460152C ? 0 ? B7FC1E84 ? B4601550 ? B4601564 ? NsBlockSignals() call B7F749F0 3 ? B7FB9ED5 ? B ? 30 ? 46 ? B7F565F0 ? B7FC2420 call B ? 33 ? 0 ? 7B ? 7B ? C ? AppendConn()+117 call B7F74E20 B4601AE8 ? C ? 51C5 ? 0 ? 1 ? B7E46FF4 ? NsConnArgProc()+61 call AppendConn() B4601AE8 ? 80B0C1C ? B7FB51A2 ? ? 228E24D8 ? 0 ? Ns_GetProcInfo()+16 call B4601AE8 ? CD298C0 ? 1 B4601A28 ? B7F33C33 ? B4DF4EA1 ? B7E46BA0 ? ThreadArgProc()+43 call B7F74410 B4601AE8 ? B7F8E9B6 ? CD298C0 ? B7F6337C ? CCF7A20 ? Ns_ThreadList()+207 call B4601AE8 ? B7F8E9B6 ? CD298C0 ? 0 ? 4A0935D9 ? B7FBB174 ? NsTclInfoObjCmd()+5 call B7F73B30 B4601AE8 ? B7F8917B ? 46 B7FBC080 ? B7FB34D3 ? 0 ? B4601AE4 ? TclEvalObjvInternal call EF0B1C0 ? CE907D0 ? 2 ? ()+819 EC701D8 ? B304D010 ? A7DBAE50 ? TclExecuteByteCode( call _init()+184 CE907D0 ? 2 ? EC701D8 ? 0 ? )+107130 ? 0 ? TclCompEvalObj()+15 call TclExecuteByteCode( CE907D0 ? 0 ? 0 ? 0 ? 2 )B4602924 ? 34ECE ? TclObjInterpProc()+ call B7EBE8E0 CE907D0 ? ABF19440 ? 645120C4660 ? 1 ? B7F565F0 ? 18 ? TclEvalObjvInternal call ABF78CE8 ? CE907D0 ? 1 ? ()+819 EC701D4 ? B7F565F0 ? A7DBB540 ? TclExecuteByteCode( call _init()+184 CE907D0 ? 1 ? EC701D4 ? 0 ? )+107130 ? 0 ? TclCompEvalObj()+15 call TclExecuteByteCode( CE907D0 ? 3 ? 3 ? B7F565F0 ? 2 )B4602924 ? 34EC2 ? TclObjInterpProc()+ call B7EBE8E0 CE907D0 ? ABF19320 ? 645120C4260 ? 1 ? 100 ? 100 ? TclEvalObjvInternal call ABF76E28 ? CE907D0 ? 2 ? ()+819 EC701CC ? B7F565F0 ? A7DBAE50 ? TclExecuteByteCode( call _init()+184 CE907D0 ? 2 ? EC701CC ? 0 ? )+107130 ? 0 ? TclCompEvalObj()+15 call TclExecuteByteCode( CE907D0 ? 0 ? B7F2F0AB ? 2 )B7F565F0 ? A7DB7010 ? 87D6 ? TclObjInterpProc()+ call B7EBE8E0 CE907D0 ? ABF19158 ? 645120C4260
[AOLSERVER] ns_cache key size
Digging deep into the ACS session properties, I discovered something disturbing: even though ns_cache holds as large a key as you want to give it, it will only uniquely identify according to the beginning of the key when setting. In other words, the following code: ns_log Notice "Sec properties: [ns_cache names util_memoize "sec_*"]" ns_cache set util_memoize $script [list [ns_time] $value] ns_log Notice "Sec properties: [ns_cache names util_memoize "sec_*"]" Executed where $script evaluates to "sec_lookup_property 2304 general-editor upload_content" ($value is irrelevant) shows this in my log: [14/May/2009:10:52:57][20497.4201664][-conn:bridge::2] Notice: Sec properties: {sec_lookup_property 2304 general-editor upload_file} [14/May/2009:10:52:57][20497.4201664][-conn:bridge::2] Notice: Sec properties: {sec_lookup_property 2304 general-editor upload_content} So an entry that was there under a different key has disappeared and is now replaced by an entry with the new key. My only conclusion can be that ns_cache set assumed the two entries had the same key, although curiously it replaced the full value of the old key with the full value of the new key. Also curious is that "ns_cache get" can still tell the difference between the two keys. I found that Ns_CacheCreate takes in a parameter that specifies the size of the key, and it seems to use the value of TCL_STRING_KEYS, though I don't know where that value is set. Is this the culprit, and if so how can I change this? If this is not the culprit, what is? Titi Ala'ilima Lead Architect MedTouch, LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 Fax: 888.770.5082 -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Diagnosing an AOLserver performance problem
I think you are right, Gustaf - I've given my client a choice of which way to go from here, one of which is to start fixing the code. janine On May 14, 2009, at 12:22 AM, Gustaf Neumann wrote: Janine Sisk wrote: Unfortunately I have no idea; they process their own stats and so far have not shared (I have asked but they keep forgetting). ... With just those changes I'm still seeing the nsd process consume 50% + of the CPU from time to time but it looks like it's happening for a shorter period of time most of the time, so that is at least an improvement. From my experience, you can obtain much more improvement by looking for bottlenecks in the application than by tuning a few parameter in the aolserver configuration. So far, i did not see any case of "slow" behavior like you are describing attributable to aolserver. Your problem is most likely more an "application performance problem" than a "aolserver performance problem". If you have an application running since several years, it is not unlikely that some tables have grown quite large, such that missing indices and sequentual scans might have become a problem over the years. Furthermore, many OpenACS user don't care much about the performance of scheduled jobs, but these can lead exactly to the behavior you are observing. I would recommend to monitor the behavior more closely, by using e.g. the xotcl request monitor and the aolserver statistics http://aolserver.cvs.sourceforge.net/viewvc/aolserver/aolserver/examples/config/stats.tcl?view=markup Since you seem to run quite an old version of OpenACS: Another option, i would consider is to upgrade OpenACS. We fixed many performance problems over the last years in the base framework. -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to > with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank. --- Janine Sisk President/CEO of furfly, LLC 503-693-6407 -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] ns_cache key size
Grrr... after broadening my search I discovered that the "ns_cache set" was wiping out all the other entries as well, so it was actually an issue of the size limitation for the cache itself. I think I shall switch to using a separate time-limited cache for this purpose instead. Sorry for wasting your time. Titi Ala'ilima Lead Architect MedTouch, LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 Fax: 888.770.5082 -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Question on two AOLserver tickets
Hi, Do you have some sort of background job that calls "ns_server active" (or similar) regularly? That could lead to random crashes. The description in http://dev.aolserver.com/trac/ticket/152 is accurate: The code, by design, is not strictly safe as it's assumed to only be used interactively and occasionally as part of debugging and performance monitoring/optimization. To make it safe would require adding mutex locks around areas that are assumed read-only and/or single-threaded which could possibly lead to lock contention. I can't say it those assumptions have ever been proven true or false but that was my thinking when the code was first written. -Jim On May 14, 2009, at 4:16 AM, Sep Ng wrote: Hi, I'm trying to debug an AOLserver crash and the point of crash seems to be AppendConn in NS_GetProcInfo... I will post the stack trace after just for reference. Looking through the ticket tracker on AOLserver, I found two tickets of particular interest: http://dev.aolserver.com/trac/ticket/325 --> My question with this ticket is was it ever resolved? and the second ticket: http://dev.aolserver.com/trac/ticket/152 --> This problem should only happen if the command ns_server was called in a multi-threaded environment, right? Here is the call stack trace I'm working with. I'm more interested in Ticket #325 as it may be related to my problem. - Call Stack Trace - calling call entryargument values in hex location type point(? means dubious value) kpedbg_dmp_stack()+ call B5B81884 B45FFB74 ? 0 ? 219 kpeDbgCrash()+72 call B5B75E14 0 ? 5 ? 0 ? 80BD810 ? B45FFC08 ? B45FFBF0 ? kpeDbgSignalHandler call B5B867B4 0 ? 5 ? B72A331C ? 2 ? 4 ? ()+107 5F ? 4 ? B4600C5D ? skgesig_sigactionHa call B45FFC54 ? B739FFE0 ? ndler()+214 gsignal()+71 signal 6 ? B460110C ? B460118C ? abort()+265 call gsignal()6 ? B460152C ? 0 ? B7FC1E84 ? B4601550 ? B4601564 ? NsBlockSignals() call B7F749F0 3 ? B7FB9ED5 ? B ? 30 ? 46 ? B7F565F0 ? B7FC2420 call B ? 33 ? 0 ? 7B ? 7B ? C ? AppendConn()+117 call B7F74E20 B4601AE8 ? C ? 51C5 ? 0 ? 1 ? B7E46FF4 ? NsConnArgProc()+61 call AppendConn() B4601AE8 ? 80B0C1C ? B7FB51A2 ? ? 228E24D8 ? 0 ? Ns_GetProcInfo()+16 call B4601AE8 ? CD298C0 ? 1 B4601A28 ? B7F33C33 ? B4DF4EA1 ? B7E46BA0 ? ThreadArgProc()+43 call B7F74410 B4601AE8 ? B7F8E9B6 ? CD298C0 ? B7F6337C ? CCF7A20 ? Ns_ThreadList()+207 call B4601AE8 ? B7F8E9B6 ? CD298C0 ? 0 ? 4A0935D9 ? B7FBB174 ? NsTclInfoObjCmd()+5 call B7F73B30 B4601AE8 ? B7F8917B ? 46 B7FBC080 ? B7FB34D3 ? 0 ? B4601AE4 ? TclEvalObjvInternal call EF0B1C0 ? CE907D0 ? 2 ? ()+819 EC701D8 ? B304D010 ? A7DBAE50 ? TclExecuteByteCode( call _init()+184 CE907D0 ? 2 ? EC701D8 ? 0 ? )+107130 ? 0 ? TclCompEvalObj()+15 call TclExecuteByteCode( CE907D0 ? 0 ? 0 ? 0 ? 2 )B4602924 ? 34ECE ? TclObjInterpProc()+ call B7EBE8E0 CE907D0 ? ABF19440 ? 645120C4660 ? 1 ? B7F565F0 ? 18 ? TclEvalObjvInternal call ABF78CE8 ? CE907D0 ? 1 ? ()+819 EC701D4 ? B7F565F0 ? A7DBB540 ? TclExecuteByteCode( call _init()+184 CE907D0 ? 1 ? EC701D4 ? 0 ? )+107130 ? 0 ? TclCompEvalObj()+15 call TclExecuteByteCode( CE907D0 ? 3 ? 3 ? B7F565F0 ? 2 )B4602924 ? 34EC2 ? TclObjInterpProc()+ call B7EBE8E0 CE907D0 ? ABF
Re: [AOLSERVER] Question on two AOLserver tickets
Ironically, we have some monitoring code that does use that functionality. So our monitoring is killing our servers. Nice! I'm removing that code now. Jade Rubick Director of Development TRUiST 120 Wall Street, 4th Floor New York, NY USA jrub...@truist.com +1 503 285 4963 +1 707 671 1333 fax www.truist.com The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On Thu, May 14, 2009 at 10:19 AM, Jim Davidson wrote: > Hi, > > Do you have some sort of background job that calls "ns_server active" (or > similar) regularly? That could lead to random crashes. The description in > http://dev.aolserver.com/trac/ticket/152 is accurate: The code, by > design, is not strictly safe as it's assumed to only be used interactively > and occasionally as part of debugging and performance > monitoring/optimization. > > To make it safe would require adding mutex locks around areas that are > assumed read-only and/or single-threaded which could possibly lead to lock > contention. I can't say it those assumptions have ever been proven true or > false but that was my thinking when the code was first written. > > -Jim > > > > > On May 14, 2009, at 4:16 AM, Sep Ng wrote: > > Hi, >> >> I'm trying to debug an AOLserver crash and the point of crash seems to >> be AppendConn in NS_GetProcInfo... I will post the stack trace after >> just for reference. >> >> Looking through the ticket tracker on AOLserver, I found two tickets >> of particular interest: >> >> http://dev.aolserver.com/trac/ticket/325 >> --> My question with this ticket is was it ever resolved? >> >> and the second ticket: >> >> http://dev.aolserver.com/trac/ticket/152 >> --> This problem should only happen if the command ns_server was >> called in a multi-threaded environment, right? >> >> Here is the call stack trace I'm working with. I'm more interested in >> Ticket #325 as it may be related to my problem. >> >> - Call Stack Trace - >> calling call entryargument values in >> hex >> location type point(? means dubious >> value) >> >> >> kpedbg_dmp_stack()+ call B5B81884 B45FFB74 ? 0 ? >> 219 >> kpeDbgCrash()+72 call B5B75E14 0 ? 5 ? 0 ? >> 80BD810 ? >> B45FFC08 ? >> B45FFBF0 ? >> kpeDbgSignalHandler call B5B867B4 0 ? 5 ? B72A331C ? >> 2 ? 4 ? >> ()+107 5F ? 4 ? B4600C5D ? >> skgesig_sigactionHa call B45FFC54 ? >> B739FFE0 ? >> ndler()+214 >> gsignal()+71 signal 6 ? B460110C ? >> B460118C ? >> abort()+265 call gsignal()6 ? B460152C ? 0 ? >> B7FC1E84 ? >> B4601550 ? >> B4601564 ? >> NsBlockSignals() call B7F749F0 3 ? B7FB9ED5 ? B ? >> 30 ? 46 ? >> B7F565F0 ? >> B7FC2420 call B ? 33 ? 0 ? 7B ? >> 7B ? C ? >> AppendConn()+117 call B7F74E20 B4601AE8 ? C ? >> 51C5 ? 0 ? 1 ? >> B7E46FF4 ? >> NsConnArgProc()+61 call AppendConn() B4601AE8 ? >> 80B0C1C ? >> B7FB51A2 ? >> ? >> 228E24D8 ? 0 ? >> Ns_GetProcInfo()+16 call B4601AE8 ? >> CD298C0 ? >> 1 B4601A28 ? >> B7F33C33 ? >> B4DF4EA1 ? >> B7E46BA0 ? >> ThreadArgProc()+43 call B7F74410 B4601AE8 ? >> B7F8E9B6 ? >> CD298C0 ? >> B7F6337C ? >> CCF7A20 ? >> Ns_ThreadList()+207 call B4601AE8 ? >> B7F8E9B6 ? >> CD298C0 ? 0 ? >> 4A0935D9 ? >> B7FBB174 ? >> NsTclInfoObjCmd()+5 call B7F73B30 B4601AE8 ? >> B7F8917B ? >> 46 B7FBC080 ? >> B7FB34D3 ? 0 ? >> B4601AE4 ? >> TclEvalObjvInternal call EF0B1C0 ? CE907D0 ? >> 2 ? >> ()+819 EC701D8 ? >> B304D010 ? >> A7DBAE50 ? >> TclExecuteByteCode( call _init()+184 CE907D0 ? 2 ? >> EC701D8 ? 0
Re: [AOLSERVER] Question on two AOLserver tickets
Yup -- should really have been documented better -- sorry about that. Anyway, what is the monitoring attempting to dig up? There may some other safe ways to get the same. -Jim On May 14, 2009, at 2:04 PM, Jade Rubick wrote: Ironically, we have some monitoring code that does use that functionality. So our monitoring is killing our servers. Nice! I'm removing that code now. Jade Rubick Director of Development TRUiST 120 Wall Street, 4th Floor New York, NY USA jrub...@truist.com +1 503 285 4963 +1 707 671 1333 fax www.truist.com The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On Thu, May 14, 2009 at 10:19 AM, Jim Davidson wrote: Hi, Do you have some sort of background job that calls "ns_server active" (or similar) regularly? That could lead to random crashes. The description in http://dev.aolserver.com/trac/ticket/152 is accurate: The code, by design, is not strictly safe as it's assumed to only be used interactively and occasionally as part of debugging and performance monitoring/optimization. To make it safe would require adding mutex locks around areas that are assumed read-only and/or single-threaded which could possibly lead to lock contention. I can't say it those assumptions have ever been proven true or false but that was my thinking when the code was first written. -Jim On May 14, 2009, at 4:16 AM, Sep Ng wrote: Hi, I'm trying to debug an AOLserver crash and the point of crash seems to be AppendConn in NS_GetProcInfo... I will post the stack trace after just for reference. Looking through the ticket tracker on AOLserver, I found two tickets of particular interest: http://dev.aolserver.com/trac/ticket/325 --> My question with this ticket is was it ever resolved? and the second ticket: http://dev.aolserver.com/trac/ticket/152 --> This problem should only happen if the command ns_server was called in a multi-threaded environment, right? Here is the call stack trace I'm working with. I'm more interested in Ticket #325 as it may be related to my problem. - Call Stack Trace - calling call entryargument values in hex location type point(? means dubious value) kpedbg_dmp_stack()+ call B5B81884 B45FFB74 ? 0 ? 219 kpeDbgCrash()+72 call B5B75E14 0 ? 5 ? 0 ? 80BD810 ? B45FFC08 ? B45FFBF0 ? kpeDbgSignalHandler call B5B867B4 0 ? 5 ? B72A331C ? 2 ? 4 ? ()+107 5F ? 4 ? B4600C5D ? skgesig_sigactionHa call B45FFC54 ? B739FFE0 ? ndler()+214 gsignal()+71 signal 6 ? B460110C ? B460118C ? abort()+265 call gsignal()6 ? B460152C ? 0 ? B7FC1E84 ? B4601550 ? B4601564 ? NsBlockSignals() call B7F749F0 3 ? B7FB9ED5 ? B ? 30 ? 46 ? B7F565F0 ? B7FC2420 call B ? 33 ? 0 ? 7B ? 7B ? C ? AppendConn()+117 call B7F74E20 B4601AE8 ? C ? 51C5 ? 0 ? 1 ? B7E46FF4 ? NsConnArgProc()+61 call AppendConn() B4601AE8 ? 80B0C1C ? B7FB51A2 ? ? 228E24D8 ? 0 ? Ns_GetProcInfo()+16 call B4601AE8 ? CD298C0 ? 1 B4601A28 ? B7F33C33 ? B4DF4EA1 ? B7E46BA0 ? ThreadArgProc()+43 call B7F74410 B4601AE8 ? B7F8E9B6 ? CD298C0 ? B7F6337C ? CCF7A20 ? Ns_ThreadList()+207 call B4601AE8 ? B7F8E9B6 ? CD298C0 ? 0 ? 4A0935D9 ? B7FBB174 ? NsTclInfoObjCmd()+5 call B7F73B30 B4601AE8 ? B7F8917B ? 46 B7FBC080 ? B7FB34D3 ? 0 ? B4601AE4 ? TclEvalObjvInternal call EF0B1C0 ? CE907D0 ? 2 ? ()+819 EC701D8 ? B304D010 ? A7DBAE50 ? TclExecuteByteCode( call _init()+184 CE907D0 ? 2 ? EC701D8 ? 0 ? )+10713
Re: [AOLSERVER] Question on two AOLserver tickets
I'm just happy we figured it out. We were using this call: set connections [ns_server active] But it wasn't in a scheduled proc, so I just moved it behind a password protection section, and put a warning around it. We seldom (never) used that page anyway. I think a bot may have found it or something. Jade Jade Rubick Director of Development TRUiST 120 Wall Street, 4th Floor New York, NY 10005 USA jrub...@truist.com +1 503 285 4963 +1 707 671 1333 fax www.truist.com The information contained in this email/document is confidential and may be legally privileged. Access to this email/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On May 14, 2009, at 12:33 PM, Jim Davidson wrote: Yup -- should really have been documented better -- sorry about that. Anyway, what is the monitoring attempting to dig up? There may some other safe ways to get the same. -Jim On May 14, 2009, at 2:04 PM, Jade Rubick wrote: Ironically, we have some monitoring code that does use that functionality. So our monitoring is killing our servers. Nice! I'm removing that code now. Jade Rubick Director of Development TRUiST 120 Wall Street, 4th Floor New York, NY USA jrub...@truist.com +1 503 285 4963 +1 707 671 1333 fax www.truist.com The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On Thu, May 14, 2009 at 10:19 AM, Jim Davidson wrote: Hi, Do you have some sort of background job that calls "ns_server active" (or similar) regularly? That could lead to random crashes. The description in http://dev.aolserver.com/trac/ticket/ 152 is accurate: The code, by design, is not strictly safe as it's assumed to only be used interactively and occasionally as part of debugging and performance monitoring/optimization. To make it safe would require adding mutex locks around areas that are assumed read-only and/or single-threaded which could possibly lead to lock contention. I can't say it those assumptions have ever been proven true or false but that was my thinking when the code was first written. -Jim On May 14, 2009, at 4:16 AM, Sep Ng wrote: Hi, I'm trying to debug an AOLserver crash and the point of crash seems to be AppendConn in NS_GetProcInfo... I will post the stack trace after just for reference. Looking through the ticket tracker on AOLserver, I found two tickets of particular interest: http://dev.aolserver.com/trac/ticket/325 --> My question with this ticket is was it ever resolved? and the second ticket: http://dev.aolserver.com/trac/ticket/152 --> This problem should only happen if the command ns_server was called in a multi-threaded environment, right? Here is the call stack trace I'm working with. I'm more interested in Ticket #325 as it may be related to my problem. - Call Stack Trace - calling call entryargument values in hex location type point(? means dubious value) kpedbg_dmp_stack()+ call B5B81884 B45FFB74 ? 0 ? 219 kpeDbgCrash()+72 call B5B75E14 0 ? 5 ? 0 ? 80BD810 ? B45FFC08 ? B45FFBF0 ? kpeDbgSignalHandler call B5B867B4 0 ? 5 ? B72A331C ? 2 ? 4 ? ()+107 5F ? 4 ? B4600C5D ? skgesig_sigactionHa call B45FFC54 ? B739FFE0 ? ndler()+214 gsignal()+71 signal 6 ? B460110C ? B460118C ? abort()+265 call gsignal()6 ? B460152C ? 0 ? B7FC1E84 ? B4601550 ? B4601564 ? NsBlockSignals() call B7F749F0 3 ? B7FB9ED5 ? B ? 30 ? 46 ? B7F565F0 ? B7FC2420 call B ? 33 ? 0 ? 7B ? 7B ? C ? AppendConn()+117 call B7F74E20 B4601AE8 ? C ? 51C5 ? 0 ? 1 ? B7E46FF4 ? NsConnArgProc()+61 call AppendConn() B4601AE8 ? 80B0C1C ? B7FB51A2 ? ? 228E24D8 ? 0 ? Ns_GetProcInfo()+16 call B4601AE8 ? CD298C0 ? 1 B4601A28 ? B7F33C33 ? B4DF4EA1 ? B7E46BA0 ? ThreadArgProc()+
Re: [AOLSERVER] Question on two AOLserver tickets
Maybe calling the API should result in a ns_log Warning to indicate a potential crash. tom jackson On Thu, 2009-05-14 at 13:26 -0700, Jade Rubick wrote: > I'm just happy we figured it out. > > > We were using this call: > > > set connections [ns_server active] > > > But it wasn't in a scheduled proc, so I just moved it behind a > password protection section, and put a warning around it. We seldom > (never) used that page anyway. I think a bot may have found it or > something. > > > Jade > > > Jade Rubick > > Director of Development > TRUiST > > 120 Wall Street, 4th Floor > > New York, NY 10005 USA > > jrub...@truist.com > +1 503 285 4963 > +1 707 671 1333 fax > > > www.truist.com > > > The information contained in this email/document is confidential and > may be legally privileged. Access to this email/document by anyone > other than the intended recipient(s) is unauthorized. If you are not > an intended recipient, any disclosure, copying, distribution, or any > action taken or omitted to be taken in reliance to it, is prohibited. > > On May 14, 2009, at 12:33 PM, Jim Davidson wrote: > > > > > > > Yup -- should really have been documented better -- sorry about > > that. > > > > > > Anyway, what is the monitoring attempting to dig up? There may some > > other safe ways to get the same. > > > > > > -Jim > > > > > > > > > > > > > > On May 14, 2009, at 2:04 PM, Jade Rubick wrote: > > > > > Ironically, we have some monitoring code that does use that > > > functionality. > > > > > > So our monitoring is killing our servers. Nice! > > > > > > I'm removing that code now. > > > > > > Jade Rubick > > > Director of Development > > > TRUiST > > > 120 Wall Street, 4th Floor > > > New York, NY USA > > > jrub...@truist.com > > > +1 503 285 4963 > > > +1 707 671 1333 fax > > > > > > www.truist.com > > > > > > > > > The information contained in this email/document is confidential > > > and may be legally privileged. Access to this mail/document by > > > anyone other than the intended recipient(s) is unauthorized. If > > > you are not an intended recipient, any disclosure, copying, > > > distribution, or any action taken or omitted to be taken in > > > reliance to it, is prohibited. > > > > > > > > > On Thu, May 14, 2009 at 10:19 AM, Jim Davidson > > > wrote: > > > Hi, > > > > > > Do you have some sort of background job that calls > > > "ns_server active" (or similar) regularly? That could > > > lead to random crashes. The description in > > > http://dev.aolserver.com/trac/ticket/152 is accurate: The > > > code, by design, is not strictly safe as it's assumed to > > > only be used interactively and occasionally as part of > > > debugging and performance monitoring/optimization. > > > > > > To make it safe would require adding mutex locks around > > > areas that are assumed read-only and/or single-threaded > > > which could possibly lead to lock contention. I can't say > > > it those assumptions have ever been proven true or false > > > but that was my thinking when the code was first written. > > > > > > -Jim > > > > > > > > > > > > > > > > > > On May 14, 2009, at 4:16 AM, Sep Ng wrote: > > > > > > Hi, > > > > > > I'm trying to debug an AOLserver crash and the > > > point of crash seems to > > > be AppendConn in NS_GetProcInfo... I will post the > > > stack trace after > > > just for reference. > > > > > > Looking through the ticket tracker on AOLserver, I > > > found two tickets > > > of particular interest: > > > > > > http://dev.aolserver.com/trac/ticket/325 > > > --> My question with this ticket is was it ever > > > resolved? > > > > > > and the second ticket: > > > > > > http://dev.aolserver.com/trac/ticket/152 > > > --> This problem should only happen if the command > > > ns_server was > > > called in a multi-threaded environment, right? > > > > > > Here is the call stack trace I'm working with. > > > I'm more interested in > > > Ticket #325 as it may be related to my problem. > > > > > > - Call Stack Trace - > > > calling call entry > > > argument values in > > > hex > > > location type point > > > (? means dubious > > > value) > > > -
Re: [AOLSERVER] Question on two AOLserver tickets
Good idea. Maybe it would make sense to disable it by default with some config flag to enable? Jim Sent from my iPhone On May 14, 2009, at 4:49 PM, Tom Jackson wrote: Maybe calling the API should result in a ns_log Warning to indicate a potential crash. tom jackson On Thu, 2009-05-14 at 13:26 -0700, Jade Rubick wrote: I'm just happy we figured it out. We were using this call: set connections [ns_server active] But it wasn't in a scheduled proc, so I just moved it behind a password protection section, and put a warning around it. We seldom (never) used that page anyway. I think a bot may have found it or something. Jade Jade Rubick Director of Development TRUiST 120 Wall Street, 4th Floor New York, NY 10005 USA jrub...@truist.com +1 503 285 4963 +1 707 671 1333 fax www.truist.com The information contained in this email/document is confidential and may be legally privileged. Access to this email/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On May 14, 2009, at 12:33 PM, Jim Davidson wrote: Yup -- should really have been documented better -- sorry about that. Anyway, what is the monitoring attempting to dig up? There may some other safe ways to get the same. -Jim On May 14, 2009, at 2:04 PM, Jade Rubick wrote: Ironically, we have some monitoring code that does use that functionality. So our monitoring is killing our servers. Nice! I'm removing that code now. Jade Rubick Director of Development TRUiST 120 Wall Street, 4th Floor New York, NY USA jrub...@truist.com +1 503 285 4963 +1 707 671 1333 fax www.truist.com The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On Thu, May 14, 2009 at 10:19 AM, Jim Davidson wrote: Hi, Do you have some sort of background job that calls "ns_server active" (or similar) regularly? That could lead to random crashes. The description in http://dev.aolserver.com/trac/ticket/152 is accurate: The code, by design, is not strictly safe as it's assumed to only be used interactively and occasionally as part of debugging and performance monitoring/optimization. To make it safe would require adding mutex locks around areas that are assumed read-only and/or single-threaded which could possibly lead to lock contention. I can't say it those assumptions have ever been proven true or false but that was my thinking when the code was first written. -Jim On May 14, 2009, at 4:16 AM, Sep Ng wrote: Hi, I'm trying to debug an AOLserver crash and the point of crash seems to be AppendConn in NS_GetProcInfo... I will post the stack trace after just for reference. Looking through the ticket tracker on AOLserver, I found two tickets of particular interest: http://dev.aolserver.com/trac/ticket/325 --> My question with this ticket is was it ever resolved? and the second ticket: http://dev.aolserver.com/trac/ticket/152 --> This problem should only happen if the command ns_server was called in a multi-threaded environment, right? Here is the call stack trace I'm working with. I'm more interested in Ticket #325 as it may be related to my problem. - Call Stack Trace - calling call entry argument values in hex location type point (? means dubious value) kpedbg_dmp_stack()+ call B5B81884 B45FFB74 ? 0 ? 219 kpeDbgCrash()+72 call B5B75E14 0 ? 5 ? 0 ? 80BD810 ? B45FFC08 ? B45FFBF0 ? kpeDbgSignalHandler call B5B867B4 0 ? 5 ? B72A331C ? 2 ? 4 ? ()+107 5F ? 4 ? B4600C5D ? skgesig_sigactionHa call B45FFC54 ? B739FFE0 ? ndler()+214 gsignal()+71 signal 6 ? B460110C ? B460118C ?
Re: [AOLSERVER] Question on two AOLserver tickets
On Thu, 2009-05-14 at 18:08 -0400, Jim Davidson wrote: > Good idea. Maybe it would make sense to disable it by default with > some config flag to enable? I was thinking the same, but I wasn't sure how many people actually use this command. This must be one of a very few commands that I have never used, so I'm far from informed on what it is used for. > Sent from my iPhone Sent via a local network shared with an iTouch. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Question on two AOLserver tickets
How about having the proc enable only if debug settings are turned on on AOLserver? On May 15, 6:08 am, Jim Davidson wrote: > Good idea. Maybe it would make sense to disable it by default with > some config flag to enable? > Jim > > Sent from my iPhone > > On May 14, 2009, at 4:49 PM, Tom Jackson wrote: > > > Maybe calling the API should result in a ns_log Warning to indicate a > > potential crash. > > > tom jackson > > > On Thu, 2009-05-14 at 13:26 -0700, Jade Rubick wrote: > >> I'm just happy we figured it out. > > >> We were using this call: > > >> set connections [ns_server active] > > >> But it wasn't in a scheduled proc, so I just moved it behind a > >> password protection section, and put a warning around it. We seldom > >> (never) used that page anyway. I think a bot may have found it or > >> something. > > >> Jade > > >> Jade Rubick > > >> Director of Development > >> TRUiST > > >> 120 Wall Street, 4th Floor > > >> New York, NY 10005 USA > > >> jrub...@truist.com > >> +1 503 285 4963 > >> +1 707 671 1333 fax > > >>www.truist.com > > >> The information contained in this email/document is confidential and > >> may be legally privileged. Access to this email/document by anyone > >> other than the intended recipient(s) is unauthorized. If you are not > >> an intended recipient, any disclosure, copying, distribution, or any > >> action taken or omitted to be taken in reliance to it, is prohibited. > > >> On May 14, 2009, at 12:33 PM, Jim Davidson wrote: > > >>> Yup -- should really have been documented better -- sorry about > >>> that. > > >>> Anyway, what is the monitoring attempting to dig up? There may some > >>> other safe ways to get the same. > > >>> -Jim > > >>> On May 14, 2009, at 2:04 PM, Jade Rubick wrote: > > Ironically, we have some monitoring code that does use that > functionality. > > So our monitoring is killing our servers. Nice! > > I'm removing that code now. > > Jade Rubick > Director of Development > TRUiST > 120 Wall Street, 4th Floor > New York, NY USA > jrub...@truist.com > +1 503 285 4963 > +1 707 671 1333 fax > > www.truist.com > > The information contained in this email/document is confidential > and may be legally privileged. Access to this mail/document by > anyone other than the intended recipient(s) is unauthorized. If > you are not an intended recipient, any disclosure, copying, > distribution, or any action taken or omitted to be taken in > reliance to it, is prohibited. > > On Thu, May 14, 2009 at 10:19 AM, Jim Davidson > wrote: > Hi, > > Do you have some sort of background job that calls > "ns_server active" (or similar) regularly? That could > lead to random crashes. The description in > http://dev.aolserver.com/trac/ticket/152is accurate: The > code, by design, is not strictly safe as it's assumed to > only be used interactively and occasionally as part of > debugging and performance monitoring/optimization. > > To make it safe would require adding mutex locks around > areas that are assumed read-only and/or single-threaded > which could possibly lead to lock contention. I can't say > it those assumptions have ever been proven true or false > but that was my thinking when the code was first written. > > -Jim > > On May 14, 2009, at 4:16 AM, Sep Ng wrote: > > Hi, > > I'm trying to debug an AOLserver crash and the > point of crash seems to > be AppendConn in NS_GetProcInfo... I will post the > stack trace after > just for reference. > > Looking through the ticket tracker on AOLserver, I > found two tickets > of particular interest: > > http://dev.aolserver.com/trac/ticket/325 > --> My question with this ticket is was it ever > resolved? > > and the second ticket: > > http://dev.aolserver.com/trac/ticket/152 > --> This problem should only happen if the command > ns_server was > called in a multi-threaded environment, right? > > Here is the call stack trace I'm working with. > I'm more interested in > Ticket #325 as it may be related to my problem. > > - Call Stack Trace - > calling call entry > argument values in > hex > location type point > (? means dubious > value) >