On Fri, 2009-04-03 at 16:20 -0600, Jim Davidson wrote: > HI, > > Sorry -- I wrote the original and goofy code leading to this > confusion... >
Thanks for monitoring our head scratching! I have a few random thoughts, and I'm not sure they add up to a particular conclusion or not. But here goes: * The adp code is squishy. It doesn't necessarily execute with the same precision as Tcl code. Under some configurations errors might not instantly terminate execution. * The adp code does not execute like an equivalent tcl page, probably due 100% to tcl pages using ns_tcl_abort and the generic tcl error handling. * The adp code has "evolved" over the years, actually there have been significant changes to the structure of the code. During the evolution, the detailed logic changed, sometimes bugs were introduced to fix perceived problems. It is amazing that nobody has noticed these differences. My guess is that nobody noticed because the changes have been uniquely backwards compatible, except when an actual error occurred. But since the errors were never logged, nobody was looking for the cause. * There is a good argument to be made that when an error occurs, everything that follows is crap. Therefore, further execution is pointless and may lead to compounded errors. * Trace filters could be thought of as a way to clean up stuff that was initialized during the request processing. But this is a hackish and currently invalid method in AOLserver. Why? The url can change during a connection, so code initialized in a preauth or postauth filter due to url matching might not get cleaned up with a matching url when trace filters run. * ns_atclose used to be a great way to clean up initialized code. It still works for connection based code, but it used to work for scheduled procs. ns_atclose has good qualities: it is registered only when code is initialized, so if the connection craps out before the initialization code runs, the ns_atclose code does not run. The other quality is that the code runs regardless of errors. It simply runs when the connection closes. This would be a great place for logging! * If the server experiences an internal error, it should attempt to return this information to the client. Maybe the client has dropped the connection, but this is not an internal server error (and AOLserver silently forgets about client drops). On error, the server should destroy all headers and all content and return a 500 response. The 500 response should get logged. This is the actual result of the request: an error. You have to tell the client what happened. Lots of code paths in AOLserver follow this rule, but not ADP code. * Access logging should happen whenever a response is returned, but we should keep in mind that an access log entry includes the response code and the length of the response, if these figures differ from reality, they become worse than useless. * Sometimes the error log is the best place to look for malfunctioning code. * AOLserver does a pretty good job of logging and offers many options for moderating/maximizing this important task. * I can't figure out why the AOLserver developers consider tcl pages to be a security risk when compared to ADP pages. AOL seemed to like adp pages, but OpenACS chose to use specialized filters. I have decided to use tcl pages as the primary "start page", and sometimes filters for generic modules/packages. * Filters seem uniquely superior when the url pattern determines which code should run. Unfortunately filters are many times registered to /*, the global pattern. This is not a filter in the typical sense. The only benefit is the ability to stage code to run in a particular order at a particular stage. The question is if there should be a separate stage for global filters, or a separate mechanism to register global code which runs at certain point, maybe conditioned on previous events. This is a like a generalized ns_atclose. * The access logging module is ancient, simple and in desperate need of professional help. * The Internal redirecting functionality is poorly documented and understood. These are configured as proxies or redirects. * A significant number of AOLserver API can be misused, that is these API cannot be used anywhere and everywhere. This is really no different from Tcl, where [break] must be executed inside of a loop. On first contact, it just isn't very obvious where the use of some AOLserver API is guaranteed to produce an error. (like calling ns_adp_return after an ns_returnXXX call). The result is understandable complaints about features which appear to be bugs. > Based on this discussion I think it's NOT working as intended. If > there was a time I thought ns_adp_abort should mean "return NS_ERROR > for connection status and thus no-traces..." I've long since > forgotten the rationale. What I remember is ns_adp_abort means > "somebody did something weird but valid, e.g., with ns_returnXXX, so > try to stop Tcl execution and don't send the buffer content (if > any)...". In contrast, ns_adp_break means "ADP buffer is good but > we're done so try to stop executing Tcl code and send what's > there...". The code is messy because the logical "break" is done by > raising a Tcl error which is why the ADP exception is stored > elsewhere. Also, I say "try to stop..." because I think one could > possibly use "catch" to catch the attempt to unwind the Tcl stack and > keep executing code. I'm not aware of any draconian logical "jump" > out of a script -- Tcl maintains state on the stack and it has to be > unwound in the end. > This is a huge problem with Tcl, at least if you wish to introduce a new control structure which doesn't mess with the one built into Tcl. I wish Tcl would produce a generic [catch], I call it [catchonly] [catchonly] would ignore all un-enumerated return codes and would look like a simple switch statement, minus the default block. Probably enough for now. tom jackson > > Related, on the trace/filter/cleanup discussion, the code was written > over successive versions and was never very smart or satisfying: > > Traces: Stupid callbacks with no result code, they just all run if > the connection is "ok" run in FIFO order > > Filters: Slightly less stupid callbacks that could signal "stop > running filters" or "don't bother with connection" (in case of pre- > auth) in FIFO order > > Conn cleanup: More specific and perhaps not stupid: Callbacks run > unconditionally in LIFO order (suitable for cleanups). Evidently the > user could probe the connection context to see if things were bad or > good. > > I think over the years I figured out that just skipping some callbacks > because the connection was "bad" (itself a poorly defined state) was > dumb so the conn-cleanup stuff was introduced. > > And, just as old as the dumb trace code is the accesslog module -- it > was written when only the trace stuff was around and has stayed that > way. I'm not sure if switching accesslog to use a conn-cleanup would > work -- it may be accessing fields in the connection which may not be > initialized on a request cancelled very early on. Anyway, it seems > for the benefit of other filters (are there really any??), the ADP > code should be fixed so an "ns_adp_abort" is not an NS_ERROR. > > > BTW: I've wondered if accesslog is useful anyway -- seems like Google > Analytics gets you the insight you need on one end and a more specific > logging thing tied to the driver thread + cleanup would be useful for > the admin type folks. For example, I'm not sure you ever get a log > entry for a invalid connection request or aborted socket connection -- > that's knowledge the driver thread would have (and would be > interesting to admin folks) but would never generate an accesslog > because there would never have been any connection context to provide > for logging. > > > -Jim > > > > > > > > > > On Apr 3, 2009, at 3:11 PM, Tom Jackson wrote: > > > After reviewing the code from version 4.0 and up, it looks like the > > intended behavior is to not log an access log entry if an error > > occurs. > > > > In fact, the request was not serviced, so an entry in the access log > > would be incorrect, the correct log entry would be a 500 internal > > server > > error. > > > > I just found another bug, and the fact that errors are not returning > > an > > internal server error message indicates that there are further bugs in > > the way the code works. > > > > My guess is that if the internal server handler runs that the log > > entry > > will show up, we'll see. > > > > Anyway, here is another test adp which exposes another bug: > > > > <% > > > > ns_adp_puts whatsup > > ns_log Notice "Before include adp exception = [ns_adp_exception]" > > ns_adp_include test.adp > > ns_log Notice "adp exception = [ns_adp_exception]" > > ns_log Notice "I shouldn't be running" > > > > %> > > > > The include file test.adp: > > > > <% > > > > ns_return 200 text/plain hi > > > > ns_adp_abort > > > > ns_log Notice "After abort, I shouldn't be running" > > > > %> > > > > <% > > > > ns_log Notice "in separate adp block" > > > > %> > > > > > > With the bug the log notice after ns_adp_abort, inside the test.adp > > file > > does not execute (this is expected), but the statements after > > ns_adp_include do run, this is bad. Execution should stop with the > > include statement since test.adp returned ADP_ABORT. > > > > Here is a fix for the bug from > > line 1010 of nsd/adpeval.c > > /* > > * Clear the return exception and reset result. > > */ > > switch (itPtr->adp.exception) { > > case ADP_OK: > > break; > > case ADP_RETURN: > > itPtr->adp.exception = ADP_OK; > > result = TCL_OK; > > break; > > case ADP_ABORT: > > result = TCL_ERROR; > > break; > > default: > > result = TCL_OK; > > break; > > } > > > > But the code still has the bug which you identified, it may have to do > > with an actual error not clearing the adp.exception setting. > > > > tom jackson > > > > On Fri, 2009-04-03 at 14:54 -0500, Andrew Steets wrote: > >> My original concern was with the access logging proc, which happens > >> to > >> be run as a trace filter. I think that the access log entries should > >> be generated regardless of whether or not ns_adp_abort is called. I > >> don't care too much about anything else that was installed as a trace > >> filter. > >> > >> Do you agree that access log entries should be generated if > >> ns_adp_abort is called? > >> > >> -Andrew > >> > >> On Fri, Apr 3, 2009 at 10:33 AM, Tom Jackson <t...@rmadilo.com> wrote: > >> Andrew, > >> > >> I wasted a little more time looking at the actual code. My > >> impression is > >> that everything is working as expected. If there is an error > >> in a > >> postauth filter or in adp processing (registered proc), trace > >> filters > >> are skipped. Until about 4.5, errors during preauth also > >> skipped trace > >> filters. Not sure why this change was made. > >> > >> The only think that matters is what happens in Ns_AdpRequest. > >> If there > >> were no errors, the request will be logged. In order to get > >> ns_adp_abort > >> to work correctly, the tcl result must be set to TCL_ERROR > >> until code > >> returns to Ns_AdpRequest. This is why an additional structure > >> is > >> maintained for the adp exception, which is independent of the > >> tcl > >> exception. In this case, adp.exception indicates what actually > >> happened > >> during adp processing. > >> > >> So things seem to be working as intended, and they have been > >> working the > >> same way for a long time. It might be possible that you are > >> misusing > >> ns_adp_abort, or something else is messing up. > >> > >> Could you provide a simple test case, probably a few nested > >> adp > >> includes, which repeats the issue? Without a test case of what > >> you think > >> should work differently, it is hard to give any more advice. > >> > >> In general, when an error occurs during a request, the > >> response is by > >> definition an error response, so the original request might > >> get > >> transformed into an internal redirect to your error handling > >> page. An > >> error in this page, or a missing error page could cause > >> further > >> problems. > >> > >> Bottom line: no reason to believe that this is a bug. > >> > >> tom jackson > >> > >> p.s. this case seems to validate my belief that the hardest > >> bug to find > >> and fix is one that doesn't actually exist. > >> > >> On Thu, 2009-04-02 at 18:03 -0500, Andrew Steets wrote: > >> > >> > >>> What was the original purpose of "trace" filters? At the C > >> API level > >>> there is a distinction between between a trace filter and a > >> cleanup > >>> callback, but it doesn't look like you can register a > >> cleanup proc > >>> from TCL. Maybe this was mistakenly omitted? > >>> > >>> The cleanup procs run unconditionally. It seems like that > >> is the most > >>> appropriate place to handle "cleanup of resources." > >> Alternatively we > >>> could change the trace filters to run regardless of the > >>> Ns_ConnRunRequest() return status, but then that would make > >> them > >>> basically the same as the cleanups. > >>> > >>> I looked a little deeper into the source. The confusion > >> seems to > >>> arise in NsAdpFlush() which is run at the end of all ADP > >> processing. > >>> The code there is smart enough to recognize when an abort > >> exception > >>> has been signalled; it sets the TCL result to "adp flush > >> disabled: adp > >>> aborted", but it still returns TCL_ERROR. That is > >> essentially where > >>> the TCL exception gets turned into a full blown connection > >> processing > >>> error. We could change NsAdpFlush() to return success when > >> it > >>> recognizes the abort exception, or just not run NsAdpFlush() > >> for abort > >>> exceptions. > >>> > >>> There would still be cases where trace filters would not run > >> though. > >>> For instance if you called ns_returnxxx without calling > >> ns_adp_abort. > >>> I'm not sure if that is a bad thing. > >>> > >>> It would be nice to hear from anyone who knows about the > >> original > >>> motivation for the trace and cleanup filters. > >>> > >>> -Andrew > >>> > >>> On Thu, Apr 2, 2009 at 3:53 PM, Tom Jackson > >> <t...@rmadilo.com> wrote: > >>> Gustaf, > >>> > >>> You may be "using" traces but not realize it, it > >> sounds like > >>> ns_adp_abort isn't don't what was originally > >> intended. > >>> > >>> I wouldn't worry about an runtime error caused > >> during running > >>> traces, it > >>> would be an error to even use ns_adp_abort in a > >> trace filter > >>> because the > >>> connection is already finished. This is analogous to > >> calling > >>> [break] > >>> outside of a loop. > >>> > >>> It seems important to consider ns_adp_abort, > >> ns_adp_return and > >>> ns_adp_break as a unit. They add necessary loop type > >> controls > >>> so that > >>> developers can create deeply nested code and still > >> get out of > >>> it without > >>> the need to use [catch]. But, like a lot of > >> AOLserver specific > >>> procedures, there is no hand-holding in their use. > >> They can be > >>> misued. > >>> > >>> In this particular case, it looks like somewhere > >> along the > >>> way, > >>> ns_adp_abort was modified to not work as expected. > >>> > >>> The desired effect is exactly what you would get by > >> returning > >>> filter_return from a preauth or postauth filter. > >> This effect > >>> is to skip > >>> to trace filters, not past them. > >>> > >>> Skipping trace filters even on an aborted connection > >> would be > >>> a disaster > >>> for any application which relies on cleanup of > >> resources. > >>> > >>> tom jackson > >>> > >>> > >>> On Thu, 2009-04-02 at 11:12 +0200, Gustaf Neumann > >> wrote: > >>>> Andrew Steets schrieb: > >>>>> The patch I sent earlier seems to fulfill these > >> needs, but > >>> I am > >>>>> worried about corner cases where LogTrace (from > >> the nslog > >>> module) > >>>>> could blow up. Nothing about the state of the > >> Conn * > >>> seems to be > >>>>> guaranteed when the ConnCleanup callbacks are > >> called. > >>>>> > >>>> Dear Andrew, > >>>> > >>>> i think most (all?) of the repondents seems to > >> agree that > >>> writing in the > >>>> about case to > >>>> the access log file. For me there are still two > >> quesions > >>> open: > >>>> > >>>> a) is it possoble to call ns_adp_abort at some > >> time, where > >>> the server > >>>> might crash > >>>> (in normal operations, everthing looks fine to > >> me, > >>> problems might > >>>> occur in > >>>> when called from some traces; other calls are > >> likely to > >>> have similar > >>>> problems) > >>>> > >>>> b) the patch replaces the call to the regular > >> server trace > >>> by a > >>>> connection cleanup call. > >>>> this means, at least in 4.5.*, ns_adp_abort > >> seems to > >>> cancel all > >>>> traces (also > >>>> these registered with ns_register_trace). Is > >> this > >>> desired? > >>>> > >>>> From Tom's website: > >>> http://rmadilo.com/files/nsapi/ns_adp_abort.html > >>>> the doc of ns_adp_abort says > >>>> > >>>> ... Every ns_returnxxx call in an ADP should be > >> followed > >>> with a call > >>>> to ns_adp_abort.... > >>>> > >>>> With this recommendation, cancelling traces > >> seem wrong to > >>> me; or at > >>>> least, > >>>> this should be documented. > >>>> > >>>> We don't use traces, all of OpenACS does not use > >> it, so this > >>> is no > >>>> current issue for us. > >>>> > >>>> -gustaf neumann > >>>> > >>>> > >>>> -- > >>>> AOLserver - http://www.aolserver.com/ > >>>> > >>>> To Remove yourself from this list, simply send an > >> email to > >>> <lists...@listserv.aol.com> with the > >>>> body of "SIGNOFF AOLSERVER" in the email message. > >> You can > >>> leave the Subject: field of your email blank. > >>>> > >>> > >>> > >>> -- > >>> AOLserver - http://www.aolserver.com/ > >>> > >>> To Remove yourself from this list, simply send an > >> email to > >>> <lists...@listserv.aol.com> with the > >>> body of "SIGNOFF AOLSERVER" in the email message. > >> You can > >>> leave the Subject: field of your email blank. > >>> > >>> > >>> > >>> -- > >>> AOLserver - http://www.aolserver.com/ > >>> > >>> > >>> > >>> To Remove yourself from this list, simply send an email to > >> <lists...@listserv.aol.com> with the > >>> body of "SIGNOFF AOLSERVER" in the email message. You can > >> leave the Subject: field of your email blank. > >>> > >>> > >> > >> > >> -- > >> AOLserver - http://www.aolserver.com/ > >> > >> To Remove yourself from this list, simply send an email to > >> <lists...@listserv.aol.com> with the > >> body of "SIGNOFF AOLSERVER" in the email message. You can > >> leave the Subject: field of your email blank. > >> > >> > >> > >> -- > >> AOLserver - http://www.aolserver.com/ > >> > >> > >> > >> To Remove yourself from this list, simply send an email to > >> <lists...@listserv.aol.com > >> > with the > >> body of "SIGNOFF AOLSERVER" in the email message. You can leave the > >> Subject: field of your email blank. > >> > >> > > > > > > -- > > AOLserver - http://www.aolserver.com/ > > > > To Remove yourself from this list, simply send an email to > > <lists...@listserv.aol.com > > > with the > > body of "SIGNOFF AOLSERVER" in the email message. You can leave the > > Subject: field of your email blank. > > > -- > AOLserver - http://www.aolserver.com/ > > To Remove yourself from this list, simply send an email to > <lists...@listserv.aol.com> with the > body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: > field of your email blank. > -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.