Hi, A bit old but let me try to be helpful here...
On Apr 3, 2009, at 11:45 PM, Tom Jackson wrote:
On Fri, 2009-04-03 at 16:20 -0600, Jim Davidson wrote:HI, Sorry -- I wrote the original and goofy code leading to this confusion...Thanks for monitoring our head scratching! I have a few random thoughts, and I'm not sure they add up to a particular conclusion or not. But here goes:* The adp code is squishy. It doesn't necessarily execute with the sameprecision as Tcl code. Under some configurations errors might not instantly terminate execution.
Yes - because some errors could be "caught" as in the end it's just a Tcl engine.
* The adp code does not execute like an equivalent tcl page, probably due 100% to tcl pages using ns_tcl_abort and the generic tcl error handling.
Yes -- the "tcl page" stuff is more CGI like, i.e., code which includes and generates embedded HTML. The ADP is more PHP, ASP, etc. like, i.e., HTML code with embedded Tcl. ADP is actually pretty good and efficient at what it does and includes a clever although arcane feature for caching bits of execution (e.g., results of a seldom- checked, slow remote services). The PHP'ish nature of ADP does make the ability to "unwind everything and start over..." technically imprecise. In practice I think most people don't use streaming and the odd case of "catch"'ing an exception doesn't happen much so it mostly works. Although, as we've discovered, the "squishiness" of mixing Tcl error/exception conditions with the weak concept of a connection statues has led to a bug in ns_adp_abort not doing the right thing.
* The adp code has "evolved" over the years, actually there have beensignificant changes to the structure of the code. During the evolution,the detailed logic changed, sometimes bugs were introduced to fix perceived problems. It is amazing that nobody has noticed these differences. My guess is that nobody noticed because the changes have been uniquely backwards compatible, except when an actual erroroccurred. But since the errors were never logged, nobody was looking forthe cause.
All true. Remember the code was mostly developed in the context of AOL's operations requirements when they used it for most all published content (they use it for almost nothing now) and so if something weird didn't get exercised in the AOL context it was never discovered. Conversely, if something was useful to AOL (e.g., the per-chunk caching), that may never have been publicized or used outside AOL.
* There is a good argument to be made that when an error occurs, everything that follows is crap. Therefore, further execution is pointless and may lead to compounded errors.
Reasonable assumption. Weirdness is the way filters, code, etc. can be re-used and layered together so one components error (e.g., one failed ADP include) is not interesting or an error to some other component (e.g., logging).
* Trace filters could be thought of as a way to clean up stuff that wasinitialized during the request processing. But this is a hackish andcurrently invalid method in AOLserver. Why? The url can change during aconnection, so code initialized in a preauth or postauth filter due to url matching might not get cleaned up with a matching url when trace filters run.
Traces were created to have some sort of API for what at the time was only logging. Not sure what else useful ever used it.
* ns_atclose used to be a great way to clean up initialized code. Itstill works for connection based code, but it used to work for scheduled procs. ns_atclose has good qualities: it is registered only when code isinitialized, so if the connection craps out before the initialization code runs, the ns_atclose code does not run. The other quality is that the code runs regardless of errors. It simply runs when the connection closes. This would be a great place for logging!
Agreed -- that thing was always a special-case thing, I think it had to do with closing files or something in the past.
* If the server experiences an internal error, it should attempt toreturn this information to the client. Maybe the client has dropped theconnection, but this is not an internal server error (and AOLserver silently forgets about client drops). On error, the server should destroy all headers and all content and return a 500 response. The 500response should get logged. This is the actual result of the request: an error. You have to tell the client what happened. Lots of code paths inAOLserver follow this rule, but not ADP code.
Agreed -- ADP is close but broken but the intent is as you described. The 500 error response stuff is comparatively low-level and very old.
* Access logging should happen whenever a response is returned, but weshould keep in mind that an access log entry includes the response codeand the length of the response, if these figures differ from reality, they become worse than useless.
Yes.
* Sometimes the error log is the best place to look for malfunctioning code.
Yes -- how many of us have "tail -f server.log..." running right now :)
* AOLserver does a pretty good job of logging and offers many options for moderating/maximizing this important task.
I suppose it's ok.
* I can't figure out why the AOLserver developers consider tcl pages tobe a security risk when compared to ADP pages. AOL seemed to like adp pages, but OpenACS chose to use specialized filters. I have decided to use tcl pages as the primary "start page", and sometimes filters for generic modules/packages.
Well, again I'm guilty. I recall quickly putting Tcl pages into the server code, improving it slightly at one point in time, but generally ignoring it in favor of ADP (no judgement - just preference). I can't recall who or when asserted it as a security risk -- seems reasonable to me. Use Tcl pages/code for start pages/filters, etc. makes sense to me.
* Filters seem uniquely superior when the url pattern determines whichcode should run. Unfortunately filters are many times registered to / *, the global pattern. This is not a filter in the typical sense. The onlybenefit is the ability to stage code to run in a particular order at a particular stage. The question is if there should be a separate stage for global filters, or a separate mechanism to register global codewhich runs at certain point, maybe conditioned on previous events. Thisis a like a generalized ns_atclose.
I think a good idea gets implemented, bad ideas linger to avoid breaking things so that's why both exist.
BTW: I remembered why the newer "connection cleanup" stuff was added -- it was designed to interact with the "connection local storage API" which can be accessed in the driver thread, ostensibly to fetch and store network-related assets in an efficient manner. The model is similar to Pthreads local storage and thus needed non-conditional cleanup callbacks.
* The access logging module is ancient, simple and in desperate need ofprofessional help.
Agreed. And, google analytics and chartbeat.com are pretty cool :)
* The Internal redirecting functionality is poorly documented and understood. These are configured as proxies or redirects.
Yes -- it was always a great disappointment for me :(
* A significant number of AOLserver API can be misused, that is theseAPI cannot be used anywhere and everywhere. This is really no differentfrom Tcl, where [break] must be executed inside of a loop. On firstcontact, it just isn't very obvious where the use of some AOLserver API is guaranteed to produce an error. (like calling ns_adp_return after anns_returnXXX call). The result is understandable complaints about features which appear to be bugs.
Yes -- sorry, I just never could get it to all "fit together" perfectly and a ton of muck has built up over time.
Based on this discussion I think it's NOT working as intended. If there was a time I thought ns_adp_abort should mean "return NS_ERROR for connection status and thus no-traces..." I've long since forgotten the rationale. What I remember is ns_adp_abort means "somebody did something weird but valid, e.g., with ns_returnXXX, so try to stop Tcl execution and don't send the buffer content (if any)...". In contrast, ns_adp_break means "ADP buffer is good but we're done so try to stop executing Tcl code and send what's there...". The code is messy because the logical "break" is done by raising a Tcl error which is why the ADP exception is stored elsewhere. Also, I say "try to stop..." because I think one could possibly use "catch" to catch the attempt to unwind the Tcl stack and keep executing code. I'm not aware of any draconian logical "jump" out of a script -- Tcl maintains state on the stack and it has to be unwound in the end.This is a huge problem with Tcl, at least if you wish to introduce a new control structure which doesn't mess with the one built into Tcl. I wish Tcl would produce a generic [catch], I call it [catchonly] [catchonly] would ignore all un-enumerated return codes and would look like a simple switch statement, minus the default block.
As with some other aspects of AOLserver, Tcl isn't always perfectly wedged in so I agree something more powerful in the exception/stack unwind process could be useful. I suppose that could be a TIP but I'm not sure exactly how it would be articulated or coded.
Thanks, -Jim
Probably enough for now. tom jacksonRelated, on the trace/filter/cleanup discussion, the code was written over successive versions and was never very smart or satisfying: Traces: Stupid callbacks with no result code, they just all run if the connection is "ok" run in FIFO order Filters: Slightly less stupid callbacks that could signal "stop running filters" or "don't bother with connection" (in case of pre- auth) in FIFO order Conn cleanup: More specific and perhaps not stupid: Callbacks run unconditionally in LIFO order (suitable for cleanups). Evidently the user could probe the connection context to see if things were bad or good.I think over the years I figured out that just skipping some callbacksbecause the connection was "bad" (itself a poorly defined state) was dumb so the conn-cleanup stuff was introduced. And, just as old as the dumb trace code is the accesslog module -- it was written when only the trace stuff was around and has stayed that way. I'm not sure if switching accesslog to use a conn-cleanup would work -- it may be accessing fields in the connection which may not be initialized on a request cancelled very early on. Anyway, it seems for the benefit of other filters (are there really any??), the ADP code should be fixed so an "ns_adp_abort" is not an NS_ERROR.BTW: I've wondered if accesslog is useful anyway -- seems like Google Analytics gets you the insight you need on one end and a more specificlogging thing tied to the driver thread + cleanup would be useful for the admin type folks. For example, I'm not sure you ever get a logentry for a invalid connection request or aborted socket connection --that's knowledge the driver thread would have (and would be interesting to admin folks) but would never generate an accesslog because there would never have been any connection context to provide for logging. -Jim On Apr 3, 2009, at 3:11 PM, Tom Jackson wrote:After reviewing the code from version 4.0 and up, it looks like the intended behavior is to not log an access log entry if an error occurs. In fact, the request was not serviced, so an entry in the access log would be incorrect, the correct log entry would be a 500 internal server error. I just found another bug, and the fact that errors are not returning aninternal server error message indicates that there are further bugs inthe way the code works. My guess is that if the internal server handler runs that the log entry will show up, we'll see. Anyway, here is another test adp which exposes another bug: <% ns_adp_puts whatsup ns_log Notice "Before include adp exception = [ns_adp_exception]" ns_adp_include test.adp ns_log Notice "adp exception = [ns_adp_exception]" ns_log Notice "I shouldn't be running" %> The include file test.adp: <% ns_return 200 text/plain hi ns_adp_abort ns_log Notice "After abort, I shouldn't be running" %> <% ns_log Notice "in separate adp block" %> With the bug the log notice after ns_adp_abort, inside the test.adp file does not execute (this is expected), but the statements after ns_adp_include do run, this is bad. Execution should stop with the include statement since test.adp returned ADP_ABORT. Here is a fix for the bug from line 1010 of nsd/adpeval.c /* * Clear the return exception and reset result. */ switch (itPtr->adp.exception) { case ADP_OK: break; case ADP_RETURN: itPtr->adp.exception = ADP_OK; result = TCL_OK; break; case ADP_ABORT: result = TCL_ERROR; break; default: result = TCL_OK; break; }But the code still has the bug which you identified, it may have to dowith an actual error not clearing the adp.exception setting. tom jackson On Fri, 2009-04-03 at 14:54 -0500, Andrew Steets wrote:My original concern was with the access logging proc, which happens tobe run as a trace filter. I think that the access log entries should be generated regardless of whether or not ns_adp_abort is called. I don't care too much about anything else that was installed as a tracefilter. Do you agree that access log entries should be generated if ns_adp_abort is called? -AndrewOn Fri, Apr 3, 2009 at 10:33 AM, Tom Jackson <t...@rmadilo.com> wrote:Andrew, I wasted a little more time looking at the actual code. My impression is that everything is working as expected. If there is an error in a postauth filter or in adp processing (registered proc), trace filters are skipped. Until about 4.5, errors during preauth also skipped trace filters. Not sure why this change was made. The only think that matters is what happens in Ns_AdpRequest. If there were no errors, the request will be logged. In order to get ns_adp_abort to work correctly, the tcl result must be set to TCL_ERROR until code returns to Ns_AdpRequest. This is why an additional structure is maintained for the adp exception, which is independent of the tclexception. In this case, adp.exception indicates what actuallyhappened during adp processing. So things seem to be working as intended, and they have been working the same way for a long time. It might be possible that you are misusing ns_adp_abort, or something else is messing up. Could you provide a simple test case, probably a few nested adpincludes, which repeats the issue? Without a test case of whatyou think should work differently, it is hard to give any more advice. In general, when an error occurs during a request, the response is by definition an error response, so the original request might get transformed into an internal redirect to your error handling page. An error in this page, or a missing error page could cause further problems. Bottom line: no reason to believe that this is a bug. tom jackson p.s. this case seems to validate my belief that the hardest bug to find and fix is one that doesn't actually exist. On Thu, 2009-04-02 at 18:03 -0500, Andrew Steets wrote:What was the original purpose of "trace" filters? At the CAPI levelthere is a distinction between between a trace filter and acleanupcallback, but it doesn't look like you can register acleanup procfrom TCL. Maybe this was mistakenly omitted? The cleanup procs run unconditionally. It seems like thatis the mostappropriate place to handle "cleanup of resources."Alternatively wecould change the trace filters to run regardless of the Ns_ConnRunRequest() return status, but then that would makethembasically the same as the cleanups. I looked a little deeper into the source. The confusionseems toarise in NsAdpFlush() which is run at the end of all ADPprocessing.The code there is smart enough to recognize when an abortexceptionhas been signalled; it sets the TCL result to "adp flushdisabled: adpaborted", but it still returns TCL_ERROR. That isessentially wherethe TCL exception gets turned into a full blown connectionprocessingerror. We could change NsAdpFlush() to return success whenitrecognizes the abort exception, or just not run NsAdpFlush()for abortexceptions. There would still be cases where trace filters would not runthough.For instance if you called ns_returnxxx without callingns_adp_abort.I'm not sure if that is a bad thing. It would be nice to hear from anyone who knows about theoriginalmotivation for the trace and cleanup filters. -Andrew On Thu, Apr 2, 2009 at 3:53 PM, Tom Jackson<t...@rmadilo.com> wrote:Gustaf, You may be "using" traces but not realize it, itsounds likens_adp_abort isn't don't what was originallyintended.I wouldn't worry about an runtime error causedduring runningtraces, it would be an error to even use ns_adp_abort in atrace filterbecause the connection is already finished. This is analogous tocalling[break] outside of a loop. It seems important to consider ns_adp_abort,ns_adp_return andns_adp_break as a unit. They add necessary loop typecontrolsso that developers can create deeply nested code and stillget out ofit without the need to use [catch]. But, like a lot ofAOLserver specificprocedures, there is no hand-holding in their use.They can bemisued. In this particular case, it looks like somewherealong theway, ns_adp_abort was modified to not work as expected. The desired effect is exactly what you would get byreturningfilter_return from a preauth or postauth filter.This effectis to skip to trace filters, not past them. Skipping trace filters even on an aborted connectionwould bea disaster for any application which relies on cleanup ofresources.tom jackson On Thu, 2009-04-02 at 11:12 +0200, Gustaf Neumannwrote:Andrew Steets schrieb:The patch I sent earlier seems to fulfill theseneeds, butI amworried about corner cases where LogTrace (fromthe nslogmodule)could blow up. Nothing about the state of theConn *seems to beguaranteed when the ConnCleanup callbacks arecalled.Dear Andrew, i think most (all?) of the repondents seems toagree thatwriting in theabout case to the access log file. For me there are still twoquesionsopen:a) is it possoble to call ns_adp_abort at sometime, wherethe servermight crash (in normal operations, everthing looks fine tome,problems mightoccur in when called from some traces; other calls arelikely tohave similarproblems) b) the patch replaces the call to the regularserver traceby aconnection cleanup call. this means, at least in 4.5.*, ns_adp_abortseems tocancel alltraces (also these registered with ns_register_trace). Isthisdesired?From Tom's website:http://rmadilo.com/files/nsapi/ns_adp_abort.htmlthe doc of ns_adp_abort says ... Every ns_returnxxx call in an ADP should befollowedwith a callto ns_adp_abort.... With this recommendation, cancelling tracesseem wrong tome; or atleast, this should be documented. We don't use traces, all of OpenACS does not useit, so thisis nocurrent issue for us. -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send anemail to<lists...@listserv.aol.com> with thebody of "SIGNOFF AOLSERVER" in the email message.You canleave the Subject: field of your email blank.-- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send anemail to<lists...@listserv.aol.com> with the body of "SIGNOFF AOLSERVER" in the email message.You canleave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to<lists...@listserv.aol.com> with thebody of "SIGNOFF AOLSERVER" in the email message. You canleave the Subject: field of your email blank.-- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <lists...@listserv.aol.comwith thebody of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.-- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <lists...@listserv.aol.comwith thebody of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.-- AOLserver - http://www.aolserver.com/To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com > with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.-- AOLserver - http://www.aolserver.com/To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com > with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
-- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.