Hi,

A bit old but let me try to be helpful here...


On Apr 3, 2009, at 11:45 PM, Tom Jackson wrote:

On Fri, 2009-04-03 at 16:20 -0600, Jim Davidson wrote:
HI,

Sorry -- I wrote the original and goofy code leading to this
confusion...


Thanks for monitoring our head scratching!

I have a few random thoughts, and I'm not sure they add up to a
particular conclusion or not. But here goes:

* The adp code is squishy. It doesn't necessarily execute with the same
precision as Tcl code. Under some configurations errors might not
instantly terminate execution.


Yes - because some errors could be "caught" as in the end it's just a Tcl engine.




* The adp code does not execute like an equivalent tcl page, probably
due 100% to tcl pages using ns_tcl_abort and the generic tcl error
handling.


Yes -- the "tcl page" stuff is more CGI like, i.e., code which includes and generates embedded HTML. The ADP is more PHP, ASP, etc. like, i.e., HTML code with embedded Tcl. ADP is actually pretty good and efficient at what it does and includes a clever although arcane feature for caching bits of execution (e.g., results of a seldom- checked, slow remote services). The PHP'ish nature of ADP does make the ability to "unwind everything and start over..." technically imprecise. In practice I think most people don't use streaming and the odd case of "catch"'ing an exception doesn't happen much so it mostly works. Although, as we've discovered, the "squishiness" of mixing Tcl error/exception conditions with the weak concept of a connection statues has led to a bug in ns_adp_abort not doing the right thing.





* The adp code has "evolved" over the years, actually there have been
significant changes to the structure of the code. During the evolution,
the detailed logic changed, sometimes bugs were introduced to fix
perceived problems. It is amazing that nobody has noticed these
differences. My guess is that nobody noticed because the changes have
been uniquely backwards compatible, except when an actual error
occurred. But since the errors were never logged, nobody was looking for
the cause.


All true. Remember the code was mostly developed in the context of AOL's operations requirements when they used it for most all published content (they use it for almost nothing now) and so if something weird didn't get exercised in the AOL context it was never discovered. Conversely, if something was useful to AOL (e.g., the per-chunk caching), that may never have been publicized or used outside AOL.



* There is a good argument to be made that when an error occurs,
everything that follows is crap. Therefore, further execution is
pointless and may lead to compounded errors.


Reasonable assumption. Weirdness is the way filters, code, etc. can be re-used and layered together so one components error (e.g., one failed ADP include) is not interesting or an error to some other component (e.g., logging).




* Trace filters could be thought of as a way to clean up stuff that was
initialized during the request processing. But this is a hackish and
currently invalid method in AOLserver. Why? The url can change during a
connection, so code initialized in a preauth or postauth filter due to
url matching might not get cleaned up with a matching url when trace
filters run.


Traces were created to have some sort of API for what at the time was only logging. Not sure what else useful ever used it.




* ns_atclose used to be a great way to clean up initialized code. It
still works for connection based code, but it used to work for scheduled procs. ns_atclose has good qualities: it is registered only when code is
initialized, so if the connection craps out before the initialization
code runs, the ns_atclose code does not run. The other quality is that
the code runs regardless of errors. It simply runs when the connection
closes. This would be a great place for logging!


Agreed -- that thing was always a special-case thing, I think it had to do with closing files or something in the past.




* If the server experiences an internal error, it should attempt to
return this information to the client. Maybe the client has dropped the
connection, but this is not an internal server error (and AOLserver
silently forgets about client drops). On error, the server should
destroy all headers and all content and return a 500 response. The 500
response should get logged. This is the actual result of the request: an error. You have to tell the client what happened. Lots of code paths in
AOLserver follow this rule, but not ADP code.


Agreed -- ADP is close but broken but the intent is as you described. The 500 error response stuff is comparatively low-level and very old.




* Access logging should happen whenever a response is returned, but we
should keep in mind that an access log entry includes the response code
and the length of the response, if these figures differ from reality,
they become worse than useless.


Yes.



* Sometimes the error log is the best place to look for malfunctioning
code.


Yes -- how many of us have "tail -f server.log..." running right now :)



* AOLserver does a pretty good job of logging and offers many options
for moderating/maximizing this important task.


I suppose it's ok.



* I can't figure out why the AOLserver developers consider tcl pages to
be a security risk when compared to ADP pages. AOL seemed to like adp
pages, but OpenACS chose to use specialized filters. I have decided to
use tcl pages as the primary "start page", and sometimes filters for
generic modules/packages.



Well, again I'm guilty. I recall quickly putting Tcl pages into the server code, improving it slightly at one point in time, but generally ignoring it in favor of ADP (no judgement - just preference). I can't recall who or when asserted it as a security risk -- seems reasonable to me. Use Tcl pages/code for start pages/filters, etc. makes sense to me.




* Filters seem uniquely superior when the url pattern determines which
code should run. Unfortunately filters are many times registered to / *, the global pattern. This is not a filter in the typical sense. The only
benefit is the ability to stage code to run in a particular order at a
particular stage. The question is if there should be a separate stage
for global filters, or a separate mechanism to register global code
which runs at certain point, maybe conditioned on previous events. This
is a like a generalized ns_atclose.


I think a good idea gets implemented, bad ideas linger to avoid breaking things so that's why both exist.

BTW: I remembered why the newer "connection cleanup" stuff was added -- it was designed to interact with the "connection local storage API" which can be accessed in the driver thread, ostensibly to fetch and store network-related assets in an efficient manner. The model is similar to Pthreads local storage and thus needed non-conditional cleanup callbacks.




* The access logging module is ancient, simple and in desperate need of
professional help.


Agreed.     And, google analytics and chartbeat.com are pretty cool :)



* The Internal redirecting functionality is poorly documented and
understood. These are configured as proxies or redirects.


Yes -- it was always a great disappointment for me :(



* A significant number of AOLserver API can be misused, that is these
API cannot be used anywhere and everywhere. This is really no different
from Tcl, where [break] must be executed inside of a loop. On first
contact, it just isn't very obvious where the use of some AOLserver API is guaranteed to produce an error. (like calling ns_adp_return after an
ns_returnXXX call). The result is understandable complaints about
features which appear to be bugs.


Yes -- sorry, I just never could get it to all "fit together" perfectly and a ton of muck has built up over time.





Based on this discussion I think it's NOT working as intended.  If
there was a time I thought ns_adp_abort should mean "return NS_ERROR
for connection status and thus no-traces..."  I've long since
forgotten the rationale.   What I remember is ns_adp_abort means
"somebody did something weird but valid, e.g., with ns_returnXXX, so
try to stop Tcl execution and don't send the buffer content (if
any)...".   In contrast, ns_adp_break means "ADP buffer is good but
we're done so try to stop executing Tcl code and send what's
there...".   The code is messy because the logical "break" is done by
raising a Tcl error which is why the ADP exception is stored
elsewhere.  Also, I say "try to stop..." because I think one could
possibly use "catch" to catch the attempt to unwind the Tcl stack and
keep executing code.  I'm not aware of any draconian logical "jump"
out of a script -- Tcl maintains state on the stack and it has to be
unwound in the end.


This is a huge problem with Tcl, at least if  you wish to introduce a
new control structure which doesn't mess with the one built into Tcl.

I wish Tcl would produce a generic [catch], I call it [catchonly]

[catchonly] would ignore all un-enumerated return codes and would look
like a simple switch statement, minus the default block.



As with some other aspects of AOLserver, Tcl isn't always perfectly wedged in so I agree something more powerful in the exception/stack unwind process could be useful. I suppose that could be a TIP but I'm not sure exactly how it would be articulated or coded.


Thanks, -Jim






Probably enough for now.

tom jackson



Related, on the trace/filter/cleanup discussion, the code was written
over successive versions and was never very smart or satisfying:

Traces:  Stupid callbacks with no result code, they just all run if
the connection is "ok" run in FIFO order

Filters:  Slightly less stupid callbacks that could signal "stop
running filters" or "don't bother with connection" (in case of pre-
auth) in FIFO order

Conn cleanup:  More specific and perhaps not stupid:  Callbacks run
unconditionally in LIFO order (suitable for cleanups).  Evidently the
user could probe the connection context to see if things were bad or
good.

I think over the years I figured out that just skipping some callbacks
because the connection was "bad" (itself a poorly defined state) was
dumb so the conn-cleanup stuff was introduced.

And, just as old as the dumb trace code is the accesslog module -- it
was written when only the trace stuff was around and has stayed that
way.  I'm not sure if switching accesslog to use a conn-cleanup would
work -- it may be accessing fields in the connection which may not be
initialized on a request cancelled very early on.   Anyway, it seems
for the benefit of other filters (are there really any??), the ADP
code should be fixed so an "ns_adp_abort" is not an NS_ERROR.


BTW: I've wondered if accesslog is useful anyway -- seems like Google Analytics gets you the insight you need on one end and a more specific
logging thing tied to the driver thread + cleanup would be useful for
the admin type folks.  For example, I'm not sure you ever get a log
entry for a invalid connection request or aborted socket connection --
that's knowledge the driver thread would have (and would be
interesting to admin folks) but would never generate an accesslog
because there would never have been any connection context to provide
for logging.


-Jim









On Apr 3, 2009, at 3:11 PM, Tom Jackson wrote:

After reviewing the code from version 4.0 and up, it looks like the
intended behavior is to not log an access log entry if an error
occurs.

In fact, the request was not serviced, so an entry in the access log
would be incorrect, the correct log entry would be a 500 internal
server
error.

I just found another bug, and the fact that errors are not returning
an
internal server error message indicates that there are further bugs in
the way the code works.

My guess is that if the internal server handler runs that the log
entry
will show up, we'll see.

Anyway, here is another test adp which exposes another bug:

<%

ns_adp_puts whatsup
ns_log Notice "Before include adp exception = [ns_adp_exception]"
ns_adp_include test.adp
ns_log Notice "adp exception = [ns_adp_exception]"
ns_log Notice "I shouldn't be running"

%>

The include file test.adp:

<%

ns_return 200 text/plain hi

ns_adp_abort

ns_log Notice "After abort, I shouldn't be running"

%>

<%

ns_log Notice "in separate adp block"

%>


With the bug the log notice after ns_adp_abort, inside the test.adp
file
does not execute (this is expected), but the statements after
ns_adp_include do run, this is bad. Execution should stop with the
include statement since test.adp returned ADP_ABORT.

Here is a fix for the bug from
line 1010 of nsd/adpeval.c
  /*
   * Clear the return exception and reset result.
   */
  switch (itPtr->adp.exception) {
  case ADP_OK:
        break;
  case ADP_RETURN:
        itPtr->adp.exception = ADP_OK;
        result = TCL_OK;
        break;
  case ADP_ABORT:
      result = TCL_ERROR;
      break;
  default:
        result = TCL_OK;
        break;
  }

But the code still has the bug which you identified, it may have to do
with an actual error not clearing the adp.exception setting.

tom jackson

On Fri, 2009-04-03 at 14:54 -0500, Andrew Steets wrote:
My original concern was with the access logging proc, which happens
to
be run as a trace filter. I think that the access log entries should be generated regardless of whether or not ns_adp_abort is called. I don't care too much about anything else that was installed as a trace
filter.

Do you agree that access log entries should be generated if
ns_adp_abort is called?

-Andrew

On Fri, Apr 3, 2009 at 10:33 AM, Tom Jackson <t...@rmadilo.com> wrote:
      Andrew,

      I wasted a little more time looking at the actual code. My
      impression is
      that everything is working as expected. If there is an error
      in a
      postauth filter or in adp processing (registered proc), trace
      filters
      are skipped. Until about 4.5, errors during preauth also
      skipped trace
      filters. Not sure why this change was made.

      The only think that matters is what happens in Ns_AdpRequest.
      If there
      were no errors, the request will be logged. In order to get
      ns_adp_abort
      to work correctly, the tcl result must be set to TCL_ERROR
      until code
      returns to Ns_AdpRequest. This is why an additional structure
      is
      maintained for the adp exception, which is independent of the
      tcl
exception. In this case, adp.exception indicates what actually
      happened
      during adp processing.

      So things seem to be working as intended, and they have been
      working the
      same way for a long time. It might be possible that you are
      misusing
      ns_adp_abort, or something else is messing up.

      Could you provide a simple test case, probably a few nested
      adp
includes, which repeats the issue? Without a test case of what
      you think
      should work differently, it is hard to give any more advice.

      In general, when an error occurs during a request, the
      response is by
      definition an error response, so the original request might
      get
      transformed into an internal redirect to your error handling
      page. An
      error in this page, or a missing error page could cause
      further
      problems.

      Bottom line: no reason to believe that this is a bug.

      tom jackson

      p.s. this case seems to validate my belief that the hardest
      bug to find
      and fix is one that doesn't actually exist.

      On Thu, 2009-04-02 at 18:03 -0500, Andrew Steets wrote:


What was the original purpose of "trace" filters?  At the C
      API level
there is a distinction between between a trace filter and a
      cleanup
callback, but it doesn't look like you can register a
      cleanup proc
from TCL.  Maybe this was mistakenly omitted?

The cleanup procs run unconditionally.  It seems like that
      is the most
appropriate place to handle "cleanup of resources."
       Alternatively we
could change the trace filters to run regardless of the
Ns_ConnRunRequest() return status, but then that would make
      them
basically the same as the cleanups.

I looked a little deeper into the source.  The confusion
      seems to
arise in NsAdpFlush() which is run at the end of all ADP
      processing.
The code there is smart enough to recognize when an abort
      exception
has been signalled; it sets the TCL result to "adp flush
      disabled: adp
aborted", but it still returns TCL_ERROR.  That is
      essentially where
the TCL exception gets turned into a full blown connection
      processing
error.  We could change NsAdpFlush() to return success when
      it
recognizes the abort exception, or just not run NsAdpFlush()
      for abort
exceptions.

There would still be cases where trace filters would not run
      though.
For instance if you called ns_returnxxx without calling
      ns_adp_abort.
I'm not sure if that is a bad thing.

It would be nice to hear from anyone who knows about the
      original
motivation for the trace and cleanup filters.

-Andrew

On Thu, Apr 2, 2009 at 3:53 PM, Tom Jackson
      <t...@rmadilo.com> wrote:
      Gustaf,

      You may be "using" traces but not realize it, it
      sounds like
      ns_adp_abort isn't don't what was originally
      intended.

      I wouldn't worry about an runtime error caused
      during running
      traces, it
      would be an error to even use ns_adp_abort in a
      trace filter
      because the
      connection is already finished. This is analogous to
      calling
      [break]
      outside of a loop.

      It seems important to consider ns_adp_abort,
      ns_adp_return and
      ns_adp_break as a unit. They add necessary loop type
      controls
      so that
      developers can create deeply nested code and still
      get out of
      it without
      the need to use [catch]. But, like a lot of
      AOLserver specific
      procedures, there is no hand-holding in their use.
      They can be
      misued.

      In this particular case, it looks like somewhere
      along the
      way,
      ns_adp_abort was modified to not work as expected.

      The desired effect is exactly what you would get by
      returning
      filter_return from a preauth or postauth filter.
      This effect
      is to skip
      to trace filters, not past them.

      Skipping trace filters even on an aborted connection
      would be
      a disaster
      for any application which relies on cleanup of
      resources.

      tom jackson


      On Thu, 2009-04-02 at 11:12 +0200, Gustaf Neumann
      wrote:
Andrew Steets schrieb:
The patch I sent earlier seems to fulfill these
      needs, but
      I am
worried about corner cases where LogTrace (from
      the nslog
      module)
could blow up.  Nothing about the state of the
      Conn *
      seems to be
guaranteed when the ConnCleanup callbacks are
      called.

Dear Andrew,

i think most (all?) of the repondents seems to
      agree that
      writing in the
about case to
the access log file. For me there are still two
      quesions
      open:

a) is it possoble to call ns_adp_abort at some
      time, where
      the server
might crash
 (in normal operations, everthing looks fine to
      me,
      problems might
occur in
 when called from some traces; other calls are
      likely to
      have similar
problems)

b) the patch replaces the call to the regular
      server trace
      by a
connection cleanup call.
 this means, at least in 4.5.*, ns_adp_abort
      seems to
      cancel all
traces (also
 these registered with ns_register_trace). Is
      this
      desired?

 From Tom's website:
      http://rmadilo.com/files/nsapi/ns_adp_abort.html
 the doc of ns_adp_abort says

 ... Every ns_returnxxx call in an ADP should be
      followed
      with a call
  to ns_adp_abort....

 With this recommendation, cancelling traces
      seem wrong to
      me; or at
least,
 this should be documented.

We don't use traces, all of OpenACS does not use
      it, so this
      is no
current issue for us.

-gustaf neumann


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an
      email to
      <lists...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message.
      You can
      leave the Subject: field of your email blank.



      --
      AOLserver - http://www.aolserver.com/

      To Remove yourself from this list, simply send an
      email to
      <lists...@listserv.aol.com> with the
      body of "SIGNOFF AOLSERVER" in the email message.
      You can
      leave the Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/



To Remove yourself from this list, simply send an email to
      <lists...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can
      leave the Subject: field of your email blank.




      --
      AOLserver - http://www.aolserver.com/

      To Remove yourself from this list, simply send an email to
      <lists...@listserv.aol.com> with the
      body of "SIGNOFF AOLSERVER" in the email message. You can
      leave the Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/



To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.




--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com > with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com > with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to