Re: [AOLSERVER] missing access log entries

Jim Davidson Mon, 13 Apr 2009 11:13:58 -0700

Hi,

A bit old but let me try to be helpful here...



On Apr 3, 2009, at 11:45 PM, Tom Jackson wrote:

On Fri, 2009-04-03 at 16:20 -0600, Jim Davidson wrote:

HI,

Sorry -- I wrote the original and goofy code leading to this
confusion...


Thanks for monitoring our head scratching!

I have a few random thoughts, and I'm not sure they add up to a
particular conclusion or not. But here goes:

* The adp code is squishy. It doesn't necessarily execute with thesame

precision as Tcl code. Under some configurations errors might not
instantly terminate execution.

Yes - because some errors could be "caught" as in the end it's just aTcl engine.


* The adp code does not execute like an equivalent tcl page, probably
due 100% to tcl pages using ns_tcl_abort and the generic tcl error
handling.

Yes -- the "tcl page" stuff is more CGI like, i.e., code whichincludes and generates embedded HTML. The ADP is more PHP, ASP, etc.like, i.e., HTML code with embedded Tcl. ADP is actually pretty goodand efficient at what it does and includes a clever although arcanefeature for caching bits of execution (e.g., results of a seldom-checked, slow remote services). The PHP'ish nature of ADP does makethe ability to "unwind everything and start over..." technicallyimprecise. In practice I think most people don't use streaming andthe odd case of "catch"'ing an exception doesn't happen much so itmostly works. Although, as we've discovered, the "squishiness" ofmixing Tcl error/exception conditions with the weak concept of aconnection statues has led to a bug in ns_adp_abort not doing theright thing.


* The adp code has "evolved" over the years, actually there have been

significant changes to the structure of the code. During theevolution,

the detailed logic changed, sometimes bugs were introduced to fix
perceived problems. It is amazing that nobody has noticed these
differences. My guess is that nobody noticed because the changes have
been uniquely backwards compatible, except when an actual error

occurred. But since the errors were never logged, nobody was lookingfor

the cause.

All true. Remember the code was mostly developed in the context ofAOL's operations requirements when they used it for most all publishedcontent (they use it for almost nothing now) and so if something weirddidn't get exercised in the AOL context it was never discovered.Conversely, if something was useful to AOL (e.g., the per-chunkcaching), that may never have been publicized or used outside AOL.


* There is a good argument to be made that when an error occurs,
everything that follows is crap. Therefore, further execution is
pointless and may lead to compounded errors.

Reasonable assumption. Weirdness is the way filters, code, etc. canbe re-used and layered together so one components error (e.g., onefailed ADP include) is not interesting or an error to some othercomponent (e.g., logging).

* Trace filters could be thought of as a way to clean up stuff thatwas
initialized during the request processing. But this is a hackish and
currently invalid method in AOLserver. Why? The url can changeduring a
connection, so code initialized in a preauth or postauth filter due to
url matching might not get cleaned up with a matching url when trace
filters run.

Traces were created to have some sort of API for what at the time wasonly logging. Not sure what else useful ever used it.


* ns_atclose used to be a great way to clean up initialized code. It

still works for connection based code, but it used to work forscheduledprocs. ns_atclose has good qualities: it is registered only whencode is

initialized, so if the connection craps out before the initialization
code runs, the ns_atclose code does not run. The other quality is that
the code runs regardless of errors. It simply runs when the connection
closes. This would be a great place for logging!

Agreed -- that thing was always a special-case thing, I think it hadto do with closing files or something in the past.

* If the server experiences an internal error, it should attempt to
return this information to the client. Maybe the client has droppedthe
connection, but this is not an internal server error (and AOLserver
silently forgets about client drops). On error, the server should
destroy all headers and all content and return a 500 response. The 500
response should get logged. This is the actual result of therequest: anerror. You have to tell the client what happened. Lots of code pathsin
AOLserver follow this rule, but not ADP code.

Agreed -- ADP is close but broken but the intent is as you described.The 500 error response stuff is comparatively low-level and very old.


* Access logging should happen whenever a response is returned, but we

should keep in mind that an access log entry includes the responsecode

and the length of the response, if these figures differ from reality,
they become worse than useless.



Yes.


* Sometimes the error log is the best place to look for malfunctioning
code.



Yes -- how many of us have "tail -f server.log..." running right now :)


* AOLserver does a pretty good job of logging and offers many options
for moderating/maximizing this important task.



I suppose it's ok.

* I can't figure out why the AOLserver developers consider tcl pagesto

be a security risk when compared to ADP pages. AOL seemed to like adp
pages, but OpenACS chose to use specialized filters. I have decided to
use tcl pages as the primary "start page", and sometimes filters for
generic modules/packages.

Well, again I'm guilty. I recall quickly putting Tcl pages into theserver code, improving it slightly at one point in time, but generallyignoring it in favor of ADP (no judgement - just preference). Ican't recall who or when asserted it as a security risk -- seemsreasonable to me. Use Tcl pages/code for start pages/filters, etc.makes sense to me.

* Filters seem uniquely superior when the url pattern determines which
code should run. Unfortunately filters are many times registered to /*,the global pattern. This is not a filter in the typical sense. Theonly
benefit is the ability to stage code to run in a particular order at a
particular stage. The question is if there should be a separate stage
for global filters, or a separate mechanism to register global code
which runs at certain point, maybe conditioned on previous events.This
is a like a generalized ns_atclose.

I think a good idea gets implemented, bad ideas linger to avoidbreaking things so that's why both exist.

BTW: I remembered why the newer "connection cleanup" stuff was added-- it was designed to interact with the "connection local storage API"which can be accessed in the driver thread, ostensibly to fetch andstore network-related assets in an efficient manner. The model issimilar to Pthreads local storage and thus needed non-conditionalcleanup callbacks.

* The access logging module is ancient, simple and in desperate needof
professional help.



Agreed.     And, google analytics and chartbeat.com are pretty cool :)


* The Internal redirecting functionality is poorly documented and
understood. These are configured as proxies or redirects.



Yes -- it was always a great disappointment for me :(

* A significant number of AOLserver API can be misused, that is these
API cannot be used anywhere and everywhere. This is really nodifferent
from Tcl, where [break] must be executed inside of a loop. On first
contact, it just isn't very obvious where the use of some AOLserverAPIis guaranteed to produce an error. (like calling ns_adp_return afteran
ns_returnXXX call). The result is understandable complaints about
features which appear to be bugs.

Yes -- sorry, I just never could get it to all "fit together"perfectly and a ton of muck has built up over time.

Based on this discussion I think it's NOT working as intended.  If
there was a time I thought ns_adp_abort should mean "return NS_ERROR
for connection status and thus no-traces..."  I've long since
forgotten the rationale.   What I remember is ns_adp_abort means
"somebody did something weird but valid, e.g., with ns_returnXXX, so
try to stop Tcl execution and don't send the buffer content (if
any)...".   In contrast, ns_adp_break means "ADP buffer is good but
we're done so try to stop executing Tcl code and send what's
there...".   The code is messy because the logical "break" is done by
raising a Tcl error which is why the ADP exception is stored
elsewhere.  Also, I say "try to stop..." because I think one could
possibly use "catch" to catch the attempt to unwind the Tcl stack and
keep executing code.  I'm not aware of any draconian logical "jump"
out of a script -- Tcl maintains state on the stack and it has to be
unwound in the end.


This is a huge problem with Tcl, at least if  you wish to introduce a
new control structure which doesn't mess with the one built into Tcl.

I wish Tcl would produce a generic [catch], I call it [catchonly]

[catchonly] would ignore all un-enumerated return codes and would look
like a simple switch statement, minus the default block.

As with some other aspects of AOLserver, Tcl isn't always perfectlywedged in so I agree something more powerful in the exception/stackunwind process could be useful. I suppose that could be a TIP butI'm not sure exactly how it would be articulated or coded.



Thanks, -Jim



Probably enough for now.

tom jackson


Related, on the trace/filter/cleanup discussion, the code was written
over successive versions and was never very smart or satisfying:

Traces:  Stupid callbacks with no result code, they just all run if
the connection is "ok" run in FIFO order

Filters:  Slightly less stupid callbacks that could signal "stop
running filters" or "don't bother with connection" (in case of pre-
auth) in FIFO order

Conn cleanup:  More specific and perhaps not stupid:  Callbacks run
unconditionally in LIFO order (suitable for cleanups).  Evidently the
user could probe the connection context to see if things were bad or
good.

I think over the years I figured out that just skipping somecallbacks

because the connection was "bad" (itself a poorly defined state) was
dumb so the conn-cleanup stuff was introduced.

And, just as old as the dumb trace code is the accesslog module -- it
was written when only the trace stuff was around and has stayed that
way.  I'm not sure if switching accesslog to use a conn-cleanup would
work -- it may be accessing fields in the connection which may not be
initialized on a request cancelled very early on.   Anyway, it seems
for the benefit of other filters (are there really any??), the ADP
code should be fixed so an "ns_adp_abort" is not an NS_ERROR.

BTW: I've wondered if accesslog is useful anyway -- seems likeGoogleAnalytics gets you the insight you need on one end and a morespecific

logging thing tied to the driver thread + cleanup would be useful for
the admin type folks.  For example, I'm not sure you ever get a log

entry for a invalid connection request or aborted socket connection--

that's knowledge the driver thread would have (and would be
interesting to admin folks) but would never generate an accesslog
because there would never have been any connection context to provide
for logging.


-Jim









On Apr 3, 2009, at 3:11 PM, Tom Jackson wrote:

After reviewing the code from version 4.0 and up, it looks like the
intended behavior is to not log an access log entry if an error
occurs.

In fact, the request was not serviced, so an entry in the access log
would be incorrect, the correct log entry would be a 500 internal
server
error.

I just found another bug, and the fact that errors are not returning
an

internal server error message indicates that there are furtherbugs in

the way the code works.

My guess is that if the internal server handler runs that the log
entry
will show up, we'll see.

Anyway, here is another test adp which exposes another bug:

<%

ns_adp_puts whatsup
ns_log Notice "Before include adp exception = [ns_adp_exception]"
ns_adp_include test.adp
ns_log Notice "adp exception = [ns_adp_exception]"
ns_log Notice "I shouldn't be running"

%>

The include file test.adp:

<%

ns_return 200 text/plain hi

ns_adp_abort

ns_log Notice "After abort, I shouldn't be running"

%>

<%

ns_log Notice "in separate adp block"

%>


With the bug the log notice after ns_adp_abort, inside the test.adp
file
does not execute (this is expected), but the statements after
ns_adp_include do run, this is bad. Execution should stop with the
include statement since test.adp returned ADP_ABORT.

Here is a fix for the bug from
line 1010 of nsd/adpeval.c
  /*
   * Clear the return exception and reset result.
   */
  switch (itPtr->adp.exception) {
  case ADP_OK:
        break;
  case ADP_RETURN:
        itPtr->adp.exception = ADP_OK;
        result = TCL_OK;
        break;
  case ADP_ABORT:
      result = TCL_ERROR;
      break;
  default:
        result = TCL_OK;
        break;
  }

But the code still has the bug which you identified, it may haveto do

with an actual error not clearing the adp.exception setting.

tom jackson

On Fri, 2009-04-03 at 14:54 -0500, Andrew Steets wrote:

My original concern was with the access logging proc, which happens
to

be run as a trace filter. I think that the access log entriesshouldbe generated regardless of whether or not ns_adp_abort iscalled. Idon't care too much about anything else that was installed as atrace

filter.

Do you agree that access log entries should be generated if
ns_adp_abort is called?

-Andrew

On Fri, Apr 3, 2009 at 10:33 AM, Tom Jackson <t...@rmadilo.com>wrote:

      Andrew,

      I wasted a little more time looking at the actual code. My
      impression is
      that everything is working as expected. If there is an error
      in a
      postauth filter or in adp processing (registered proc), trace
      filters
      are skipped. Until about 4.5, errors during preauth also
      skipped trace
      filters. Not sure why this change was made.

      The only think that matters is what happens in Ns_AdpRequest.
      If there
      were no errors, the request will be logged. In order to get
      ns_adp_abort
      to work correctly, the tcl result must be set to TCL_ERROR
      until code
      returns to Ns_AdpRequest. This is why an additional structure
      is
      maintained for the adp exception, which is independent of the
      tcl

exception. In this case, adp.exception indicates whatactually

      happened
      during adp processing.

      So things seem to be working as intended, and they have been
      working the
      same way for a long time. It might be possible that you are
      misusing
      ns_adp_abort, or something else is messing up.

      Could you provide a simple test case, probably a few nested
      adp

includes, which repeats the issue? Without a test case ofwhat

      you think
      should work differently, it is hard to give any more advice.

      In general, when an error occurs during a request, the
      response is by
      definition an error response, so the original request might
      get
      transformed into an internal redirect to your error handling
      page. An
      error in this page, or a missing error page could cause
      further
      problems.

      Bottom line: no reason to believe that this is a bug.

      tom jackson

      p.s. this case seems to validate my belief that the hardest
      bug to find
      and fix is one that doesn't actually exist.

      On Thu, 2009-04-02 at 18:03 -0500, Andrew Steets wrote:

What was the original purpose of "trace" filters?  At the C

      API level

there is a distinction between between a trace filter and a

      cleanup

callback, but it doesn't look like you can register a

      cleanup proc

from TCL.  Maybe this was mistakenly omitted?

The cleanup procs run unconditionally.  It seems like that

      is the most

appropriate place to handle "cleanup of resources."

       Alternatively we

could change the trace filters to run regardless of the
Ns_ConnRunRequest() return status, but then that would make

      them

basically the same as the cleanups.

I looked a little deeper into the source.  The confusion

      seems to

arise in NsAdpFlush() which is run at the end of all ADP

      processing.

The code there is smart enough to recognize when an abort

      exception

has been signalled; it sets the TCL result to "adp flush

      disabled: adp

aborted", but it still returns TCL_ERROR.  That is

      essentially where

the TCL exception gets turned into a full blown connection

      processing

error.  We could change NsAdpFlush() to return success when

it

recognizes the abort exception, or just not run NsAdpFlush()

      for abort

exceptions.

There would still be cases where trace filters would not run

      though.

For instance if you called ns_returnxxx without calling

      ns_adp_abort.

I'm not sure if that is a bad thing.

It would be nice to hear from anyone who knows about the

      original

motivation for the trace and cleanup filters.

-Andrew

On Thu, Apr 2, 2009 at 3:53 PM, Tom Jackson

      <t...@rmadilo.com> wrote:

      Gustaf,

      You may be "using" traces but not realize it, it

      sounds like

      ns_adp_abort isn't don't what was originally

      intended.


      I wouldn't worry about an runtime error caused

      during running

      traces, it
      would be an error to even use ns_adp_abort in a

      trace filter

      because the
      connection is already finished. This is analogous to

      calling

      [break]
      outside of a loop.

      It seems important to consider ns_adp_abort,

      ns_adp_return and

      ns_adp_break as a unit. They add necessary loop type

      controls

      so that
      developers can create deeply nested code and still

      get out of

      it without
      the need to use [catch]. But, like a lot of

      AOLserver specific

      procedures, there is no hand-holding in their use.

      They can be

      misued.

      In this particular case, it looks like somewhere

      along the

      way,
      ns_adp_abort was modified to not work as expected.

      The desired effect is exactly what you would get by

      returning

      filter_return from a preauth or postauth filter.

      This effect

      is to skip
      to trace filters, not past them.

      Skipping trace filters even on an aborted connection

      would be

      a disaster
      for any application which relies on cleanup of

      resources.


      tom jackson


      On Thu, 2009-04-02 at 11:12 +0200, Gustaf Neumann

      wrote:

Andrew Steets schrieb:

The patch I sent earlier seems to fulfill these

      needs, but

      I am

worried about corner cases where LogTrace (from

      the nslog

      module)

could blow up.  Nothing about the state of the

      Conn *

      seems to be

guaranteed when the ConnCleanup callbacks are

      called.

Dear Andrew,

i think most (all?) of the repondents seems to

      agree that

      writing in the

about case to
the access log file. For me there are still two

      quesions

      open:


a) is it possoble to call ns_adp_abort at some

      time, where

      the server

might crash
 (in normal operations, everthing looks fine to

me,

      problems might

occur in
 when called from some traces; other calls are

      likely to

      have similar

problems)

b) the patch replaces the call to the regular

      server trace

      by a

connection cleanup call.
 this means, at least in 4.5.*, ns_adp_abort

      seems to

      cancel all

traces (also
 these registered with ns_register_trace). Is

      this

      desired?


 From Tom's website:

      http://rmadilo.com/files/nsapi/ns_adp_abort.html

 the doc of ns_adp_abort says

 ... Every ns_returnxxx call in an ADP should be

      followed

      with a call

  to ns_adp_abort....

 With this recommendation, cancelling traces

      seem wrong to

      me; or at

least,
 this should be documented.

We don't use traces, all of OpenACS does not use

      it, so this

      is no

current issue for us.

-gustaf neumann


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an

      email to

      <lists...@listserv.aol.com> with the

body of "SIGNOFF AOLSERVER" in the email message.

      You can

      leave the Subject: field of your email blank.



      --
      AOLserver - http://www.aolserver.com/

      To Remove yourself from this list, simply send an

      email to

      <lists...@listserv.aol.com> with the
      body of "SIGNOFF AOLSERVER" in the email message.

      You can

      leave the Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/



To Remove yourself from this list, simply send an email to

      <lists...@listserv.aol.com> with the

body of "SIGNOFF AOLSERVER" in the email message. You can

      leave the Subject: field of your email blank.



      --
      AOLserver - http://www.aolserver.com/

      To Remove yourself from this list, simply send an email to
      <lists...@listserv.aol.com> with the
      body of "SIGNOFF AOLSERVER" in the email message. You can
      leave the Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/



To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com

with the

body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com

with the

body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com> with thebody of "SIGNOFF AOLSERVER" in the email message. You can leave theSubject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com> with thebody of "SIGNOFF AOLSERVER" in the email message. You can leave theSubject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Re: [AOLSERVER] missing access log entries

Reply via email to