Re: [AOLSERVER] SSL read error: bad write retry

2007-01-26 Thread Andrew Steets

We have some servers that only service cURL/openssl clients and we never see
these SSL errors on those machines, regardless of load.  OTOH, we have other
servers that face real people, and they tend to log openssl errors
relatively frequently.  Both servers have identical configurations
(aolserver 4.0.10 / openssl 0.9.7e).

At one point, the mod_ssl guys decided there was some issue with the MSIE
ssl stack, but I'm not sure if any of it is still valid.

http://www.modssl.org/docs/2.8/ssl_faq.html#ToC49

-Andrew

On 1/26/07, Alex Kroman <[EMAIL PROTECTED]> wrote:


I can't seem to come up with a good test case that triggers this
behavior

- I have never seen this occur in Firefox (my main browser).
- Using wget in an infinite loop with varying page sizes and varying
loads does not seem to trigger it.
- Just a few minutes ago I was clicking around with Internet Explorer
and reproduced the behavior.
- The pages that trigger this behavior seem to be completely random.

This site is an Intranet for a 100 person company.  I sent out a survey
to the heaviest users of the system and 100% of the Internet Explorer
users have encountered this behavior within the past week and none of
the Apple users have.

Alex

-Original Message-
From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf
Of Steve Manning
Sent: Friday, January 26, 2007 12:56 AM
To: AOLSERVER@listserv.aol.com
Subject: Re: [AOLSERVER] SSL read error: bad write retry

Alex

We see this problem as well and I think its related to the system load.
Our peak load is in October when we are averaging over 500,000 pages per
day and we have had reports of blank pages being returned during this
time.

I spoke to Dossy about it in Sept last year as I know hes been doing
some work on tidying it up but its not yet been committed. See below.

Steve


On 2006.09.20, Steve Manning <[EMAIL PROTECTED]> wrote:
>Could you give us an update on the current state of nsopenssl.
>
>I'm currently using v3_0beta26 but I'm seeing increasing
>numbers of "SSL read error: ssl handshake failure" and "SSL
>write error: bad write retry" errors in the log as the site
>gets more busy (currently about 1.4m requests/day). I see there
>has been some activity in CVS - v3_0beta27 and Head and I'm
>wondering if these changes are worth having and if there
>anything else in the pipeline.

I'm sitting on a whole chunk of changes ... and some of that
logging needs to be rationalized ... either demoted to "Debug"
level, or removed entirely.

At this point in time, are there any serious remaining bugs with
nsopenssl?  I'd like to finally declare "nsopenssl 3.0"
final ...
probably just call it "nsopenssl 3.1" to avoid all the confusion
with the MANY 3.0-beta-something versions.

Lets put together a TODO list for nsopenssl_v3_r1, divide up the
work (or, assign it all to me, doesn't matter) and I'll try to
put an estimate on it.

So: what are you (plural -- all of you) still waiting for to be
done in nsopenssl?

-- Dossy




On Thu, 2007-01-25 at 20:12 -0600, Alex Kroman wrote:
> Our production server is getting 57,000 pageviews per day but I am
> able to replicate this behavior on a development server that I am the
> only user on.
>
> Linux intra 2.6.8-3-686-smp #1 SMP Thu Feb 9 07:05:39 UTC 2006 i686
> GNU/Linux OpenSSL 0.9.7e
>
>
> -Original Message-
> From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On
> Behalf Of Scott Goodwin
> Sent: Thursday, January 25, 2007 5:37 PM
> To: AOLSERVER@LISTSERV.AOL.COM
> Subject: Re: [AOLSERVER] SSL read error: bad write retry
>
> How many connections a day does your server get, and can you give me
> an estimate of the rate of connection activity when the form
> submission fails? Also, send me the output of 'uname -a' and the
> version of OpenSSL you're using.
>
> thanks,
>
> /s.
>
> On Jan 25, 2007, at 5:52 PM, Alex Kroman wrote:
>
> > Hi all,
> >
> > Every day about 1% of connections to my website result in the
> > following
> > error:
> >
> > Error: nsopenssl: SSL write error: bad write retry
> >
> > I can reproduce the error by repeatedly submiting a form.
> > Eventually one
> > of those submits will fail and give the generic Internet Explorer
> > connection error and append the "bad write retry" message to the
log.
> >
> > Has anyone run into this problem?
> >
> > I am using the stock Debian versions of AOLServer 4.0.10 and
> > nsopenssl
>
> > 3.0beta22.
> >
> > Here are some settings from my configuration file:
> >
> > ns_param   maxinput  [expr 1024 * 1024 * 100]
> > ns_param recvwait [expr 20 * 60]
> > ns_param socktimeout 240
> >
> > Thanks,
> > Alex
> >
> >
> > --
> > AOLserver - http://www.aolserver.com/
> >
> > To Remove yourself from this list, simply send an email to
> > <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERV

Re: [AOLSERVER] SSL read error: bad write retry

2007-01-26 Thread Andrew Steets

We have had keepalivetimeout set to 0 for several months at least.  I don't
think that is a complete solution.  The most common error we see is 'SSL
read error: ssl handshake failure.'  It is the most common by far, but we
also see a few other misc SSL errors.  I tried turning tracing on 5-6 months
ago, but the output was not useful (at least not to me) and now I can't find
those logs.

I have a patch for nsopenssl that hooks into the message callback
(SSL_CTX_set_msg_callback), but I haven't tried using it in production
because its really, really verbose.  I can give it to anyone that wants it,
but its also pretty easy to derive from the msg_cb stuff in s_cb.c (from the
openssl dist).

It would be great if we could determine how to reproduce the issue in a
controlled environment.  I'm convinced its related to the MS https stack.
I've tried just hammering an aolserver instance with a simple vb thing using
System.Net.WebRequest over and over, but it doesn't seem to cause the
error.  I'm not sure if it uses the same underlying code, or if a more
complicated series of events has to occur in order to generate the error.

-Andrew


On 1/26/07, Scott Goodwin <[EMAIL PROTECTED]> wrote:


Hi Steve,

If keepalivetimeout is not set at all in your nsd.tcl, it means you
are using keepalive and it is set to 30 seconds. Can you try adding
the keepalivetimetout parameter and setting it to 0 as I mentioned in
a previous message and see if that solves the problem? I'm pretty
sure Andrew found the correct information -- that MSIE has difficulty
with keepalive conns over SSL, particularly since no one has been
able to replicate the problem with other browsers or load testers.
Note that turning off keepalive will turn it off for non-SSL conns as
well, so if you try it, do be careful.

/s.

On Jan 26, 2007, at 2:44 PM, Steve Manning wrote:

> Hi Scott
>
> Long time no hear.
>
> The site is http://www.fancydress.com running on Linux - Centos 4.4
> (RHEL4 derived). We run AOLserver 4.0.10 with OpenACS 5.0.4 over the
> top.
>
> OpenSSL is 0.9.7a-43-14 from the supplied RPM and were using the
> nsopenssl tagged as v3.0beta26 from cvs.
>
>> From the config we have:
>
> ns_section ns/server/${server}/module/nsopenssl/sslcontext/
> users
> ns_param Role  server
> .
> .
> .
> .
> # for Protocols"ALL" = "SSLv2, SSLv3, TLSv1"
> ns_param Protocols "SSLv3, TLSv1"
> ns_param CipherSuite   "ALL:!ADH:RC4+RSA:+HIGH:
> +MEDIUM:
> +LOW:+SSLv2:+EXP"
> ns_param PeerVerifyfalse
> ns_param PeerVerifyDepth   3
> ns_param Trace false
> ns_param SessionCache true
> ns_param SessionCacheID 1
> ns_param SessionCacheSize 512
> ns_param SessionCacheTimeout 300
>
> keepalivetimeout is not set.
>
> Just from this evenings log I can see e.g.
>
> [26/Jan/2007:18:52:34][25120.3050740656][-conn:fancydress::14]
> Error: nsopenssl (fancydress): SSL read error: bad write retry
>
> [26/Jan/2007:19:02:28][25120.3023371184][-conn:fancydress::40]
> Error: nsopenssl (fancydress): SSL read error: ssl handshake
> failure
>
> Let me know if you need anything else.
>
>   Steve
>
>
> On Fri, 2007-01-26 at 12:55 -0500, Scott Goodwin wrote:
>> Steve, what version of OpenSSL are you running on the site that
>> you're experiencing this problem on?
>>
>> /s.
>>
>> On Jan 26, 2007, at 3:55 AM, Steve Manning wrote:
>>
>>> Alex
>>>
>>> We see this problem as well and I think its related to the system
>>> load.
>>> Our peak load is in October when we are averaging over 500,000
>>> pages per
>>> day and we have had reports of blank pages being returned during
>>> this
>>> time.
>>>
>>> I spoke to Dossy about it in Sept last year as I know hes been doing
>>> some work on tidying it up but its not yet been committed. See
>>> below.
>>>
>>> Steve
>>>
>>
>>
>> --
>> AOLserver - http://www.aolserver.com/
>>
>> To Remove yourself from this list, simply send an email to
>> <[EMAIL PROTECTED]> with the
> --
> Steve Manning - Mandrake Linux 10.1 - Gnome 2.6
> East Goscote  - Leicester - UK +44 (0)116 260 5457
> E-Mail: [EMAIL PROTECTED] - Web: www.festinalente.co.uk
> AIM: verbomania - Public Key: 25665CAF from wwwkeys.pgp.net
> ---
>  There are only 10 types of people in this world
>  Those who understand binary and those who don't
> ---
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to
> <[EMAIL PROTECTED]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.


--
AOLserver - htt

Re: [AOLSERVER] SSL read error: bad write retry

2007-01-29 Thread Andrew Steets

I noticed that the default ssl session cache size is only 128, and the
default session timeout is five minutes.  If clients are not expiring the
session before 5 minutes, and you've got more than 128 clients in 5 minutes,
then what should happen?

The openssl documentation is a bit unclear:

SSL_CTX_sess_set_cache_size(3):
When the maximum number of sessions is reached, no more new sessions are
added to the cache. New space may be added by calling
SSL_CTX_flush_sessions(3) to remove expired sessions.

SSL_CTX_flush_sessions(3):
As sessions will not be reused ones they are expired, they should be removed
from the cache to save resources. This can either be done automatically
whenever 255 new sessions were established or manually by calling
SSL_CTX_flush_sessions().

And it doesn't look like nsopenssl ever calls SSL_CTX_flush_sessions()
explicitly.

So the default cache size is 128, but it only flushed after 255 sessions?
That sounds like trouble.  Has anyone tried increasing the
'sessioncachesize' parameter?

Also, it looks like openssl tracks cache full events on a per-ctx basis, but
they aren't exposed in nsopenssl.  That might be nice to have in a future
rev.

-Andrew

On 1/29/07, Alex Kroman <[EMAIL PROTECTED]> wrote:


Hi all,

I turned off keepalive on our production server but am still receiving
the "bad write retry" errors.

-Alex

-Original Message-
From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf
Of Dossy Shiobara
Sent: Friday, January 26, 2007 10:35 AM
To: AOLSERVER@LISTSERV.AOL.COM
Subject: Re: [AOLSERVER] SSL read error: bad write retry

On 2007.01.26, Alex Kroman <[EMAIL PROTECTED]> wrote:
> I had Siege connect to my development server 50,000 times and did not
> receive the bad write retry once.  While clicking around the site with

> Siege active I still got the "bad write retry" and a blank page in
> about
> 75 clicks.  This is a similar result to what I would get when my
> development server is not under load.

I smell SSLv2 at play here.  I bet Firefox is using TLS or SSLv3, while
IE is still using SSLv2.

What do your "protocols" and "ciphersuite" ns_param's look like in your
nsopenssl config?

-- Dossy

--
Dossy Shiobara  | [EMAIL PROTECTED] | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  "He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to
<[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the
email message. You can leave the Subject: field of your email blank.


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <
[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.




--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] SSL read error: bad write retry

2007-01-30 Thread Andrew Steets
case CConnectRenegotiateIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_connect_renegotiate(sslcontext->sslctx));
+ break;
+ case CAcceptIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_accept(sslcontext->sslctx));
+ break;
+ case CAcceptGoodIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_accept_good(sslcontext->sslctx));
+ break;
+ case CAcceptRenegotiateIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_accept_renegotiate(sslcontext->sslctx));
+ break;
+ case CCacheHitsIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_hits(sslcontext->sslctx));
+ break;
+ case CCacheMissesIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_misses(sslcontext->sslctx));
+ break;
+ case CCacheFullIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_cache_full(sslcontext->sslctx));
+ break;
+ case CTimeoutsIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_timeouts(sslcontext->sslctx));
+ break;
+ }
+
+ return TCL_OK;
+ }
+
+


 /*
  *------


On 1/29/07, Scott Goodwin <[EMAIL PROTECTED]> wrote:


At this point I'd prefer not to speculate -- much better to replicate the

problem and see it in all its dynamic glory. However, my sense is that
session caching, keepalive and other factors may make the problem worse but
are not likely to be root causes.




/s.




On Jan 29, 2007, at 6:13 PM, Andrew Steets wrote:

I noticed that the default ssl session cache size is only 128, and the

default session timeout is five minutes.  If clients are not expiring the
session before 5 minutes, and you've got more than 128 clients in 5 minutes,
then what should happen?


The openssl documentation is a bit unclear:

 SSL_CTX_sess_set_cache_size(3):

 When the maximum number of sessions is reached, no more new sessions are

added to the cache. New space may be added by
calling  SSL_CTX_flush_sessions(3) to remove expired sessions.


 SSL_CTX_flush_sessions(3):

As sessions will not be reused ones they are expired, they should be

removed from the cache to save resources. This can either be done
automatically whenever 255 new sessions were established or manually by
calling SSL_CTX_flush_sessions().


And it doesn't look like nsopenssl ever calls SSL_CTX_flush_sessions()

explicitly.


So the default cache size is 128, but it only flushed after 255

sessions?  That sounds like trouble.  Has anyone tried increasing the
'sessioncachesize' parameter?


 Also, it looks like openssl tracks cache full events on a per-ctx basis,

but they aren't exposed in nsopenssl.  That might be nice to have in a
future rev.


-Andrew


On 1/29/07, Alex Kroman <[EMAIL PROTECTED]> wrote:
>  Hi all,
>
> I turned off keepalive on our production server but am still receiving
> the "bad write retry" errors.
>
> -Alex
>
> -Original Message-
> From: AOLserver Discussion [mailto: [EMAIL PROTECTED] On

Behalf

> Of Dossy Shiobara
> Sent: Friday, January 26, 2007 10:35 AM
> To: AOLSERVER@LISTSERV.AOL.COM
> Subject: Re: [AOLSERVER] SSL read error: bad write retry
>
> On 2007.01.26, Alex Kroman <[EMAIL PROTECTED]> wrote:
> > I had Siege connect to my development server 50,000 times and did not
> > receive the bad write retry once.  While clicking around the site with



>
> > Siege active I still got the "bad write retry" and a blank page in
> > about
> > 75 clicks.  This is a similar result to what I would get when my
> > development server is not under load.
>
> I smell SSLv2 at play here.  I bet Firefox is using TLS or SSLv3, while
> IE is still using SSLv2.
>
> What do your "protocols" and "ciphersuite" ns_param's look like in your
> nsopenssl config?
>
> -- Dossy
>
> --
> Dossy Shiobara  | [EMAIL PROTECTED] | http://dossy.org/
> Panoptic Computer Network   |  http://panoptic.com/
>   "He realized the fastest way to change is to laugh at your own
> folly -- then you can let go and quickly move on." (p. 70)
>
>
> --
> AOLserver -  http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to
> <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the
> email message. You can leave the Subject: field of your email blank.
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to <

[EMAIL PROTECTED]> with the

> body of "

Re: [AOLSERVER] SSL read error: bad write retry

2007-01-31 Thread Andrew Steets
f(interp->result, "%ld",
SSL_CTX_sess_misses(sslcontext->sslctx));
+ break;
+ case CCacheFullIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_cache_full(sslcontext->sslctx));
+ break;
+ case CTimeoutsIdx:
+ sprintf(interp->result, "%ld",
SSL_CTX_sess_timeouts(sslcontext->sslctx));
+ break;
+ }
+
+ return TCL_OK;
+ }
+
+


 /*
  *--


On 1/30/07, Andrew Steets <[EMAIL PROTECTED]> wrote:


I'm still looking into this and I'm still just confused.  There is a small
bug on line 1731 of sslcontext.c (CVS).

SSL_CTX_set_session_id_context(
sslcontext->sslctx,
(void *) &sslcontext->sessionCacheId,
sizeof(sslcontext->sessionCacheId)
);


That sizeof() should almost certainly be strlen (sessionCacheId is char *,
not char[]), but I don't think thats really causing any of my problems.

Here is a patch that exposes some of the openssl session cache statistics
collection functionality to the TCL layer.  I tend to get zero back from
everything in my test environment, so I get the feeling something is not
quite right with the session caching logic, but I still haven't found
anything substantial.  This page roughly describes what _should_ be
available.

http://www.openssl.org/docs/ssl/SSL_CTX_sess_number.html

use like 'ns_openssl_sess_stats cache_hits default_ctx'.

-Andrew

Index: tclcmds.c
===
RCS file: /cvsroot/aolserver/nsopenssl/tclcmds.c,v
retrieving revision 1.51
diff -c -r1.51 tclcmds.c
*** tclcmds.c13 Jun 2004 04:21:31 -  1.51
--- tclcmds.c   31 Jan 2007 02:48:55 -
***
*** 138,144 
  NsTclOpenSSLSockListenObjCmd,
  NsTclOpenSSLSockListenCallbackObjCmd,
  NsTclOpenSSLSockCallbackObjCmd,
! NsTclOpenSSLGetUrlObjCmd;

  extern Tcl_CmdProc
  NsTclOpenSSLGetUrlCmd,
--- 138,145 
  NsTclOpenSSLSockListenObjCmd,
  NsTclOpenSSLSockListenCallbackObjCmd,
  NsTclOpenSSLSockCallbackObjCmd,
! NsTclOpenSSLGetUrlObjCmd,
! NsTclOpenSSLSessStatsObjCmd;

  extern Tcl_CmdProc
  NsTclOpenSSLGetUrlCmd,
***
*** 160,165 
--- 161,167 
  {"ns_openssl_socklisten",
NULL,  NsTclOpenSSLSockListenObjCmd   },

  {"ns_openssl_sockcallback",
NULL,  NsTclOpenSSLSockCallbackObjCmd },
  {"ns_openssl_socklistencallback",
NULL,  NsTclOpenSSLSockListenCallbackObjCmd   },

+ {"ns_openssl_sess_stats",
NULL,  NsTclOpenSSLSessStatsObjCmd},
  #if 0  /* these ns_openssl_sock* commands are not implemented */
  {"ns_openssl_socknread",  NsTclOpenSSLSockNReadCmd,  NULL
},
  {"ns_openssl_sockselect", NsTclOpenSSLSockSelectCmd,
NULL   },
***
*** 1254,1259 
--- 1256,1358 
  return TCL_OK;
  }

+ /*
+  *--
+  *
+  * NsTclOpenSSLSessStatsObjCmd --
+  *
+  *  Return per-context session statistics
+  *
+  * Results:
+  *  Tcl result.
+  *
+  * Side effects:
+  *  None.
+  *
+  *--
+  */
+
+ int
+ NsTclOpenSSLSessStatsObjCmd(ClientData arg, Tcl_Interp *interp, int
objc,
+ Tcl_Obj *CONST objv[])
+ {
+ static CONST char *opts[] = {
+ "number", "connect", "connect_good", "connect_renegotiate",
+ "accept", "accept_good", "accept_renegotiate", "cache_hits",
+ "cache_misses", "cache_full", "timeouts"
+ };
+
+ enum ISubCmdIdx {
+ CNumberIdx, CConnectIdx, CConnectGoodIdx,
CConnectRenegotiateIdx,
+ CAcceptIdx, CAcceptGoodIdx, CAcceptRenegotiateIdx,
CCacheHitsIdx,
+ CCacheMissesIdx, CCacheFullIdx, CTimeoutsIdx
+ } opt;
+
+ Server*thisServer = (Server *) arg;
+ NsOpenSSLContext  *sslcontext = NULL;
+
+ if (objc < 2 || objc > 3) {
+ Tcl_WrongNumArgs(interp, 1, objv, "name ?sslcontext?");
+ return TCL_ERROR;
+ }
+
+ if (objc == 4) {
+ sslcontext = Ns_OpenSSLServerSSLContextGet(thisServer->server,
objv[2]);
+ } else {
+ sslcontext =
NsOpenSSLContextClientDefaultGet(thisServer->server);
+ }
+
+ if (sslcontext == NULL) {
+ Tcl_SetResult(interp, "failed to use either named or default SSL
context",
+ TCL_STATIC);
+ return TCL_ERROR;
+ }
+
+ if (Tcl_GetIndexFr

[AOLSERVER] leak in nsoracle.c

2007-04-10 Thread Andrew Steets

In tcl_error_p(), two buffers (msgbuf and buf) are allocated from the heap.
Further along, if the error turns out to be ORA-1405, the call returns
without freeing the buffers.  It seems like an easy enough fix, but I don't
understand why 1405 gets handled differently.  To reproduce leak, use
"ns_ora exec_plsql" on a function that returns NULL.

Anyone else run into this?

-Andrew


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] leak in nsoracle.c

2007-04-11 Thread Andrew Steets

A bit more discussion about this bug.  A proposed patch against HEAD
is attached.

tcl_error_p:

tcl_error_p() is a function used to check the return status of all OCI
calls.  If they return something bad, tcl_error_p generates a useful
error message, massages any database connections (if necessary), and
returns non-zero.

the memory leak:

First of all, it doesnt exist in the 2.7 version of nsoracle.  In 2.7,
any stored function returning NULL would cause ORA-01405 to be
generated, a TCL error would be thrown, and it was up to the
application to handle it appropriately.  In an update added to CVS
shortly after 2.7 was tagged, a special exception was added to
tcl_error_p() for ORA-01405.  If this error was detected, tcl_error_p
simply returned zero, as if there was no error returned from OCI.  The
memory leak is a result of this change: in a rush to exit the
function, no dynamic memory allocations were cleaned up.

To reproduce the issue with a post-2.7 nsoracle driver.  Execute the
following in nscp and watch the aolserver memory footprint grow.  It
leaks about 40k on each iteration, so this will grow the aolserver by
a noticable 35-40M.

nscp 1> set dbh [ns_db gethandle]
nscp 2> for {set i 0} {$i < 1000} {incr i 1} { ns_ora exec_plsql $dbh
"begin :1 := to_char(null); end;" }

the patch:

It turns out that the OCI bind functions can take an "indicator
variable" argument.  It's just a pointer to an int, and you're
supposed to check it for after the statement is executed.  nsoracle
was passing 0 for the indicator variable, so there was no way to know
whether OCI was returning an empty string or NULL.  If you pass a
valid indicator variable, Oracle no longer throws ORA-01405 when a
stored function returns NULL.  Thats effectively what this patch does.
This allows removal of the explicit 1405 handling code from
tcl_error_p(), and eliminates the memory leak.

OCIBindByPos:
http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14250/oci15rel003.htm#i456224

OCI Indicator variables:
http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14250/oci02bas.htm#i462559

-Andrew

On 4/10/07, Andrew Steets <[EMAIL PROTECTED]> wrote:

In tcl_error_p(), two buffers (msgbuf and buf) are allocated from the heap.
Further along, if the error turns out to be ORA-1405, the call returns
without freeing the buffers.  It seems like an easy enough fix, but I don't
understand why 1405 gets handled differently.  To reproduce leak, use
"ns_ora exec_plsql" on a function that returns NULL.

Anyone else run into this?

-Andrew




--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.
Index: nsoracle.c
===
RCS file: /cvsroot/aolserver/nsoracle/nsoracle.c,v
retrieving revision 1.25
diff -u -r1.25 nsoracle.c
--- nsoracle.c	22 Feb 2006 16:14:58 -	1.25
+++ nsoracle.c	11 Apr 2007 17:15:40 -
@@ -566,7 +566,17 @@
 ora_connection_t  *connection;
 oci_status_t   oci_status;
 char  *query, *buf;
-  
+
+/* This indicator variable is a dummy.  We don't actually check the
+ * status.  Previously, we set the indp parameter to OCIBindByPos to
+ * 0.  Oracle would throw ORA-01405 and we would specifically ignore
+ * it in tcl_error_p.  Now we pass this dummy variable, and Oracle
+ * returns OCI_SUCCESS whether or not the returned value is NULL.
+ * This eliminates the need for explicitly handling ORA-01405 in
+ * tcl_error_p. */
+
+sb2   null_indicator;
+
 if (objc != 4) {
 Tcl_WrongNumArgs(interp, 2, objv, 
 "dbhandle dbId sql");
@@ -616,7 +626,7 @@
 			   buf,
 			   EXEC_PLSQL_BUFFER_SIZE,
 			   SQLT_STR,
-			   0,
+			   &null_indicator,
 			   0,
 			   0,
 			   0,


[AOLSERVER] segfault in ns_driver query

2008-11-07 Thread Andrew Steets
Hello,

In certain cases the ns_driver query logic can end up dereferencing a
null pointer.  The code iterates over all the sockets waiting for I/O
events and prints out some info about the socket and the associated
conn via NsAppendConn(sockPtr->connPtr).  It turns out that if the
sock is in SOCK_CLOSEWAIT state, the connection associated with the
sock has already been freed, so the NsAppendConn call blows up.

I can reproduce the crash by logging into the nscp port and calling
ns_driver query over and over on a lightly loaded development server.

Here is a patch that just puts an empty list where the connection info
would be in the case that the conn is null.  Does it look ok?

--- driver.c.orig   2008-11-07 17:17:36.0 +
+++ driver.c2008-11-07 17:20:33.0 +
@@ -1328,7 +1328,12 @@
pdata.pfds[sockPtr->pidx].revents,
sockPtr->acceptTime.sec, sockPtr->acceptTime.usec,
sockPtr->timeout.sec, sockPtr->timeout.usec);
-   NsAppendConn(drvPtr->queryPtr, sockPtr->connPtr, "i/o");
+   if (sockPtr->connPtr != NULL) {
+   NsAppendConn(drvPtr->queryPtr, sockPtr->connPtr, "i/o");
+   } else {
+   Tcl_DStringStartSublist(drvPtr->queryPtr);
+   Tcl_DStringEndSublist(drvPtr->queryPtr);
+   }
Tcl_DStringEndSublist(drvPtr->queryPtr);
sockPtr = sockPtr->nextPtr;
}

-Andrew


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


[AOLSERVER] missing access log entries

2009-03-26 Thread Andrew Steets
Hello,

There are certain cases where connections probably ought to generate
access log entries but do not.  Specifically if an ADP exits via
ns_adp_abort no access log entry will be generated, but data may have
been returned to the client.  This seems like a bug.

The access log callback is registered via Ns_RegisterServerTrace.  The
comments indicate that callbacks registered with this mechanism will
only be run if "the connection request procedure successfully responds
to the clients request."  There is another callback procedure called
Ns_RegisterConnCleanup which will run its traces "at the end of
connection no matter the result code from the connection's request
procedure."

Switching the nslog code to use Ns_RegisterConnCleanup rather than
Ns_RegisterServerTrace seems to solve the case of the missing access log
entries.  Does that seem like a reasonable thing to do?  I'm not
familiar with the details of the various tracing hooks.

Trivial patch is attached.

-Andrew


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.
Index: nslog/nslog.c
===
RCS file: /cvsroot/aolserver/aolserver/nslog/nslog.c,v
retrieving revision 1.16
diff -u -r1.16 nslog.c
--- nslog/nslog.c	8 Aug 2005 11:32:18 -	1.16
+++ nslog/nslog.c	26 Mar 2009 16:49:52 -
@@ -217,7 +217,7 @@
 if (LogOpen(logPtr) != NS_OK) {
 	return NS_ERROR;
 }
-Ns_RegisterServerTrace(server, LogTrace, logPtr);
+Ns_RegisterConnCleanup(server, LogTrace, logPtr);
 Ns_RegisterAtShutdown(LogCloseCallback, logPtr);
 Ns_TclInitInterps(server, AddCmds, logPtr);
 return NS_OK;


Re: [AOLSERVER] missing access log entries

2009-03-26 Thread Andrew Steets
On Thu, Mar 26, 2009 at 1:40 PM, Dossy Shiobara  wrote:
> I wonder - should this be the documented known behavior of ns_adp_abort vs.
> ns_adp_return?  i.e., abort indicates that the connection is intentionally
> terminated, not logged, etc. vs. ns_adp_return which halts ADP processing
> but continues the connection, which includes logging, etc.

We have some ADP code that explicitly returns data via ns_return and
then calls ns_adp_abort to discontinue processing.  It isn't an error
per se, just another way of getting data back to the client.  Maybe
it's a pathological case.  I don't understand exactly why we do it
that way (not my code), but the ns_adp_abort documentation mentions
this type of strategy.

> I'm inclined to agree with you that the current behavior is a bug, but it
> raises the question: should there be such a function that says "this
> connection wasn't handled, don't even log it" - or, should ALL connections
> always be logged, even if it's aborted?

As Scott suggested, we should probably log everything, at least for
some reasonable value of "everything."  Even if you switch the access
log trace to the cleanup callback, you still don't get access entries
for clients who connect but don't issue a well formed HTTP request.  I
don't have a huge problem with that, and I think it would be difficult
to log those types of events.


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] missing access log entries

2009-03-26 Thread Andrew Steets
This thread has turned into an interesting discussion of what may be a
fairly useful ADP programming idiom, but I don't want to focus too
much on ns_adp_abort.  There are other cases that will cause the ADP
to run without generating an access log entry.  For example, if you
call one of the ns_returnxxx family of functions within an ADP, no
access log entry will be created regardless of whether or not
ns_adp_abort was called.  I think there are more cases but I can't
come up with any right now.

I suspect, though I have not had time to confirm, that in aolserver
4.0 this was not a problem.  If anyone has a testing 4.0 instance can
you please run a quick test?

If we decouple the web developers from the people who monitor and
administer the web server I think it becomes more clear.  If I am a
security administrator, I want to know all the URLs that have been
accessed on the server, regardless of what the ADP code does.  If I am
a systems administrator, I want to know how much data was sent out in
response to any request, and how much time it took to process the
request (I'm a *huge* fan of logreqtime), regardless of what the ADP
did.

The patch I sent earlier seems to fulfill these needs, but I am
worried about corner cases where LogTrace (from the nslog module)
could blow up.  Nothing about the state of the Conn * seems to be
guaranteed when the ConnCleanup callbacks are called.

-Andrew

On Thu, Mar 26, 2009 at 6:34 PM, Tom Jackson  wrote:
> By a strange coincidence I needed had a similar issue with Tcl (tcl
> pages).
>
> I did a ns_returnredirect way deep into an application. I was hoping to
> abort further execution of Tcl code, but by design, script execution
> continues.
>
> I considered that only throwing an error would unwind everything
> correctly. Since I hate the idea of doing this inside Tcl code, I
> decided to live with the problem.
>
> Then I discovered ns_tcl_abort. Here's the def:
> (from modules/file.tcl):
> #
> # ns_tcl_abort is a work-alike ns_adp_abort.
> #
> proc ns_tcl_abort {} {
>    error ns_tcl_abort "" NS_TCL_ABORT
> }
>
> So if this is a work-alike, the intent could be to stop processing deep
> within some code, but it shouldn't have any effect on the logging.
>
> Note that ns_sourceproc catches the above error:
> proc ns_sourceproc {..} {
> ...
>        if {$code == 1 && $errorCode == "NS_TCL_ABORT"} {
>            return
>        }
> 
> }
>
> So I think normal logging should take place.
>
> The best evidence for normal logging is that ns_adp_abort is called
> intentionally, so the programmer can decide when to do it.
>
> tom jackson
>
> On Thu, 2009-03-26 at 16:11 -0400, Scott Goodwin wrote:
>> All connections should be logged as requests that came from clients
>> along with details on how the server responds. Some indication that
>> the connection was aborted should be made in the log, perhaps with a
>> count of how many bytes were transferred. In cases where no response
>> is going to be sent and the connection aborted, the response code
>> shown in the log could be left blank or as a placeholder (e.g. "xxx").
>> The general principle is that we always want visibility into what
>> happens with every connection -- in many situations we are serving
>> anonymous clients who aren't going to call and complain or post a
>> trouble ticket, so it's nice to see such aborted conns in the logs as
>> an indication that we might need to investigate what's going on.
>>
>> /s.
>>
>> On Mar 26, 2009, at 2:40 PM, Dossy Shiobara wrote:
>>
>> > On 3/26/09 1:31 PM, Andrew Steets wrote:
>> >> Hello,
>> >>
>> >> There are certain cases where connections probably ought to generate
>> >> access log entries but do not.  Specifically if an ADP exits via
>> >> ns_adp_abort no access log entry will be generated, but data may have
>> >> been returned to the client.  This seems like a bug.
>> >
>> > I wonder - should this be the documented known behavior of
>> > ns_adp_abort vs. ns_adp_return?  i.e., abort indicates that the
>> > connection is intentionally terminated, not logged, etc. vs.
>> > ns_adp_return which halts ADP processing but continues the
>> > connection, which includes logging, etc.
>> >
>> > I'm inclined to agree with you that the current behavior is a bug,
>> > but it raises the question: should there be such a function that
>> > says "this connection wasn't handled, don't even log it" - or,
>> > should ALL connections always be logged, even if it's aborted?
>> >
>> > Thanks, Andrew.
>> >
>> > --
>> > Dossy Shiobara              | do...@panoptic.com | http://dossy.org/
>> > Panoptic Computer Network   | http://panoptic.com/
>> >  "He realized the fastest way to change is to laugh at your own
>> >    folly -- then you can let go and quickly move on." (p. 70)
>> >


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] missing access log entries

2009-04-02 Thread Andrew Steets
What was the original purpose of "trace" filters?  At the C API level there
is a distinction between between a trace filter and a cleanup callback, but
it doesn't look like you can register a cleanup proc from TCL.  Maybe this
was mistakenly omitted?

The cleanup procs run unconditionally.  It seems like that is the most
appropriate place to handle "cleanup of resources."  Alternatively we could
change the trace filters to run regardless of the Ns_ConnRunRequest() return
status, but then that would make them basically the same as the cleanups.

I looked a little deeper into the source.  The confusion seems to arise in
NsAdpFlush() which is run at the end of all ADP processing.  The code there
is smart enough to recognize when an abort exception has been signalled; it
sets the TCL result to "adp flush disabled: adp aborted", but it still
returns TCL_ERROR.  That is essentially where the TCL exception gets turned
into a full blown connection processing error.  We could change NsAdpFlush()
to return success when it recognizes the abort exception, or just not run
NsAdpFlush() for abort exceptions.

There would still be cases where trace filters would not run though.  For
instance if you called ns_returnxxx without calling ns_adp_abort.  I'm not
sure if that is a bad thing.

It would be nice to hear from anyone who knows about the original motivation
for the trace and cleanup filters.

-Andrew

On Thu, Apr 2, 2009 at 3:53 PM, Tom Jackson  wrote:

> Gustaf,
>
> You may be "using" traces but not realize it, it sounds like
> ns_adp_abort isn't don't what was originally intended.
>
> I wouldn't worry about an runtime error caused during running traces, it
> would be an error to even use ns_adp_abort in a trace filter because the
> connection is already finished. This is analogous to calling [break]
> outside of a loop.
>
> It seems important to consider ns_adp_abort, ns_adp_return and
> ns_adp_break as a unit. They add necessary loop type controls so that
> developers can create deeply nested code and still get out of it without
> the need to use [catch]. But, like a lot of AOLserver specific
> procedures, there is no hand-holding in their use. They can be misued.
>
> In this particular case, it looks like somewhere along the way,
> ns_adp_abort was modified to not work as expected.
>
> The desired effect is exactly what you would get by returning
> filter_return from a preauth or postauth filter. This effect is to skip
> to trace filters, not past them.
>
> Skipping trace filters even on an aborted connection would be a disaster
> for any application which relies on cleanup of resources.
>
> tom jackson
>
> On Thu, 2009-04-02 at 11:12 +0200, Gustaf Neumann wrote:
> > Andrew Steets schrieb:
> > > The patch I sent earlier seems to fulfill these needs, but I am
> > > worried about corner cases where LogTrace (from the nslog module)
> > > could blow up.  Nothing about the state of the Conn * seems to be
> > > guaranteed when the ConnCleanup callbacks are called.
> > >
> > Dear Andrew,
> >
> > i think most (all?) of the repondents seems to agree that writing in the
> > about case to
> > the access log file. For me there are still two quesions open:
> >
> > a) is it possoble to call ns_adp_abort at some time, where the server
> > might crash
> >(in normal operations, everthing looks fine to me, problems might
> > occur in
> >when called from some traces; other calls are likely to have similar
> > problems)
> >
> > b) the patch replaces the call to the regular server trace by a
> > connection cleanup call.
> >this means, at least in 4.5.*, ns_adp_abort seems to cancel all
> > traces (also
> >these registered with ns_register_trace). Is this desired?
> >
> >From Tom's website:
> http://rmadilo.com/files/nsapi/ns_adp_abort.html
> >the doc of ns_adp_abort says
> >
> >... Every ns_returnxxx call in an ADP should be followed with a call
> > to ns_adp_abort
> >
> >With this recommendation, cancelling traces seem wrong to me; or at
> > least,
> >this should be documented.
> >
> > We don't use traces, all of OpenACS does not use it, so this is no
> > current issue for us.
> >
> > -gustaf neumann
> >
> >
> > --
> > AOLserver - http://www.aolserver.com/
> >
> > To Remove yourself from this list, simply send an email to <
> lists...@listserv.aol.com> with the
> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
> >
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to <
> lists...@listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] missing access log entries

2009-04-03 Thread Andrew Steets
My original concern was with the access logging proc, which happens to be
run as a trace filter.  I think that the access log entries should be
generated regardless of whether or not ns_adp_abort is called.  I don't care
too much about anything else that was installed as a trace filter.

Do you agree that access log entries should be generated if ns_adp_abort is
called?

-Andrew

On Fri, Apr 3, 2009 at 10:33 AM, Tom Jackson  wrote:

> Andrew,
>
> I wasted a little more time looking at the actual code. My impression is
> that everything is working as expected. If there is an error in a
> postauth filter or in adp processing (registered proc), trace filters
> are skipped. Until about 4.5, errors during preauth also skipped trace
> filters. Not sure why this change was made.
>
> The only think that matters is what happens in Ns_AdpRequest. If there
> were no errors, the request will be logged. In order to get ns_adp_abort
> to work correctly, the tcl result must be set to TCL_ERROR until code
> returns to Ns_AdpRequest. This is why an additional structure is
> maintained for the adp exception, which is independent of the tcl
> exception. In this case, adp.exception indicates what actually happened
> during adp processing.
>
> So things seem to be working as intended, and they have been working the
> same way for a long time. It might be possible that you are misusing
> ns_adp_abort, or something else is messing up.
>
> Could you provide a simple test case, probably a few nested adp
> includes, which repeats the issue? Without a test case of what you think
> should work differently, it is hard to give any more advice.
>
> In general, when an error occurs during a request, the response is by
> definition an error response, so the original request might get
> transformed into an internal redirect to your error handling page. An
> error in this page, or a missing error page could cause further
> problems.
>
> Bottom line: no reason to believe that this is a bug.
>
> tom jackson
>
> p.s. this case seems to validate my belief that the hardest bug to find
> and fix is one that doesn't actually exist.
>
> On Thu, 2009-04-02 at 18:03 -0500, Andrew Steets wrote:
> > What was the original purpose of "trace" filters?  At the C API level
> > there is a distinction between between a trace filter and a cleanup
> > callback, but it doesn't look like you can register a cleanup proc
> > from TCL.  Maybe this was mistakenly omitted?
> >
> > The cleanup procs run unconditionally.  It seems like that is the most
> > appropriate place to handle "cleanup of resources."  Alternatively we
> > could change the trace filters to run regardless of the
> > Ns_ConnRunRequest() return status, but then that would make them
> > basically the same as the cleanups.
> >
> > I looked a little deeper into the source.  The confusion seems to
> > arise in NsAdpFlush() which is run at the end of all ADP processing.
> > The code there is smart enough to recognize when an abort exception
> > has been signalled; it sets the TCL result to "adp flush disabled: adp
> > aborted", but it still returns TCL_ERROR.  That is essentially where
> > the TCL exception gets turned into a full blown connection processing
> > error.  We could change NsAdpFlush() to return success when it
> > recognizes the abort exception, or just not run NsAdpFlush() for abort
> > exceptions.
> >
> > There would still be cases where trace filters would not run though.
> > For instance if you called ns_returnxxx without calling ns_adp_abort.
> > I'm not sure if that is a bad thing.
> >
> > It would be nice to hear from anyone who knows about the original
> > motivation for the trace and cleanup filters.
> >
> > -Andrew
> >
> > On Thu, Apr 2, 2009 at 3:53 PM, Tom Jackson  wrote:
> > Gustaf,
> >
> > You may be "using" traces but not realize it, it sounds like
> > ns_adp_abort isn't don't what was originally intended.
> >
> > I wouldn't worry about an runtime error caused during running
> > traces, it
> > would be an error to even use ns_adp_abort in a trace filter
> > because the
> > connection is already finished. This is analogous to calling
> > [break]
> > outside of a loop.
> >
> > It seems important to consider ns_adp_abort, ns_adp_return and
> > ns_adp_break as a unit. They add necessary loop type controls
> > so that
> > developers can create deeply nested code and still get out of
> > 

Re: [AOLSERVER] missing access log entries

2009-04-03 Thread Andrew Steets
This is what I suggested a few emails ago :-)

> I looked a little deeper into the source.  The confusion seems to
> arise in NsAdpFlush() which is run at the end of all ADP processing.
> The code there is smart enough to recognize when an abort exception
> has been signalled; it sets the TCL result to "adp flush disabled: adp
> aborted", but it still returns TCL_ERROR.  That is essentially where
> the TCL exception gets turned into a full blown connection processing
> error.  We could change NsAdpFlush() to return success when it
> recognizes the abort exception, or just not run NsAdpFlush() for abort
> exceptions.

I'm fine with this patch.

-Andrew

On Fri, Apr 3, 2009 at 2:37 PM, Tom Jackson  wrote:

> Hey,
>
> Hopefully this is my last post on this subject, I think I actually found
> the bug.
>
> The bug is in NsAdpFlush from nsd/adprequest.c:
>
> 214- */
> 215-
> 216-Tcl_ResetResult(interp);
> 217:if (itPtr->adp.exception == ADP_ABORT) {
> 218-Tcl_SetResult(interp, "adp flush disabled: adp aborted",
> TCL_STATIC);
> 219-result = TCL_OK;
> 220-} else if (len == 0 && stream) {
>
>
> The bug was a missing line setting result to TCL_OK. (line 219).
>
>
> Also, ns_adp_return cannot be used after and ns_returnxxx command as adp
> processing continues after calling it.
>
> Here are two test files:
>
> test-adp-abort.adp:
>
> <%
>
> ns_return 200 text/plain hi
>
> ns_adp_abort
>
> %>
>
> test-adp-return.adp:
>
> <%
>
> ns_adp_puts hi
>
> ns_adp_return
>
> %>
>
> Both of these result in an access.log entry.
>
> Before the change, ns_adp_abort would lead to an error message:
>
> adp flush failed: connection closed
>abort exception raised
> while processing connection #2:
>GET /test.adp HTTP/1.1
>Host: 127.0.0.1:8000
>User-Agent: ...
>Accept: 
>Accept-Language: en-us,en;q=0.5
>Accept-Encoding: gzip,deflate
>Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
>Keep-Alive: 300
>Connection: keep-alive
>Cache-Control: max-age=0
>
>
> This error message is valid if ns_adp_return is used after and
> ns_returnxxx.
>
> tom jackson
>
>
>
> On Fri, 2009-04-03 at 08:33 -0700, Tom Jackson wrote:
> > Andrew,
> >
> > I wasted a little more time looking at the actual code. My impression is
> > that everything is working as expected. If there is an error in a
> > postauth filter or in adp processing (registered proc), trace filters
> > are skipped. Until about 4.5, errors during preauth also skipped trace
> > filters. Not sure why this change was made.
> >
> > The only think that matters is what happens in Ns_AdpRequest. If there
> > were no errors, the request will be logged. In order to get ns_adp_abort
> > to work correctly, the tcl result must be set to TCL_ERROR until code
> > returns to Ns_AdpRequest. This is why an additional structure is
> > maintained for the adp exception, which is independent of the tcl
> > exception. In this case, adp.exception indicates what actually happened
> > during adp processing.
> >
> > So things seem to be working as intended, and they have been working the
> > same way for a long time. It might be possible that you are misusing
> > ns_adp_abort, or something else is messing up.
> >
> > Could you provide a simple test case, probably a few nested adp
> > includes, which repeats the issue? Without a test case of what you think
> > should work differently, it is hard to give any more advice.
> >
> > In general, when an error occurs during a request, the response is by
> > definition an error response, so the original request might get
> > transformed into an internal redirect to your error handling page. An
> > error in this page, or a missing error page could cause further
> > problems.
> >
> > Bottom line: no reason to believe that this is a bug.
> >
> > tom jackson
> >
> > p.s. this case seems to validate my belief that the hardest bug to find
> > and fix is one that doesn't actually exist.
> >
> > On Thu, 2009-04-02 at 18:03 -0500, Andrew Steets wrote:
> > > What was the original purpose of "trace" filters?  At the C API level
> > > there is a distinction between between a trace filter and a cleanup
> > > callback, but it doesn't look like you can register a cleanup proc
> > > from TCL.  Maybe this was mistakenly omitted?
> > >
> > > The cleanup procs run unconditionally.  It seems like that is

Re: [AOLSERVER] missing access log entries

2009-04-04 Thread Andrew Steets
Hi Tom,

Attachments seem to work ok on this list.

I don't think we can return 500 internal server error after
Ns_ConnRunRequest has been invoked as it may have already sent an http
response code via streaming output or ns_returnxxx.

-Andrew

On Sat, Apr 4, 2009 at 12:00 PM, Tom Jackson  wrote:
> Here is a test patch for the ns_adp_abort issue.
>
> The patch enables sending an error message in the case of an actual
> error during adp processing, or after a postauth filter (preauth errors
> already allow this behavior).
>
> Also, logging is enabled in all cases. If an error occurs, a 500
> response is sent and this is what is logged.
>
> I haven't tested this with ns_adp_break. But it works with ns_adp_return
> and ns_adp_abort as well as error handling in and adp.
>
> tom jackson
>
> Not sure if I can attach the patch, but here goes:
>
>
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to 
>  with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] missing access log entries

2009-04-06 Thread Andrew Steets
I got a chance to test this out this morning.  I don't understand what
it is supposed to fix.  I still don't get access log entries when
ns_adp_abort is called.

On Sun, Apr 5, 2009 at 12:52 PM, Tom Jackson  wrote:
> The attached patch fixes ns_adp_break, it differs from the previous
> patch by one line in  adpeval.c
>
> tom jackson
>
> On Sat, 2009-04-04 at 16:25 -0500, Andrew Steets wrote:
>> Hi Tom,
>>
>> Attachments seem to work ok on this list.
>
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to 
>  with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] missing access log entries

2009-04-09 Thread Andrew Steets
Hi Tom,  sorry to go dark for so long.  It was operator error.  I was
in a hurry and I don't think I restarted the server after I installed
the patched version of the server.

I checked again just now and everything works as expected.

-Andrew

On Mon, Apr 6, 2009 at 4:14 PM, Tom Jackson  wrote:
> Andrew,
>
> Hmmm, well without knowing how you tested this, I can't help much. I
> created a few test adp pages. I tested them before my changes to
> identify the problems.
>
> Here is an example set of pages:
> include.adp:
> <%
> ns_adp_puts "before include"
> ns_adp_include test-ns-return.adp
> ns_adp_puts "after include"
> ns_log Notice "finished include.adp"
> %>
> test-ns-return.adp:
> <%
> ns_return 200 text/plain hi
> ns_adp_abort
> ns_log Notice "test-ns-return.adp after ns_adp_abort"
> %>
>
> The error.log should contain neither of the Notice logs.
> The access.log should have a 200 response of content length 2.
>
> Even this produces an access.log entry:
>
> <%
> ns_adp_puts hi
> ns_adp_abort
> %>
>
> A zero length 200 response:
>
> 127.0.0.1 - - [06/Apr/2009:13:54:52 -0700] "GET /just-abort.adp
> HTTP/1.1" 200 0 "" ""
>
> Did you patch the other two files? (Note that my queue.c file is not
> identical to yours, so the patch needs to be applied by hand I think.)
>
> queue.c handles changes to allow logging during error conditions
>
> adprequest.c changes allows distinguishing between actual errors and adp
> signaling and translates Tcl return codes into AOLserver request return
> codes. Changes also ensure that the adp buffer is cleaned up in all
> cases.
>
> adpeval.c changes just ensure that the tcl error code is set to
> correspond to the ADP exception code. The code probably needs a comment
> because actual errors in ADP processing is signaled when the Tcl return
> code is TCL_OK and the adp exception code is ADP_OK. Why? Because on a
> tcl error, the ADP code doesn't get to change adp.exception to something
> else.
>
> The bugs in the current code were due to the awkward but necessary
> maintenance of these two return codes. The ADP code has gone through a
> lot of significant changes, so it is easy to see how these details
> didn't make it through correctly. But the simplicity of fixing them
> indicates that the code is in pretty good shape.
>
> Anyway, that is the bugs. The code in queue.c is not bug related, but
> allows the client to receive a 500 response on error and to allow
> logging during error conditions.
>
> Here is a link to my patch to my code:
>
> http://www.junom.com/gitweb/gitweb.perl?p=aolserver.git;a=commit;h=ca26f1a
>
> tom jackson
>
>
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] TLS 1.6 and Aolserver

2009-04-29 Thread Andrew Steets
Hello,

We don't use this TLS package at Wayport, but I have seen similar
errors with OpenSSL before in other applications.  I pulled the TLS
code and glanced through it.  It doesn't look like you have registered
the locking callbacks for openssl, which means any openssl calls are
not thread safe.  That's going to be a problem inside aolserver :-)

Check out InitOpenSSL() nsopenssl.c (in the nsopenssl module).  It
does all the basic stuff you need to get OpenSSL running in a
thread-safe manor.

Also:  http://openssl.org/docs/crypto/threads.html

If you 'info threads' and see other threads inside openssl crypto
functions this is almost certainly your problem.

HTH.

-Andrew

On Wed, Apr 29, 2009 at 5:29 PM, Jade Rubick  wrote:
> Jeff:
> Here is a backtrace of the crash with 1.6 stable. Did you need it from head?
> J
>
> Jade Rubick
>
> Director of Development
>
> TRUiST
>
> 120 Wall Street, 4th Floor
>
> New York, NY 10005 USA
>
> jrub...@truist.com
> +1 503 285 4963
>
> +1 707 671 1333 fax
>
> www.truist.com
>
> The information contained in this email/document is confidential and may be
> legally privileged. Access to this email/document by anyone other than the
> intended recipient(s) is unauthorized. If you are not an intended recipient,
> any disclosure, copying, distribution, or any action taken or omitted to be
> taken in reliance to it, is prohibited.
> Begin forwarded message:
>
> TLS BACKTRACE FROM 1.6 stable (without disabling DH)
> Complete backtrace:
> (gdb) bt
> #0  0xe410 in __kernel_vsyscall ()
> #1  0xb7cd4875 in raise () from /lib/tls/i686/cmov/libc.so.6
> #2  0xb7cd6201 in abort () from /lib/tls/i686/cmov/libc.so.6
> #3  0xb7ee7a4f in Tcl_PanicVA () from /usr/local/tcl/lib/libtcl8.4.so
> #4  0xb7ee7a77 in Tcl_Panic () from /usr/local/tcl/lib/libtcl8.4.so
> #5  0xb7ef6b4f in Ptr2Block () from /usr/local/tcl/lib/libtcl8.4.so
> #6  0xb7ef7117 in TclpFree () from /usr/local/tcl/lib/libtcl8.4.so
> #7  0xb7e9751d in Tcl_Free () from /usr/local/tcl/lib/libtcl8.4.so
> #8  0xb7f27251 in ns_free () from
> /usr/local/aolserver40r10/lib/libnsthread.so
> #9  0xb605c4aa in CRYPTO_free () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
> #10 0xb60890aa in BN_clear_free () from
> /usr/lib/i686/cmov/libcrypto.so.0.9.8
> #11 0xb60b0836 in DH_free () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
> #12 0xa1ffa1e5 in CTX_Init (statePtr=0x139ce5c0, proto=3, key=0x0, cert=0x0,
>     CAdir=0x0, CAfile=0x0, ciphers=0x0) at tls.c:1015
> #13 0xa1ff9a72 in ImportObjCmd (clientData=0x0, interp=0x16403240, objc=4,
>     objv=0xa97f96bc) at tls.c:800
> #14 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #15 0xb7e92987 in Tcl_EvalEx () from /usr/local/tcl/lib/libtcl8.4.so
> #16 0xb7e93635 in Tcl_EvalObjEx () from /usr/local/tcl/lib/libtcl8.4.so
> #17 0xb7e9a358 in Tcl_EvalObjCmd () from /usr/local/tcl/lib/libtcl8.4.so
> #18 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #19 0xb7ebf0db in TclExecuteByteCode () from /usr/local/tcl/lib/libtcl8.4.so
> #20 0xb7ec2dbc in TclCompEvalObj () from /usr/local/tcl/lib/libtcl8.4.so
> #21 0xb7eefd68 in TclObjInterpProc () from /usr/local/tcl/lib/libtcl8.4.so
> #22 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #23 0xb7e92987 in Tcl_EvalEx () from /usr/local/tcl/lib/libtcl8.4.so
> #24 0xb7e93635 in Tcl_EvalObjEx () from /usr/local/tcl/lib/libtcl8.4.so
> #25 0xb7e9a358 in Tcl_EvalObjCmd () from /usr/local/tcl/lib/libtcl8.4.so
> #26 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #27 0xb7ebf0db in TclExecuteByteCode () from /usr/local/tcl/lib/libtcl8.4.so
> #28 0xb7ec2dbc in TclCompEvalObj () from /usr/local/tcl/lib/libtcl8.4.so
> #29 0xb7eefd68 in TclObjInterpProc () from /usr/local/tcl/lib/libtcl8.4.so
> #30 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #31 0xb7ebf0db in TclExecuteByteCode () from /usr/local/tcl/lib/libtcl8.4.so
> #32 0xb7ec2dbc in TclCompEvalObj () from /usr/local/tcl/lib/libtcl8.4.so
> #33 0xb7e93539 in Tcl_EvalObjEx () from /usr/local/tcl/lib/libtcl8.4.so
> #34 0xb7e9fe07 in Tcl_IfObjCmd () from /usr/local/tcl/lib/libtcl8.4.so
> #35 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #36 0xb7e92987 in Tcl_EvalEx () from /usr/local/tcl/lib/libtcl8.4.so
> #37 0xb7edcccb in Tcl_FSEvalFile () from /usr/local/tcl/lib/libtcl8.4.so
> #38 0xb7ea5f16 in Tcl_SourceObjCmd () from /usr/local/tcl/lib/libtcl8.4.so
> #39 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #40 0xb7e92987 in Tcl_EvalEx () from /usr/local/tcl/lib/libtcl8.4.so
> #41 0xb7e93635 in Tcl_EvalObjEx () from /usr/local/tcl/lib/libtcl8.4.so
> #42 0xb7ee3cf1 in Tcl_NamespaceObjCmd () from
> /usr/local/tcl/lib/libtcl8.4.so
> #43 0xb7e923c3 in TclEvalObjvInternal () from
> /usr/local/tcl/lib/libtcl8.4.so
> #44 0xb7ebf0db in TclExecuteByteCode () from /usr/local/tcl/lib/libtcl8.4.so
> #45 0xb7ec2dbc in TclCompEvalO

[AOLSERVER] nsopenssl client file descriptor issues

2009-04-30 Thread Andrew Steets
Hello,

We recently discovered a problem with the nsopenssl ns_httpsXXX client
commands which was causing SSL close notify alerts (a.k.a. random
binary garbage) to be written to unrelated (non-ssl) file descriptors
in certain cases.  While we were trying to come up with a fix, we
stumbled across some other nsopenssl issues.

If you aren't using the nsopenssl *client* functionality this is
probably not interesting.  If you aren't interested in hacking the
nsopenssl code then you should realize that this may be a potential
source of frustration.  For anyone else, details follow.

All of the ns_https client TCL (https.tcl) commands eventually call
ns_openssl_sockopen to open an SSL connection to a server.
ns_openssl_sockopen, like ns_sockopen, returns two TCL channel ids,
one of which is for reading and the other for writing.  The TCL
channels are created in CreateTclChannel() in nsopenssl's tclcmds.c.
The channels are stored in a pair of structs with the following
definition:

typedef struct ChanInfo {
    NsOpenSSLConn   *sslconn;
    SOCKET   socket;
    Tcl_Channel  chan;
    void    *otherchaninfo;
} ChanInfo;

so the write chaninfo holds a pointer to the read chaninfo and vice
versa.  The channels are currently constructed such that the read
channel is associated with the original socket fd created for the ssl
connection, and the write channel is associated with another fd
dup()'ed from the original.  They are both associated with the same
NsOpenSSLConn struct, which itself holds the original socket fd as
well.

The channel close function, ChanCloseProc(), has to deal with this two
fd situation, and that is where we run into problems.  The close proc
will close the fd associated with whichever channel is being closed,
but will only shutdown the ssl connection when both channels have been
closed.

Here is the slightly edited close chan code:

static int
ChanCloseProc(ClientData arg, Tcl_Interp *interp)
{
    ChanInfo *chaninfo  = (ChanInfo *) arg;
    ChanInfo *otherchaninfo = NULL;

    Tcl_UnregisterChannel(interp, chaninfo->chan);
    ns_sockclose(chaninfo->socket);
    chaninfo->socket = INVALID_SOCKET;
    otherchaninfo = (ChanInfo *) chaninfo->otherchaninfo;

    if (otherchaninfo->socket == INVALID_SOCKET) {
    ns_free(otherchaninfo);
    NsOpenSSLConnDestroy(chaninfo->sslconn);
    ns_free(chaninfo);
    }

    return TCL_OK;
}

One problem is that the ns_sockclose() call precedes the
NsOpenSSLConnDestroy() call.  NsOpenSSLConnDestroy() calls
SSL_shutdown() on the file descriptor which was previously closed with
ns_sockclose().  SSL_shutdown() tries to write some ssl close notify
messages on the fd.  There is no way this can succeed because the fd
was already closed.  The error is siliently ignored.  Clearly the sock
close needs to come after NsOpenSSLConnDestroy().

But there is more.  Now we need to examine two possible cases.

Case 1: The write channel is closed before the read channel.  In this
case the dup fd is closed first, and the original FD is closed second.
 There is a teensy little race condition here.  After the
ns_sockclose() call, the OS may context switch to another thread which
may call open(), dup(), socket() or anything that gets a new FD.  It's
also possible that the FD that the OS returns for that call may have
been the one which was previously closed with ns_sockclose().  If we
then switch back to the original thread and call
NsOpenSSLConnDestroy() -> SSL_shutdown(), then we will end up writing
and reading on somebody else's file file descriptor!  This is
obviously bad, but the chances of this race actually occuring are
probably slim.

Case 2:  The read channel is closed before the write channel.  This is
the worst.  The original fd, the one in the NsOpenSSLConn struct is
closed, but NsOpenSSLConnDestroy is not called because the write
channel is still open and the sslconn * still holds the now invalid
fd.  Now we have a much larger window for that FD to be recycled by
the OS and we don't necessarily need an unlikely context switch to be
stung by the race.  The following ADP highlights this condition.

<%
set fds [ns_openssl_sockopen -nonblock www.att.com 443]

set rfd [lindex $fds 0]
set wfd [lindex $fds 1]

ns_adp_puts "rfd: $rfd"
ns_adp_puts "wfd: $wfd"

_ns_https_puts 5 $wfd "GET / HTTP/1.0\r"

close $rfd

set tmpfd [open /tmp/nsopenssl w]

ns_adp_puts "tmpfd: $tmpfd"

close $wfd
close $tmpfd
%>

If you look at /tmp/nsopenssl after running this ADP, you should see
some random binary garbage from the ssl shutdown writing on the wrong
fd.

Here is a patch that we are using.  It switches the code to use only
one FD (but still two TCL channels).  It should avoid any close/open
FD races.  I have only tested on Linux, not sure how it might affect
other platforms.  Sorry about the verbosity.

-Andrew


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" i

Re: [AOLSERVER] Fwd: [AOLSERVER] TLS 1.6 and Aolserver

2009-05-01 Thread Andrew Steets
It's not a matter of compiling OpenSSL to be thread safe.  Someone
needs to update the TLS C code to call the right OpenSSL API functions
on module initialization.  In it's current state I don't see how the
TLS module can safely call OpenSSL from a threaded context.

>From the Openssl docs:
http://openssl.org/docs/crypto/threads.html#DESCRIPTION

"OpenSSL can safely be used in multi-threaded applications provided
that at least two callback functions are set, locking_function and
threadid_func."

The TLS C code doesn't setup either one of those callbacks, so that's
a problem.  I'm not sure if that is your problem specifically but it
would be a good place to start.

-Andrew

On Fri, May 1, 2009 at 12:59 PM, Jade Rubick  wrote:
>
> Jade Rubick
> Director of Development
> TRUiST
> 120 Wall Street, 4th Floor
> New York, NY USA
> jrub...@truist.com
> +1 503 285 4963
> +1 707 671 1333 fax
>
> www.truist.com
>
>
> The information contained in this email/document is confidential and may be
> legally privileged. Access to this  mail/document by anyone other than the
> intended recipient(s) is unauthorized. If you are not an intended recipient,
> any disclosure, copying, distribution, or any action taken or omitted to be
> taken in reliance to it, is prohibited.
>
>
> -- Forwarded message --
> From: Jack Schmidt 
> Date: Thu, Apr 30, 2009 at 4:03 PM
> Subject: Re: [AOLSERVER] TLS 1.6 and Aolserver
> To: Jade Rubick 
> Cc: tech 
>
>
> I just tried it by recompiling openssl with threads as compiler option and
> it produces the same problem.  Maybe there's another way of making openssl
> thread safe.  Not sure as of the moment.
>
> 2009/5/1 Jack Schmidt 
>>
>> It's certainly a possibility.  Since I'm also trying to debianize openssl
>> from Gutsy with debug symbols, we can easily slip in a threaded build.
>>
>> 2009/5/1 Jade Rubick 
>>>
>>> Maybe we didn't compile openssl to be threadsafe?
>>> J
>>>
>>> Jade Rubick
>>>
>>> Director of Development
>>>
>>> TRUiST
>>>
>>> 120 Wall Street, 4th Floor
>>>
>>> New York, NY 10005 USA
>>>
>>> jrub...@truist.com
>>> +1 503 285 4963
>>>
>>> +1 707 671 1333 fax
>>>
>>> www.truist.com
>>>
>>> The information contained in this email/document is confidential and may
>>> be legally privileged. Access to this email/document by anyone other than
>>> the intended recipient(s) is unauthorized. If you are not an intended
>>> recipient, any disclosure, copying, distribution, or any action taken or
>>> omitted to be taken in reliance to it, is prohibited.
>>> Begin forwarded message:
>>>
>>> From: Andrew Steets 
>>> Date: April 29, 2009 6:16:14 PM PDT
>>> To: AOLSERVER@LISTSERV.AOL.COM
>>> Subject: Re: [AOLSERVER] TLS 1.6 and Aolserver
>>> Reply-To: AOLserver Discussion 
>>> Hello,
>>>
>>> We don't use this TLS package at Wayport, but I have seen similar
>>> errors with OpenSSL before in other applications.  I pulled the TLS
>>> code and glanced through it.  It doesn't look like you have registered
>>> the locking callbacks for openssl, which means any openssl calls are
>>> not thread safe.  That's going to be a problem inside aolserver :-)
>>>
>>> Check out InitOpenSSL() nsopenssl.c (in the nsopenssl module).  It
>>> does all the basic stuff you need to get OpenSSL running in a
>>> thread-safe manor.
>>>
>>> Also:  http://openssl.org/docs/crypto/threads.html
>>>
>>> If you 'info threads' and see other threads inside openssl crypto
>>> functions this is almost certainly your problem.
>>>
>>> HTH.
>>>
>>> -Andrew
>>>
>>> On Wed, Apr 29, 2009 at 5:29 PM, Jade Rubick  wrote:
>>>
>>> Jeff:
>>>
>>> Here is a backtrace of the crash with 1.6 stable. Did you need it from
>>> head?
>>>
>>> J
>>>
>>> Jade Rubick
>>>
>>> Director of Development
>>>
>>> TRUiST
>>>
>>> 120 Wall Street, 4th Floor
>>>
>>> New York, NY 10005 USA
>>>
>>> jrub...@truist.com
>>>
>>> +1 503 285 4963
>>>
>>> +1 707 671 1333 fax
>>>
>>> www.truist.com
>>>
>>> The information contained in this email/document is confidential and may
>>> be
>>>
>>> lega

Re: [AOLSERVER] nsopenssl client file descriptor issues

2009-05-01 Thread Andrew Steets
I haven't been able to reproduce the crashing... I tried beta26 and
beta27.  It works out of the box for me.

-Andrew

On Fri, May 1, 2009 at 9:58 AM, Tom Jackson  wrote:
> Andrew,
>
> Do you have any up-to-date instructions on compiling nsopenssl? For some
> reason I'm getting a segfault the instant I try to use the client
> ns_httpspost.
>
> I'm think it is related to the linux distribution, but the crash isn't
> the random problem you are seeing.
>
> Thanks,
>
> tom jackson
>
> On Thu, 2009-04-30 at 17:59 -0500, Andrew Steets wrote:
>> Hello,
>>
>> We recently discovered a problem with the nsopenssl ns_httpsXXX client
>> commands which was causing SSL close notify alerts (a.k.a. random
>> binary garbage) to be written to unrelated (non-ssl) file descriptors
>> in certain cases.  While we were trying to come up with a fix, we
>> stumbled across some other nsopenssl issues.
>>
>> If you aren't using the nsopenssl *client* functionality this is
>> probably not interesting.  If you aren't interested in hacking the
>> nsopenssl code then you should realize that this may be a potential
>> source of frustration.  For anyone else, details follow.
>>
>> All of the ns_https client TCL (https.tcl) commands eventually call
>> ns_openssl_sockopen to open an SSL connection to a server.
>> ns_openssl_sockopen, like ns_sockopen, returns two TCL channel ids,
>> one of which is for reading and the other for writing.  The TCL
>> channels are created in CreateTclChannel() in nsopenssl's tclcmds.c.
>> The channels are stored in a pair of structs with the following
>> definition:
>>
>> typedef struct ChanInfo {
>>     NsOpenSSLConn   *sslconn;
>>     SOCKET           socket;
>>     Tcl_Channel      chan;
>>     void            *otherchaninfo;
>> } ChanInfo;
>>
>> so the write chaninfo holds a pointer to the read chaninfo and vice
>> versa.  The channels are currently constructed such that the read
>> channel is associated with the original socket fd created for the ssl
>> connection, and the write channel is associated with another fd
>> dup()'ed from the original.  They are both associated with the same
>> NsOpenSSLConn struct, which itself holds the original socket fd as
>> well.
>>
>> The channel close function, ChanCloseProc(), has to deal with this two
>> fd situation, and that is where we run into problems.  The close proc
>> will close the fd associated with whichever channel is being closed,
>> but will only shutdown the ssl connection when both channels have been
>> closed.
>>
>> Here is the slightly edited close chan code:
>>
>> static int
>> ChanCloseProc(ClientData arg, Tcl_Interp *interp)
>> {
>>     ChanInfo *chaninfo      = (ChanInfo *) arg;
>>     ChanInfo *otherchaninfo = NULL;
>>
>>     Tcl_UnregisterChannel(interp, chaninfo->chan);
>>     ns_sockclose(chaninfo->socket);
>>     chaninfo->socket = INVALID_SOCKET;
>>     otherchaninfo = (ChanInfo *) chaninfo->otherchaninfo;
>>
>>     if (otherchaninfo->socket == INVALID_SOCKET) {
>>         ns_free(otherchaninfo);
>>         NsOpenSSLConnDestroy(chaninfo->sslconn);
>>         ns_free(chaninfo);
>>     }
>>
>>     return TCL_OK;
>> }
>>
>> One problem is that the ns_sockclose() call precedes the
>> NsOpenSSLConnDestroy() call.  NsOpenSSLConnDestroy() calls
>> SSL_shutdown() on the file descriptor which was previously closed with
>> ns_sockclose().  SSL_shutdown() tries to write some ssl close notify
>> messages on the fd.  There is no way this can succeed because the fd
>> was already closed.  The error is siliently ignored.  Clearly the sock
>> close needs to come after NsOpenSSLConnDestroy().
>>
>> But there is more.  Now we need to examine two possible cases.
>>
>> Case 1: The write channel is closed before the read channel.  In this
>> case the dup fd is closed first, and the original FD is closed second.
>>  There is a teensy little race condition here.  After the
>> ns_sockclose() call, the OS may context switch to another thread which
>> may call open(), dup(), socket() or anything that gets a new FD.  It's
>> also possible that the FD that the OS returns for that call may have
>> been the one which was previously closed with ns_sockclose().  If we
>> then switch back to the original thread and call
>> NsOpenSSLConnDestroy() -> SSL_shutdown(), then we will end up writing
>> and reading on somebody else's file file descriptor!  This is
>> 

[AOLSERVER] Oracle driver update and SF question

2010-08-25 Thread Andrew Steets
Hello,

I just checked in a patch for the Oracle driver that fixes a crash bug we
were seeing on some of our servers.  Anyone running a relatively recent
(last two years) version of the Oracle driver may want to switch.

I saw some e-mail a while ago about switching to GitHub, but I don't see any
of the modules on GitHub.  I have some other nsoracle patches (eg. log
warning and query text for queries running > X seconds) that would probably
be better suited to a private fork or branch.  Any plans to migrate the
modules to GitHub?

-Andrew


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] Oracle driver update and SF question

2010-08-26 Thread Andrew Steets
There was an uniititialized int * being passed into OCI in the code that
handles 'ns_ora exec_plsql'.

2.7 is nearly 6 years old now.  If you can post the backtrace of a crash w/
the most recent nsoracle from CVS I'm happy to take a look at it.  Are you
able to reliably reproduce the crash?

-Andrew

On Wed, Aug 25, 2010 at 6:15 PM, Sep Ng  wrote:

> Hi,
>
> Can I ask what nature of the crash this fixes?  I remember trying
> nsoracle 2.8 fork and it was extremely unstable and had to rollback to
> 2.7.  Is this on the CVS tree now?
>
> Thanks!
>
> On Aug 26, 4:56 am, Andrew Steets  wrote:
> > Hello,
> >
> > I just checked in a patch for the Oracle driver that fixes a crash bug we
> > were seeing on some of our servers.  Anyone running a relatively recent
> > (last two years) version of the Oracle driver may want to switch.
> >
> > I saw some e-mail a while ago about switching to GitHub, but I don't see
> any
> > of the modules on GitHub.  I have some other nsoracle patches (eg. log
> > warning and query text for queries running > X seconds) that would
> probably
> > be better suited to a private fork or branch.  Any plans to migrate the
> > modules to GitHub?
> >
> > -Andrew
> >
> > --
> > AOLserver -http://www.aolserver.com/
> >
> > To Remove yourself from this list, simply send an email to <
> lists...@listserv.aol.com> with the
> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to <
> lists...@listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLServer Terminal Escape Sequence in Logs Command Injection Vulnerability

2010-09-09 Thread Andrew Steets
The exploit works like this:

1) Attacker sends HTTP request with ANSI escape sequence embedded in URL
2) Escape sequence is logged to access log.
3) Administrator on web server views log via cat, tail, etc.'
4) Escape sequences are interpreted by terminal emulator.

In the case of extremely braindead terminal emulators, this can result in
arbitrary command execution.  The example in the SecurityFocus link sends an
escape sequence which changes the window title in most common terminal
emulators.  A more comprehensive overview of terminal emulator security
issues is available here: http://marc.info/?l=bugtraq&m=104612710031920

Some subtle (?) points:
1) The "remote" exploit actually occurs on the host running the terminal
emulator, not the web server.
2) Most terminal emulators do not support arbitrary command execution via
escape sequences.

-Andrew



On Thu, Sep 9, 2010 at 9:47 AM, Jade Rubick  wrote:

> Did I read this correctly: this is a remotely exploitable?
>
> Jade
>
> *
> Jade Rubick *|* *Director of Development | *TRU**i**ST*
> 2201 Wisconsin Ave NW, Suite 250 | Washington, DC 20007 | *www.truist.com* |
> +1 202 903 2564
>
> P Please consider the environment before printing
>
> The information contained in this email/document is confidential and may be
> legally privileged. Access to this email/document by anyone other than the
> intended recipient(s) is unauthorized. If you are not an intended recipient,
> any disclosure, copying, distribution, or any action taken or omitted to be
> taken in reliance to it, is prohibited.
>
>
>
>
>
> On Sep 9, 2010, at 5:41 AM, Dossy Shiobara wrote:
>
> As a short-term solution, this is probably adequate, but there's
> information loss -- it'd be nice to indicate the original byte sequence
> somehow in the log entry by escaping characters so that log analysis
> tools could detect such attacks, etc.
>
> Perhaps the right answer is to log the URI with proper URL-encoding, so
> that it would be logged as %1B instead of the literal byte.
>
>
> On 9/9/10 8:18 AM, Gustaf Neumann wrote:
>
>
> i have just now committed a quick fix for the problem into the
>
> aolserver/nslog/nslog.c
>
> into the sourceforge module. please check, if this is in all cases
>
> sufficient.
>
>
> --
> Dossy Shiobara  | do...@panoptic.com | http://dossy.org/
> Panoptic Computer Network   | http://panoptic.com/
>  "He realized the fastest way to change is to laugh at your own
>folly -- then you can let go and quickly move on." (p. 70)
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to <
> lists...@listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>
>
>
> --
> AOLserver - http://www.aolserver.com/
>
>
> To Remove yourself from this list, simply send an email to 
>  with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.
>
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.