Changed subject heading. See more of what I have uncovered below.
Not sure where to go next.

Graham Dumpleton wrote ..
> > > Unlike suggestions by someone else that "self" seemed to be getting
> corrupted,
> > > it looks fine to me, and code simply crashed down in:
> > >
> > >  apr_bucket_read(b, &data, &size, APR_BLOCK_READ)
> > >
> > > on very first call to it. Thus need to start tracking into Apache itself
> > and see what
> > > there may be about bucket structures that isn't correct. This is where
> > I got to
> > > last time before I gave up, feeling it wasn't worth the effort at the
> > time. I'll try
> > > and build a version of Apache with debug so I can get a better stack
> > trace.
> > 
> > The first thing I'd check is for validity of b. Buckets use reference
> > counting much like Python, so sometimes it's possible for a bucket to
> > "self-distruct".
> 
> Starting to delve into the bucket now. Haven't looked at reference count
> stuff yet, but the b->type object seems to be bogus. This is where the
> read() function pointer is kept and since it is a bad value it is why it
> dies.

This is starting to look really ugly.

In _conn_read(), it first creates a bucket brigade from the connection
objects pool object. No chance of this being destroyed prematurely
as a result.

    bb = apr_brigade_create(c->pool, c->bucket_alloc);

>From what I understand, it then makes a call which links the bucket
brigade to the actual source of data.

    rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize);

Under normal circumstances this would also have the side effect of
performing the first actual read of data off the socket connection which
the client created to Apache.

Import things here to note are the value of:

  c->input_filters->frec->filter_func.in_func

going into the call. Not sure exactly, but I imagine that this is the first
input filter which handles reading from the socket.

My logging shows the address of the input filter in memory as 178456.

When ap_get_brigade() returns okay, the first actual bucket from the
bucket brigade is obatained:

    b = APR_BRIGADE_FIRST(bb);

There are two interesting values in the bucket worth looking at:

    b->type->name
    b->type->read

The first is the type of bucket object and the second is the pointer to a
function to read data from the bucket.

My logging shows the type of bucket as being "HEAP" and the address
of the read function pointer as 1819356.

I will not go into the rest of the function except to say that as necessary
it may do additional reads using apr_bucket_read() to get more data
if required when that initially read by ap_get_brigade() isn't enough.

Anyway, the above is when it is working okay. This being when I have the
connection handler attached to my primary listener port. As soon as I
add into the main Apache configuration file an additional socket for
Apache to listen on, ie., when I add:

  Listen 8081

it will crash in _conn_read() no matter whether I have attached the
connection handler to the primary listener port or the additional
listener port.

In contrast to the above, when it dies, the address of the input filter
in memory is still 178456, but the initial bucket in the bucket brigade
as populated by ap_get_brigade() is bogus. Ie., I get for the name crap
like:

  \x01\x80b\x18\x01\x8f\xec\x18\x01\x83b\x18\x01\x80b\x1c\x01\x8f\xcc\xb8

and the address of the read function is 88.

Importantly, the ap_get_brigade() function does not block on a read
waiting for the first data coming over the socket like it did before.

With the bogus bucket returned, when apr_bucket_read() is later called,
it tries to use the read function in the initial bucket which being bogus
causes the crash.

Thus in summary, with a secondary listener port the ap_get_brigade()
function doesn't block on read waiting for first data, returning
immediately, but still seeming to return success. The initial bucket
in the bucket brigade then seems to be bogus.

What I might speculate is that if the test in mod_python for the
connection handler is setup to run on a secondary listener port,
but with the primary still active, that it may trigger the problem
on other systems like Linux. Jim, you might want to try this and see
if you can duplicate it on Linux.

BTW, I am not saying this is the same problem on the BSD systems,
but it certainly is not correct either way.

Graham

Reply via email to