Changed subject heading. See more of what I have uncovered below. Not sure where to go next.
Graham Dumpleton wrote .. > > > Unlike suggestions by someone else that "self" seemed to be getting > corrupted, > > > it looks fine to me, and code simply crashed down in: > > > > > > apr_bucket_read(b, &data, &size, APR_BLOCK_READ) > > > > > > on very first call to it. Thus need to start tracking into Apache itself > > and see what > > > there may be about bucket structures that isn't correct. This is where > > I got to > > > last time before I gave up, feeling it wasn't worth the effort at the > > time. I'll try > > > and build a version of Apache with debug so I can get a better stack > > trace. > > > > The first thing I'd check is for validity of b. Buckets use reference > > counting much like Python, so sometimes it's possible for a bucket to > > "self-distruct". > > Starting to delve into the bucket now. Haven't looked at reference count > stuff yet, but the b->type object seems to be bogus. This is where the > read() function pointer is kept and since it is a bad value it is why it > dies. This is starting to look really ugly. In _conn_read(), it first creates a bucket brigade from the connection objects pool object. No chance of this being destroyed prematurely as a result. bb = apr_brigade_create(c->pool, c->bucket_alloc); >From what I understand, it then makes a call which links the bucket brigade to the actual source of data. rc = ap_get_brigade(c->input_filters, bb, mode, APR_BLOCK_READ, bufsize); Under normal circumstances this would also have the side effect of performing the first actual read of data off the socket connection which the client created to Apache. Import things here to note are the value of: c->input_filters->frec->filter_func.in_func going into the call. Not sure exactly, but I imagine that this is the first input filter which handles reading from the socket. My logging shows the address of the input filter in memory as 178456. When ap_get_brigade() returns okay, the first actual bucket from the bucket brigade is obatained: b = APR_BRIGADE_FIRST(bb); There are two interesting values in the bucket worth looking at: b->type->name b->type->read The first is the type of bucket object and the second is the pointer to a function to read data from the bucket. My logging shows the type of bucket as being "HEAP" and the address of the read function pointer as 1819356. I will not go into the rest of the function except to say that as necessary it may do additional reads using apr_bucket_read() to get more data if required when that initially read by ap_get_brigade() isn't enough. Anyway, the above is when it is working okay. This being when I have the connection handler attached to my primary listener port. As soon as I add into the main Apache configuration file an additional socket for Apache to listen on, ie., when I add: Listen 8081 it will crash in _conn_read() no matter whether I have attached the connection handler to the primary listener port or the additional listener port. In contrast to the above, when it dies, the address of the input filter in memory is still 178456, but the initial bucket in the bucket brigade as populated by ap_get_brigade() is bogus. Ie., I get for the name crap like: \x01\x80b\x18\x01\x8f\xec\x18\x01\x83b\x18\x01\x80b\x1c\x01\x8f\xcc\xb8 and the address of the read function is 88. Importantly, the ap_get_brigade() function does not block on a read waiting for the first data coming over the socket like it did before. With the bogus bucket returned, when apr_bucket_read() is later called, it tries to use the read function in the initial bucket which being bogus causes the crash. Thus in summary, with a secondary listener port the ap_get_brigade() function doesn't block on read waiting for first data, returning immediately, but still seeming to return success. The initial bucket in the bucket brigade then seems to be bogus. What I might speculate is that if the test in mod_python for the connection handler is setup to run on a secondary listener port, but with the primary still active, that it may trigger the problem on other systems like Linux. Jim, you might want to try this and see if you can duplicate it on Linux. BTW, I am not saying this is the same problem on the BSD systems, but it certainly is not correct either way. Graham