Today I stubled accross some rather weird intermittent segfaults / core dumps 
with trunk
plus APR trunk that get created when running the perl test framework. Below is 
the stack trace:

#0  0x0000002a95f7f829 in kill () from /lib64/tls/libc.so.6
#1  <signal handler called>
#2  ap_ident_lookup (r=0x2a9cc9fd50) at mod_ident.c:314
#3  0x0000002a9899aeb9 in log_remote_logname (r=0x2a9cc9fd50, a=Variable "a" is 
not available.
) at mod_log_config.c:315
#4  0x0000002a9899be6a in config_log_transaction (r=0x2a9cc9fd50, 
cls=0x552ad88fe0, default_format=Variable "default_format" is not available.
) at mod_log_config.c:914
#5  0x0000002a9899bf49 in multi_log_transaction (r=0x2a9cc9fd50) at 
mod_log_config.c:1008
#6  0x000000552aadccd1 in ap_run_log_transaction (r=0x2a9cc9fd50) at 
protocol.c:1688
#7  0x000000552aae7e82 in eor_bucket_cleanup (data=Variable "data" is not 
available.
) at eor_bucket.c:35
#8  0x0000002a957abb8e in run_cleanups (cref=0x2a9cc9fc70) at 
memory/unix/apr_pools.c:1993
#9  0x0000002a957ac3d9 in apr_pool_destroy (pool=0x2a9cc9fc50) at 
memory/unix/apr_pools.c:766
#10 0x000000552aae858d in remove_empty_buckets (bb=0x2a9cc305b0) at 
core_filters.c:658
#11 0x000000552aae8ad8 in send_brigade_nonblocking (s=0x552ae86410, 
bb=0x2a9cc305b0, bytes_written=0x2a9cc30390, c=0x2a9cc03e40)
    at core_filters.c:647
#12 0x000000552aae8c87 in send_brigade_blocking (s=0x552ae86410, 
bb=0x2a9cc305b0, bytes_written=0x2a9cc30390, c=0x2a9cc03e40)
    at core_filters.c:671
#13 0x000000552aae8e30 in ap_core_output_filter (f=0x2a9cc03f90, 
new_bb=Variable "new_bb" is not available.
) at core_filters.c:472
#14 0x000000552aaf77e1 in ap_process_request (r=0x2a9cc9fd50) at 
http_request.c:352
#15 0x000000552aaf4cb5 in ap_process_http_connection (c=0x2a9cc03e40) at 
http_core.c:194
#16 0x000000552aaf0cf2 in ap_run_process_connection (c=0x2a9cc03e40) at 
connection.c:41
#17 0x000000552aafd841 in worker_thread (thd=0x552adc23f0, dummy=Variable 
"dummy" is not available.
) at worker.c:560
#18 0x0000002a95c1c10a in start_thread () from /lib64/tls/libpthread.so.0
#19 0x0000002a960178c3 in clone () from /lib64/tls/libc.so.6
#20 0x0000000000000000 in ?? ()

Jumping to frame 2 shows the following:

(gdb) frame 2
#2  ap_ident_lookup (r=0x2a9cc9fd50) at mod_ident.c:314
314         if (!(conf->do_rfc1413 & ~RFC1413_UNSET)) {
(gdb) p r->per_dir_config
$2 = (struct ap_conf_vector_t *) 0x2a9cc04010
(gdb) x/20 0x2a9cc04010
0x2a9cc04010:   0x61636f6c      0x736f686c      0x6f6c2e74      0x646c6163
0x2a9cc04020:   0x69616d6f      0x0000006e      0x00000000      0x00000000
0x2a9cc04030:   0x00000000      0x00000000      0x00000000      0x00000000
0x2a9cc04040:   0x00000000      0x00000000      0x00000000      0x00000000
0x2a9cc04050:   0x00000000      0x00000000      0x00000000      0x00000000
(gdb) p *r
$3 = {pool = 0x2a9cc9fc50, connection = 0x2a9cc03e40, server = 0x552aeadbf0, 
next = 0x0, prev = 0x0, main = 0x0,
  the_request = 0x2a9ccad0c0 "GET /modules/negotiation/en/compressed/ 
HTTP/1.0", assbackwards = 0, proxyreq = 0, header_only = 0,
  protocol = 0x2a9ccad1d0 "HTTP/1.0", proto_num = 1000, hostname = 0x2a9ccad3b0 
"localhost.localdomain", request_time = 1239099455443124,
  status_line = 0x552ab070e6 "200 OK", status = 200, method = 0x2a9ccad120 
"GET", method_number = 0, allowed = 1, allowed_xmethods = 0x0,
  allowed_methods = 0x2a9cc03fc0, sent_bodyct = 1, bytes_sent = 17, mtime = 
1218545841000000, chunked = 0, range = 0x0, clength = 17,
  remaining = 0, read_length = 0, read_body = 0, read_chunked = 0, 
expecting_100 = 0, headers_in = 0x2a9cca0060,
  headers_out = 0x2a9cc2fd00, err_headers_out = 0x2a9cc06df0, subprocess_env = 
0x2a9cc06f20, notes = 0x2a9cc06b20,
  content_type = 0x552af34030 "text/html", handler = 0x0, content_encoding = 
0x0, content_languages = 0x2a9cca2a80,
  vlist_validator = 0x0, user = 0x0, ap_auth_type = 0x0, no_cache = 0, 
no_local_copy = 0,
  unparsed_uri = 0x2a9ccad170 "/modules/negotiation/en/compressed/", uri = 
0x2a9cca2600 "`'?\234*", filename = 0x2a9cca2640 "`'?\234*",
  canonical_filename = 0x2a9cca2550 
"/home/pluem/apache/perl-framework/t/htdocs/modules/negotiation/en/compressed/index.html.fr",
  path_info = 0x0, args = 0x0, finfo = {pool = 0x2a9cca1010, valid = 7598960, 
protection = 1604, filetype = APR_REG, user = 505,
    group = 100, inode = 264340, device = 64769, nlink = 1, size = 17, csize = 
4096, atime = 1239098874000000, mtime = 1218545841000000,
    ctime = 1218545841000000,
    fname = 0x2a9cca2550 
"/home/pluem/apache/perl-framework/t/htdocs/modules/negotiation/en/compressed/index.html.fr",
 name = 0x0,
    filehand = 0x0}, parsed_uri = {scheme = 0x0, hostinfo = 0x0, user = 0x0, 
password = 0x0, hostname = 0x0,
    port_str = 0x2a9ccad3d9 "8529", path = 0x2a9ccad1a0 
"/modules/negotiation/en/compressed/", query = 0x0, fragment = 0x0,
    hostent = 0x0, port = 8529, is_initialized = 1, dns_looked_up = 0, 
dns_resolved = 0}, used_path_info = 2,
  per_dir_config = 0x2a9cc04010, request_config = 0x2a9cc00fa0, htaccess = 0x0, 
output_filters = 0x2a9ccacf10,
  input_filters = 0x2a9ccad3f0, proto_output_filters = 0x2a9ccacf10, 
proto_input_filters = 0x2a9ccad3f0, eos_sent = 1, kept_body = 0x0,
  invoke_mtx = 0x2a9cc036a0, body_table = 0x0}

So r->per_dir_config seems to be corrupt which finally causes the segfault. 
Furthermore it can be noted that
r->uri / r->filename are also corrupted as they only contain garbage.
The further data of the request_rec shows that this request was "fast interally 
redirected" (ap_internal_fast_redirect)
by mod_dir (see r->the_request / r->canonical_filename).
I think the reason for this behaviour is the following:

1. The subrequest created by mod_dir uses a subpool of r->pool for its 
allocations.
2. ap_internal_fast_redirect uses the data allocated out of this subpool for r 
(especially uri / filename / per_dir_config)
3. The call to apr_pool_destroy in frame 9 destroys the request pool of r.
4. ap_ident_lookup in frame 2 is called by a cleanup of r->pool and uses data 
in r (r->per_dir_config in this case),
   but the subpools of r->pool are already destroyed as the cleanups run after 
the destruction of the subpools.
   ==> BOOM.

The code has been this way for a long time. Why didn't we see this earlier?

IMHO this is exposed by the new pool implementation in APR trunk (note only 
exposed, not caused).
As we now free all memory used by the subpools via free the chance increased 
that this memory is
reused for other purposes. In the old implementation we only returned the 
memory to the allocator
and as long as r->pool had a large enough memory chunk further allocation for 
r->pool were done
out of this and none of the memory in the allocator free list was touched.

How to solve?

I don't know. This is why I write this mail :-).

Seriously, I guess we would need to prevent the destruction of the subrequest 
subpool or we would need
to run all our cleanups for the request as pre cleanups. Both seem to be no 
usable approaches.


Regards

Rüdiger

Reply via email to