Today I stubled accross some rather weird intermittent segfaults / core dumps with trunk plus APR trunk that get created when running the perl test framework. Below is the stack trace:
#0 0x0000002a95f7f829 in kill () from /lib64/tls/libc.so.6 #1 <signal handler called> #2 ap_ident_lookup (r=0x2a9cc9fd50) at mod_ident.c:314 #3 0x0000002a9899aeb9 in log_remote_logname (r=0x2a9cc9fd50, a=Variable "a" is not available. ) at mod_log_config.c:315 #4 0x0000002a9899be6a in config_log_transaction (r=0x2a9cc9fd50, cls=0x552ad88fe0, default_format=Variable "default_format" is not available. ) at mod_log_config.c:914 #5 0x0000002a9899bf49 in multi_log_transaction (r=0x2a9cc9fd50) at mod_log_config.c:1008 #6 0x000000552aadccd1 in ap_run_log_transaction (r=0x2a9cc9fd50) at protocol.c:1688 #7 0x000000552aae7e82 in eor_bucket_cleanup (data=Variable "data" is not available. ) at eor_bucket.c:35 #8 0x0000002a957abb8e in run_cleanups (cref=0x2a9cc9fc70) at memory/unix/apr_pools.c:1993 #9 0x0000002a957ac3d9 in apr_pool_destroy (pool=0x2a9cc9fc50) at memory/unix/apr_pools.c:766 #10 0x000000552aae858d in remove_empty_buckets (bb=0x2a9cc305b0) at core_filters.c:658 #11 0x000000552aae8ad8 in send_brigade_nonblocking (s=0x552ae86410, bb=0x2a9cc305b0, bytes_written=0x2a9cc30390, c=0x2a9cc03e40) at core_filters.c:647 #12 0x000000552aae8c87 in send_brigade_blocking (s=0x552ae86410, bb=0x2a9cc305b0, bytes_written=0x2a9cc30390, c=0x2a9cc03e40) at core_filters.c:671 #13 0x000000552aae8e30 in ap_core_output_filter (f=0x2a9cc03f90, new_bb=Variable "new_bb" is not available. ) at core_filters.c:472 #14 0x000000552aaf77e1 in ap_process_request (r=0x2a9cc9fd50) at http_request.c:352 #15 0x000000552aaf4cb5 in ap_process_http_connection (c=0x2a9cc03e40) at http_core.c:194 #16 0x000000552aaf0cf2 in ap_run_process_connection (c=0x2a9cc03e40) at connection.c:41 #17 0x000000552aafd841 in worker_thread (thd=0x552adc23f0, dummy=Variable "dummy" is not available. ) at worker.c:560 #18 0x0000002a95c1c10a in start_thread () from /lib64/tls/libpthread.so.0 #19 0x0000002a960178c3 in clone () from /lib64/tls/libc.so.6 #20 0x0000000000000000 in ?? () Jumping to frame 2 shows the following: (gdb) frame 2 #2 ap_ident_lookup (r=0x2a9cc9fd50) at mod_ident.c:314 314 if (!(conf->do_rfc1413 & ~RFC1413_UNSET)) { (gdb) p r->per_dir_config $2 = (struct ap_conf_vector_t *) 0x2a9cc04010 (gdb) x/20 0x2a9cc04010 0x2a9cc04010: 0x61636f6c 0x736f686c 0x6f6c2e74 0x646c6163 0x2a9cc04020: 0x69616d6f 0x0000006e 0x00000000 0x00000000 0x2a9cc04030: 0x00000000 0x00000000 0x00000000 0x00000000 0x2a9cc04040: 0x00000000 0x00000000 0x00000000 0x00000000 0x2a9cc04050: 0x00000000 0x00000000 0x00000000 0x00000000 (gdb) p *r $3 = {pool = 0x2a9cc9fc50, connection = 0x2a9cc03e40, server = 0x552aeadbf0, next = 0x0, prev = 0x0, main = 0x0, the_request = 0x2a9ccad0c0 "GET /modules/negotiation/en/compressed/ HTTP/1.0", assbackwards = 0, proxyreq = 0, header_only = 0, protocol = 0x2a9ccad1d0 "HTTP/1.0", proto_num = 1000, hostname = 0x2a9ccad3b0 "localhost.localdomain", request_time = 1239099455443124, status_line = 0x552ab070e6 "200 OK", status = 200, method = 0x2a9ccad120 "GET", method_number = 0, allowed = 1, allowed_xmethods = 0x0, allowed_methods = 0x2a9cc03fc0, sent_bodyct = 1, bytes_sent = 17, mtime = 1218545841000000, chunked = 0, range = 0x0, clength = 17, remaining = 0, read_length = 0, read_body = 0, read_chunked = 0, expecting_100 = 0, headers_in = 0x2a9cca0060, headers_out = 0x2a9cc2fd00, err_headers_out = 0x2a9cc06df0, subprocess_env = 0x2a9cc06f20, notes = 0x2a9cc06b20, content_type = 0x552af34030 "text/html", handler = 0x0, content_encoding = 0x0, content_languages = 0x2a9cca2a80, vlist_validator = 0x0, user = 0x0, ap_auth_type = 0x0, no_cache = 0, no_local_copy = 0, unparsed_uri = 0x2a9ccad170 "/modules/negotiation/en/compressed/", uri = 0x2a9cca2600 "`'?\234*", filename = 0x2a9cca2640 "`'?\234*", canonical_filename = 0x2a9cca2550 "/home/pluem/apache/perl-framework/t/htdocs/modules/negotiation/en/compressed/index.html.fr", path_info = 0x0, args = 0x0, finfo = {pool = 0x2a9cca1010, valid = 7598960, protection = 1604, filetype = APR_REG, user = 505, group = 100, inode = 264340, device = 64769, nlink = 1, size = 17, csize = 4096, atime = 1239098874000000, mtime = 1218545841000000, ctime = 1218545841000000, fname = 0x2a9cca2550 "/home/pluem/apache/perl-framework/t/htdocs/modules/negotiation/en/compressed/index.html.fr", name = 0x0, filehand = 0x0}, parsed_uri = {scheme = 0x0, hostinfo = 0x0, user = 0x0, password = 0x0, hostname = 0x0, port_str = 0x2a9ccad3d9 "8529", path = 0x2a9ccad1a0 "/modules/negotiation/en/compressed/", query = 0x0, fragment = 0x0, hostent = 0x0, port = 8529, is_initialized = 1, dns_looked_up = 0, dns_resolved = 0}, used_path_info = 2, per_dir_config = 0x2a9cc04010, request_config = 0x2a9cc00fa0, htaccess = 0x0, output_filters = 0x2a9ccacf10, input_filters = 0x2a9ccad3f0, proto_output_filters = 0x2a9ccacf10, proto_input_filters = 0x2a9ccad3f0, eos_sent = 1, kept_body = 0x0, invoke_mtx = 0x2a9cc036a0, body_table = 0x0} So r->per_dir_config seems to be corrupt which finally causes the segfault. Furthermore it can be noted that r->uri / r->filename are also corrupted as they only contain garbage. The further data of the request_rec shows that this request was "fast interally redirected" (ap_internal_fast_redirect) by mod_dir (see r->the_request / r->canonical_filename). I think the reason for this behaviour is the following: 1. The subrequest created by mod_dir uses a subpool of r->pool for its allocations. 2. ap_internal_fast_redirect uses the data allocated out of this subpool for r (especially uri / filename / per_dir_config) 3. The call to apr_pool_destroy in frame 9 destroys the request pool of r. 4. ap_ident_lookup in frame 2 is called by a cleanup of r->pool and uses data in r (r->per_dir_config in this case), but the subpools of r->pool are already destroyed as the cleanups run after the destruction of the subpools. ==> BOOM. The code has been this way for a long time. Why didn't we see this earlier? IMHO this is exposed by the new pool implementation in APR trunk (note only exposed, not caused). As we now free all memory used by the subpools via free the chance increased that this memory is reused for other purposes. In the old implementation we only returned the memory to the allocator and as long as r->pool had a large enough memory chunk further allocation for r->pool were done out of this and none of the memory in the allocator free list was touched. How to solve? I don't know. This is why I write this mail :-). Seriously, I guess we would need to prevent the destruction of the subrequest subpool or we would need to run all our cleanups for the request as pre cleanups. Both seem to be no usable approaches. Regards Rüdiger