https://bz.apache.org/bugzilla/show_bug.cgi?id=66646
Bug ID: 66646
Summary: apache http2 h2eos bucket corruption in
httpd-2.4.55/56/57 (with workaround)
Product: Apache httpd-2
Version: 2.4.57
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: mod_http2
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Created attachment 38585
--> https://bz.apache.org/bugzilla/attachment.cgi?id=38585&action=edit
check for h2eos bucket corruption
Hello,
starting with the big mod_http2 update in httpd-2.4.55 we see random
crashes in core_filter.c send_brigade_nonblocking/writev_nonblocking
because of corrupted buckets.
- the crashes do not occur when the current httpd-code (httpd-2.4.57)
is used with mod_http2 from httpd-2.4.54.
- our servers are configured to run with 64 worker-threads per process,
and the machines use redhat-linux. For each server which serves
about 20 million requests per day, we see between 1 and 5 crashes daily.
- crashes occur always in the context of a HTTP2-request (mostly simple
GET-requests). See example below.
- traces show that the crash happens always in cases where a h2eos
bucket is destroyed and the following bucket from the brigade
belongs to the same stream.
see patch: httpd-trunk-mod_http2-check-for-h2eos-bucket-corruption
- the next/prev pointers of this following bucket sometimes point to
illegal memory and indicate a use-after-free bug where the memory
gets reused after the bucket-destroy-handler of the h2eos token
was called.
- the patch checks for such cases in core_filter:send_brigade_nonblocking,
and shows that the problem occurs far more often than the count of
crashes we see.
- after a fresh apache restart it always takes some time (20-30 minutes)
on our busy servers until the bug is triggered, so it might have todo
with a special behavior of the apache-handlers in situtions where for
example 'MaxConnectionsPerChild' is reached.
We are currently using a workaround to fix these crashes, which
was built to find out more about the problem:
- in h2_c1_io.c:pass_output the buckets are reordered, so that a
flush bucket and scratch heap bucket is inserted before the h2eos bucket
when mod_http2 passes the output-brigade to the filters of other
apache modules.
- since the h2eos<->flush swap fix in h2_c1_io.c seems not to be sufficient
to catch all cases where a flush bucket is inserted, the remaining cases
are swapped in core_filter.
see patch: httpd-trunk-mod_http2-workaround-for-h2eos-bucket-corruption
(should be applied on top of the other patch).
- with this patch applied the crashes are absent & the check doesn't trigger.
I think a real fix must more generally enforce the bucket-order in
mod_http2, so that h2eos is always the last bucket in a http2 stream.
Another possibility might be to allocate the flush/scratch-heap buckets
from a pool with a longer lifetime than the pool used for h2eos.
Example output when _checked_bucket_delete triggers:
---------------------------------------------------
[Tue Jun 06 12:49:55 2023] [error] [client ...] send_brigade_nonblocking_len0:
BAD BUCKET type H2EOS bucket 0x7f681c0009c8 before:(next == 0x7f681c000d88,
prev == 0x7f690c09df98) (next.next == 0x7f690c09df98, next.prev ==
0x7f681c0009c8) (first == 0x7f681c0009c8, last == 0x7f681c000d88) bb-sentinel
0x7f690c09df98 after:next type FLUSH 0x7f681c000d88 (next == 0x7f681c000ba8,
prev == 0x7f690c09df98) bb (first == 0x7f690c09df98, last == 0x7f690c09df98)
- before:
head: 0x7f690c09df98 [first: h2eos, last: flush]
h2eos: 0x7f681c0009c8 [next: flush, prev: head]
flush: 0x7f681c000d88 [next: head, prev: h2eos]
- after bucket_delete(h2eos):
head: 0x7f690c09df98 [first: head, last: head]
flush: 0x7f681c000d88 [next: next, prev: head]
next: 0x7f681c000ba8 ???
Example crash core:
------------------
Note: Line numbers here do not match repo because of included debug code.
Such crashes occur after a check like above triggered in a
preceeding send_brigade_nonblocking loop iteration.
(gdb) bt
#0 0x00007f1494013c68 in ?? ()
#1 0x00000000004c9db7 in writev_nonblocking (s=s@entry=0x7f15d80b8340,
bb=bb@entry=0x7f15d80b8af0, bytes_to_write=bytes_to_write@entry=361,
nvec=nvec@entry=4, c=c@entry=0x7f15d80b8558, ctx=<optimized out>,
ctx=<optimized out>) at core_filters.c:850 --> if (n) { apr_bucket_split() ...
}
#2 0x00000000004caf75 in send_brigade_nonblocking (c=0x7f15d80b8558,
ctx=0x7f1494071678, bb=0x7f15d80b8af0, s=0x7f15d80b8340) at core_filters.c:819
--> if (nvec > 0) { rv = writev_nonblocking .. }
#3 ap_core_output_filter (f=0x7f1494071348, new_bb=0x7f15d80b8af0) at
core_filters.c:538
#4 0x00000000005b58bc in ssl_io_filter_output (f=0x7f15d80b8aa8,
bb=0x7f14940717e0) at ssl_engine_io.c:2094
#5 0x00000000005b234e in ssl_io_filter_coalesce (f=0x7f15d80b8a80,
bb=0x7f14940717e0) at ssl_engine_io.c:2037
#6 0x00000000005ef937 in pass_output (flush=1, io=0x7f1494049af0) at
h2_c1_io.c:281
#7 h2_c1_io_assure_flushed (io=io@entry=0x7f1494049af0) at h2_c1_io.c:340
#8 0x00000000005dc9f0 in h2_session_shutdown_notice (session=0x7f1494049ab0)
at h2_session.c:735
#9 h2_session_ev_mpm_stopping (arg=0, msg=0x0, session=0x7f1494049ab0) at
h2_session.c:1524
#10 h2_session_dispatch_event (session=0x7f1494049ab0, ev=<optimized out>,
arg=0, msg=0x0) at h2_session.c:1726
#11 0x00000000005de673 in h2_session_process (session=0x7f1494049ab0,
async=0) at h2_session.c:1798
#12 0x00000000005cb0a9 in h2_c1_run (c=c@entry=0x7f15d80b8558) at h2_c1.c:130
#13 0x00000000005cb4bd in h2_c1_hook_process_connection (c=0x7f15d80b8558) at
h2_c1.c:278
#14 0x00000000004dbd10 in ap_run_process_connection
(c=c@entry=0x7f15d80b8558) at connection.c:42
#15 0x00000000004dc23e in ap_process_connection (c=c@entry=0x7f15d80b8558,
csd=csd@entry=0x7f15d80b8340) at connection.c:217
#16 0x00000000008212db in process_socket (bucket_alloc=0x7f1494013108,
my_thread_num=8, my_child_num=102, sock=0x7f15d80b8340, p=0x7f15d80b82b8,
thd=0x2e74528) at worker.c:485
#17 worker_thread (thd=0x2e74528, dummy=<optimized out>) at worker.c:814
#18 0x00007f15ff802ea5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f15ff327b0d in clone () from /lib64/libc.so.6
(gdb) frame 1
(gdb) info locals
apr__d = 0x7f15d80b8af8
apr__b = 0x7f15d80b8af8
bucket = 0x7f15d80b8af8
n = 46
rv = 0
vec = <optimized out>
bytes_written = 361
i = 3
offset = 3
(gdb) p *bb
$1 = {p = 0x7f15d80b82b8, list = {next = 0x7f15d80b8af8, prev =
0x7f15d80b8af8}, bucket_alloc = 0x7f1494013108}
(gdb) p *bucket
$2 = {link = {next = 0x7f15d80b8af8, prev = 0x7f15d80b8af8}, type =
0x7f1494013108, length = 139731795674416, start = 139731795675888, data =
0x92e640 <brigade_cleanup>, free = 0x987aa0 <apr_pool_cleanup_null>, list =
0x7f15d80b8b28}
Greetings,
Barnim
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]