On 2013-11-25 23:25, Jeff Trawick wrote: > On Mon, Nov 25, 2013 at 5:03 PM, Jeff Trawick <traw...@gmail.com> wrote: > >> On Mon, Nov 25, 2013 at 4:28 PM, olli hauer <oha...@gmx.de> wrote: >> >>> On 2013-11-25 22:14, Jeff Trawick wrote: >>>> On Sun, Nov 24, 2013 at 8:39 PM, Jeff Trawick <traw...@gmail.com> >>> wrote: >>>> >>>>> Let's move this to dev@httpd and omit dev@apr (after this e-mail)... >>>>> >>>>> >>>>> On Sun, Nov 24, 2013 at 8:28 AM, olli hauer <oha...@gmx.de> wrote: >>>>> >>>>>> On 2013-11-22 00:08, Jeff Trawick wrote: >>>>>>> On Thu, Nov 21, 2013 at 5:48 PM, olli hauer <oha...@gmx.de> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> sorry for late response to apr-1.5.0 ... >>>>>>>> >>>>>>>> I've done some tests with apr-1.5.0 on FreeBSD 10 (amd64) >>>>>>>> and it seems there is an issue that breaks apache24. >>>>>>>> >>>>>>>> With apr-1.5.0 apache22 works but apache24 is broken. >>>>>>>> apache starts fine, nothing special in the logs or during >>>>>>>> start with -X but no response is coming back. >>>>>>>> >>>>>>>> apr/apr-util test: >>>>>>>> ======================================== >>>>>>>> apr-1.5.0: all tests passed [1] >>>>>>>> apr-util-1.5.3: all tests passed >>>>>>>> >>>>>>>> >>>>>>>> working configurations (FreeBSD beta3 [1] >>>>>>>> ========================================= >>>>>>>> apache22-2.2.26 apr-1.4.8 apr-util-1.5.3 >>>>>>>> apache22-2.2.26 apr-1.5.0 apr-util-1.5.3 >>>>>>>> apache24-2.4.6 apr-1.4.8 apr-util-1.5.2 >>>>>>>> apache24-2.4.7 apr-1.4.8 apr-util-1.5.2 >>>>>>>> apache24-2.4.6 apr-1.4.8 apr-util-1.5.3 >>>>>>>> apache24-2.4.7 apr-1.4.8 apr-util-1.5.3 >>>>>>>> >>>>>>>> broken combinations: >>>>>>>> ========================================= >>>>>>>> apache24-2.4.6 apr-1.5.0 apr-util-1.5.3 >>>>>>>> apache24-2.4.7 apr-1.5.0 apr-util-1.5.3 >>>>>>>> >>>>>>>> All tests where done with MPM worker. >>>>>>>> >>>>>>>> >>>>>>>> FreeBSD 8.4 (amd64) seems OK in all combinations >>>>>>>> FreeBSD 9.2 (amd64) seems OK in all combinations >>>>>>>> >>>>>>>> [1] FreeBSD 10 beta3 with iconv UTF7 patch r258316 >>>>>>>> (head/lib/libiconv_modules/UTF7/citrus_utf7.c) >>>>>>>> >>>>>>>> Any hints where to start? >>>>>>>> >>>>>>> >>>>>>> Set LogLevel trace8 and compare good vs. bad. >>>>>>> Start with -X then attach with dtruss and compare good vs. bad. >>>>>>> Get open fds displayed by lsof and compare good vs. bad. >>>>>>> Is connection to client held open? Get backtraces. >>>>>>> >>>>>>> I just compared 1.4.8 vs. 1.5.0 and didn't see anything that looked >>>>>>> remotely likely. >>>>>>> >>>>>> >>>>>> Comparing trace8 outputs showed the request is processed >>>>>> but the following code snippet in server/core_filters.c >>>>>> never get TRUE except the client cancels the request. >>>>>> >>>>>> To get some better log entries I've used server/core_filters.c >>>>>> r1510295 from trunk. >>>>>> >>>>>> >>>>>> @@server/core_filters.c (line 510) >>>>>> if (APR_BUCKET_IS_FLUSH(bucket) >>>>>> || non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER >>>>>> || morphing_bucket_in_brigade >>>>>> || eor_buckets_in_brigade > MAX_REQUESTS_IN_PIPELINE) { >>>>>> ... >>>>>> } >>>>>> >>>>>> [http:trace3] http_filters.c(974):[client x.x.x.x:x] Response sent >>> with >>>>>> status 200, headers: >>>>>> [http:trace5] http_filters.c(983):[client x.x.x.x:x] Date: Sun, 24 >>> Nov >>>>>> 2013 10:28:37 GMT >>>>>> [http:trace5] http_filters.c(986):[client x.x.x.x:x] Server: >>>>>> Apache/2.4.7 (FreeBSD) >>>>>> [http:trace4] http_filters.c(804):[client x.x.x.x:x] Last-Modified: >>>>>> Sat, 23 Nov 2013 16:51:58 GMT >>>>>> [http:trace4] http_filters.c(804):[client x.x.x.x:x] ETag: >>>>>> \\"be-4ebdaf2ef2780\\" >>>>>> [http:trace4] http_filters.c(804):[client x.x.x.x:x] Accept-Ranges: >>>>>> bytes >>>>>> [http:trace4] http_filters.c(804):[client x.x.x.x:x] >>> Content-Length: 190 >>>>>> [http:trace4] http_filters.c(804):[client x.x.x.x:x] Keep-Alive: >>>>>> timeout=5, max=100 >>>>>> [http:trace4] http_filters.c(804):[client x.x.x.x:x] Connection: >>>>>> Keep-Alive >>>>>> [http:trace4] http_filters.c(804):[client x.x.x.x:x] Content-Type: >>>>>> text/html >>>>>> [core:trace8] core_filters.c(576):[client x.x.x.x:x] brigade contains: >>>>>> bytes: 284, non-file bytes: 284, eor buckets: 0, morphing buckets: 0 >>>>>> [core:trace8] core_filters.c(576):[client x.x.x.x:x] brigade contains: >>>>>> bytes: 474, non-file bytes: 284, eor buckets: 0, morphing buckets: 0 >>>>>> [core:trace8] core_filters.c(576):[client x.x.x.x:x] brigade contains: >>>>>> bytes: 474, non-file bytes: 284, eor buckets: 1, morphing buckets: 0 >>>>>> >>>>>> >>>>>> This following lines are only seen if >>>>>> apr-1-5.0 was build without IPv6 support >>>>>> or apache24 was build with v4-mapping enabled >>>>>> or "Listen $IP:$port" is given in httpd.conf >>>>>> >>>>>> [core:trace6] core_filters.c(526):[client x.x.x.x:x] will flush >>> because >>>>>> of FLUSH bucket >>>>>> [core:trace8] core_filters.c(528):[client x.x.x.x:x] seen in brigade >>> so >>>>>> far: bytes: 474, non-file bytes: 284, eor buckets: 1, morphing >>> buckets: 0 >>>>>> [core:trace8] core_filters.c(555):[client x.x.x.x:x] flushing now >>>>>> [core:trace8] core_filters.c(568):[client x.x.x.x:x] total bytes >>> written: >>>>>> 474 >>>>>> [core:trace8] core_filters.c(576):[client x.x.x.x:x] brigade contains: >>>>>> bytes: 0, non-file bytes: 0, eor buckets: 0, morphing buckets: 0 >>>>>> >>>>>> However a flush is triggered if the client cancels the request, but no >>>>>> data is sent over the wire ... >>>>>> >>>>>> >>>>>> I've searched if other also have seen a similar issue and found >>>>>> instead the following interesting article from Nov. 2002 ;) >>>>>> http://people.apache.org/~trawick/v4mapped.html >>>>>> >>>>>> >>>>> I haven't analyzed the trace messages you showed above. Here's what I >>> did >>>>> on FreeBSD 9: >>>>> >>>>> Apply this patch to fix the version check for disabling v4mapped >>> addresses: >>>>> >>>>> Index: configure.in >>>>> =================================================================== >>>>> --- configure.in (revision 1545127) >>>>> +++ configure.in (working copy) >>>>> @@ -774,7 +774,10 @@ >>>>> ], >>>>> [ >>>>> case $host in >>>>> - *freebsd5*|*netbsd*|*openbsd*) >>>>> + *freebsd[1234].*) >>>>> + v4mapped=yes >>>>> + ;; >>>>> + *freebsd*|*netbsd*|*openbsd*) >>>>> v4mapped=no >>>>> ;; >>>>> *) >>>>> >>>>> That gives me >>>>> >>>>> $ bin/apachectl -V >>>>> AH00558: httpd: Could not reliably determine the server's fully >>> qualified >>>>> domain name, using 127.0.0.1. Set the 'ServerName' directive globally >>> to >>>>> suppress this message >>>>> Server version: Apache/2.4.8-dev (Unix) >>>>> Server built: Nov 24 2013 20:21:30 >>>>> Server's Module Magic Number: 20120211:27 >>>>> Server loaded: APR 1.5.1-dev, APR-UTIL 1.5.4-dev >>>>> Compiled using: APR 1.5.1-dev, APR-UTIL 1.5.4-dev >>>>> Architecture: 64-bit >>>>> Server MPM: worker >>>>> threaded: yes (fixed thread count) >>>>> forked: yes (variable process count) >>>>> Server compiled with.... >>>>> -D APR_HAS_SENDFILE >>>>> -D APR_HAS_MMAP >>>>> -D APR_HAVE_IPV6 (IPv4-mapped addresses disabled) >>>>> >>>>> The last line shown indicates that mapped addresses are disabled in >>> this >>>>> build. >>>>> >>>>> The only Listen I have is "Listen 8080". procstat says I have separate >>>>> sockets, as expected: >>>>> >>>>> 60734 httpd 3 s - rw---n--- 9 0 TCP ::.8080 ::.0 >>>>> 60734 httpd 4 s - rw---n--- 9 0 TCP 0.0.0.0:8080 >>>>> 0.0.0.0:0 >>>>> >>>>> From netstat: >>>>> >>>>> tcp4 0 0 *.8080 *.* >>> LISTEN >>>>> tcp6 0 0 *.8080 *.* >>> LISTEN >>>>> >>>>> (You had shown sockstat before; this is the same stuff.) >>>>> >>>>> --/-- >>>>> >>>>> Are you able to check for the issue on FreeBSD 9? >>>>> >>>>> When it hangs, are you connecting to the IPv4 listener or the IPv6 >>>>> listener? Does it matter whether you use loopback or the address of a >>>>> network interface? >>>>> >>>>> You're using a regular web browser in the failing case, right? Have >>> you >>>>> tried something as simple as netcat for the failing address/port? >>> Example: >>>>> >>>>> $ echo "GET /" | nc 127.0.0.1 8080 >>>>> <html><body><h1>It works!</h1></body></html> >>>>> >>>>> -- >>>>> Born in Roswell... married an alien... >>>>> http://emptyhammock.com/ >>>>> >>>> >>>> I see the hang on FreeBSD 10 Beta 3 using the same sources, though >>> clang is >>>> installed instead of gcc. >>>> >>>> FWIW, the simple "GET /<EOF>" request works but an HTTP/1.1 request from >>>> telnet or from Firefox hangs. I am able to try loopback over IPv4 and >>> IPv6 >>>> and they both hang. >>>> >>>> Now move the gcc binaries from FreeBSD 9 to FreeBSD 10 (along with >>>> libpcre.0): I don't see the hang with the simple testcases that >>> triggered >>>> it with the clang builds. >>>> >>>> I wouldn't be surprised if your apr-1.5-as-trigger observation reflects >>>> "accidents" like code moving around and/or changing in size. I didn't >>> see >>>> any interesting changes in the source that would affect the path where >>> the >>>> hang occurs. But who really knows :) >>>> >>> >>> Playing with the code (damage included), moving 'flush_upto = next;' >>> and the page loads. I still have some problems to understand how complete >>> code is working (or even not). >>> >>> What is not clear to me why 'if (APR_BUCKET_IS_FLUSH(bucket)' is not >>> triggered? >>> >> >> It is for me at least during lingering close, and I go into the "if >> (flush_upto != NULL) {" block as expected. >> >> From the start of a response to the simple index.html I see a HEAP bucket >> (hdrs?), FILE bucket, EOS bucket, EOR bucket, and maybe something else I >> glossed over, but no flush bucket. >> >> I think the issue may be that after generating the response for one >> request, httpd thinks there is a pipelined request and instead of flushing >> the first response it goes to read the next request but none is available. >> >> Yep, hung here: >> >> Breakpoint 1, check_pipeline (c=0x8026282b8) at http_request.c:221 >> 221 if (c->keepalive != AP_CONN_CLOSE) { >> Current language: auto; currently minimal >> (gdb) p c->keepalive >> $1 = AP_CONN_KEEPALIVE >> (gdb) n >> 223 apr_bucket_brigade *bb = apr_brigade_create(c->pool, >> c->bucket_alloc); >> (gdb) >> 225 rv = ap_get_brigade(c->input_filters, bb, AP_MODE_SPECULATIVE, >> (gdb) list >> 220 { >> 221 if (c->keepalive != AP_CONN_CLOSE) { >> 222 apr_status_t rv; >> 223 apr_bucket_brigade *bb = apr_brigade_create(c->pool, >> c->bucket_alloc); >> 224 >> 225 rv = ap_get_brigade(c->input_filters, bb, AP_MODE_SPECULATIVE, >> 226 APR_NONBLOCK_READ, 1); >> 227 if (rv != APR_SUCCESS || APR_BRIGADE_EMPTY(bb)) { >> 228 /* >> 229 * Error or empty brigade: There is no data present in the >> input >> (gdb) n >> >> (HANGING in call at line 225) >> > > Ha! > > See if this brings any happiness: > > Index: network_io/unix/sockets.c > =================================================================== > --- network_io/unix/sockets.c (revision 1545394) > +++ network_io/unix/sockets.c (working copy) > @@ -273,7 +273,7 @@ > #endif /* TCP_NODELAY_INHERITED */ > #if APR_O_NONBLOCK_INHERITED > if (apr_is_option_set(sock, APR_SO_NONBLOCK) == 1) { > - apr_set_option(*new, APR_SO_NONBLOCK, 1); > + /* apr_set_option(*new, APR_SO_NONBLOCK, 1); */ > } > #endif /* APR_O_NONBLOCK_INHERITED */
I can confirm after removing the line in apr-1.5.0 apache24 no longer hangs. (tested with apache-2.4.6 / 2.4.7) I seen now why this is triggered by comparing apr.h after `./configure', > grep APR_O_NONBLOCK_INHERITED work/apr-1.4.8/include/apr.h ./work/apr-1.4.8/include/apr.h:#define APR_O_NONBLOCK_INHERITED 0 > grep APR_O_NONBLOCK_INHERITED work/apr-1.5.0/include/apr.h work/apr-1.5.0/include/apr.h:#define APR_O_NONBLOCK_INHERITED 1 > There are some APR 1.5 autoconf changes to consider at > http://svn.apache.org/viewvc/apr/apr/branches/1.5.x/build/apr_network.m4?view=log OK, backing out the changes from r1502805 seems to do the trick on 10b3 http://svn.apache.org/viewvc?view=revision&revision=1502805 > grep APR_O_NONBLOCK_INHERITED work/apr-1.5.0/include/apr.h work/apr-1.5.0/include/apr.h:#define APR_O_NONBLOCK_INHERITED 0 > Also, be careful which apr and apr-util are getting loaded. httpd rpath > has /usr/local/lib in front of the location of my apr and apr-util, and > /usr/local/lib has alternate versions. (apachectl -V should show the same > compiled-with/running-with for apr and apr-util.) I'm always building in clean environment a package for testing with tinderbox or poudriere, this way I'm sure the no old libs are flying around. -- Thanks olli