Doesn't the socket_bucket_read() call (frame #3) enter apr_socket_timeout_set(p, 0), and hence sononblock() which puts the socket in O_NONBLOCK?
Maybe "strace" would help here too, since we are in the system now. On Thu, Feb 4, 2016 at 9:17 AM, Joachim Achtzehnter <joac...@kraut.ca> wrote: > Here is the backtrace: > >> #0 0xb7fddc40 in __kernel_vsyscall () >> #1 0xb7e05f03 in __read_nocancel () at >> ../sysdeps/unix/syscall-template.S:81 >> #2 0xb7f39589 in apr_socket_recv (sock=sock@entry=0x8156d18, >> buf=buf@entry=0x8168750 "html, body {\n font-family: arial, >> helvetica, sans-serif;\n font-size: 12px;\n height: 100%;\n >> margin:0;\n padding:0;\n height:100%;\n}\nbody a, a:hover {\n >> text-decoration: underline;\n border:"..., len=len@entry=0xbffff5e4) at >> network_io/unix/sendrecv.c:81 >> #3 0xb7f744d5 in socket_bucket_read (a=0x8160fc8, str=0xbffff5e0, >> len=0xbffff5e4, block=APR_NONBLOCK_READ) >> at buckets/apr_buckets_socket.c:36 >> #4 0x08086a98 in ap_core_input_filter (f=0x8165770, b=0x8165730, >> mode=AP_MODE_READBYTES, block=APR_NONBLOCK_READ, readbytes=5) >> at core_filters.c:235 >> #5 0xb7ce732a in bio_filter_in_read (bio=0x81666d8, in=0x816bc83 "", >> inlen=5) >> at ssl_engine_io.c:487 >> #6 0xb7b1bc66 in BIO_read () >> from /home/joachima/httpd-2.4.18/lib/libcrypto.so.1.0.0 >> #7 0xb7c8e040 in ssl3_read_n () >> from /home/joachima/httpd-2.4.18/lib/libssl.so.1.0.0 >> #8 0xb7c8fa75 in ssl3_read_bytes () >> from /home/joachima/httpd-2.4.18/lib/libssl.so.1.0.0 >> #9 0xb7c8c211 in ssl3_read () >> from /home/joachima/httpd-2.4.18/lib/libssl.so.1.0.0 >> #10 0xb7ca8679 in SSL_read () >> from /home/joachima/httpd-2.4.18/lib/libssl.so.1.0.0 >> #11 0xb7ce80d7 in ssl_io_input_read (inctx=inctx@entry=0x81636e8, >> buf=buf@entry=0x8163710 "GET /css/subpage.css HTTP/1.1\r\nUser-Agent: >> curl/7.40.0\r\nHost: na171\r\nAccept: */*\r\n\r\n", >> len=len@entry=0xbffff8f8) >> at ssl_engine_io.c:611 >> #12 0xb7cea36c in ssl_io_filter_input (f=0x8165718, bb=0x81660d8, >> mode=AP_MODE_SPECULATIVE, block=APR_NONBLOCK_READ, readbytes=1) >> at ssl_engine_io.c:1407 >> #13 0x080a013f in check_pipeline (bb=0x81660d8, c=0x8156eb0) >> at http_request.c:244 >> #14 ap_process_request_after_handler (r=0x817e2d8) at http_request.c:367 >> #15 0x080a1229 in ap_process_async_request (r=r@entry=0x817e2d8) >> at http_request.c:448 >> #16 0x080a1575 in ap_process_request (r=r@entry=0x817e2d8) >> at http_request.c:458 >> #17 0x0809d547 in ap_process_http_sync_connection (c=0x8156eb0) >> at http_core.c:210 >> #18 ap_process_http_connection (c=0x8156eb0) at http_core.c:251 >> #19 0x0809549d in ap_run_process_connection (c=0x8156eb0) at >> connection.c:41 >> #20 0x0809589d in ap_process_connection (c=c@entry=0x8156eb0, >> csd=0x8156d18) >> at connection.c:213 >> #21 0x080a7849 in child_main (child_num_arg=child_num_arg@entry=0, >> child_bucket=child_bucket@entry=0) at prefork.c:723 >> #22 0x080a7a9f in make_child (s=0x80f9f38, slot=slot@entry=0, >> bucket=bucket@entry=0) at prefork.c:767 >> #23 0x080a9091 in prefork_run (_pconf=0x80d50a8, plog=0x80fd9b8, >> s=0x80f9f38) >> at prefork.c:979 >> #24 0x08070959 in ap_run_mpm (pconf=pconf@entry=0x80d50a8, plog=0x80fd9b8, >> s=0x80f9f38) at mpm_common.c:94 >> #25 0x08069f23 in main (argc=4, argv=0xbffffd54) at main.c:777 > > > > On 2016-02-03 11:50 PM, Plüm, Rüdiger, Vodafone Group wrote: > >> Can you also post the backtrace when the server is hanging in the read? >> >> Regards >> >> Rüdiger >> >>> -----Original Message----- >>> From: Plüm, Rüdiger, Vodafone Group >>> Sent: Donnerstag, 4. Februar 2016 08:39 >>> To: dev@httpd.apache.org >>> Subject: RE: HTTPS connections lock-up with 2.4.18 >>> >>> Thanks. Which version of APR / APR-UTIL are you using? >>> >>> Regards >>> >>> Rüdiger >>> >>>> -----Original Message----- >>>> From: Joachim Achtzehnter [mailto:joac...@kraut.ca] >>>> Sent: Donnerstag, 4. Februar 2016 03:56 >>>> To: dev@httpd.apache.org >>>> Subject: Re: HTTPS connections lock-up with 2.4.18 >>>> >>>> Perhaps, there is an important clue this time. When I retrieve a file >>>> that works we see the following log messages near the time when the >>>> reply is being written: >>>> >>>>> [Wed Feb 03 18:13:51.402289 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: pass 271 bytes >>>>> >>>>> [Wed Feb 03 18:13:51.402463 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: pass 4020 bytes >>>>> >>>>> [Wed Feb 03 18:13:51.402515 2016] [core:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] core_output_filter: sent 4291 on 4291 bytes >>>>> >>>>> [Wed Feb 03 18:13:51.402527 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: METADATA EOS >>>>> >>>>> [Wed Feb 03 18:13:51.402581 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: METADATA EOR >>>>> >>>>> [Wed Feb 03 18:13:51.406356 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9db96d8] in: read 5 bytes >>>>> >>>>> [Wed Feb 03 18:13:51.406375 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9db96d8] in: read 26 bytes >>>>> >>>>> [Wed Feb 03 18:13:51.406392 2016] [http:notice] [pid 1676] (70014)End >>> >>> of >>>> >>>> file found: [client 205.159.216.185:55978] check_pipeline: no data >>>>> >>>>> [Wed Feb 03 18:13:51.406476 2016] [http:notice] [pid 1676] (70014)End >>> >>> of >>>> >>>> file found: [client 205.159.216.185:55978] check_pipeline: >>>> data_in_input_filters = 0 >>>>> >>>>> [Wed Feb 03 18:13:51.406492 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: METADATA FLUSH >>>>> >>>>> [Wed Feb 03 18:13:51.406591 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: METADATA FLUSH >>>>> >>>>> [Wed Feb 03 18:13:51.406613 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: METADATA EOC >>>>> >>>>> [Wed Feb 03 18:13:51.406624 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: pass 31 bytes >>>>> >>>>> [Wed Feb 03 18:13:51.406673 2016] [ssl:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] bio[9dabd50] out: FLUSH >>>>> >>>>> [Wed Feb 03 18:13:51.406723 2016] [core:notice] [pid 1676] [client >>>> >>>> 205.159.216.185:55978] core_output_filter: sent 31 on 31 bytes >>>>> >>>>> [Wed Feb 03 18:13:51.406735 2016] [ssl:debug] [pid 1676] >>>> >>>> ssl_engine_io.c(1029): [client 205.159.216.185:55978] AH02001: >>> >>> Connection >>>> >>>> closed to child 0 with standard shutdown (server localhost:443) >>>> >>>> Notice the call to check_pipeline returns quickly. Now the same part of >>>> the log for the failing file: >>>> >>>>> [Wed Feb 03 17:56:55.424124 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: pass 271 bytes >>>>> >>>>> [Wed Feb 03 17:56:55.424192 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: pass 1727 bytes >>>>> >>>>> [Wed Feb 03 17:56:55.424223 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: METADATA EOS >>>>> >>>>> [Wed Feb 03 17:56:55.424235 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: METADATA EOR >>>>> >>>>> [Wed Feb 03 18:09:56.652288 2016] [http:notice] [pid 1663] >>> >>> (11)Resource >>>> >>>> temporarily unavailable: [client 205.159.216.185:55962] check_pipeline: >>> >>> no >>>> >>>> data >>>>> >>>>> [Wed Feb 03 18:09:56.652376 2016] [http:notice] [pid 1663] >>> >>> (11)Resource >>>> >>>> temporarily unavailable: [client 205.159.216.185:55962] check_pipeline: >>>> data_in_input_filters = 0 >>>>> >>>>> [Wed Feb 03 18:09:56.652398 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: METADATA FLUSH >>>>> >>>>> [Wed Feb 03 18:09:56.652444 2016] [core:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] core_output_filter: sent 1998 on 1998 bytes >>>>> >>>>> [Wed Feb 03 18:09:56.652510 2016] [ssl:info] [pid 1663] (70014)End of >>>> >>>> file found: [client 205.159.216.185:55962] AH01991: SSL input filter >>> >>> read >>>> >>>> failed. >>>>> >>>>> [Wed Feb 03 18:09:56.652519 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: METADATA FLUSH >>>>> >>>>> [Wed Feb 03 18:09:56.652525 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: METADATA EOC >>>>> >>>>> [Wed Feb 03 18:09:56.652545 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: pass 31 bytes >>>>> >>>>> [Wed Feb 03 18:09:56.652553 2016] [ssl:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] bio[8158d50] out: FLUSH >>>>> >>>>> [Wed Feb 03 18:09:56.652568 2016] [core:notice] [pid 1663] [client >>>> >>>> 205.159.216.185:55962] core_output_filter: sent 31 on 31 bytes >>>>> >>>>> [Wed Feb 03 18:09:56.652576 2016] [ssl:debug] [pid 1663] >>>> >>>> ssl_engine_io.c(1029): [client 205.159.216.185:55962] AH02001: >>> >>> Connection >>>> >>>> closed to child 0 with standard shutdown (server localhost:443) >>>> >>>> The check_pipeline call only returns when I kill the client. I first >>>> noticed this in the debugger where httpd was sitting in a blocking read. >>>> Somehow, in this case a blocking read is done instead of a non-blocking >>>> read. >>>> >>>> To be specific, it ended up in the do/while loop that calls read in >>>> function apr_socket_recv (sendrecv.c), called from socket_bucket_read in >>>> apr_buckets_socket.c, called from ap_core_input_filter in >>>> core_filters.c. The block parameter was APR_NONBLOCK_READ, but this >>>> didn't stop the lower-level functions from blocking, somehow. The >>>> apr_socket_recv function seems to do the select version of reading based >>>> on the APR_INCOMPLETE_READ flag in sock->options. >>>> >>>> By the way, to answer an earlier question, this is a prefork server with >>>> separate processes, not threaded. >>>> >>>> Thanks, >>>> >>>> Joachim >>>> >>>> >>>> On 2016-02-03 8:06, Yann Ylavic wrote: >>>>> >>>>> On Wed, Feb 3, 2016 at 4:53 PM, Yann Ylavic <ylavic....@gmail.com> >>>> >>>> wrote: >>>>>> >>>>>> >>>>>> I suggest you use the attached patch instead (replacing any previous >>>>>> one), so that we log the output path taken when mod_ssl is in the >>>>>> place. >>>>> >>>>> >>>>> Forgot about the casting issue (SSL_get_app_data) with your >>> >>> compiler... >>>>> >>>>> This v5 (attached) should compile fine. >>>>> >>>>>> >>>>>>> And maybe take network traces on the server side? >>>>>> >>>>>> >>>>>> That would be really nice too. >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Yann. >>>> >>>> >>>> -- >>>> joac...@kraut.ca http://www.kraut.ca > > > > -- > joac...@kraut.ca www.kraut.ca