In Yann's v3 of the patch, it triggers only when I stop the server while the 
test case is ongoing,

Thread 2 Crashed:
0   libsystem_kernel.dylib              0x00007fff202e292e __pthread_kill + 10
1   libsystem_pthread.dylib             0x00007fff203115bd pthread_kill + 263
2   libsystem_c.dylib                   0x00007fff202664ab __abort + 139
3   libsystem_c.dylib                   0x00007fff20266420 abort + 135
4   httpd                               0x000000010625bb72 ap_log_assert + 130 
(log.c:1649)
5   mod_mpm_event.so                    0x0000000106b73549 
close_socket_nonblocking + 217 (event.c:569)
6   mod_mpm_event.so                    0x0000000106b731f2 kill_connection + 64 
(event.c:834) [inlined]
7   mod_mpm_event.so                    0x0000000106b731f2 
start_lingering_close_blocking + 178 (event.c:860)
8   mod_mpm_event.so                    0x0000000106b72e52 process_socket + 
1282 (event.c:1258)
9   mod_mpm_event.so                    0x0000000106b72819 worker_thread + 809 
(event.c:2390)
10  libapr-1.0.dylib                    0x0000000106412341 dummy_worker + 17 
(thread.c:147)
11  libsystem_pthread.dylib             0x00007fff203118fc _pthread_start + 224
12  libsystem_pthread.dylib             0x00007fff2030d443 thread_start + 15


[Tue Jul 13 14:12:18.800125 2021] [mpm_event:trace4] [pid 47786:tid 
123145484374016] event.c(563): closing socket 15/7f95ba07b4a0
[Tue Jul 13 14:12:19.551039 2021] [mpm_event:trace4] [pid 47787:tid 
123145488666624] event.c(1322): closing listeners (connection_count=0)
[Tue Jul 13 14:12:19.551146 2021] [mpm_event:trace4] [pid 47786:tid 
123145497251840] event.c(1322): closing listeners (connection_count=1)
[Tue Jul 13 14:12:19.551176 2021] [mpm_event:trace4] [pid 47788:tid 
123145488666624] event.c(1322): closing listeners (connection_count=0)
[Tue Jul 13 14:12:19.551209 2021] [mpm_event:trace4] [pid 47786:tid 4476272128] 
event.c(563): closing socket 15/7f95ba07b4b0
[Tue Jul 13 14:12:19.552305 2021] [mpm_event:trace4] [pid 47792:tid 
123145488666624] event.c(1322): closing listeners (connection_count=0)
[Tue Jul 13 14:12:19.805283 2021] [mpm_event:trace4] [pid 47786:tid 
123145484910592] event.c(563): closing socket -1/7f95ba07b4b0
[Tue Jul 13 14:12:19.805289 2021] [mpm_event:error] [pid 47786:tid 
123145484910592] (9)Bad file descriptor: AH00468: error closing socket 
-1/7f95ba07b4b0

You can see that socket 7f95ba07b4b0 is closed twice. I attach my log and v5 
patch (which is Yann's plus some extra log around closing).

Attachment: error_log
Description: Binary data

Attachment: event_ka_no_lingerv5.diff
Description: Binary data

> Am 13.07.2021 um 16:03 schrieb Stefan Eissing <stefan.eiss...@greenbytes.de>:
> 
> Wait a minute...need to confirm that I am not lying and used an older patch 
> version...
> 
>> Am 13.07.2021 um 16:01 schrieb Stefan Eissing <stefan.eiss...@greenbytes.de>:
>> 
>> Found one condition where the assertion failure happens. The test case I 
>> have does this:
>> 
>> sock = socket.create_connection((host, int(env.https_port)))
>> try:
>>   sock.settimeout(0.5)
>>   sock.recv(1024)
>>   assert False
>> except Exception as ex:
>>   print(f"as expected: {ex}")
>> sock.close()
>> 
>> Which is all that's needed to trigger the assertion. The server is not 
>> stopped.
>> 
>> The log for this thread only gives:
>> [Tue Jul 13 13:51:29.476747 2021] [mpm_event:trace4] [pid 31582:tid 
>> 123145480441856] event.c(858): [client 127.0.0.1:51768] lingering close in 
>> state 5
>> [Tue Jul 13 13:51:29.476769 2021] [mpm_event:trace4] [pid 31582:tid 
>> 123145480441856] event.c(833): [remote 127.0.0.1:51768] kill connection in 
>> state 5
>> [Tue Jul 13 13:51:29.476775 2021] [mpm_event:trace4] [pid 31582:tid 
>> 123145480441856] event.c(563): closing socket -1/7fa24681d0b0 from 
>> process_socket
>> [Tue Jul 13 13:51:29.476779 2021] [mpm_event:error] [pid 31582:tid 
>> 123145480441856] (9)Bad file descriptor: AH00468: error closing socket 
>> -1/7fa24681d0b0 from process_socket
>> [Tue Jul 13 13:51:29.476799 2021] [core:crit] [pid 31582:tid 
>> 123145480441856] AH00102: [Tue Jul 13 13:51:29 2021] file event.c, line 569, 
>> assertion "0" failed
>> 
>> I added a trace4 "closing socket" before the apr_socket_close(), grepping 
>> for that in the log shows:
>> [Tue Jul 13 13:51:28.467677 2021] [mpm_event:trace4] [pid 31582:tid 
>> 123145479905280] event.c(563): closing socket 15/7fa24681d0a0 from 
>> process_lingering_close
>> [Tue Jul 13 13:51:29.476775 2021] [mpm_event:trace4] [pid 31582:tid 
>> 123145480441856] event.c(563): closing socket -1/7fa24681d0b0 from 
>> process_socket
>> [Tue Jul 13 13:51:29.476779 2021] [mpm_event:error] [pid 31582:tid 
>> 123145480441856] (9)Bad file descriptor: AH00468: error closing socket 
>> -1/7fa24681d0b0 from process_socket
>> [Tue Jul 13 13:51:29.984132 2021] [mpm_event:trace4] [pid 31584:tid 
>> 123145470783488] event.c(563): closing socket -1/7fa24681f0b0 from 
>> process_socket
>> [Tue Jul 13 13:51:29.984138 2021] [mpm_event:error] [pid 31584:tid 
>> 123145470783488] (9)Bad file descriptor: AH00468: error closing socket 
>> -1/7fa24681f0b0 from process_socket
>> 
>> 
>> 
>> Thread 21 Crashed:
>> 0   libsystem_kernel.dylib           0x00007fff202e292e __pthread_kill + 10
>> 1   libsystem_pthread.dylib          0x00007fff203115bd pthread_kill + 263
>> 2   libsystem_c.dylib                0x00007fff202664ab __abort + 139
>> 3   libsystem_c.dylib                0x00007fff20266420 abort + 135
>> 4   httpd                            0x000000010dbfdb72 ap_log_assert + 130 
>> (log.c:1649)
>> 5   mod_mpm_event.so                 0x000000010e5154bf 
>> close_socket_nonblocking_ex + 223 (event.c:569)
>> 6   mod_mpm_event.so                 0x000000010e515158 kill_connection_ex + 
>> 67 (event.c:835) [inlined]
>> 7   mod_mpm_event.so                 0x000000010e515158 
>> start_lingering_close_blocking_ex + 184 (event.c:862)
>> 8   mod_mpm_event.so                 0x000000010e514da9 process_socket + 
>> 1289 (event.c:1261)
>> 9   mod_mpm_event.so                 0x000000010e514769 worker_thread + 809 
>> (event.c:2394)
>> 10  libapr-1.0.dylib                 0x000000010ddb9341 dummy_worker + 17 
>> (thread.c:147)
>> 11  libsystem_pthread.dylib          0x00007fff203118fc _pthread_start + 224
>> 12  libsystem_pthread.dylib          0x00007fff2030d443 thread_start + 15
>> 
>> 
>> 
>> - Stefan
>> 
>>> Am 13.07.2021 um 12:39 schrieb Stefan Eissing 
>>> <stefan.eiss...@greenbytes.de>:
>>> 
>>> Coming back to this: I vote for applying this to trunk. On a fresh setup, I 
>>> do not see and assert failures in my test.
>>> 
>>> - Stefan
>>> 
>>>> Am 08.07.2021 um 20:17 schrieb Yann Ylavic <ylavic....@gmail.com>:
>>>> 
>>>> On Thu, Jul 8, 2021 at 4:21 PM Stefan Eissing
>>>> <stefan.eiss...@greenbytes.de> wrote:
>>>>> 
>>>>> I see many of those:
>>>>> [Thu Jul 08 14:16:55.301670 2021] [mpm_event:error] [pid 81101:tid 
>>>>> 123145411510272] (9)Bad file descriptor: AH00468: error closing socket 
>>>>> -1/7ff9cf0086b0 from process_socket
>>>>> 
>>>>> which come from event.c#1263: rc = start_lingering_close_blocking(cs);
>>>>> calling event.c#864: kill_connection_ex(cs, from);
>>>>> and event.c#837: close_socket_nonblocking_ex(cs->pfd.desc.s, from);
>>>> 
>>>> OK, so I think this could be addressed by the attached patch.
>>>> This is the same as v0 plus some changes in ap_start_lingering_close()
>>>> (from server/connection.c) to not close the socket on error.
>>>> ap_start_lingering_close() did not close the socket consistently, so
>>>> the caller had to call apr_socket_close() anyway and sometimes fail..
>>>> The close on failure is moved to ap_lingering_close() which is the one
>>>> that should care.
>>>> 
>>>> Still the AP_DEBUG_ASSERT(0) triggering with this?
>>>> <event_ka_no_lingerv3.diff>
>>> 
>> 
> 

Reply via email to