Hi Thierry,

Op 27-4-2018 om 1:54 schreef PiBa-NL:
Hi Thierry,

Op 26-4-2018 om 12:25 schreef thierry.fourn...@arpalert.org:
Your trace shows a corrupted tree. Maybe it is due to the freebsd
architecture and the corruption is no reproductible on Linux ? I do not have
freebsd for testing.

Regards,
Thierry

My 'best' reproduction scenario involves around 50 to 100 request or sometimes less requests total made from my Chrome browser. (over a vpn connection which adds a little latency.. a ping/reply to the haproxy server takes +-17ms , maybe that helps reproduce.. :/ ) I've changed the lua function 'print_r' to not write anything in its wr() function, to rule out the the freebsd console / ssh session to be causing the issue. It stays the same though.

Also been adding some printf statements to how the eb_tree is used, which i think might be interesting..
    printf("eb32sc_insert(%d)\n",new);
    printf("eb32sc_next(%d)  leaf_p: %d \n",eb32, eb32->node.leaf_p);
    printf("eb32sc_delete(%d)\n",eb32);

The pattern i see here is that usually a task is inserted, the next task is looked up, and the task gets deleted again. Last round before the crash i see that the task is inserted, then deleted and then afterwards the next task is being retrieved for that same 'item'.. Which fails.. Perhaps because it was the last task to run.?. And there is nolonger a higher root to jump higher up the tree.?. I must admit i don't fully grasp how the tree is traversed exactly.. I seem to see that at least for the first task to be put into the tree there is some special handling.. On the other side, if it aint in the tree, is there technically still a 'next' item??

Below the last part of that logging, and also attached the complete log from the start.. Perhaps it gives a clue.?.

Regards,

PiBa-NL (Pieter)

eb32sc_insert(35826016)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35826016)  leaf_p: 8606272
eb32sc_delete(35826016)
task_wakeup  35826016  32
active_tasks_mask |= t->thread_mask  (35826016)
eb32sc_insert(35826016)
task_wakeup  35826656  8
active_tasks_mask |= t->thread_mask  (35826656)
eb32sc_insert(35826656)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35826656)  leaf_p: 35826656
eb32sc_delete(35826656)
00000013:TestSite.srvrep[0006:ffffffff]: HTTP/1.1 200 OK
00000013:TestSite.srvhdr[0006:ffffffff]: Refresh: 1
00000013:TestSite.srvhdr[0006:ffffffff]: Server: haproxy/webstats
00000013:TestSite.srvhdr[0006:ffffffff]: Content-Type: text/html
00000013:TestSite.srvhdr[0006:ffffffff]: Content-Length: 1778
eb32sc_delete(35826016)
00000013:TestSite.srvcls[0006:ffffffff]
eb32sc_next(35826016)  leaf_p: 0
Segmentation fault (core dumped)

Tried to dig a little further.. pretty sure this are the steps to the issue. The exact reproduction probably is rather timing sensitive though. A tcp connection 'stream task' gets closed, before the applet was done running and this removes the applet task from the tree. This applet task is however what the 'rq_next' is pointing to..

Below stack seems to be what removes and frees the applet task.. Not exactly sure if this is the exact one causing the crash a little later its not related to below 'log' anyhow..

hlua_applet_http_release (ctx=0x802238780) at P:\Git\haproxy\src\hlua.c:6668
si_applet_release (si=0x8022accf0) at P:\Git\haproxy\include\proto\stream_interface.h:234 stream_int_shutw_applet (si=0x8022accf0) at P:\Git\haproxy\src\stream_interface.c:1506 si_shutw (si=0x8022accf0) at P:\Git\haproxy\include\proto\stream_interface.h:321
process_stream (t=0x80222a8c0) at P:\Git\haproxy\src\stream.c:2161
process_runnable_tasks () at P:\Git\haproxy\src\task.c:236
run_poll_loop () at P:\Git\haproxy\src\haproxy.c:2404
run_thread_poll_loop (data=0x8022420a0) at P:\Git\haproxy\src\haproxy.c:2469
main (argc=5, argv=0x7fffffffea68) at P:\Git\haproxy\src\haproxy.c:3060


    *    Below some printf output, mostly containing function names, and a few extra inside the process_runnable_tasks loop to show what was the current and next task...
active_tasks_mask |= t->thread_mask  (35827936)
eb32sc_insert(35827936)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
find rq_next, current task 35827936  rq_current:35827936
eb32sc_next(35827936)  leaf_p: 8610368
eb32sc_delete(35827936)
process task 35827936  rq_next:0
task_wakeup  35827936  32
active_tasks_mask |= t->thread_mask  (35827936)
eb32sc_insert(35827936)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
find rq_next, current task 35827936  rq_current:35827936
eb32sc_next(35827936)  leaf_p: 8610368
eb32sc_delete(35827936)
process task 35827936  rq_next:0
task_wakeup  35827936  32
active_tasks_mask |= t->thread_mask  (35827936)
eb32sc_insert(35827936)
task_wakeup  35827616  8
active_tasks_mask |= t->thread_mask  (35827616)
eb32sc_insert(35827616)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
find rq_next, current task 35827616  rq_current:35827616
eb32sc_next(35827616)  leaf_p: 35827616
eb32sc_delete(35827616)
    *    Here processing of task ..16 starts and rq_next is filled with ..36..
process task 35827616  rq_next:35827936
0000002b:TestSite.srvrep[0014:ffffffff]: HTTP/1.1 200 OK
0000002b:TestSite.srvhdr[0014:ffffffff]: Content-Type: text/html
0000002b:TestSite.srvhdr[0014:ffffffff]: Server: haproxy/webstats
0000002b:TestSite.srvhdr[0014:ffffffff]: Refresh: 1
0000002b:TestSite.srvhdr[0014:ffffffff]: Content-Length: 1779
    *    there is a part that says some states may not happen for applets but the other state SI_ST_EST above falls through to that part maybe thats okay.?. maybe not. stream_int_shutw_applet, may not happen si->state:7 si->flags:16820 ic->flags:-2080370622
si_applet_release
hlua_applet_http_release
    *    While ..16 is running that ..36 gets deleted and freed.
task_delete (task:35827936)
eb32sc_delete(35827936)
0000002b:TestSite.srvcls[0014:ffffffff]
find rq_next, current task 35827936  rq_current:35827936
   *     Next loop the 'next' of ..36 is tried to be read, ..36 itself is however not part of the tree anymore..
eb32sc_next(35827936)  leaf_p: 0
   *     Coredump..

Perhaps the task should not be deleted and freed directly but flagged 'to be destroyed' and then later in the process_runnable_tasks loop the actual remove&free should be done.? I hope this enough info to maybe try and patch it even without a reproduction.? Or maybe can try and reproduce it by 'forcing' some creative delays here and there to get the 'correct' order of events.?.

Regards,

PiBa-NL (Pieter)


Reply via email to