[ 
https://issues.apache.org/jira/browse/DISPATCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295462#comment-16295462
 ] 

Ganesh Murthy commented on DISPATCH-902:
----------------------------------------

This crash is easy to reproduce thanks to the reproducer provided by Kim van 
der Riet.

On following the steps, I see the following backtrace

(gdb) bt
#0  pn_transport_tail_closed (transport=0x0) at 
/home/gmurthy/opensource/qpid-proton/proton-c/src/core/transport.c:3056
#1  0x00007fccc7788349 in pn_connection_driver_read_closed 
(d=d@entry=0x7fcca4036538) at 
/home/gmurthy/opensource/qpid-proton/proton-c/src/core/connection_driver.c:109
#2  0x00007fccc7569ed1 in pconnection_rclosed (pc=0x7fcca4035f90) at 
/home/gmurthy/opensource/qpid-proton/proton-c/src/proactor/epoll.c:898
#3  pconnection_process (pc=0x7fcca4035f90, events=<optimized out>, 
timeout=timeout@entry=false, topup=false) at 
/home/gmurthy/opensource/qpid-proton/proton-c/src/proactor/epoll.c:1084
#4  0x00007fccc756a95b in proactor_do_epoll (p=0x10012a0, 
can_block=can_block@entry=true) at 
/home/gmurthy/opensource/qpid-proton/proton-c/src/proactor/epoll.c:2007
#5  0x00007fccc756b97a in pn_proactor_wait (p=<optimized out>) at 
/home/gmurthy/opensource/qpid-proton/proton-c/src/proactor/epoll.c:2025
#6  0x00007fccc79f4e02 in thread_run (arg=0xfb75e0) at 
/home/gmurthy/opensource/qpid-dispatch/src/server.c:932
#7  0x00007fccc734d609 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fccc6603e6f in clone () from /lib64/libc.so.6

The failure comes from

bool pn_transport_tail_closed(pn_transport_t *transport) { return 
transport->tail_closed; } in transport.c

transport is 0.

In trying to fix the crash, I modified the pn_transport_tail_closed to 
bool pn_transport_tail_closed(pn_transport_t *transport) { return transport && 
transport->tail_closed; }
This temporarily solved the problem, the router does not immediately crash but 
crashes after a couple of minutes. Modifying the pn_transport_tail_closed is 
not the solution to this problem. The code should not even get here in the 
first place since the transport is already closed due to connection termination.

Assigning this issue to [~aconway] for further research into the proton API.




> Intermittent crash with link to broker when broker closed
> ---------------------------------------------------------
>
>                 Key: DISPATCH-902
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-902
>             Project: Qpid Dispatch
>          Issue Type: Bug
>            Reporter: Kim van der Riet
>            Assignee: Ted Ross
>         Attachments: qdrouterd.node1.conf, qdrouterd.node2.conf, 
> qpidd.d2n.conf
>
>
> When using dispatch in a 2-node configuration with a broker between them:
> {noformat}
>         9002           10001           10001            9003
> sender ----> dispatch1 -----> qpid-cpp -----> dispatch2 -----> receiver
> {noformat}
> and initializing in the following order:
> # start dispatch1
> # start dispatch2
> # start qpid-cpp
> # wait for "Link Route Activated" messages on both dispatch nodes
> # stop qpid-cpp
> then the dispatch nodes will core after a random amount of time and after 
> sending a random number of 
> {noformat}
> (info) Connection to localhost:10001 failed: proton:io Connection refused - 
> on read from localhost:10001
> {noformat}
> messages.
> The stack trace is as follows for all occurrences:
> {noformat}
> Thread 3 "qdrouterd" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffea269700 (LWP 10954)]
> pn_transport_tail_closed (transport=0x0) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/core/transport.c:3044
> 3044  bool pn_transport_tail_closed(pn_transport_t *transport) { return 
> transport->tail_closed; }
> (gdb) thread apply all bt
> Thread 5 (Thread 0x7fffe9267700 (LWP 10956)):
> #0  0x00007ffff67eb6d3 in epoll_wait () at 
> ../sysdeps/unix/syscall-template.S:84
> #1  0x00007ffff77327e2 in proactor_do_epoll (p=0x89b550, 
> can_block=can_block@entry=true) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:1978
> #2  0x00007ffff77337ca in pn_proactor_wait (p=<optimized out>) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:2025
> #3  0x00007ffff7bbc219 in thread_run (arg=0x89ec20) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/server.c:932
> #4  0x00007ffff75185ca in start_thread (arg=0x7fffe9267700) at 
> pthread_create.c:333
> #5  0x00007ffff67eb0cd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> Thread 4 (Thread 0x7fffe9a68700 (LWP 10955)):
> #0  0x00007ffff67eb6d3 in epoll_wait () at 
> ../sysdeps/unix/syscall-template.S:84
> #1  0x00007ffff77327e2 in proactor_do_epoll (p=0x89b550, 
> can_block=can_block@entry=true) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:1978
> #2  0x00007ffff77337ca in pn_proactor_wait (p=<optimized out>) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:2025
> #3  0x00007ffff7bbc219 in thread_run (arg=0x89ec20) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/server.c:932
> #4  0x00007ffff75185ca in start_thread (arg=0x7fffe9a68700) at 
> pthread_create.c:333
> #5  0x00007ffff67eb0cd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> Thread 3 (Thread 0x7fffea269700 (LWP 10954)):
> #0  pn_transport_tail_closed (transport=0x0) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/core/transport.c:3044
> #1  0x00007ffff794f4f9 in pn_connection_driver_read_closed 
> (d=d@entry=0x7fffdc054288) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/core/connection_driver.c:109
> #2  0x00007ffff7731ef1 in pconnection_rclosed (pc=0x7fffdc053ce0) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:898
> #3  pconnection_process (pc=0x7fffdc053ce0, events=<optimized out>, 
> timeout=timeout@entry=false, topup=topup@entry=false) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:1084
> #4  0x00007ffff7732945 in proactor_do_epoll (p=0x89b550, 
> can_block=can_block@entry=true) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:2007
> #5  0x00007ffff77337ca in pn_proactor_wait (p=<optimized out>) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:2025
> #6  0x00007ffff7bbc219 in thread_run (arg=0x89ec20) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/server.c:932
> #7  0x00007ffff75185ca in start_thread (arg=0x7fffea269700) at 
> pthread_create.c:333
> #8  0x00007ffff67eb0cd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> Thread 2 (Thread 0x7fffeaa6a700 (LWP 10953)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at 
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007ffff7ba2949 in sys_cond_wait (cond=<optimized out>, 
> held_mutex=<optimized out>) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/posix/threading.c:91
> #2  0x00007ffff7bb0cf5 in router_core_thread (arg=0x8f8c90) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/router_core/router_core_thread.c:66
> #3  0x00007ffff75185ca in start_thread (arg=0x7fffeaa6a700) at 
> pthread_create.c:333
> #4  0x00007ffff67eb0cd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> Thread 1 (Thread 0x7ffff7fbb180 (LWP 10946)):
> #0  0x00007ffff67eb6d3 in epoll_wait () at 
> ../sysdeps/unix/syscall-template.S:84
> #1  0x00007ffff77327e2 in proactor_do_epoll (p=0x89b550, 
> can_block=can_block@entry=true) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:1978
> #2  0x00007ffff77337ca in pn_proactor_wait (p=<optimized out>) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:2025
> #3  0x00007ffff7bbc219 in thread_run (arg=arg@entry=0x89ec20) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/server.c:932
> #4  0x00007ffff7bbc2f0 in qd_server_run (qd=<optimized out>) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/server.c:1186
> #5  0x00000000004017dc in main_process (config_path=0x7fffffffda56 
> "/home/kpvdr/RedHat/install/etc/qpid-dispatch/qdrouterd.node2.conf", 
> python_pkgdir=<optimized out>, fd=2)
>     at /home/kpvdr/RedHat/qpid-dispatch/router/src/main.c:111
> #6  0x00000000004015ec in main (argc=3, argv=0x7fffffffd638) at 
> /home/kpvdr/RedHat/qpid-dispatch/router/src/main.c:318
> {noformat}
> More detail:
>  
> {noformat}
> (gdb) bt full
> #0  pn_transport_tail_closed (transport=0x0) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/core/transport.c:3044
> No locals.
> #1  0x00007ffff794f4f9 in pn_connection_driver_read_closed 
> (d=d@entry=0x7fffe0071108) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/core/connection_driver.c:109
> No locals.
> #2  0x00007ffff7731ef1 in pconnection_rclosed (pc=0x7fffe0070b60) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:898
> No locals.
> #3  pconnection_process (pc=0x7fffe0070b60, events=<optimized out>, 
> timeout=timeout@entry=false, topup=topup@entry=false) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:1084
>         inbound_wake = <optimized out>
>         rearm_timer = <optimized out>
>         timer_fired = <optimized out>
>         waking = false
>         tick_required = false
>         rearm_pc = <optimized out>
> #4  0x00007ffff7732945 in proactor_do_epoll (p=0x89b550, 
> can_block=can_block@entry=true) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:2007
>         batch = 0x0
>         ev = {events = 21, data = {ptr = 0x7fffe0070b70, fd = -536409232, u32 
> = 3758558064, u64 = 140736951946096}}
>         n = <optimized out>
>         ee = 0x7fffe0070b70
>         timeout = -1
> #5  0x00007ffff77337ca in pn_proactor_wait (p=<optimized out>) at 
> /home/kpvdr/RedHat/qpid-proton/proton-c/src/proactor/epoll.c:2025
> No locals.
> #6  0x00007ffff7bbc219 in thread_run (arg=0x89ec20) at 
> /home/kpvdr/RedHat/qpid-dispatch/src/server.c:932
>         events = <optimized out>
>         e = <optimized out>
>         qd_server = 0x89ec20
>         running = true
> #7  0x00007ffff75185ca in start_thread (arg=0x7fffe9a68700) at 
> pthread_create.c:333
>         __res = <optimized out>
>         pd = 0x7fffe9a68700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737113392896, 
> -7680037156526760930, 140737488344079, 4096, 140737113392896, 
> 140737113393600, 7679998188122266654, 7680018654140947486}, 
>               mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data 
> = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
> #8  0x00007ffff67eb0cd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> No locals.
> {noformat}
> Dispatch,  qpid-cpp and Proton are all built from master yesterday (Dec 14).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to