[ 
https://issues.apache.org/jira/browse/DISPATCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471157#comment-16471157
 ] 

Gordon Sim commented on DISPATCH-994:
-------------------------------------

Digging a little deeper, the issue here is the re-use of a link name on a 
session before the previous use has fully closed. The test case attached here 
is arguably incorrect, as it does not wait for the connection to be confirmed 
as closed before resubscribing with the same link names. However even a 
modified version that does so can cause the same problem. DISPATCH-997 is a 
different symptom of the same route cause, and the test there also waits for 
connection close before reusing. If router-c is run under valrgind, that too 
can trigger this segfault.

The only way to avoid it would be for the application to wait for the link 
detach to be confirmed before closing the connection. That is not something 
that can be relied on. If the connection ends (cleanly or due to disconnect) 
before the link is closed, then the router will confirm the close of the 
connection before waiting for the detach it relays down the link route to be 
echoed back.

If you get an attach with the same name before the detach for the previous use 
of that name has been echoed back, then the previous link is not fully closed 
(it is locally open, remotely closed), and when proton handles the attach it 
gives the previous object which is in the incorrect state. This either leads to 
the router incorrectly treating the attach as the echoing back of a router 
initiated link, which causes the segfault described in this issue due to 
correct context not being set up or it causes the attach to be ignored and not 
echoed back. The former happens when the detach is echoed back slowly, so 
running router c under valgrind makes it more likely.

Fundamentally I think this is an issue in using the same session for all routed 
links, where the links are detached asynchronously.

> segfault in qdr_link_second_attach
> ----------------------------------
>
>                 Key: DISPATCH-994
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-994
>             Project: Qpid Dispatch
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Gordon Sim
>            Priority: Major
>         Attachments: router-a.conf, router-b.conf, router-c.conf, 
> topic_test.py
>
>
> Link routing from router A through router B to a 'broker', and closing and 
> opening two receivers causes a segfault.
> {noformat}
> ==25674== Thread 4:
> ==25674== Invalid read of size 8
> ==25674==    at 0x4E77EEF: qdr_link_second_attach (connections.c:474)
> ==25674==    by 0x4E87142: AMQP_link_attach_handler (router_node.c:680)
> ==25674==    by 0x4E8BF2B: handle (server.c:940)
> ==25674==    by 0x4E8CBA7: thread_run (server.c:958)
> ==25674==    by 0x54FA739: start_thread (in /usr/lib64/libpthread-2.24.so)
> ==25674==    by 0x6288E7E: clone (in /usr/lib64/libc-2.24.so)
> ==25674==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==25674== 
> ==25674== 
> ==25674== Process terminating with default action of signal 11 (SIGSEGV): 
> dumping core
> ==25674==  Access not within mapped region at address 0x10
> ==25674==    at 0x4E77EEF: qdr_link_second_attach (connections.c:474)
> ==25674==    by 0x4E87142: AMQP_link_attach_handler (router_node.c:680)
> ==25674==    by 0x4E8BF2B: handle (server.c:940)
> ==25674==    by 0x4E8CBA7: thread_run (server.c:958)
> ==25674==    by 0x54FA739: start_thread (in /usr/lib64/libpthread-2.24.so)
> ==25674==    by 0x6288E7E: clone (in /usr/lib64/libc-2.24.so)
> ==25674==  If you believe this happened as a result of a stack
> ==25674==  overflow in your program's main thread (unlikely but
> ==25674==  possible), you can try to increase the size of the
> ==25674==  main thread stack using the --main-stacksize= flag.
> ==25674==  The main thread stack size used in this run was 8388608
> {noformat}
> To reproduce, start three routers with the attached config files, then run 
> the attached python test program.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to