[jira] [Commented] (PROTON-2432) Proton crashes because of a concurrency failure in collector->pool

Clifford Jansen (Jira) Wed, 22 Sep 2021 10:15:08 -0700


    [ 
https://issues.apache.org/jira/browse/PROTON-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418723#comment-17418723
 ]


Clifford Jansen commented on PROTON-2432:
-----------------------------------------

Further to Robbie's excellent response:

See also the "Thread-safety" note in messaging_handler.hpp. Useful examples 
working with work queues can be found in cpp/examples including broker.cpp and 
the multithreaded clients.

An alternate method to achieve thread safety in Proton (from using 
proton::work_queue) is to use connection::wake() paired with 
on_connection_wake() and have your own locking mechanism to manage your own 
work queue concept to ensure active use of the connection only happens in the 
dedicated thread that receives the connection callbacks.

One frequent "gotcha" is inadvertent use of the connection or its sub-objects 
(senders/receivers/deliveries) from another thread. Destructors and copy 
constructors are the usual problem. A good strategy is to get a smart pointer 
to the Proton object while in the callback and stash it until a future safe 
callback where the application is ready to release it, and do so via 
smart_ptr::reset(). That way the destructor is called exactly when you want it, 
and any unnoticed copies of the shared ptr in another tread will have no 
surprise calls into the Proton engine.

 

> Proton crashes because of a concurrency failure in collector->pool
> ------------------------------------------------------------------
>
>                 Key: PROTON-2432
>                 URL: https://issues.apache.org/jira/browse/PROTON-2432
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.32.0
>         Environment: RHEL 7 
>            Reporter: Jesse Hulsizer
>            Priority: Major
>         Attachments: proton-2432.patch
>
>
> While running our application tests, our application crashes with many 
> different backtraces that look similar to this...
> {noformat}
> #0  0x0000000000000000 in ?? ()
> #1  0x00007fc777579198 in pn_class_incref () from 
> /usr/lib64/libqpid-proton.so.11
> #2  0x00007fc777587d8a in pn_collector_put () from 
> /usr/lib64/libqpid-proton.so.11
> #3  0x00007fc7775887ea in ?? () from /usr/lib64/libqpid-proton.so.11
> #4  0x00007fc777588c7b in pn_transport_pending () from 
> /usr/lib64/libqpid-proton.so.11
> #5  0x00007fc777588d9e in pn_transport_pop () from 
> /usr/lib64/libqpid-proton.so.11
> #6  0x00007fc777599298 in ?? () from /usr/lib64/libqpid-proton.so.11
> #7  0x00007fc77759a784 in ?? () from /usr/lib64/libqpid-proton.so.11
> #8  0x00007fc7773236f0 in proton::container::impl::thread() () from 
> /usr/lib64/libqpid-proton-cpp.so.12
> #9  0x00007fc7760b2470 in ?? () from /usr/lib64/libstdc++.so.6
> #10 0x00007fc776309aa1 in start_thread () from /lib64/libpthread.so.0
> #11 0x00007fc7758b6bdd in clone () from /lib64/libc.so.6{noformat}
> Using gdb to probe one of the backtraces show that the collector->pool size 
> is -1... (seen here as 18446744073709551615)
> {noformat}
> (gdb) p *collector $1 = \{pool = 0x7fa7182de180, head = 0x7fa7182de250, tail 
> = 0x7fa7182b8b90, prev = 0x7fa7182ea010, freed = false}
> (gdb) p collector->pool $2 = (pn_list_t *) 0x7fa7182de180 (gdb) p 
> *collector->pool $3 = \{clazz = 0x7fa74eb7c000, capacity = 16, size = 
> 18446744073709551615, elements = 0x7fa7182de1b0}{noformat}
> The proton code was marked up with print statements which show that two 
> threads were accessing the collector->pool data structure at the same time...
> {noformat} 
>  7b070700: pn_list_pop index 0 list->0x7fec401e0b70 value->0x7fec3c728a10
>  4ffff700:pn_list_add index 1 size 2list->0x7fec401e0b70 value->0x7fec402095b0
>  7b070700: pn_list_pop size 1 list->0x7fec401e0b70
>  4ffff700: pn_list_pop size 1 list->0x7fec401e0b70
>  7b070700: pn_list_pop index 0 list->0x7fec401e0b70 value->0x7fec3c728a10
>  4ffff700: pn_list_pop index 0 list->0x7fec401e0b70 
> value->0x7fec3c728a10{noformat}
> The hex number on the far left is the thread id. As can be seen in the last 
> two lines, two threads are popping from the collector->pool simultaneously. 
> This produces the -1 size as seen up above



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

[jira] [Commented] (PROTON-2432) Proton crashes because of a concurrency failure in collector->pool

Reply via email to