[ 
https://issues.apache.org/jira/browse/AMQCPP-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062650#comment-13062650
 ] 

Timothy Bish commented on AMQCPP-376:
-------------------------------------

I think I see what the cause is, its a bit tricky figuring out how its tying 
itself in this particular knot though.  You could try the code in trunk as the 
threading model has changed but its still not quite release ready.

I need to think on this one for a bit to come up with the right fix.

> Deadlock in IOTransport when network of brokers restart and failover is used. 
> ------------------------------------------------------------------------------
>
>                 Key: AMQCPP-376
>                 URL: https://issues.apache.org/jira/browse/AMQCPP-376
>             Project: ActiveMQ C++ Client
>          Issue Type: Bug
>          Components: Other C++ Clients
>    Affects Versions: 3.4.0
>         Environment: ActiveMQ-CPP  ver - 3.4.0
> Broker  5.3.1
> Machine: Linux mars 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 
> x86_64 x86_64 GNU/Linux
> gcc version: 4.1.2 20080704 (Red Hat 4.1.2-44))
>            Reporter: igor khaustov
>            Assignee: Timothy Bish
>         Attachments: bt_1.txt, bt_2.txt
>
>
> The problem description:
> We  run Network of brokers ( 4 in number ) . 
> Broker URI : broker URI 
> 'failover://(tcp://10.10.13.20:61616,tcp://10.10.13.22:61616,tcp://10.10.13.24:61616,tcp://10.10.13.26:61616)?randomize=true&connection.closeTimeout=10000&transport.soTimeout=3000&timeout=3000&connection.useAsyncSend=true&connection.alwaysSyncSend=false'
> Producer loads broker with 1000 message/sec . We testing the producer 
> behavior while failover by  restarting all brokers in row ( all 4 ) while 
> sending the messages and get deadlock as shown below .
> Note: the problem tested only with network on brokers .
> The backtrace ( only relevant threads ):
> +Thread 16 (process 26892):+
> *#0  0x00000032ef00ce74 in __lll_lock_wait () from /lib64/libpthread.so.0*
> #1  0x00000032ef008874 in _L_lock_106 () from /lib64/libpthread.so.0
> #2  0x00000032ef0082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000dc5a04 in decaf::internal::util::concurrent::MutexImpl::lock 
> (handle=0xfefdd38) at decaf/internal/util/concurrent/unix/MutexImpl.cpp:77
> #4  0x0000000000bd9092 in decaf::util::concurrent::Mutex::lock 
> (this=0xff54100) at decaf/util/concurrent/Mutex.cpp:111
> #5  0x0000000000d51f3f in 
> decaf::util::AbstractCollection<decaf::lang::Pointer<activemq::transport::Transport,
>  decaf::util::concurrent::atomic::AtomicRefCounter> >::lock (this=0xff540f8) 
> at ./decaf/util/AbstractCollection.h:331
> #6  0x0000000000bd86c9 in decaf::util::concurrent::Lock::lock 
> (this=0x4c7b9c90) at decaf/util/concurrent/Lock.cpp:54
> #7  0x0000000000bd883a in Lock (this=0x4c7b9c90, object=0xff54188, 
> intiallyLocked=true) at decaf/util/concurrent/Lock.cpp:32
> *#8  0x0000000000d47a77 in 
> activemq::transport::failover::CloseTransportsTask::add (this=0xff540e8, 
> transport=@0x4c7b9cf0) at 
> activemq/transport/failover/CloseTransportsTask.cpp:46*
> #9  0x0000000000b1b748 in 
> activemq::transport::failover::FailoverTransport::handleTransportFailure 
> (this=0xffed498, error=@0x4c7b9ee0) at 
> activemq/transport/failover/FailoverTransport.cpp:483
> #10 0x0000000000b41a06 in 
> activemq::transport::failover::FailoverTransportListener::onException 
> (this=0xfde2e58, ex=@0x4c7b9ee0) at 
> activemq/transport/failover/FailoverTransportListener.cpp:76
> #11 0x0000000000d34813 in activemq::transport::TransportFilter::fire 
> (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #12 0x0000000000d34841 in activemq::transport::TransportFilter::onException 
> (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #13 0x0000000000d34813 in activemq::transport::TransportFilter::fire 
> (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #14 0x0000000000d34841 in activemq::transport::TransportFilter::onException 
> (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #15 0x0000000000d554c8 in 
> activemq::transport::inactivity::InactivityMonitor::onException 
> (this=0xfeeb558, ex=@0x4c7b9ee0) at 
> activemq/transport/inactivity/InactivityMonitor.cpp:312
> #16 0x0000000000d34813 in activemq::transport::TransportFilter::fire 
> (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #17 0x0000000000d34841 in activemq::transport::TransportFilter::onException 
> (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #18 0x0000000000d326f2 in activemq::transport::IOTransport::fire 
> (this=0xdce10b8, ex=@0x4c7b9ee0) at activemq/transport/IOTransport.cpp:87
> #19 0x0000000000d32982 in activemq::transport::IOTransport::run 
> (this=0xdce10b8) at activemq/transport/IOTransport.cpp:264
> #20 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback 
> (properties=0x105871d8) at decaf/lang/Thread.cpp:137
> #21 0x0000000000ba9068 in threadWorker (arg=0x105871d8) at 
> decaf/lang/Thread.cpp:190
> #22 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #23 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> +Thread 9 (process 14470):+
> *#0  0x00000032ef00a899 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0*
> #1  0x0000000000dc54b3 in 
> decaf::internal::util::concurrent::ConditionImpl::wait (condition=0x1072d2b8) 
> at decaf/internal/util/concurrent/unix/ConditionImpl.cpp:101
> #2  0x0000000000bd9033 in decaf::util::concurrent::Mutex::wait 
> (this=0x105871d8) at decaf/util/concurrent/Mutex.cpp:126
> #3  0x0000000000ba8538 in decaf::lang::Thread::join (this=0x12a4a418) at 
> decaf/lang/Thread.cpp:452
> #4  0x0000000000d32c28 in activemq::transport::IOTransport::close 
> (this=0xdce10b8) at activemq/transport/IOTransport.cpp:222
> #5  0x0000000000d34bfe in activemq::transport::TransportFilter::close 
> (this=0x1020c118) at activemq/transport/TransportFilter.cpp:106
> #6  0x0000000000b47d3a in activemq::transport::tcp::TcpTransport::close 
> (this=0x1020c118) at activemq/transport/tcp/TcpTransport.cpp:74
> #7  0x0000000000d34bfe in activemq::transport::TransportFilter::close 
> (this=0xfeeb558) at activemq/transport/TransportFilter.cpp:106
> #8  0x0000000000d554ec in 
> activemq::transport::inactivity::InactivityMonitor::close (this=0xfeeb558) at 
> activemq/transport/inactivity/InactivityMonitor.cpp:300
> #9  0x0000000000d77867 in 
> activemq::wireformat::openwire::OpenWireFormatNegotiator::close 
> (this=0x10627498) at 
> activemq/wireformat/openwire/OpenWireFormatNegotiator.cpp:248
> *#10 0x0000000000d478ff in 
> activemq::transport::failover::CloseTransportsTask::iterate (this=0xff540e8) 
> at activemq/transport/failover/CloseTransportsTask.cpp:75*
> #11 0x0000000000d25891 in activemq::threads::CompositeTaskRunner::iterate 
> (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:173
> #12 0x0000000000d25ae4 in activemq::threads::CompositeTaskRunner::run 
> (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:107
> #13 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback 
> (properties=0xfeeb2b8) at decaf/lang/Thread.cpp:137
> #14 0x0000000000ba9068 in threadWorker (arg=0xfeeb2b8) at 
> decaf/lang/Thread.cpp:190
> #15 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #16 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> As you can see +Thread 16+ is on lock_wait for *_synchronized( &transports 
> )_* in activemq::transport::failover::CloseTransportsTask::add .
> The *_synchronized( &transports )_* in locked by +Thread 9+ in 
> activemq::threads::CompositeTaskRunner::iterate . But  +Thread 9+ is on 
> pthread_cond_wait which has to be signalled by the +Thread 16+.
> Kind regards .
> Igor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to