[
https://issues.apache.org/jira/browse/QPID-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287353#comment-16287353
]
Håkan Johansson commented on QPID-8056:
---------------------------------------
I have done yet more digging and found three more error cases:
h3. Case 1:
{noformat}
Thread error [apa.bepa3]: Transaction outcome unknown: transport failure
pure virtual method called
terminate called without an active exception
{noformat}
Traceback (from gdb):
{noformat}
#0 0x0000003dbf032495 in raise () from /lib64/libc.so.6
#1 0x0000003dbf033c75 in abort () from /lib64/libc.so.6
#2 0x0000003dc5cbea8d in __gnu_cxx::__verbose_terminate_handler() () from
/usr/lib64/libstdc++.so.6
#3 0x0000003dc5cbcbe6 in ?? () from /usr/lib64/libstdc++.so.6
#4 0x0000003dc5cbcc13 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5 0x0000003dc5cbd56f in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6
#6 0x00007ffff7d7afc7 in qpid::messaging::Connection::isOpen
(this=0x7fffffffdc88) at
/users/hakjo/work/Slask/qpid_messaging_crash/src/qpid-cpp-1.36.0/src/qpid/messaging/Connection.cpp:80
...
(gdb) p impl (at qpid::messaging::Connection::isOpen step)
$1 = (warning: can't find linker symbol for virtual table for
`qpid::messaging::ConnectionImpl' value
warning: found `construction vtable for
qpid::messaging::ConnectionImpl-in-qpid::messaging::amqp::ConnectionHandle'
instead
warning: can't find linker symbol for virtual table for
`qpid::messaging::ConnectionImpl' value
warning: found `construction vtable for
qpid::messaging::ConnectionImpl-in-qpid::messaging::amqp::ConnectionHandle'
instead
qpid::messaging::ConnectionImpl *) 0x7fffec000ff0
{noformat}
Our code is doing this at the time:
{noformat}
if (not connection_.isValid() or not connection_.isOpen()) {
{noformat}
I googled this error and did not find much, but similar errors can happen with
buffer overruns or writes to "dead pointers", where the memory has been reused.
h3. Case 2:
Valgrind errors:
{noformat}
==25997== Thread 5:
==25997== Invalid read of size 4
==25997== at 0x3DBF409470: pthread_mutex_lock (in /lib64/libpthread-2.12.so)
==25997== by 0x4C6B1CB: qpid::sys::Mutex::lock() (Mutex.h:116)
==25997== by 0x4C6D2E5:
qpid::sys::ScopedLock<qpid::sys::Mutex>::ScopedLock(qpid::sys::Mutex&)
(Mutex.h:33)
==25997== by 0x4C600E4:
qpid::messaging::amqp::ConnectionContext::activateOutput()
(ConnectionContext.cpp:640)
==25997== by 0x4C5A98C: qpid::messaging::amqp::(anonymous
namespace)::ConnectionTickerTask::fire() (ConnectionContext.cpp:115)
==25997== by 0x59C4B06: qpid::sys::TimerTask::fireTask() (Timer.cpp:66)
==25997== by 0x59C5D60:
qpid::sys::Timer::fire(boost::intrusive_ptr<qpid::sys::TimerTask>)
(Timer.cpp:222)
==25997== by 0x59C5694: qpid::sys::Timer::run() (Timer.cpp:156)
==25997== by 0x5933086: qpid::sys::(anonymous namespace)::runRunnable(void*)
(Thread.cpp:35)
==25997== by 0x3DBF407AA0: start_thread (in /lib64/libpthread-2.12.so)
==25997== by 0x3DBF0E8BCC: clone (in /lib64/libc-2.12.so)
==25997== Address 0x64bd3a0 is 608 bytes inside a block of size 744 free'd
==25997== at 0x4A06016: operator delete(void*) (vg_replace_malloc.c:480)
==25997== by 0x4C5B34D:
qpid::messaging::amqp::ConnectionContext::~ConnectionContext()
(ConnectionContext.cpp:158)
==25997== by 0x4C771ED: void
boost::checked_delete<qpid::messaging::amqp::ConnectionContext>(qpid::messaging::amqp::ConnectionContext*)
(checked_delete.hpp:34)
==25997== by 0x4C77395:
boost::detail::sp_counted_impl_p<qpid::messaging::amqp::ConnectionContext>::dispose()
(sp_counted_impl.hpp:78)
==25997== by 0x4C4811D: boost::detail::sp_counted_base::release()
(sp_counted_base_gcc_x86.hpp:146)
==25997== by 0x4C48196: boost::detail::shared_count::~shared_count()
(shared_count.hpp:467)
==25997== by 0x4C76EAB:
boost::shared_ptr<qpid::messaging::amqp::ConnectionContext>::~shared_ptr()
(shared_ptr.hpp:331)
==25997== by 0x4C772A0:
qpid::messaging::amqp::ConnectionHandle::~ConnectionHandle()
(ConnectionHandle.h:41)
==25997== by 0x4C7736B:
qpid::messaging::amqp::ConnectionHandle::~ConnectionHandle()
(ConnectionHandle.h:41)
==25997== by 0x4C6BBCF: qpid::RefCounted::released() const (RefCounted.h:48)
==25997== by 0x4C6BBA2: qpid::RefCounted::release() const (RefCounted.h:42)
==25997== by 0x4C6BC03: qpid::intrusive_ptr_release(qpid::RefCounted const*)
(RefCounted.h:54)
==25997== by 0x4CE6646:
qpid::messaging::PrivateImplRef<qpid::messaging::Connection>::set(qpid::messaging::Connection&,
boost::intrusive_ptr<qpid::messaging::ConnectionImpl> const&)
(PrivateImplRef.h:80)
==25997== by 0x4CE6544:
qpid::messaging::PrivateImplRef<qpid::messaging::Connection>::assign(qpid::messaging::Connection&,
qpid::messaging::Connection const&) (PrivateImplRef.h:89)
==25997== by 0x4CE599A:
qpid::messaging::Connection::operator=(qpid::messaging::Connection const&)
(Connection.cpp:43)
==25997== by 0x407E31: ioserver::qpid::QpidMessageSource::close()
(QpidMessageSource.cc:165)
==25997== by 0x407921: ioserver::qpid::QpidMessageSource::commitMessages()
(QpidMessageSource.cc:105)
==25997== by 0x40BEA9: ThreadedQpidMessageSource::threadFunc()
(ThreadedQpidMessageSource.cc:68)
==25997== by 0x40D646: boost::_mfi::mf0<void,
ThreadedQpidMessageSource>::operator()(ThreadedQpidMessageSource*) const
(mem_fn_template.hpp:49)
==25997== by 0x40D5B5: void
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*>
>::operator()<boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>&, boost::_bi::list0&, int) (bind.hpp:246)
==25997== by 0x40D562: boost::_bi::bind_t<void, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> >
>::operator()() (bind_template.hpp:20)
==25997== by 0x40D521: boost::detail::thread_data<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> > > >::run()
(thread.hpp:59)
==25997== by 0x4F9AD46: thread_proxy (in /usr/lib64/libboost_thread-mt.so.5)
==25997== by 0x3DBF407AA0: start_thread (in /lib64/libpthread-2.12.so)
==25997== by 0x3DBF0E8BCC: clone (in /lib64/libc-2.12.so)
==25997==
==25997== Invalid read of size 4
==25997== at 0x3DBF408F73: __pthread_mutex_lock_full (in
/lib64/libpthread-2.12.so)
==25997== by 0x4C6B1CB: qpid::sys::Mutex::lock() (Mutex.h:116)
==25997== by 0x4C6D2E5:
qpid::sys::ScopedLock<qpid::sys::Mutex>::ScopedLock(qpid::sys::Mutex&)
(Mutex.h:33)
==25997== by 0x4C600E4:
qpid::messaging::amqp::ConnectionContext::activateOutput()
(ConnectionContext.cpp:640)
==25997== by 0x4C5A98C: qpid::messaging::amqp::(anonymous
namespace)::ConnectionTickerTask::fire() (ConnectionContext.cpp:115)
==25997== by 0x59C4B06: qpid::sys::TimerTask::fireTask() (Timer.cpp:66)
==25997== by 0x59C5D60:
qpid::sys::Timer::fire(boost::intrusive_ptr<qpid::sys::TimerTask>)
(Timer.cpp:222)
==25997== by 0x59C5694: qpid::sys::Timer::run() (Timer.cpp:156)
==25997== by 0x5933086: qpid::sys::(anonymous namespace)::runRunnable(void*)
(Thread.cpp:35)
==25997== by 0x3DBF407AA0: start_thread (in /lib64/libpthread-2.12.so)
==25997== by 0x3DBF0E8BCC: clone (in /lib64/libc-2.12.so)
==25997== Address 0x64bd3a0 is 608 bytes inside a block of size 744 free'd
==25997== at 0x4A06016: operator delete(void*) (vg_replace_malloc.c:480)
==25997== by 0x4C5B34D:
qpid::messaging::amqp::ConnectionContext::~ConnectionContext()
(ConnectionContext.cpp:158)
==25997== by 0x4C771ED: void
boost::checked_delete<qpid::messaging::amqp::ConnectionContext>(qpid::messaging::amqp::ConnectionContext*)
(checked_delete.hpp:34)
==25997== by 0x4C77395:
boost::detail::sp_counted_impl_p<qpid::messaging::amqp::ConnectionContext>::dispose()
(sp_counted_impl.hpp:78)
==25997== by 0x4C4811D: boost::detail::sp_counted_base::release()
(sp_counted_base_gcc_x86.hpp:146)
==25997== by 0x4C48196: boost::detail::shared_count::~shared_count()
(shared_count.hpp:467)
==25997== by 0x4C76EAB:
boost::shared_ptr<qpid::messaging::amqp::ConnectionContext>::~shared_ptr()
(shared_ptr.hpp:331)
==25997== by 0x4C772A0:
qpid::messaging::amqp::ConnectionHandle::~ConnectionHandle()
(ConnectionHandle.h:41)
==25997== by 0x4C7736B:
qpid::messaging::amqp::ConnectionHandle::~ConnectionHandle()
(ConnectionHandle.h:41)
==25997== by 0x4C6BBCF: qpid::RefCounted::released() const (RefCounted.h:48)
==25997== by 0x4C6BBA2: qpid::RefCounted::release() const (RefCounted.h:42)
==25997== by 0x4C6BC03: qpid::intrusive_ptr_release(qpid::RefCounted const*)
(RefCounted.h:54)
==25997== by 0x4CE6646:
qpid::messaging::PrivateImplRef<qpid::messaging::Connection>::set(qpid::messaging::Connection&,
boost::intrusive_ptr<qpid::messaging::ConnectionImpl> const&)
(PrivateImplRef.h:80)
==25997== by 0x4CE6544:
qpid::messaging::PrivateImplRef<qpid::messaging::Connection>::assign(qpid::messaging::Connection&,
qpid::messaging::Connection const&) (PrivateImplRef.h:89)
==25997== by 0x4CE599A:
qpid::messaging::Connection::operator=(qpid::messaging::Connection const&)
(Connection.cpp:43)
==25997== by 0x407E31: ioserver::qpid::QpidMessageSource::close()
(QpidMessageSource.cc:165)
==25997== by 0x407921: ioserver::qpid::QpidMessageSource::commitMessages()
(QpidMessageSource.cc:105)
==25997== by 0x40BEA9: ThreadedQpidMessageSource::threadFunc()
(ThreadedQpidMessageSource.cc:68)
==25997== by 0x40D646: boost::_mfi::mf0<void,
ThreadedQpidMessageSource>::operator()(ThreadedQpidMessageSource*) const
(mem_fn_template.hpp:49)
==25997== by 0x40D5B5: void
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*>
>::operator()<boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>&, boost::_bi::list0&, int) (bind.hpp:246)
==25997== by 0x40D562: boost::_bi::bind_t<void, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> >
>::operator()() (bind_template.hpp:20)
==25997== by 0x40D521: boost::detail::thread_data<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> > > >::run()
(thread.hpp:59)
==25997== by 0x4F9AD46: thread_proxy (in /usr/lib64/libboost_thread-mt.so.5)
==25997== by 0x3DBF407AA0: start_thread (in /lib64/libpthread-2.12.so)
==25997== by 0x3DBF0E8BCC: clone (in /lib64/libc-2.12.so)
==25997==
{noformat}
The Timer fires a callback in a deleted object.
This is very similar to the case with the Poller thread calling callbacks in
deleted objects.
The life cycle management for callbacks is broken in qpid-cpp.
h3. Case 3:
Valgrind error:
{noformat}
==61784== Thread 4:
==61784== Invalid write of size 1
==61784== at 0x4C9973C:
qpid::messaging::amqp::Transaction::discharge(boost::function<void
(boost::shared_ptr<qpid::messaging::amqp::SessionContext>,
boost::shared_ptr<qpid::messaging::amqp::SenderContext>,
qpid::messaging::Message const&, bool,
qpid::messaging::amqp::SenderContext::Delivery**)>,
boost::shared_ptr<qpid::messaging::amqp::SessionContext> const&, bool)
(Transaction.cpp:95)
==61784== by 0x4C689A2:
qpid::messaging::amqp::ConnectionContext::discharge(boost::shared_ptr<qpid::messaging::amqp::SessionContext>,
bool) (ConnectionContext.cpp:1350)
==61784== by 0x4C68B15:
qpid::messaging::amqp::ConnectionContext::commit(boost::shared_ptr<qpid::messaging::amqp::SessionContext>)
(ConnectionContext.cpp:1356)
==61784== by 0x4C93446: qpid::messaging::amqp::SessionHandle::commit()
(SessionHandle.cpp:45)
==61784== by 0x4CECFC0: qpid::messaging::Session::commit() (Session.cpp:42)
==61784== by 0x407899: ioserver::qpid::QpidMessageSource::commitMessages()
(QpidMessageSource.cc:101)
==61784== by 0x40BEA9: ThreadedQpidMessageSource::threadFunc()
(ThreadedQpidMessageSource.cc:68)
==61784== by 0x40D646: boost::_mfi::mf0<void,
ThreadedQpidMessageSource>::operator()(ThreadedQpidMessageSource*) const
(mem_fn_template.hpp:49)
==61784== by 0x40D5B5: void
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*>
>::operator()<boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>&, boost::_bi::list0&, int) (bind.hpp:246)
==61784== by 0x40D562: boost::_bi::bind_t<void, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> >
>::operator()() (bind_template.hpp:20)
==61784== by 0x40D521: boost::detail::thread_data<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> > > >::run()
(thread.hpp:59)
==61784== by 0x4F9AD46: thread_proxy (in /usr/lib64/libboost_thread-mt.so.5)
==61784== by 0x3DBF407AA0: start_thread (in /lib64/libpthread-2.12.so)
==61784== by 0x3DBF0E8BCC: clone (in /lib64/libc-2.12.so)
==61784== Address 0x6752300 is 512 bytes inside a block of size 520 free'd
==61784== at 0x4A06016: operator delete(void*) (vg_replace_malloc.c:480)
==61784== by 0x4C9BF23: qpid::messaging::amqp::Transaction::~Transaction()
(Transaction.h:43)
==61784== by 0x4C73C1F: void
boost::checked_delete<qpid::messaging::amqp::Transaction>(qpid::messaging::amqp::Transaction*)
(checked_delete.hpp:34)
==61784== by 0x4C75D0F:
boost::detail::sp_counted_impl_p<qpid::messaging::amqp::Transaction>::dispose()
(sp_counted_impl.hpp:78)
==61784== by 0x4C4811D: boost::detail::sp_counted_base::release()
(sp_counted_base_gcc_x86.hpp:146)
==61784== by 0x4C48196: boost::detail::shared_count::~shared_count()
(shared_count.hpp:467)
==61784== by 0x4C75BC5:
boost::shared_ptr<qpid::messaging::amqp::Transaction>::~shared_ptr()
(shared_ptr.hpp:331)
==61784== by 0x4C9954C:
qpid::messaging::amqp::Transaction::discharge(boost::function<void
(boost::shared_ptr<qpid::messaging::amqp::SessionContext>,
boost::shared_ptr<qpid::messaging::amqp::SenderContext>,
qpid::messaging::Message const&, bool,
qpid::messaging::amqp::SenderContext::Delivery**)>,
boost::shared_ptr<qpid::messaging::amqp::SessionContext> const&, bool)
(Transaction.cpp:86)
==61784== by 0x4C689A2:
qpid::messaging::amqp::ConnectionContext::discharge(boost::shared_ptr<qpid::messaging::amqp::SessionContext>,
bool) (ConnectionContext.cpp:1350)
==61784== by 0x4C68B15:
qpid::messaging::amqp::ConnectionContext::commit(boost::shared_ptr<qpid::messaging::amqp::SessionContext>)
(ConnectionContext.cpp:1356)
==61784== by 0x4C93446: qpid::messaging::amqp::SessionHandle::commit()
(SessionHandle.cpp:45)
==61784== by 0x4CECFC0: qpid::messaging::Session::commit() (Session.cpp:42)
==61784== by 0x407899: ioserver::qpid::QpidMessageSource::commitMessages()
(QpidMessageSource.cc:101)
==61784== by 0x40BEA9: ThreadedQpidMessageSource::threadFunc()
(ThreadedQpidMessageSource.cc:68)
==61784== by 0x40D646: boost::_mfi::mf0<void,
ThreadedQpidMessageSource>::operator()(ThreadedQpidMessageSource*) const
(mem_fn_template.hpp:49)
==61784== by 0x40D5B5: void
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*>
>::operator()<boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>&, boost::_bi::list0&, int) (bind.hpp:246)
==61784== by 0x40D562: boost::_bi::bind_t<void, boost::_mfi::mf0<void,
ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> >
>::operator()() (bind_template.hpp:20)
==61784== by 0x40D521: boost::detail::thread_data<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, ThreadedQpidMessageSource>,
boost::_bi::list1<boost::_bi::value<ThreadedQpidMessageSource*> > > >::run()
(thread.hpp:59)
==61784== by 0x4F9AD46: thread_proxy (in /usr/lib64/libboost_thread-mt.so.5)
==61784== by 0x3DBF407AA0: start_thread (in /lib64/libpthread-2.12.so)
==61784== by 0x3DBF0E8BCC: clone (in /lib64/libc-2.12.so)
==61784==
{noformat}
This one is really weird. Some code (called from
{{ConnectionContext::discharge}}):
{noformat}
void Transaction::discharge(SendFunction send, const SessionPtr& session, bool
fail) {
error.raise();
committing = !fail;
try {
// Send a discharge message to the remote coordinator.
Variant::List dischargeList;
dischargeList.push_back(Variant(id));
dischargeList.push_back(Variant(fail));
Variant discharge(dischargeList);
discharge.setDescriptor(qpid::amqp::transaction::DISCHARGE_CODE);
SenderContext::Delivery* delivery = 0;
send(session, shared_from_this(), Message(discharge), true, &delivery);
if (!delivery->accepted())
throw TransactionAborted(delivery->error());
committing = false;
}
catch(const TransactionError&) {
throw;
}
catch(const Exception& e) {
committing = false;
throw TransactionAborted(e.what());
}
}
{noformat}
The {{send}} calls throws an exception that is caught by the second catch
block. At that point the {{Transaction}} is deleted.
It looks like the shared pointer created by {{shared_from_this()}} deletes
{{this}}} when the exception is thrown.
This is weird since the transaction object is owned by the session object.
Maybe the session object is deleted as well, but I see no trace of that in the
code.
I have only been able to reproduce this error once, making investigation really
hard.
> qpid::messaging::ConnectionContext crash after network disconnect (with patch)
> ------------------------------------------------------------------------------
>
> Key: QPID-8056
> URL: https://issues.apache.org/jira/browse/QPID-8056
> Project: Qpid
> Issue Type: Bug
> Components: C++ Client
> Affects Versions: qpid-cpp-1.36.0
> Environment: RedHat Enterprise Linux 6
> Reporter: Håkan Johansson
> Labels: crash
> Fix For: qpid-cpp-1.38.0
>
> Attachments: connection_context.diff, valgrind.txt
>
>
> When doing HA testing we found that our application often crashed inside the
> Qpid Messaging library.
> Our test:
> * One ActiveMQ broker.
> * Two proxies connecting to the AMQP port on the broker. At the start, only
> one of the proxies are running.
> * Test program configured to use failover between the two proxies. Protocol
> is "amqp1.0". It reads messages in a loop using a transactional session. On
> error it closes the connection and opens a new.
> * Three queues are read from in parallel, each reader using its own
> connection in a thread. Nothing is shared between the threads in the client
> code.
> * Send some messages and let the test program process them.
> * Stop proxy1 and start proxy2.
> * Send some more messages and let the test program process them.
> * Stop proxy2 and start proxy1.
> * And so on...
> After a couple of switches the test program crashes, but not always. It's a
> timing thing.
> A typical error message that we see before the crash:
> {noformat}
> Exception when trying to close the qpid connection: Transaction outcome
> unknown: transport failure
> {noformat}
> The reason for the crash is that the poller thread is still active when the
> connection is being deleted. The destructor of the
> {{qpid::messaging::ConnectionContext}} class deletes the {{TcpTransport}}
> instance at the same time as, or right before, the poller thread is calling a
> callback on it ({{qpid::messaging::amqp::TcpTransport::disconnected}}).
> I have attached a patch to solve the issue, at least for this use case.
> I cannot test this on {{1.37.0}} as I cannot build that version on RHEL6 as
> it uses Python 2.6 which is no longer supported in {{1.37.0}}. The code in
> question is identical in {{1.36.0}} and {{1.37.0}} though.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]