Alan Conway created QPID-6278:
---------------------------------

             Summary:  HA broker abort in TXN soak test
                 Key: QPID-6278
                 URL: https://issues.apache.org/jira/browse/QPID-6278
             Project: Qpid
          Issue Type: Bug
          Components: C++ Clustering
    Affects Versions: 0.30
            Reporter: Alan Conway
            Assignee: Alan Conway


see also https://bugzilla.redhat.com/show_bug.cgi?id=1145386

I have a repeatable crash in primary HA broker, by doing a soak test on TXNs.


This is with trunk code new as of an hour ago:
  
URL: https://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1626916
Node Kind: directory
Schedule: normal
Last Changed Author: aconway
Last Changed Rev: 1626887


I did a standard build, first of proton and then of qpidd -- except that I had 
them install themselves in /usr instead of /usr/local .




Here are the scripts I use.


script 1
starting the HA cluster
{
#! /bin/bash


export PYTHONPATH=/home/mick/trunk/qpid/python

QPIDD=/usr/sbin/qpidd
QPID_HA=/home/mick/trunk/qpid/tools/src/py/qpid-ha


# This is where I put the log files.
rm -rf /tmp/mick
mkdir /tmp/mick



for N in 1 2 3
do
  $QPIDD                                                          \
    --auth=no                                                     \
    --no-module-dir                                               \
    --load-module /usr/lib64/qpid/daemon/ha.so                    \
    --log-enable debug+:ha::                                      \
    --ha-cluster=yes                                              \
    --ha-replicate=all                                            \
    --ha-brokers-url=localhost:5801,localhost:5802,localhost:5803 \
    --ha-public-url=localhost:5801,localhost:5802,localhost:5803  \
    -p 580$N                                                      \
    --data-dir /tmp/mick/data_$N                                  \
    --log-to-file /tmp/mick/qpidd_$N.log                          \
    --mgmt-enable=yes                                             \
    -d
  echo "============================================"
  echo "started broker $N from $QPIDD"
  echo "============================================"
  sleep 1
done


# Now promote one broker to primary.
echo "Promoting broker 5801..."
${QPID_HA} promote --cluster-manager -b localhost:5801
echo "done."

}


script 2
create the tx queues, and load the first one with 1000 messages
{
  #! /bin/bash

TXTEST2=/usr/libexec/qpid/tests/qpid-txtest2

echo "Loading data to queues..."
${TXTEST2} --init=yes --transfer=no --check=no                           \
           --port 5801                                                   \
           --total-messages 1000 --connection-options '{reconnect:true}' \
           --messages-per-tx 10 --tx-count 100                           \
           --queue-base-name=tx --fetch-timeout=1
}



script 3
now beat the heck out of the TXN code
{
  #! /bin/bash

TXTEST2=/usr/libexec/qpid/tests/qpid-txtest2


echo "starting transfers..."
${TXTEST2} --init=no --transfer=yes --check=no                           \
           --port 5801                                                   \
           --total-messages 5000000 --connection-options '{reconnect:true}' \
           --messages-per-tx 10 --tx-count 500000                          \
           --queue-base-name=tx --fetch-timeout=1

}





I do *not* do any failovers.  Just let that TXN-exercising script run until the 
primary broker dies.  

It took quite a while.  In my most recent test, I got through something like 
300,000 transactions (10 messages each) before the broker became brokest.

I then tried the same test on a standalone broker and it got all the way 
through.




Here is the traceback:

#0  0x0000003186a328a5 in raise () from /lib64/libc.so.6
#1  0x0000003186a34085 in abort () from /lib64/libc.so.6
#2  0x0000003186a2ba1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003186a2bae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f6bb72b4f16 in operator-> (this=0x7f6b48378060, sync=<value 
optimized out>)
    at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:166
#5  qpid::broker::SessionState::IncompleteIngressMsgXfer::completed 
(this=0x7f6b48378060, 
    sync=<value optimized out>) at 
/home/mick/trunk/qpid/cpp/src/qpid/broker/SessionState.cpp:409
#6  0x00007f6bb726d670 in invokeCallback (this=<value optimized out>, 
msg=<value optimized out>)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h:117
#7  finishCompleter (this=<value optimized out>, msg=<value optimized out>)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h:158
#8  enqueueComplete (this=<value optimized out>, msg=<value optimized out>)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/PersistableMessage.h:76
#9  qpid::broker::NullMessageStore::enqueue (this=<value optimized out>, 
msg=<value optimized out>)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/NullMessageStore.cpp:97
#10 0x00007f6bb71f4578 in qpid::broker::Queue::enqueue (this=0x7f6b4801ef90, 
ctxt=0x7f6b6821bdf0, msg=...)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/Queue.cpp:910
#11 0x00007f6bb71f46db in qpid::broker::Queue::TxPublish::prepare 
(this=0x7f6b48435c70, 
    ctxt=<value optimized out>) at 
/home/mick/trunk/qpid/cpp/src/qpid/broker/Queue.cpp:159
#12 0x00007f6bb72c8b3d in qpid::broker::TxBuffer::prepare (this=0x7f6b68549120, 
ctxt=0x7f6b6821bdf0)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/TxBuffer.cpp:42
#13 0x00007f6bb72c9dbe in qpid::broker::TxBuffer::startCommit 
(this=0x7f6b68549120, 
    store=<value optimized out>) at 
/home/mick/trunk/qpid/cpp/src/qpid/broker/TxBuffer.cpp:73
#14 0x00007f6bb7298c74 in qpid::broker::SemanticState::commit 
(this=0x7f6b6c567fb8, store=0x2460440)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/SemanticState.cpp:198
#15 0x00007f6bb6c5886e in 
invoke<qpid::framing::AMQP_ServerOperations::TxHandler> (this=0x7f6b8bffd4a0, 
    body=<value optimized out>) at 
/home/mick/trunk/qpid/cpp/build/src/qpid/framing/TxCommitBody.h:53
#16 qpid::framing::AMQP_ServerOperations::TxHandler::Invoker::visit 
(this=0x7f6b8bffd4a0, 
    body=<value optimized out>) at 
/home/mick/trunk/qpid/cpp/build/src/qpid/framing/ServerInvoker.cpp:582
#17 0x00007f6bb6c5cc41 in qpid::framing::AMQP_ServerOperations::Invoker::visit 
(this=0x7f6b8bffd670, body=...)
    at /home/mick/trunk/qpid/cpp/build/src/qpid/framing/ServerInvoker.cpp:278
#18 0x00007f6bb72b504c in invoke<qpid::broker::SessionAdapter> (this=<value 
optimized out>, 
    method=0x7f6b68130790) at 
/home/mick/trunk/qpid/cpp/src/qpid/framing/Invoker.h:67
#19 qpid::broker::SessionState::handleCommand (this=<value optimized out>, 
method=0x7f6b68130790)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/SessionState.cpp:198
#20 0x00007f6bb72b6235 in qpid::broker::SessionState::handleIn 
(this=0x7f6b6c567df0, frame=...)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/SessionState.cpp:295
#21 0x00007f6bb6cd5291 in qpid::amqp_0_10::SessionHandler::handleIn 
(this=0x7f6b6c4e2120, f=...)
    at /home/mick/trunk/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:93
#22 0x00007f6bb722692b in operator() (this=0x7f6b500ab840, frame=...)
    at /home/mick/trunk/qpid/cpp/src/qpid/framing/Handler.h:39
#23 qpid::broker::ConnectionHandler::handle (this=0x7f6b500ab840, frame=...)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/ConnectionHandler.cpp:94
#24 0x00007f6bb7221ba8 in qpid::broker::amqp_0_10::Connection::received 
(this=0x7f6b500ab660, frame=...)
    at /home/mick/trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.cpp:198
#25 0x00007f6bb71aea4d in qpid::amqp_0_10::Connection::decode 
(this=0x7f6b5005d770, 
    buffer=<value optimized out>, size=<value optimized out>)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to