[
https://issues.apache.org/jira/browse/QPID-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204363#comment-15204363
]
Alan Conway commented on QPID-7149:
-----------------------------------
That narrows it down considerably to the code that runs on the primary even
when no backups are present. The primary does do some tracking of queue state
even when there are no backups, in order to be ready if a backup does connect.
That is probably where the trouble lies.
> [HA] active HA broker memory leak when ring queue discards overflow messages
> ----------------------------------------------------------------------------
>
> Key: QPID-7149
> URL: https://issues.apache.org/jira/browse/QPID-7149
> Project: Qpid
> Issue Type: Bug
> Components: C++ Broker
> Environment: RHEL6
> qpid trunk svn rev. 1735384
> - issue seen in very old releases (since active-passive HA cluster initial
> implementation, most probably)
> libstdc++-devel-4.4.7-4.el6.x86_64
> gcc-c++-4.4.7-4.el6.x86_64
> libgcc-4.4.7-4.el6.x86_64
> libstdc++-4.4.7-4.el6.x86_64
> gcc-4.4.7-4.el6.x86_64
> Reporter: Pavel Moravec
>
> There is a memory leak on active HA broker, triggered most probably by
> purging overflow message from a ring queue. Basic scenario is to setup HA
> cluster, promote to primary and feed forever a ring queue with messages.
> Detailed scenario:
> 1) Start brokers and promote one to primary:
> {noformat}
> start_broker() {
> port=$1
> shift
> rm -rf _${port}
> mkdir _${port}
> nohup qpidd --load-module=ha.so --port=$port
> --log-to-file=qpidd.$port.log --data-dir=_${port} --auth=no
> --log-to-stderr=no --ha-cluster=yes
> --ha-brokers-url="$(hostname):5672,$(hostname):5673,$(hostname):5674"
> --ha-replicate=all --acl-file=/root/qpidd.acl "$@" > /dev/null 2>&1 &
> sleep 1
> }
> killall qpidd qpid-receive 2> /dev/null
> rm -f qpidd.*.log
> start_broker 5672
> sleep 1
> qpid-ha promote -b $(hostname):5672 --cluster-manager
> sleep 1
> start_broker 5673
> sleep 1
> start_broker 5674
> {noformat}
> 2) Create ring queues and send there messages (it is enough to have 1 queue,
> having more should show the leak faster):
> {noformat}
> for i in $(seq 0 9); do
> qpid-config add queue FromKeyServer_$i --max-queue-size=10000
> --max-queue-count=10 --limit-policy=ring --argument=x-qpid-priorities=10
> done
> while true; do
> for j in $(seq 1 10); do
> for i in $(seq 1 10); do
> for k in $(seq 0 9); do
> qpid-send -a FromKeyServer_$k -m 100
> --send-rate=50 -- priority=$(($((RANDOM))%10)) &
> done
> done
> wait
> while [ $(qpid-stat -q | grep broker-replicator | sed "s/Y//g"
> | awk '{ print $2 }' | sort -n | tail -n1) != "0" ]; do
> sleep 1
> done
> done
> date
> ps aux | grep qpidd | grep "port=5672" | awk -F "--store-dir" '{ print
> $1 }'
> done
> {noformat}
> (the "while [ $(qpid-stat -q | .." cycle is there just to slow down the
> message enqueues to ensure replication federation queues dont have big
> backlog - that would interfere with memory consumpiton observation)
> 3) Run those scripts and monitor memory consumption.
> - without using priority queues and sending messages without priorities, leak
> is evident as well - sometimes smaller, sometimes the same
> - valgrind (on some older versions I tested before more thoroughly) detects
> nothing (neither leaked memory or reachable at shutdown)
> - same leak is evident even with --ha-replicate=none
> - number of backup brokers does not affect the memory leak
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]