[ 
https://issues.apache.org/jira/browse/QPID-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Moravec updated QPID-4343:
--------------------------------

    Attachment: bz854666.patch

Patch proposal.

Instead of sending one too-huge-to-encode AMQP message from UpdateClient to 
update state of MessageGroupManager, more state updates are sent - one per each 
message group. As a message group consists of few messages only, this approach 
should not hit the original problem any more.

a/src/qpid/cluster/UpdateClient.cpp has to be changed to send potentially more 
updates by one StatefulQueueObserver. 

a/src/qpid/broker/QueueFlowLimit.h changed is a direct consequence of that

MessageGroupManager::getState and MessageGroupManager::setState in fact does 
the same as before but without the "for (GroupMap::const_iterator .." loop done 
from UpdateClient.
                
> cluster initial update stall when a queue has >10k messages with message 
> groups set
> -----------------------------------------------------------------------------------
>
>                 Key: QPID-4343
>                 URL: https://issues.apache.org/jira/browse/QPID-4343
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Pavel Moravec
>              Labels: patch
>         Attachments: bz854666.patch
>
>
> Description of problem:
> Having qpid broker in a cluster and using message groups, an attempt to join 
> a clustered peer causes cluster stall during initial update process, when 
> some queue has >10k messages with message groups set.
> The reason is that updater node sends information about message groups in 
> ClusterConnectionQueueObserverStateBody message (exactly one message per one 
> queue). If some queue has "too much" messages with msg.groups, such 
> ClusterConnectionQueueObserverStateBody message does not fit into one AMQP 
> frame and it is silently(!) dropped by the updater.
> Updatee node then waits for the message while updater node (and consequently 
> whole cluster) waits for updatee to mark itself as ready.
> Version-Release number of selected component (if applicable):
> 0.14-21, almost surely in 0.18
> How reproducible:
> 100%
> Steps to Reproduce:
> 1. Have 2node cluster with 1 node running
> 2. Produce at least 10k messages with message groups to it:
> qpid-send --group-key "GROUP_KEY" -m 10000 -a "groupQ; {create:always, 
> node:{type:queue, x-declare:{ arguments:{'qpid.group_header_key':'GROUP_KEY', 
> 'qpid.shared_msg_group':1 }}}}"
> 3. (re)start 2nd node twice - due to some unknown reason, the first start 
> succeeds while the second does not.
> Actual results:
> New joiner stalls the cluster.
> Expected results:
> No broker joining a cluster can stall the cluster.
> Additional info:
> patch proposed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to