[ https://issues.apache.org/jira/browse/QPID-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pavel Moravec updated QPID-4343: -------------------------------- Attachment: bz854666.patch Patch proposal. Instead of sending one too-huge-to-encode AMQP message from UpdateClient to update state of MessageGroupManager, more state updates are sent - one per each message group. As a message group consists of few messages only, this approach should not hit the original problem any more. a/src/qpid/cluster/UpdateClient.cpp has to be changed to send potentially more updates by one StatefulQueueObserver. a/src/qpid/broker/QueueFlowLimit.h changed is a direct consequence of that MessageGroupManager::getState and MessageGroupManager::setState in fact does the same as before but without the "for (GroupMap::const_iterator .." loop done from UpdateClient. > cluster initial update stall when a queue has >10k messages with message > groups set > ----------------------------------------------------------------------------------- > > Key: QPID-4343 > URL: https://issues.apache.org/jira/browse/QPID-4343 > Project: Qpid > Issue Type: Bug > Components: C++ Broker > Affects Versions: 0.18 > Reporter: Pavel Moravec > Labels: patch > Attachments: bz854666.patch > > > Description of problem: > Having qpid broker in a cluster and using message groups, an attempt to join > a clustered peer causes cluster stall during initial update process, when > some queue has >10k messages with message groups set. > The reason is that updater node sends information about message groups in > ClusterConnectionQueueObserverStateBody message (exactly one message per one > queue). If some queue has "too much" messages with msg.groups, such > ClusterConnectionQueueObserverStateBody message does not fit into one AMQP > frame and it is silently(!) dropped by the updater. > Updatee node then waits for the message while updater node (and consequently > whole cluster) waits for updatee to mark itself as ready. > Version-Release number of selected component (if applicable): > 0.14-21, almost surely in 0.18 > How reproducible: > 100% > Steps to Reproduce: > 1. Have 2node cluster with 1 node running > 2. Produce at least 10k messages with message groups to it: > qpid-send --group-key "GROUP_KEY" -m 10000 -a "groupQ; {create:always, > node:{type:queue, x-declare:{ arguments:{'qpid.group_header_key':'GROUP_KEY', > 'qpid.shared_msg_group':1 }}}}" > 3. (re)start 2nd node twice - due to some unknown reason, the first start > succeeds while the second does not. > Actual results: > New joiner stalls the cluster. > Expected results: > No broker joining a cluster can stall the cluster. > Additional info: > patch proposed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org