[
https://issues.apache.org/jira/browse/AMQCPP-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035954#comment-18035954
]
Justin Bertram commented on AMQCPP-760:
---------------------------------------
[~nautilus], I'm thoroughly confused by your recent activity on this issue.
You've added two comments that look like brand new issues unrelated to this
one, and most recently you marked this issue as resolved/fixed in 3.9.6 even
though no fix has been committed in the [activemq-cpp
repository|https://github.com/apache/activemq-cpp]. Can you clarify your
comments and why you marked this as resolved/fixed?
> ActiveMQ-CPP 3.9.5 Failover Timeout Issue with Advisory Topic in PROD
> ---------------------------------------------------------------------
>
> Key: AMQCPP-760
> URL: https://issues.apache.org/jira/browse/AMQCPP-760
> Project: ActiveMQ C++ Client
> Issue Type: Bug
> Components: Transports
> Environment: * {*}ActiveMQ-CPP Version{*}: 3.9.5
> * {*}ActiveMQ Broker Version{*}: 5.17.3
> * {*}Broker Hosts{*}: cca-prdappqua01.icc.corp:20801,
> cca-prdappqua02.icc.corp:20801
> * {*}Client Host{*}: cca-prdappbta05.icc.corp
> * {*}OS{*}: RHEL 7.9
> * {*}AMQ_URL{*}:
> "failover:(tcp://cca-prdappqua01.icc.corp:20801,tcp://cca-prdappqua02.icc.corp:20801)?maxReconnectAttempts=10&initialReconnectDelay=1000&timeout=30000&randomize=false"
> * {*}Demo Application{*}: C++ program (demo_sys_amq_modules.exe) subscribing
> to ActiveMQ.Advisory.Consumer.Queue.101 with a 30-second receive timeout.
> (attached)
> Reporter: Oren
> Priority: Major
> Fix For: 3.9.6
>
> Attachments: activemq.log, activemq.xml, demo_sys_amq_modules.cpp
>
>
> h2. Summary
> We are experiencing a timeout issue in our PROD environment when using the
> ActiveMQ-CPP 3.9.5 client with a {{failover:}} URI, specifically when
> subscribing to the advisory topic
> {{{}ActiveMQ.Advisory.Consumer.Queue.101{}}}. The issue persists despite
> upgrading from 3.9.3 (where AMQCPP-610 was suspected) and correcting a typo
> in the {{{}AMQ_URL{}}}. The same code works in non-PROD environments
> (TEST/BETA) and in PROD with a direct {{tcp://}} URI.
> h2. Issue Description
> The demo times out in PROD when using the failover: protocol to receive
> advisory messages for Queue.101. Key observations:
> * Works in TEST/BETA with identical broker settings and failover: URI
> (maxReconnectAttempts=10).
> {noformat}
> [BETA l_onissa@cca-betappbta01 cca_domain]$
> AMQ_URL="failover:([tcp://cca-betappqua01.icc.corp:20801])"
> [BETA l_onissa@cca-betappbta01 cca_domain]$ demo_sys_amq_modules.exe 101
> ActiveMQ-CPP initialized. Connecting to
> failover:([tcp://cca-betappqua01.icc.corp:20801])
> Waiting for advisory messages on: ActiveMQ.Advisory.Consumer.Queue.101 ...
> Received advisory message of type: Advisory^{noformat}
> * Works in PROD with direct [tcp://cca-prdappqua01.icc.corp:20801].
> {noformat}
> *[PROD l_onissa@cca-prdappbta05 cca_domain]$
> AMQ_URL="[tcp://cca-prdappqua01.icc.corp:20801]"*
> *[PROD l_onissa@cca-prdappbta05 cca_domain]$ demo_sys_amq_modules.exe 101*
> ActiveMQ-CPP initialized. Connecting to [tcp://cca-prdappqua01.icc.corp:20801]
> Waiting for advisory messages on: ActiveMQ.Advisory.Consumer.Queue.101 ...
> Received advisory message of type: Advisory{noformat}
> * Fails in PROD with failover: even after upgrading to 3.9.5 (fixing
> [AMQCPP-610]) and correcting a typo in the original AMQ_URL
> (maxReconnectAttemps=0, which defaulted to maxReconnectAttempts=-1).
> {noformat}
> *[PROD l_onissa@cca-prdappbta05 cca_domain]$
> AMQ_URL="failover:([tcp://cca-prdappqua01.icc.corp:20801])"*
> *[PROD l_onissa@cca-prdappbta05 cca_domain]$ demo_sys_amq_modules.exe 101*
> ActiveMQ-CPP initialized. Connecting to
> failover:([tcp://cca-prdappqua01.icc.corp:20801])
> Waiting for advisory messages on: ActiveMQ.Advisory.Consumer.Queue.101 ...
> No advisory message received (timeout).{noformat}
> * Java-based clients work reliably in PROD with failover:.
> h2. Relevant Broker Log Entries
> From activemq.log on cca-prdappqua01 (ActiveMQ 5.17.3) activemq.log file
> attached:
> {noformat}
> 2025-05-19 21:21:23,839 | WARN | TopicSubscription:
> consumer=ID:cca-prdappbta07.icc.corp-14259-1747678799256-0:0:-1:1, ...
> dispatched=1000, delivered=0, matched=1001, ... has twice its prefetch limit
> pending, without an ack; it appears to be slow: [tcp://10.222.12.83:31247]
> 2025-05-19 17:31:22,146 | WARN | Transport Connection to:
> [tcp://10.222.12.74:25787] failed: Broken pipe (Write failed)
> 2025-05-24 20:25:15,567 | WARN | Transport Connection to:
> [tcp://10.222.12.84:33681] failed: Cannot send, channel has already
> failed{noformat}
> * Slow topic consumers on cca-prdappbta07, cca-prdappbta02, cca-prdappbta08
> suggest broker resource contention.
> * Failed connections indicate potential network instability affecting
> failover clients.
> h2. Suspected Causes
> # {*}Broker Overload{*}: Slow consumers may delay advisory message delivery,
> impacting failover clients.
> # {*}Network Issues{*}: Failed connections suggest network instability,
> disrupting failover retries or subscriptions.
> # {*}Advisory Message Absence{*}: Possible lack of consumer activity on
> Queue.101 in PROD.
> # {*}Residual Effects{*}: Earlier 3.9.3 clients with maxReconnectAttemps=0
> (defaulting to -1) may have stressed the broker, affecting current 3.9.5
> clients.
> # {*}Failover Transport{*}: Possible edge case in 3.9.5 failover handling
> for advisory topics.
> h2. Steps Taken
> * Upgraded from ActiveMQ-CPP 3.9.3 to 3.9.5 to address AMQCPP-610.
> * Corrected {{AMQ_URL}} typo (maxReconnectAttemps to
> maxReconnectAttempts=10).
> * Tested with direct {{tcp://}} URI (works) and {{failover:}} (times out).
> * Verified broker connectivity via telnet (successful).
> * Added consumer for {{Queue.101}} to trigger advisory messages (no success
> yet).
> * Enabled debug logging in the demo (logs pending).
> h2. Request for Assistance
> * Is there a known issue in ActiveMQ-CPP 3.9.5 with failover and advisory
> topic subscriptions under high broker load or network instability?
> * Could slow consumers (as seen in the log) affect failover client
> subscriptions? Any recommended configurations to mitigate this?
> * Are there additional failover parameters or patches in 3.9.5 to improve
> stability for advisory topics?
> * Suggestions for diagnosing broker-side issues (e.g., clearing stale
> connections without restart)?
> h2. Additional Information
> * {*}Broker Config{*}: advisorySupport is true (default for 5.17.3 see
> attached activemq.xml).
> * {*}Compile Command{*}:
> {noformat}
> g++ -O2 -finline-functions -D_XOPEN_SOURCE=600 -m64 -fPIC -Wall \-L/usr/lib64
> -lactivemq-cpp \-isystem /usr/include/activemq-cpp-3.9.5 \-std=c++11
> \src/demo_sys_amq_modules.cpp -o target/bin/demo_sys_amq_modules.exe{noformat}
> * {*}Next Steps Planned{*}: Test with non-advisory topic, increase receive
> timeout to 60s, check activemq.xml for advisory settings, and consider
> upgrading to 3.10.0.
> Please advise on next steps or known issues. Thank you for your support!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact