[ https://issues.apache.org/jira/browse/AMQ-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501051#comment-14501051 ]
Chad Kelly edited comment on AMQ-5155 at 4/20/15 2:01 AM: ---------------------------------------------------------- REVISION Can someone confirm this bug occurs in their environment? If so, this bug relates to AMQ-5155 and AMQ-4674. I recommend that the ActiveMQ WebSocket example should work as is, regardless of using parameters like transport.hbGracePeriodMultiplier. The bug manifests itself in "/activemq-client/src/main/java/org/apache/activemq/transport/AbstractInactivityMonitor.java" when the timing of onCommand() (generated by the web browser PING or message) crosses over the timing of readCheck(). In other words, the 10 second timing of onCommand() via web browser's PING occurs before the 10 second thread timing of readCheck(), but with a small time drift, the timing of readCheck() starts to occur before onCommand(). At this exact cross over point, readCheck() will occur twice, and this will cause the bug. In the log below, the timing sequence is: 17:05:21,162 | DEBUG | onCommand() 17:05:21,162 | DEBUG | readCheck() 17:05:31,162 | DEBUG | readCheck() 17:05:31,164 | DEBUG | onCommand() Opening Web Socket... Web Socket Opened... >>> CONNECT accept-version:1.1,1.0 heart-beat:10000,10000 login:guesttgg passcode:guest <<< CONNECTED server:ActiveMQ/5.12-SNAPSHOT heart-beat:10000,10000 session:ID:c1-39862-1429234022608-4:2 version:1.1 connected to server ActiveMQ/5.12-SNAPSHOT send PING every 10000ms check PONG every 10000ms connected to Stomp >>> SUBSCRIBE id:sub-0 destination:/queue/test (same behavor for /topic/test) <<< PONG >>> PING ... 395 times this case, but 50-500 times in general <<< PONG >>> PING <<< PONG >>> PING >>> PING >>> PING did not receive server activity for the last 20003ms Whoops! Lost connection to ws://localhost:61614/stomp INFO version of data/activemq.log 2015-04-17 15:49:12,750 | WARN | Transport Connection to: StompSocket_919027643 failed: org.apache.activemq.transport.InactivityIOException: Channel was inactive for too (>10000) long: StompSocket_919027643 | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ InactivityMonitor Worker DEBUG with special coding version of data/activemq.log 2015-04-19 17:05:21,162 | DEBUG | onCommand() | org.apache.activemq.transport.AbstractInactivityMonitor | qtp1493158871-33 2015-04-19 17:05:21,162 | DEBUG | 10000ms elapsed since last read check. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor ReadCheckTimer 2015-04-19 17:05:21,162 | DEBUG | readCheck() | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor ReadCheckTimer 2015-04-19 17:05:31,161 | DEBUG | WriteChecker: 10000ms elapsed since last write check. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor WriteCheckTimer 2015-04-19 17:05:31,161 | DEBUG | Running WriteCheck[StompSocket_1693369093] | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor Worker 2015-04-19 17:05:31,162 | DEBUG | 10000ms elapsed since last read check. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor ReadCheckTimer 2015-04-19 17:05:31,162 | DEBUG | readCheck() | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor ReadCheckTimer 2015-04-19 17:05:31,163 | DEBUG | No message received since last read check for org.apache.activemq.transport.ws.jetty8.StompSocket@64eec305. Throwing InactivityIOException. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor ReadCheckTimer 2015-04-19 17:05:31,164 | DEBUG | Running ReadCheck[StompSocket_1693369093] | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor Worker 2015-04-19 17:05:31,164 | DEBUG | onCommand() | org.apache.activemq.transport.AbstractInactivityMonitor | qtp1493158871-30 was (Author: chadkelly): REVISION Can someone confirm this bug occurs in their environment? If so, this bug relates to AMQ-5155 and AMQ-4674. I recommend that the ActiveMQ WebSocket example should work as is, regardless of using parameters like transport.hbGracePeriodMultiplier. Testing up to 12 continuous hours suggest that Fedora 21 with Chrome 39.0.2171.95 (x64) and Chrome 41.0.2272.118 (x32) works correctly for all test cases. Wireshark (via linux on localhost) shows that WebSocket/ActiveMQ always sends its “\n” (PONG) with a 0.1+ second gap between Chrome sending its “\n” (PING). Fedora 21 with Firefox 34 (x64) and Firefox 37 (x32) ONLY works ~25% of the time. ~75% of the time Firefox FAILS after 50 to 500 PINGs. Wireshark shows correct functionality when the gap between WebSocket/ActiveMQ's “\n” (PONG) and Firefox's “\n” (PING) is larger than +0.1 second. However, when the time gap between the PONG and PING is small, InactivityIOException exception can occur ~75% of the time. For example, in the attached PCAP file titled “firefox_bug_example1.pcapng, frame number 97 and 99 crosses over in PING/PONG timing. Opening Web Socket... Web Socket Opened... >>> CONNECT accept-version:1.1,1.0 heart-beat:10000,10000 login:guesttgg passcode:guest <<< CONNECTED server:ActiveMQ/5.12-SNAPSHOT heart-beat:10000,10000 session:ID:c1-39862-1429234022608-4:2 version:1.1 connected to server ActiveMQ/5.12-SNAPSHOT send PING every 10000ms check PONG every 10000ms connected to Stomp >>> SUBSCRIBE id:sub-0 destination:/queue/test (same behavor for /topic/test) <<< PONG >>> PING ... 395 times this case, but 50-500 times in general <<< PONG >>> PING <<< PONG >>> PING >>> PING >>> PING did not receive server activity for the last 20003ms Whoops! Lost connection to ws://localhost:61614/stomp In data/activemq.log 2015-04-17 15:49:12,750 | WARN | Transport Connection to: StompSocket_919027643 failed: org.apache.activemq.transport.InactivityIOException: Channel was inactive for too (>10000) long: StompSocket_919027643 | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ InactivityMonitor Worker > Heartbeat fails in STOMP over WebSockets > ---------------------------------------- > > Key: AMQ-5155 > URL: https://issues.apache.org/jira/browse/AMQ-5155 > Project: ActiveMQ > Issue Type: Bug > Affects Versions: 5.9.1, 5.10.0 > Reporter: Arjan van den Berg > Assignee: Timothy Bish > Priority: Minor > Fix For: 5.12.0 > > Attachments: AMQ-5155-jetty8.diff, AMQ-5155-jetty9.diff, > firefox_bug_example1.pcapng > > > From AMQ-4740: > I receive the following error after establishing a connection and heartbeat > through stomp.js. This seems to occur after the 'PING' is sent. > ---------- stomp.js output > <<< CONNECTED > heart-beat:10000,10000 > session:ID:localhost.localdomain-45596-1396530920609-2:2 > server:ActiveMQ/5.10-SNAPSHOT > version:1.1 > send PING every 10000ms > check PONG every 10000ms > <<< PONG > >>> PING > did not receive server activity for the last 20005ms > Whoops! Lost connection to ws://172.16.99.73:61614/stomp > --------------------- activemq console --------------- > WARN | Transport Connection to: StompSocket_19548821 failed: > java.io.IOException > Exception in thread "ActiveMQ InactivityMonitor Worker" > java.lang.NullPointerException > at > org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:314) > at > org.apache.activemq.transport.AbstractInactivityMonitor$4.run(AbstractInactivityMonitor.java:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > WARN | Transport Connection to: StompSocket_19548821 failed: > java.io.IOException > Exception in thread "ActiveMQ InactivityMonitor Worker" > java.lang.NullPointerException > at > org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:314) > at > org.apache.activemq.transport.AbstractInactivityMonitor$4.run(AbstractInactivityMonitor.java:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > For me it looks as if the StompInactivityMonitor is delivering its events to > the wrong Transport, i.e. it needs a "narrow()" when setting it up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)