[ https://issues.apache.org/jira/browse/AMQ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Pavlovich updated AMQ-2317: -------------------------------- Labels: close-pending (was: ) > Duplicate messages with transacted persistent messages during JDBC > Master/Slave failover > ---------------------------------------------------------------------------------------- > > Key: AMQ-2317 > URL: https://issues.apache.org/jira/browse/AMQ-2317 > Project: ActiveMQ > Issue Type: Bug > Affects Versions: 5.3.0 > Environment: OS: MacOS X 10.5.7 MacBook Core 2 Duo 2 Ghz > DBMS: MySQL 5.0.83 (through macports), SQLServer 2005 (in VMWare), other > suspected but not thouroughly tested (including HSQL) > All observations are against trunk: rev 790957 (2009-07-03 23:07:04 +0700 > (Fri, 03 Jul 2009)) (fuse progress 5.3.0.3 and ActiveMQ 5.2.0 seem to have > the same problem though) > Reporter: Daniel Mueller > Priority: Critical > Labels: close-pending > Attachments: FailoverTransactionalTest.patch > > > There is a race condition somewhere in the transaction/replay code involving > failovers of JDBC only Master/Slave configurations. > Observed problems: > If messages are sent to a master broker in one transaction, and during the > time of the transaction the master fails over to the slave, then the messages > seem to be replayed twice (both database holds duplicates (see query at the > end) and the broker answer with message count containing duplicates). > Severity: > If the clients are connected to the new master and start consuming, the > broker will not deliver dups. The dups will be delivered though, if there is > another failover (a common case for system upgrades). It seems like a single > consumer will not get duplicates, even if it fails over again to new broker, > but if the consumer is restarted, it loses it's state as well, and > subsequently gets the duplicates delivered. > Attached is a testcase that demonstrates the problem. It shows that with a > single producer doing commits after each send, it creates on additional > message in the broker with a duplicate MSGID_SEQ. If everything is committed > in one transaction, then every single message in the transaction is > duplicated (and not only the ones before the failover occurred). > The testcase uses an external MySQL instance though, and needs the DBCP and > the MySQL JDBC connector on the classpath (the pom is patched in the attached > file to resolve that automatically). > Out of the 6 tests, the following almost always fail on my machine: > testProducer_MasterFailoverByShutdown_AtRandomTimes_CommitPerMessage > (expected <6000>, but was <6001>) > testProducer_MasterFailoverByShutdown_AtRandomTimes_OneCommit (expected > <6000>, but was <12000>) > Rarely (3-5% of the cases) this one also fails: > testProducer_MasterFailoverByShutdown_SingleMsgCommit_AfterCommit (expected > <500>, but was <501>) > Other observations made: > 1) The problem seems to be a race condition because while trying to find the > cause through debugging, the problem disapeared when setting a break point in > TransactionInfo.visit(line:100). The race condition is met on my machine > (specs above) basically all the time without interaction (from maven, on the > shell with a build, inside eclipse debugged and normal). > 2) It seems that TransactionBroker.commitTransaction(line:100) is called once > with duplicated synchronizations (2x size). On the other hand > MemoryTransactionStore$Tx(line:109) is called twice with the correct amount > first, and later a doubled amount. > 3) The problem is not reproducible with Kaha, the problem is related to JDBC. > 4) It might be possible to have the testcase fail reliably with one of > Derby/HSQL/H2, but I didn't investigate. > 5) The testcase is not exactly very pretty, but it does show the problem ;) > 6) The attached testcase is a patch against activemq-core. > 7) The tests can be executed directly (in bash) with: > env MAVEN_OPTS="$MAVEN_OPTS -Xmx800M" mvn > -Dtest=org.apache.activemq.transport.failover.FailoverTransactionalTest test > 8) For MySQL the following should work: > SELECT > MSGID_PROD > ,MSGID_SEQ > FROM activemq_msgs > GROUP BY MSGID_PROD,MSGID_SEQ > HAVING ( COUNT(MSGID_SEQ) > 1 ); > 9) if you need the my.cnf for the database, I can attach that as well. > 10) The tables are correctly created as InnoDB > I think that's it... -- This message was sent by Atlassian Jira (v8.3.4#803005)