[ 
https://issues.apache.org/jira/browse/AMQ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Pavlovich updated AMQ-2317:
--------------------------------
    Labels: close-pending  (was: )

> Duplicate messages with transacted persistent messages during JDBC 
> Master/Slave failover
> ----------------------------------------------------------------------------------------
>
>                 Key: AMQ-2317
>                 URL: https://issues.apache.org/jira/browse/AMQ-2317
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.3.0
>         Environment: OS: MacOS X  10.5.7 MacBook Core 2 Duo 2 Ghz
> DBMS: MySQL 5.0.83 (through macports), SQLServer 2005 (in VMWare), other 
> suspected but not thouroughly tested (including HSQL)
> All observations are against trunk: rev 790957 (2009-07-03 23:07:04 +0700 
> (Fri, 03 Jul 2009)) (fuse progress 5.3.0.3 and ActiveMQ 5.2.0 seem to have 
> the same problem though)
>            Reporter: Daniel Mueller
>            Priority: Critical
>              Labels: close-pending
>         Attachments: FailoverTransactionalTest.patch
>
>
> There is a race condition somewhere in the transaction/replay code involving 
> failovers of JDBC only Master/Slave configurations.
> Observed problems:
> If messages are sent to a master broker in one transaction, and during the 
> time of the transaction the master fails over to the slave, then the messages 
> seem to be replayed twice (both database holds duplicates (see query at the 
> end) and the broker answer with message count containing duplicates).
> Severity: 
> If the clients are connected to the new master and start consuming, the 
> broker will not deliver dups. The dups will be delivered though, if there is 
> another failover (a common case for system upgrades). It seems like a single 
> consumer will not get duplicates, even if it fails over again to new broker, 
> but if the consumer is restarted, it loses it's state as well, and 
> subsequently gets the duplicates delivered.
> Attached is a testcase that demonstrates the problem. It shows that with a 
> single producer doing commits after each send, it creates on additional 
> message in the broker with a duplicate MSGID_SEQ. If everything is committed 
> in one transaction, then every single message in the transaction is 
> duplicated (and not only the ones before the failover occurred).
> The testcase uses an external MySQL instance though, and needs the DBCP and 
> the MySQL JDBC connector on the classpath (the pom is patched in the attached 
> file to resolve that automatically).
> Out of the 6 tests, the following almost always fail on my machine:
> testProducer_MasterFailoverByShutdown_AtRandomTimes_CommitPerMessage  
> (expected <6000>, but was <6001>)
> testProducer_MasterFailoverByShutdown_AtRandomTimes_OneCommit  (expected 
> <6000>, but was <12000>)
> Rarely (3-5% of the cases) this one also fails:
> testProducer_MasterFailoverByShutdown_SingleMsgCommit_AfterCommit  (expected 
> <500>, but was <501>)
> Other observations made:
> 1) The problem seems to be a race condition because while trying to find the 
> cause through debugging, the problem disapeared when setting a break point in 
> TransactionInfo.visit(line:100). The race condition is met on my machine 
> (specs above) basically all the time without interaction (from maven, on the 
> shell with a build, inside eclipse debugged and normal).
> 2) It seems that TransactionBroker.commitTransaction(line:100) is called once 
> with duplicated synchronizations (2x size). On the other hand 
> MemoryTransactionStore$Tx(line:109) is called twice with the correct amount 
> first, and later a doubled amount.
> 3) The problem is not reproducible with Kaha, the problem is related to JDBC.
> 4) It might be possible to have the testcase fail reliably with one of 
> Derby/HSQL/H2, but I didn't investigate.
> 5) The testcase is not exactly very pretty, but it does show the problem ;)
> 6) The attached testcase is a patch against activemq-core.
> 7) The tests can be executed directly (in bash) with:
> env MAVEN_OPTS="$MAVEN_OPTS -Xmx800M" mvn 
> -Dtest=org.apache.activemq.transport.failover.FailoverTransactionalTest test
> 8) For MySQL the following should work: 
> SELECT 
>       MSGID_PROD
>      ,MSGID_SEQ
>   FROM activemq_msgs
> GROUP BY MSGID_PROD,MSGID_SEQ
> HAVING ( COUNT(MSGID_SEQ) > 1 );
> 9) if you need the my.cnf for the database, I can attach that as well.
> 10) The tables are correctly created as InnoDB
> I think that's it...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to