[
https://issues.apache.org/activemq/browse/AMQ-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_39780
]
Mario Siegenthaler commented on AMQ-1350:
-----------------------------------------
We've also expired this behavior on a 4.1.1 master/slave configuration using
SQL-Server. The master has somehow lost the lock during a database maintance
operation (we suspect some DB-admin killed the lock in order to be able to
backup the database) and we ended up with two masters.
> JDBC master/slave does not work properly with datasources that can reconnect
> to the database
> --------------------------------------------------------------------------------------------
>
> Key: AMQ-1350
> URL: https://issues.apache.org/activemq/browse/AMQ-1350
> Project: ActiveMQ
> Issue Type: Bug
> Components: Message Store
> Affects Versions: 5.x
> Environment: Linux x86_64, Sun jdk 1.6, Postgresql 8.2.4, c3p0 or
> other pooling datasources
> Reporter: Eric Anderson
>
> This problem involves the JDBC master/slave configuration when the db server
> is restarted, or when the brokers lose their JDBC connections for whatever
> reason temporarily, and when a datasource is in use that can re-establish
> stale connections prior to providing them to the broker.
> The problem lies with the JDBC locking strategy used to determine which
> broker is master and which are slaves. Let's say there are two brokers, a
> master and a slave, and they've successfully initialized. If you restart the
> database server, the slave will throw an exception because it's just caught
> an exception while blocked attempting to get the lock. The slave will then
> *retry* the process of getting a lock over and over again. Now, since the
> database was bounced, the *master* will have lost its lock in the
> activemq_lock table. However, with the current 4.x-5.x code, it will never
> "know" that it has lost the lock. There is no mechanism to check the lock
> state. So it will continue to think that it is the master and will leave all
> of its network connectors active.
> When the slave tries to acquire the lock now, if the datasource has restored
> connections to the now-restarted database server, it will succeed. The slave
> will come up as master, and there will be two masters active concurrently.
> Both masters should at this point be fully-functional, as both will have
> datasources that can talk to the database server once again.
> I have tested this with c3p0 and verified that I get two masters after
> bouncing the database server. If, at that point, I kill the original slave
> broker, the original master still appears to be functioning normally. If,
> instead, I kill the original master broker, messages are still delivered via
> the original slave (now co-master). It does not seem to matter which broker
> the clients connect to - both work.
> There is no workaround that I can think of that would function correctly
> across multiple database bounces. If a slave's datasource does not have the
> functionality to do database reconnects, then, after the first database
> server restart, it will never be able to establish a connection to the db
> server in order to attempt to acquire the lock. This, combined with the fact
> that the JDBC master/slave topology does not have any favored brokers -- all
> can be masters or slaves depending on start-up order and the failures that
> have occurred over time, means that a datasource that can do reconnects is
> required on all brokers. Therefore it would seem that in the JDBC
> masters/slave topology a database restart or temporary loss of database
> connectivity will always result in multiple masters.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.