[jira] [Commented] (ARTEMIS-4276) Message Group does not replicate properly during failover

Justin Bertram (Jira) Wed, 17 May 2023 09:27:10 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723532#comment-17723532
 ]


Justin Bertram commented on ARTEMIS-4276:
-----------------------------------------

bq. We are using virtual topics for that.

Now that you're on ActiveMQ Artemis you can use JMS 2's [shared topic 
consumer|https://docs.oracle.com/javaee/7/api/javax/jms/Session.html#createSharedConsumer-javax.jms.Topic-java.lang.String-].

bq. By using grouping we ensure that the same consumer will process all 
versions of the same transaction. 

As noted previously, grouping *doesn't* ensure that the *same* consumer will 
process all the messages in the group. It only guarantees that _one consumer at 
a time_ will process the messages in the group and therefore the messages will 
be processed in order.

bq. To handle a message duplication all our consumer's listeners are using a 
LRU (last recently used) cache of the already processed messages.

A local, volatile LRU cache is not enough to mitigate duplicate messages. Keep 
in mind that even _if_ the broker maintained the consumer-group relationship 
during broker failover the consumer itself can still fail at any point (e.g. 
JVM crash, hardware failure, network glitch, etc.) at which time a new consumer 
for the group will be chosen which may lead to processing duplicate messages 
since the _new_ consumer won't have the already-processed messages in its LRU 
cache. 

In short, guaranteeing that the same consumer gets the same group on broker 
failover does not adequately deal with the threat of duplicate messages.

Generally speaking, distributing state like this (i.e. in the consumer's LRU 
cache) is not a good idea because it typically leads to consistency issues. 
State should be concentrated in the non-distributed components (i.e. message 
broker & database).

bq. Is the grouping cached used by the broker distributed or persisted during 
te failover switch?

No. The consumer-group relationship is not designed to survive fail-over for 
the reasons I outlined previously.

bq. Is there any setup to circumvent this?

Yes. Simply put, your consumers need to be 
[_idempotent_|https://en.wikipedia.org/wiki/Idempotence]. In your situation I 
can think of a few ways to do this.

Often when folks needs keep data between two resources like a message broker 
and a database in sync they use an [XA 
transaction|https://en.wikipedia.org/wiki/X/Open_XA]. In Java this is 
implemented via [JTA|https://github.com/jakartaee/transactions]. This is very 
common in Java especially when an application is running in a Java EE 
application server because MDBs are transactional by default and any other XA 
resource used in the course of processing a JMS message in an MDB is 
automatically enlisted into the transaction meaning that all the work is 
_atomically_ (i.e. either it all succeeds or it all fails). By using a JTA 
transaction between the JMS and JDBC resources you ensure that if the JDBC 
insert succeeds but the JMS message acknowledgement fails then everything will 
be rolled back so that neither the JMS message is consumed nor the data is 
actually inserted into the JDBC database. When the message is consumed again 
later there will be no duplicate entries in the database.

Another way to deal with this would be to set up a primary key on the table (or 
tables) where you're inserting data. This would prevent duplicates records from 
being inserted into the database when consumers receive duplicate messages. The 
primary key could be a combination of the {{JMSXGroupID}} and the version (e.g. 
{{EXT_BOND_ID_4}}). Therefore, in the scenario you outlined in your comment 
when *LDR1* receives *EXT_BOND_ID* with version *4* it will process it and when 
it tries to insert it into the database it won't actually be able to.

> Message Group does not replicate properly during failover
> ---------------------------------------------------------
>
>                 Key: ARTEMIS-4276
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4276
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.28.0
>            Reporter: Liviu Citu
>            Priority: Major
>
> Hi,
> We are currently migrating our software from Classic to Artemis and we plan 
> to use failover functionality.
> We were using message group functionality by setting *JMSXGroupID* and this 
> was working as expected. However after failover switch I noticed that 
> messages are sent to wrong consumers.
> Our gateway/interface application is actually a collection of servers:
>  * gateway adapter server: receives messages from an external systems and 
> puts them on a specific/virtual topic
>  * gateway loader server (can be balanced): picks up the messages from the 
> topic and do processing
>  * gateway fail queue: monitors all messages that failed processing and has a 
> functionality of resubmitting the message (users will correct the processing 
> errors and then resubmit transaction)
> *JMSXGroupID* is used to ensure that during message resubmit the same 
> consumer/loader is processing the message as it was originally processed.
> However, if the message resubmit is happening during failover switch we have 
> noticed that the message is not sent to the right consumer as it should. 
> Basically the first available consumer is used which is not what we want.
> I have searched for configuration changes but couldn't find any relevant 
> information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARTEMIS-4276) Message Group does not replicate properly during failover

Reply via email to