[jira] [Commented] (ARTEMIS-4276) Message Group does not replicate properly during failover

Justin Bertram (Jira) Thu, 18 May 2023 10:01:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723979#comment-17723979
 ]


Justin Bertram commented on ARTEMIS-4276:
-----------------------------------------

I think you've misunderstood much of what I wrote. Here are some additional 
comments and clarifications...

bq. Our plan during migration from Classic ActiveMQ to Artemis is to modify as 
little as possible the source code to reduce the regression impact.

Fair enough.

bq. Our software is C++ code based and we are using CMS API (ActiveMQ CPP) as a 
client.

The CMS API was originally based on JMS 1.1 and I don't believe it has been 
updated since JMS 2 was released 10 years ago now. Therefore I wouldn't expect 
it to have the methods for creating a shared subscription.

bq. We do not want to process the same message in more than one group (please 
correct me if I am wrong)...

The whole point of sharing a subscription between multiple consumers is to 
ensure that the same message is not processed more than once.

I recommended the move to JMS 2 shared subscriptions assuming you were using a 
JMS client. This would make your code more portable and easier to understand. 
However, since you're using CMS that's obviously out of the question.

bq. ActiveMQ CPP does not have idempotent consumers.

Idempotency is something you, as the application developer, must implement. It 
is not something inherent to the client implementation which you use to 
communicate with the broker (i.e. ActiveMQ CPP).

bq. Indeed the CMS consumer gets restored during failover but the object is not 
recreated so our wrapper is still valid and the cache still stands in this 
context.

The scenario where the primary broker fails and the client switches to the 
backup broker (i.e. "failover") is _not_ what I was describing. The problem I 
was trying to describe is what happens when some kind of failure renders the 
cache invalid. This could happen for any number of reasons, some of which I 
outlined in my previous comment. This is a weakness in the application design 
which will lead to the same problems with duplicate messages as you have when a 
broker failure causes the consumer-group relationship to change.

bq. The synchronization problem between database and JMS Broker is not 
necessary related to failover  or Artemis usage.

Yes, of course. This is a general problem in computing which is why XA 
transactions were invented in the first place. Their use is certainly not 
restricted to databases and message brokers or even to Java. They are used 
across the industry in many many different kinds of resources in many different 
programming languages.

Typically the need for consistency between resources is identified before 
implementation and is part of the fundamental application design. XA is not 
simple and care is needed when fitting all the pieces together.

bq. At the database level we have a protection with primary keys and indeed the 
same transaction cannot be inserted twice.

This seems to flatly contradict what you said in your previous comment, "This 
leads to the same transaction being imported in the database twice..." Please 
clarify.

bq. We just wanted to explore the possibility to have a way of removing these 
"fake"  failures caused by failover or somehow to distinguish them from those 
which are real business failures.

The "fake" failures are the result of your application design (i.e. the 
consumers are not idempotent). To be clear, even _if_ the broker maintained the 
consumer-group relationship during failover you'd still have the risk of these 
kinds of "fake" failures in other scenarios.

That said, the client knows when a failover has occurred so it knows that, at 
least for a little while, there is a fair chance of duplicate messages and 
therefore primary key violations on the database. It could either add this 
context to the failure notification to help whoever reads it or it could simply 
ignore the primary key violations for a time.

> Message Group does not replicate properly during failover
> ---------------------------------------------------------
>
>                 Key: ARTEMIS-4276
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4276
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.28.0
>            Reporter: Liviu Citu
>            Priority: Major
>
> Hi,
> We are currently migrating our software from Classic to Artemis and we plan 
> to use failover functionality.
> We were using message group functionality by setting *JMSXGroupID* and this 
> was working as expected. However after failover switch I noticed that 
> messages are sent to wrong consumers.
> Our gateway/interface application is actually a collection of servers:
>  * gateway adapter server: receives messages from an external systems and 
> puts them on a specific/virtual topic
>  * gateway loader server (can be balanced): picks up the messages from the 
> topic and do processing
>  * gateway fail queue: monitors all messages that failed processing and has a 
> functionality of resubmitting the message (users will correct the processing 
> errors and then resubmit transaction)
> *JMSXGroupID* is used to ensure that during message resubmit the same 
> consumer/loader is processing the message as it was originally processed.
> However, if the message resubmit is happening during failover switch we have 
> noticed that the message is not sent to the right consumer as it should. 
> Basically the first available consumer is used which is not what we want.
> I have searched for configuration changes but couldn't find any relevant 
> information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARTEMIS-4276) Message Group does not replicate properly during failover

Reply via email to