[ 
https://issues.apache.org/jira/browse/ARTEMIS-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nmeylan updated ARTEMIS-4794:
-----------------------------
    Description: 
+Attached test *BridgeDuplicateMessagesARTEMIS4794Test.java*+ highlights the 
issue with _org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl_

Place it under 
_tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/cluster/bridge_

{*}Summary{*}:
     When a bridge is stopped while messages being consumed by the target node, 
it can lead to duplicate messages.

{*}Description{*}:
    When Using bridge and programmatically *stopping* it while messages are 
being consumed by the target node, the source node fails to get the 
acknowledgement from target node and messages now exists on the source and the 
target node.

It appears that the "active" flag being set to false when 
BridgeImpl.StopRunnable is called prevent message to be acknowledged by 
_BridgeImpl::sendAcknowledged_ function

 

{*}Context{*}:
This bug appear in my code (a custom plugin) because is start and stop Bridge 
programmatically to move messages from one node to another when some conditions 
are met, if they are no longer met I want to stop the moving of messages.
 

*Notes:*
 * Changing bridge configuration 
{_}useDuplicateDetection{_},{_}confirmationWindowSize{_} or 
_producerWindowSize_ parameter do not help to mitigate the issue
 * Not related to large messages, i use large messages in my test to ease 
reproduction 
 * Reproduced on 2.30 and 2.34
 * Calling pause() does not create duplicate 
{_}server.getClusterManager().getBridges().get(bridgeName).pause(){_};

 

*Resolution:*
Maybe _StopRunnable::run_ should wait until _queue.getDeliveringCount() == 0_ 
after removing consumer but before going further in the stop process

 

 

*UPDATE:* When using pause instead of stop in above scenari, I get message not 
being develirable anymore
{*}Summary{*}:
When a bridge is paused while *large* messages being consumed by the target 
node, it can lead to message not able to be delivered to consumers.
{*}Description{*}:
When Using bridge and programmatically pausing it while messages are being 
consumed by the target node, If large messages are being delivered, the thread 
In _BridgeImpl::deliverLargeMessage_ is not awaited, and the bridge is paused 
then the Runnable of deliverLargeMessage is being run, but as consumer has been 
removed, message can't be consume by target node, and the message won't be 
delivered to new consumers

{*}Notes{*}:
 * PauseRunnable does not await for task in {{executor}} to complete, 
{{deliverLargeMessage }}do create task in {{executor}}

 * 
 ** We can see that even after PauseRunnable has complete, 
deliverLargeMessage's task is running after.

 * If I call {{bridge1.onCreditsFlow(true, null);}} to set the flag 
{{blockedOnFlowControl}} to true, before calling pause, it prevent putting new 
task on executor and mitigate the issue, but It feels weird and I think there 
might still be race condition

  was:
+Attached test *BridgeDuplicateMessagesARTEMIS4794Test.java*+ highlights the 
issue with _org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl_

Place it under 
_tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/cluster/bridge_

{*}Summary{*}:
     When a bridge is stopped while messages being consumed by the target node, 
it can lead to duplicate messages.

{*}Description{*}:
    When Using bridge and programmatically *stopping* it while messages are 
being consumed by the target node, the source node fails to get the 
acknowledgement from target node and messages now exists on the source and the 
target node.

It appears that the "active" flag being set to false when 
BridgeImpl.StopRunnable is called prevent message to be acknowledged by 
_BridgeImpl::sendAcknowledged_ function

 

{*}Context{*}:
This bug appear in my code (a custom plugin) because is start and stop Bridge 
programmatically to move messages from one node to another when some conditions 
are met, if they are no longer met I want to stop the moving of messages.
 

*Notes:*
 * Changing bridge configuration 
{_}useDuplicateDetection{_},{_}confirmationWindowSize{_} or 
_producerWindowSize_ parameter do not help to mitigate the issue
 * Not related to large messages, i use large messages in my test to ease 
reproduction 
 * Reproduced on 2.30 and 2.34
 * Calling pause() does not create duplicate 
{_}server.getClusterManager().getBridges().get(bridgeName).pause(){_};

 

*Resolution:*
Maybe _StopRunnable::run_ should wait until _queue.getDeliveringCount() == 0_ 
after removing consumer but before going further in the stop process

 

 

*UPDATE:* When using pause instead of stop in above scenari, I get message not 
being develirable anymore
{*}Summary{*}:
When a bridge is paused while *large* messages being consumed by the target 
node, it can lead to message not able to be delivered to consumers.
{*}Description{*}:
When Using bridge and programmatically pausing it while messages are being 
consumed by the target node, If large messages are being delivered, the thread 
In _BridgeImpl::deliverLargeMessage_ is not awaited, and the bridge is paused 
then the Runnable of deliverLargeMessage is being run, but as consumer has been 
removed, message can't be consume by target node, and the message won't be 
delivered to new consumers


> CoreBridge: Duplicate message when bridge is stopped while messages being 
> consumed by target node
> -------------------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-4794
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4794
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 2.30.0, 2.34.0
>            Reporter: nmeylan
>            Priority: Major
>         Attachments: BridgeARTEMIS4794Test.java, 
> message-not-deliverable.log.txt
>
>
> +Attached test *BridgeDuplicateMessagesARTEMIS4794Test.java*+ highlights the 
> issue with _org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl_
> Place it under 
> _tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/cluster/bridge_
> {*}Summary{*}:
>      When a bridge is stopped while messages being consumed by the target 
> node, it can lead to duplicate messages.
> {*}Description{*}:
>     When Using bridge and programmatically *stopping* it while messages are 
> being consumed by the target node, the source node fails to get the 
> acknowledgement from target node and messages now exists on the source and 
> the target node.
> It appears that the "active" flag being set to false when 
> BridgeImpl.StopRunnable is called prevent message to be acknowledged by 
> _BridgeImpl::sendAcknowledged_ function
>  
> {*}Context{*}:
> This bug appear in my code (a custom plugin) because is start and stop Bridge 
> programmatically to move messages from one node to another when some 
> conditions are met, if they are no longer met I want to stop the moving of 
> messages.
>  
> *Notes:*
>  * Changing bridge configuration 
> {_}useDuplicateDetection{_},{_}confirmationWindowSize{_} or 
> _producerWindowSize_ parameter do not help to mitigate the issue
>  * Not related to large messages, i use large messages in my test to ease 
> reproduction 
>  * Reproduced on 2.30 and 2.34
>  * Calling pause() does not create duplicate 
> {_}server.getClusterManager().getBridges().get(bridgeName).pause(){_};
>  
> *Resolution:*
> Maybe _StopRunnable::run_ should wait until _queue.getDeliveringCount() == 0_ 
> after removing consumer but before going further in the stop process
>  
>  
> *UPDATE:* When using pause instead of stop in above scenari, I get message 
> not being develirable anymore
> {*}Summary{*}:
> When a bridge is paused while *large* messages being consumed by the target 
> node, it can lead to message not able to be delivered to consumers.
> {*}Description{*}:
> When Using bridge and programmatically pausing it while messages are being 
> consumed by the target node, If large messages are being delivered, the 
> thread In _BridgeImpl::deliverLargeMessage_ is not awaited, and the bridge is 
> paused then the Runnable of deliverLargeMessage is being run, but as consumer 
> has been removed, message can't be consume by target node, and the message 
> won't be delivered to new consumers
> {*}Notes{*}:
>  * PauseRunnable does not await for task in {{executor}} to complete, 
> {{deliverLargeMessage }}do create task in {{executor}}
>  * 
>  ** We can see that even after PauseRunnable has complete, 
> deliverLargeMessage's task is running after.
>  * If I call {{bridge1.onCreditsFlow(true, null);}} to set the flag 
> {{blockedOnFlowControl}} to true, before calling pause, it prevent putting 
> new task on executor and mitigate the issue, but It feels weird and I think 
> there might still be race condition



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact


Reply via email to