[ 
https://issues.apache.org/jira/browse/IGNITE-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Petrov updated IGNITE-27109:
------------------------------------
    Ignite Flags: Release Notes Required  (was: Docs Required,Release Notes 
Required)

> IgniteCache#putAll may silently lose entries while any node is leaving the 
> cluster
> ----------------------------------------------------------------------------------
>
>                 Key: IGNITE-27109
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27109
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Petrov
>            Assignee: Mikhail Petrov
>            Priority: Major
>              Labels: ise
>
> IgniteCache#putAll call may succeed, but some of the specified entries will 
> not be stored in the cache. This may happen for ATOMIC caches when a node 
> leaves the cluster during IgniteCache#putAll execution. Even though putAll 
> can partially fail for atomic caches, user still should get 
> CachePartialUpdateException.
> The problem is reproduced by ReliabilityTest.testFailover test. Cache 
> configuration: ATOMIC, REPLICATED, FULL_SYNC
> See: 
> https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E
> Explanation :
> IgniteCache#putAll call may succeed, but some of the specified entries will 
> not be stored in the cache. This may happen for ATOMIC caches when a node 
> leaves the cluster during IgniteCache#putAll execution. Even though putAll 
> can partially fail for atomic caches, user still should get 
> CachePartialUpdateException.
> The problem is reproduced by ReliabilityTest.testFailover test. Cache 
> configuration: ATOMIC, REPLICATED, FULL_SYNC
> See: 
> https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E
> Explanation :
> Consider cluster with 3 nodes - node0, node1, node2
> 1. node0 accepts putAll request, maps all keys to corresponding primary nodes 
> and sends GridNearAtomicFullUpdateRequest to node1 and node2.
> 2. node1 starts processing cache entries. Halfway through this process node1 
> receives stop signal (Ignite#close). All remaining attempts to process cache 
> entries will fail with exception - see 
> IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#invoke and 
> IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#operationCancelledException.
> 3. node1 manages to sends GridDhtAtomicUpdateRequest with all processed 
> entries to backups - node2 and node0.
> 4. node1 fails to send GridNearAtomicUpdateResponse with failed keys to node0 
> because NIO was stopped. This message is an indication to the "near" node 
> that some keys could not be processed and the operation should be terminated 
> with an exception.
> 5. node0 and node2 process entries from GridDhtAtomicUpdateRequest`s and 
> sends GridDhtAtomicNearResponse`s to node0.
> 6. node1 is removed from the cluster.
> 7. node0 gets event that node1(primary node for some keys) left the cluster 
> but it received GridDhtAtomicNearResponse`s from all backups. So node0 does 
> nothing and eventually completes putAll operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to