[ 
https://issues.apache.org/jira/browse/IGNITE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712295#comment-17712295
 ] 

Aleksey Plekhanov commented on IGNITE-19115:
--------------------------------------------

Cherry-picked to 2.15

> Possible deadlock in handling pending cache messages when the cache is 
> recreated
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-19115
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19115
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.14
>            Reporter: Vyacheslav Koptilin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>             Fix For: 2.15
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Let's consider the following scenario:
>   Precondition:
>     there is a cluster of two server nodes (node A - coordinator, and node B) 
> and an atomic cache that resides on that nodes.
>     current topology version is (x, y)
> Node B initiates putting a new key-value pair into the atomic cache. Let's 
> assume the primary partition, which belongs to the key, resides on node A.
> The previous step requires acquiring a gateway lock for the corresponding 
> cache (GridCacheGateway read lock) and registering 
> GridNearAtomicSingleUpdateFuture into the MVCC manager. It is important to 
> note, that cache future does not acquire topology lock and so should not 
> block PME
> Concurrently, node A initiates destroying the cache. Corresponding PME will 
> be successfully completed on the coordinator node and blocked on node B just 
> because the gateway is already acquired
> {noformat}
> Thread [name="sys-#105%dht.IgniteCacheRecreateTest1%", id=123, 
> state=TIMED_WAITING, blockCnt=0, waitCnt=350]
>         at java.lang.Thread.sleep(Native Method)
>         at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:8316)
>         at 
> o.a.i.i.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:324)
>         at 
> o.a.i.i.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2582)
>         at 
> o.a.i.i.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$1c59e5cf$1(GridCacheProcessor.java:2776)
>         at 
> o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$714/770930142.apply(Unknown
>  Source)
>         at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11628)
>         at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11530)
>         at 
> o.a.i.i.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2755)
>         at 
> o.a.i.i.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2945)
>         at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2528)
>         at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4785)
>         at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:161)
>         at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4453)
>         at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4441)
>         at 
> o.a.i.i.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:464)
>         at 
> o.a.i.i.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355)
>         at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4441)
>         at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1991)
>         at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:469)
>         at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:454)
>         at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3765)
>         at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3744)
>         at 
> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
>         at 
> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
>         at 
> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>         at 
> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>         at 
> o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
>         at 
> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
>         at 
> o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
>         at 
> o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
>         at 
> o.a.i.i.managers.communication.GridIoManager.access$5300(GridIoManager.java:243)
>         at 
> o.a.i.i.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
>         at 
> o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750)
> {noformat}
> Node A initiates creating a new cache with the same name as previously 
> destroyed.
> Node A received a cache update message but it cannot be processed, because a 
> new cache (cache with the same cacheId) is starting, so, the processing of 
> this message should be postponed until PME is completed (In this case the 
> GridDhtForceKeysFuture is created, and the message will not be processed 
> until PME is completed. So, the near node will not receive a response and it 
> will not be able to complete the previous exchange future. see IGNITE-10251).
> new PME on node B cannot proceed further just because of 3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to