[ https://issues.apache.org/jira/browse/IGNITE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360722#comment-16360722 ]
Alexey Goncharuk commented on IGNITE-6113: ------------------------------------------ Pavel, 1) The code around clearFuture looks suspicious to me: some of the clearFuture.onDone() are sync-ed, some are not. Also, note that there is a clearFuture.listen(), and the listener may be called either inside the sync block, or outside (if the listener is invoked from another thread). In this case, the reset() call is likely unsynchronized with the listener chain invocation. 2) Partitions clear await is synchronous in system pool, we must avoid this. In best case this will lead to a significant performance drop, in worst case - to a deadlock. The wait should be asynchronous. We should probably also report some sort of partition clear progress (or at least have a metric/mbean indicating that rebalancing wont start because we are waiting for these partitions). 3) There is a suspicious getter remaining() in GridDhtPartitionDemander - the method is synchronized, but it returns a reference to a map. What if the map changes afterwards? 4) Please add a specific test which will reproduce the absence of PME when async eviction is happening. Also, we should add tests for the following partition state transitions: MOVING->RENTING->MOVING->OWNING (add an optional node crash for each transition) RENTING->MOVING->RENTING->EVICTED (add an optional node crash for each transition) > Partition eviction prevents exchange from completion > ---------------------------------------------------- > > Key: IGNITE-6113 > URL: https://issues.apache.org/jira/browse/IGNITE-6113 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.1 > Reporter: Vladislav Pyatkov > Assignee: Alexey Goncharuk > Priority: Major > > I has waited for 3 hours for completion without any success. > exchange-worker is blocked. > {noformat} > "exchange-worker-#92%DPL_GRID%grid554.ca.sbrf.ru%" #173 prio=5 os_prio=0 > tid=0x00007f0835c2e000 nid=0xb907 runnable [0x00007e74ab1d0000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007efee630a7c0> (a > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition$1) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.assign(GridDhtPreloader.java:340) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1801) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - None > {noformat} > {noformat} > "sys-#124%DPL_GRID%grid554.ca.sbrf.ru%" #278 prio=5 os_prio=0 > tid=0x00007e731c02d000 nid=0xbf4d runnable [0x00007e734e7f7000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) > - locked <0x00007f056161bf88> (a java.lang.Object) > at > org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.writeBuffer(FileWriteAheadLogManager.java:1829) > at > org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:1572) > at > org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:1421) > at > org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.access$800(FileWriteAheadLogManager.java:1331) > at > org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:339) > at > org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.beforeReleaseWrite(PageMemoryImpl.java:1287) > at > org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1142) > at > org.gridgain.grid.internal.processors.cache.database.pagemem.PageImpl.releaseWrite(PageImpl.java:167) > at > org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writeUnlock(PageHandler.java:193) > at > org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:242) > at > org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:119) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.doRemoveFromLeaf(BPlusTree.java:2886) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.removeFromLeaf(BPlusTree.java:2865) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.access$6900(BPlusTree.java:2515) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1607) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.doRemove(BPlusTree.java:1481) > at > org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.remove(BPlusTree.java:1451) > at > org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.remove(H2TreeIndex.java:307) > at > org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.doUpdate(GridH2Table.java:637) > at > org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:517) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:664) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:1186) > at > org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:467) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1090) > at > org.gridgain.grid.cache.db.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:993) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:357) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3621) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:599) > - locked <0x00007f054d45bad8> (a > org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCacheEntry) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:956) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:793) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:856) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:843) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6660) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:925) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)