[ 
https://issues.apache.org/jira/browse/IGNITE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360722#comment-16360722
 ] 

Alexey Goncharuk commented on IGNITE-6113:
------------------------------------------

Pavel,

1) The code around clearFuture looks suspicious to me: some of the 
clearFuture.onDone() are sync-ed, some are not. Also, note that there is a 
clearFuture.listen(), and the listener may be called either inside the sync 
block, or outside (if the listener is invoked from another thread). In this 
case, the reset() call is likely unsynchronized with the listener chain 
invocation.
2) Partitions clear await is synchronous in system pool, we must avoid this. In 
best case this will lead to a significant performance drop, in worst case - to 
a deadlock. The wait should be asynchronous. We should probably also report 
some sort of partition clear progress (or at least have a metric/mbean 
indicating that rebalancing wont start because we are waiting for these 
partitions).
3) There is a suspicious getter remaining() in GridDhtPartitionDemander - the 
method is synchronized, but it returns a reference to a map. What if the map 
changes afterwards?
4) Please add a specific test which will reproduce the absence of PME when 
async eviction is happening. Also, we should add tests for the following 
partition state transitions:
MOVING->RENTING->MOVING->OWNING (add an optional node crash for each transition)
RENTING->MOVING->RENTING->EVICTED (add an optional node crash for each 
transition)

> Partition eviction prevents exchange from completion
> ----------------------------------------------------
>
>                 Key: IGNITE-6113
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6113
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Vladislav Pyatkov
>            Assignee: Alexey Goncharuk
>            Priority: Major
>
> I has waited for 3 hours for completion without any success.
> exchange-worker is blocked.
> {noformat}
> "exchange-worker-#92%DPL_GRID%grid554.ca.sbrf.ru%" #173 prio=5 os_prio=0 
> tid=0x00007f0835c2e000 nid=0xb907 runnable [0x00007e74ab1d0000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007efee630a7c0> (a 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition$1)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189)
>         at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.assign(GridDhtPreloader.java:340)
>         at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1801)
>         at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "sys-#124%DPL_GRID%grid554.ca.sbrf.ru%" #278 prio=5 os_prio=0 
> tid=0x00007e731c02d000 nid=0xbf4d runnable [0x00007e734e7f7000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>         - locked <0x00007f056161bf88> (a java.lang.Object)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.writeBuffer(FileWriteAheadLogManager.java:1829)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:1572)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:1421)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.access$800(FileWriteAheadLogManager.java:1331)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:339)
>         at 
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.beforeReleaseWrite(PageMemoryImpl.java:1287)
>         at 
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1142)
>         at 
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageImpl.releaseWrite(PageImpl.java:167)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writeUnlock(PageHandler.java:193)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:242)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:119)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.doRemoveFromLeaf(BPlusTree.java:2886)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.removeFromLeaf(BPlusTree.java:2865)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.access$6900(BPlusTree.java:2515)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1607)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.doRemove(BPlusTree.java:1481)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.remove(BPlusTree.java:1451)
>         at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.remove(H2TreeIndex.java:307)
>         at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.doUpdate(GridH2Table.java:637)
>         at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:517)
>         at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:664)
>         at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:1186)
>         at 
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:467)
>         at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1090)
>         at 
> org.gridgain.grid.cache.db.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:993)
>         at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:357)
>         at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3621)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:599)
>         - locked <0x00007f054d45bad8> (a 
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCacheEntry)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:956)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:793)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:856)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:843)
>         at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6660)
>         at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:925)
>         at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to