[ https://issues.apache.org/jira/browse/IGNITE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004416#comment-16004416 ]
ASF GitHub Bot commented on IGNITE-5125: ---------------------------------------- GitHub user zstan opened a pull request: https://github.com/apache/ignite/pull/1922 IGNITE-5125, improve logging in case of hang while exchange cannot fi… …nish. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-5125 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/1922.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1922 ---- commit 5587c33c99c3e40d4efca0068a0508c40f8abc5c Author: Evgeny Stanilovskiy <estanilovs...@gridgain.com> Date: 2017-05-05T12:35:54Z IGNITE-5125, improve logging in case of hang while exchange cannot finish. ---- > Need to improve logging in case of hang > --------------------------------------- > > Key: IGNITE-5125 > URL: https://issues.apache.org/jira/browse/IGNITE-5125 > Project: Ignite > Issue Type: Improvement > Reporter: Yakov Zhdanov > Assignee: Stanilovsky Evgeny > Priority: Critical > Fix For: 2.1 > > > 1. When cache operation hangs on node it is not reported as hanged although > partition map exchange cannot finish. > {noformat} > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fdd260fd7c0> (a > org.apache.ignite.internal.util.future.GridEmbeddedFuture) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:159) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:119) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2356) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2354) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4168) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2354) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2335) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2312) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.put(IgniteCacheProxy.java:1379) > {noformat} > {noformat} > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fa0698fee50> (a > org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:161) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:119) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get0(GridCacheAdapter.java:4570) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4544) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1428) > at > org.apache.ignite.internal.processors.cache.GridCacheProxyImpl.get(GridCacheProxyImpl.java:329) > at > org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:589) > at > org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.sequence(DataStructuresProcessor.java:397) > {noformat} > 2. Partition exchnage future dumps objects only limited number of times. I > would suggest to switch to mode when we double the delay between dumps each > time, but no more than 30min > 3. If exchange worker is stuck at > GridDhtPartitionsExchangeFuture.waitPartitionRelease then unreleased > partitions should be reported (same rules as of pt 2 apply) > {noformat} > "exchange-worker-#93%...%" #143 prio=5 os_prio=0 tid=0x00007fd782df3000 > nid=0x1526 waiting on condition [0x00007fc2dc9c5000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fcab30006b8> (a > org.apache.ignite.internal.util.future.GridCompoundFuture) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:980) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:907) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:617) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1701) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)