[jira] [Created] (IGNITE-8821) Huge logs on BPlusTreeSelfTest put/remove family tests
Dmitriy Sorokin created IGNITE-8821: --- Summary: Huge logs on BPlusTreeSelfTest put/remove family tests Key: IGNITE-8821 URL: https://issues.apache.org/jira/browse/IGNITE-8821 Project: Ignite Issue Type: Test Components: general Reporter: Dmitriy Sorokin Assignee: Dmitriy Sorokin Fix For: 2.6 A printLocks method generates huge count of ## XX log lines without any more info assigned to. Avoiding the output of unnecessary non-informative lines is suggested. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8769) JVM crash in Basic1 suite in master branch on TC
[ https://issues.apache.org/jira/browse/IGNITE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515680#comment-16515680 ] Dmitriy Sorokin commented on IGNITE-8769: - [~ivan.glukos], review my patch, please! > JVM crash in Basic1 suite in master branch on TC > > > Key: IGNITE-8769 > URL: https://issues.apache.org/jira/browse/IGNITE-8769 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Dmitriy Sorokin >Priority: Blocker > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Latest build with crash: [TC > link|https://ci.ignite.apache.org/viewLog.html?buildId=1373991&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Basic1] > There is another crash in the history: [TC > link|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Basic1&branch_IgniteTests24Java8=%3Cdefault%3E&tab=buildTypeStatusDiv] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8749) Exception for "no space left" situation should be propagated to FailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513761#comment-16513761 ] Dmitriy Sorokin commented on IGNITE-8749: - [~agura], review my patch, please! > Exception for "no space left" situation should be propagated to FailureHandler > -- > > Key: IGNITE-8749 > URL: https://issues.apache.org/jira/browse/IGNITE-8749 > Project: Ignite > Issue Type: Improvement > Components: persistence >Reporter: Sergey Chugunov >Assignee: Dmitriy Sorokin >Priority: Major > Fix For: 2.6 > > > For now if "no space left" situation is detected in > FileWriteAheadLogManager#formatFile method and corresponding exception is > thrown the exception doesn't get propagated to FailureHandler and node > continues working. > As "no space left" is a critical situation, corresponding exception should be > propagated to handler to make necessary actions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8769) JVM crash in Basic1 suite in master branch on TC
[ https://issues.apache.org/jira/browse/IGNITE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-8769: --- Assignee: Dmitriy Sorokin > JVM crash in Basic1 suite in master branch on TC > > > Key: IGNITE-8769 > URL: https://issues.apache.org/jira/browse/IGNITE-8769 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Dmitriy Sorokin >Priority: Blocker > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Latest build with crash: [TC > link|https://ci.ignite.apache.org/viewLog.html?buildId=1373991&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Basic1] > There is another crash in the history: [TC > link|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Basic1&branch_IgniteTests24Java8=%3Cdefault%3E&tab=buildTypeStatusDiv] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8749) Exception for "no space left" situation should be propagated to FailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-8749: --- Assignee: Dmitriy Sorokin (was: Andrey Gura) > Exception for "no space left" situation should be propagated to FailureHandler > -- > > Key: IGNITE-8749 > URL: https://issues.apache.org/jira/browse/IGNITE-8749 > Project: Ignite > Issue Type: Improvement > Components: persistence >Reporter: Sergey Chugunov >Assignee: Dmitriy Sorokin >Priority: Major > Fix For: 2.6 > > > For now if "no space left" situation is detected in > FileWriteAheadLogManager#formatFile method and corresponding exception is > thrown the exception doesn't get propagated to FailureHandler and node > continues working. > As "no space left" is a critical situation, corresponding exception should be > propagated to handler to make necessary actions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8742) Direct IO 2 suite is timed out by 'out of disk space' failure emulation test: WAL manager failure does not stoped execution
[ https://issues.apache.org/jira/browse/IGNITE-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-8742: --- Assignee: Dmitriy Sorokin > Direct IO 2 suite is timed out by 'out of disk space' failure emulation test: > WAL manager failure does not stoped execution > --- > > Key: IGNITE-8742 > URL: https://issues.apache.org/jira/browse/IGNITE-8742 > Project: Ignite > Issue Type: Test > Components: persistence >Reporter: Dmitriy Pavlov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > > https://ci.ignite.apache.org/viewLog.html?buildId=1366882&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_PdsDirectIo2 > Test > org.apache.ignite.internal.processors.cache.persistence.IgniteNativeIoWalFlushFsyncSelfTest#testFailAfterStart > emulates problem with disc space using exception. > In direct IO environment real IO with disk is performed, tmpfs is not used. > Sometimes this error can come from rollover() of segment, failure handler > reacted accordingly. > {noformat} > detected. Will be handled accordingly to configured handler [hnd=class > o.a.i.failure.StopNodeFailureHandler, failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.i.pagemem.wal.StorageException: Unable > to write]] > class org.apache.ignite.internal.pagemem.wal.StorageException: Unable to write > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.writeBuffer(FsyncModeFileWriteAheadLogManager.java:2964) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.flush(FsyncModeFileWriteAheadLogManager.java:2640) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.flush(FsyncModeFileWriteAheadLogManager.java:2572) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.flushOrWait(FsyncModeFileWriteAheadLogManager.java:2525) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.close(FsyncModeFileWriteAheadLogManager.java:2795) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.access$700(FsyncModeFileWriteAheadLogManager.java:2340) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.rollOver(FsyncModeFileWriteAheadLogManager.java:1029) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.log(FsyncModeFileWriteAheadLogManager.java:673) > {noformat} > But test seems to be not able to stop, node stopper thread tries to stop > cache, flush WAL. flush wait for rollover, which will never happen. > {noformat} > Thread [name="node-stopper", id=2836, state=WAITING, blockCnt=7, waitCnt=9] > Lock > [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@47f6473, > ownerName=null, ownerId=-1] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976) > at o.a.i.i.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7473) > at > o.a.i.i.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.flushOrWait(FsyncModeFileWriteAheadLogManager.java:2546) > at > o.a.i.i.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.fsync(FsyncModeFileWriteAheadLogManager.java:2750) > at > o.a.i.i.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager$FileWriteHandle.access$2000(FsyncModeFileWriteAheadLogManager.java:2340) > at > o.a.i.i.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.flush(FsyncModeFileWriteAheadLogManager.java:699) > at > o.a.i.i.processors.cache.GridCacheProcessor.stopCache(GridCacheProcessor.java:1243) > at > o.a.i.i.processors.cache.GridCacheProcessor.stopCaches(GridCacheProcessor.java:969) > at > o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:943) > at o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2289) > at o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2167) > at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2588) > - locked o.a.i.i.IgnitionEx$IgniteNamedInstance@90f6bfd > at o.a.i.i.IgnitionEx$Igni
[jira] [Commented] (IGNITE-8311) IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to terminate via NPE
[ https://issues.apache.org/jira/browse/IGNITE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500129#comment-16500129 ] Dmitriy Sorokin commented on IGNITE-8311: - [~agura], review my patch, please! > IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to > terminate via NPE > --- > > Key: IGNITE-8311 > URL: https://issues.apache.org/jira/browse/IGNITE-8311 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Currently, tests use {{NoOpFailureHandler}} by default, hence this > exchange-worker termination is masked. We are to fix it: test code should not > be able to terminate system-critical thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8311) IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to terminate via NPE
[ https://issues.apache.org/jira/browse/IGNITE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496627#comment-16496627 ] Dmitriy Sorokin edited comment on IGNITE-8311 at 5/31/18 2:26 PM: -- The root cause of this error is inconsistent state of GridAffinityAssignmentCache, which appears when main cycle of ExchangeWorker.body0() continue work on occuring of IgniteCheckedException (last catch block at the end of cycle). Proposed solution - further throwing of catched exception, it will prevent going into inconsistent state of grid components. However, some tests of starting grid with incorrect configuration will cause the jvm halt due to critical system error detected, but that issue should be fixed in IGNITE-1094. was (Author: cyberdemon): The root cause of this error is inconsistent state of GridAffinityAssignmentCache, which appears when main cycle of ExchangeWorker.body0() continue work on occuring of IgniteCheckedException (last catch block at the end of cycle). Proposed solution - further throwing of catched exception, it will prevent going into inconsistent state of grid components. However, some tests of starting grid with incorrect configuration will cause the jvm halt due to critical system error detected, but that issue should be fixed in -IGNITE-1094-. > IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to > terminate via NPE > --- > > Key: IGNITE-8311 > URL: https://issues.apache.org/jira/browse/IGNITE-8311 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Currently, tests use {{NoOpFailureHandler}} by default, hence this > exchange-worker termination is masked. We are to fix it: test code should not > be able to terminate system-critical thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8311) IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to terminate via NPE
[ https://issues.apache.org/jira/browse/IGNITE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496627#comment-16496627 ] Dmitriy Sorokin edited comment on IGNITE-8311 at 5/31/18 2:25 PM: -- The root cause of this error is inconsistent state of GridAffinityAssignmentCache, which appears when main cycle of ExchangeWorker.body0() continue work on occuring of IgniteCheckedException (last catch block at the end of cycle). Proposed solution - further throwing of catched exception, it will prevent going into inconsistent state of grid components. However, some tests of starting grid with incorrect configuration will cause the jvm halt due to critical system error detected, but that issue should be fixed in -IGNITE-1094-. was (Author: cyberdemon): The root cause of this error is inconsistent state of GridAffinityAssignmentCache, which appears when main cycle of ExchangeWorker.body0() continue work on occuring of IgniteCheckedException (last catch block at the end of cycle). Proposed solution - further throwing of catched exception, it will prevent going into inconsistent state of grid components. However, some tests of starting grid with incorrect configuration will cause the jvm halt due to critical system error detected, but that issue should be fixed in IGNITE-1049. > IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to > terminate via NPE > --- > > Key: IGNITE-8311 > URL: https://issues.apache.org/jira/browse/IGNITE-8311 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Currently, tests use {{NoOpFailureHandler}} by default, hence this > exchange-worker termination is masked. We are to fix it: test code should not > be able to terminate system-critical thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8311) IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to terminate via NPE
[ https://issues.apache.org/jira/browse/IGNITE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496627#comment-16496627 ] Dmitriy Sorokin commented on IGNITE-8311: - The root cause of this error is inconsistent state of GridAffinityAssignmentCache, which appears when main cycle of ExchangeWorker.body0() continue work on occuring of IgniteCheckedException (last catch block at the end of cycle). Proposed solution - further throwing of catched exception, it will prevent going into inconsistent state of grid components. However, some tests of starting grid with incorrect configuration will cause the jvm halt due to critical system error detected, but that issue should be fixed in IGNITE-1049. > IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to > terminate via NPE > --- > > Key: IGNITE-8311 > URL: https://issues.apache.org/jira/browse/IGNITE-8311 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Currently, tests use {{NoOpFailureHandler}} by default, hence this > exchange-worker termination is masked. We are to fix it: test code should not > be able to terminate system-critical thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-5997) [Test Failed] DynamicIndexPartitionedAtomicConcurrentSelfTest.testCoordinatorChange
[ https://issues.apache.org/jira/browse/IGNITE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin resolved IGNITE-5997. - Resolution: Cannot Reproduce > [Test Failed] > DynamicIndexPartitionedAtomicConcurrentSelfTest.testCoordinatorChange > --- > > Key: IGNITE-5997 > URL: https://issues.apache.org/jira/browse/IGNITE-5997 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.1 >Reporter: Eduard Shangareev >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > > It fails more often locally on linux machine > http://ci.ignite.apache.org/viewLog.html?buildId=752869&tab=buildResultsDiv&buildTypeId=Ignite20Tests_IgniteQueries2#testNameId-4226597044755906475 > {code} > SchemaOperationException [code=0, msg=Client node is disconnected (operation > result is unknown).] > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.onDisconnected(GridQueryProcessor.java:822) > at > org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3770) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:749) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:559) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2391) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2370) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1686) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-5997) [Test Failed] DynamicIndexPartitionedAtomicConcurrentSelfTest.testCoordinatorChange
[ https://issues.apache.org/jira/browse/IGNITE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496455#comment-16496455 ] Dmitriy Sorokin commented on IGNITE-5997: - I ran this test about 40 times, and no failures was happen. In addition, [TC history of this test|https://ci.ignite.apache.org/project.html?tab=testDetails&projectId=IgniteTests24Java8&testNameId=-4226597044755906475&page=1] not contain any run in which one has been failed. So, I think that this ticket should be closed as non-reproducable. > [Test Failed] > DynamicIndexPartitionedAtomicConcurrentSelfTest.testCoordinatorChange > --- > > Key: IGNITE-5997 > URL: https://issues.apache.org/jira/browse/IGNITE-5997 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.1 >Reporter: Eduard Shangareev >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > > It fails more often locally on linux machine > http://ci.ignite.apache.org/viewLog.html?buildId=752869&tab=buildResultsDiv&buildTypeId=Ignite20Tests_IgniteQueries2#testNameId-4226597044755906475 > {code} > SchemaOperationException [code=0, msg=Client node is disconnected (operation > result is unknown).] > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.onDisconnected(GridQueryProcessor.java:822) > at > org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3770) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:749) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:559) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2391) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2370) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1686) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8584) Provide ability to terminate any thread with enabled test features
[ https://issues.apache.org/jira/browse/IGNITE-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487807#comment-16487807 ] Dmitriy Sorokin commented on IGNITE-8584: - [~agura], review my patch, please! > Provide ability to terminate any thread with enabled test features > -- > > Key: IGNITE-8584 > URL: https://issues.apache.org/jira/browse/IGNITE-8584 > Project: Ignite > Issue Type: New Feature >Reporter: Andrey Gura >Assignee: Dmitriy Sorokin >Priority: Major > Fix For: 2.6 > > > We already have {{WorkersControlMXBean}} that provides possibility to > interrupt thread registered in system workers registry. We also want stop any > thread in the system for testing purposes. > Method {{stop(String threadName)}} should be added that have to find thread > by name and stop it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8584) Provide ability to terminate any thread with enabled test features
[ https://issues.apache.org/jira/browse/IGNITE-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-8584: --- Assignee: Dmitriy Sorokin > Provide ability to terminate any thread with enabled test features > -- > > Key: IGNITE-8584 > URL: https://issues.apache.org/jira/browse/IGNITE-8584 > Project: Ignite > Issue Type: New Feature >Reporter: Andrey Gura >Assignee: Dmitriy Sorokin >Priority: Major > Fix For: 2.6 > > > We already have {{WorkersControlMXBean}} that provides possibility to > interrupt thread registered in system workers registry. We also want stop any > thread in the system for testing purposes. > Method {{stop(String threadName)}} should be added that have to find thread > by name and stop it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-5997) [Test Failed] DynamicIndexPartitionedAtomicConcurrentSelfTest.testCoordinatorChange
[ https://issues.apache.org/jira/browse/IGNITE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-5997: --- Assignee: Dmitriy Sorokin > [Test Failed] > DynamicIndexPartitionedAtomicConcurrentSelfTest.testCoordinatorChange > --- > > Key: IGNITE-5997 > URL: https://issues.apache.org/jira/browse/IGNITE-5997 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.1 >Reporter: Eduard Shangareev >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > > It fails more often locally on linux machine > http://ci.ignite.apache.org/viewLog.html?buildId=752869&tab=buildResultsDiv&buildTypeId=Ignite20Tests_IgniteQueries2#testNameId-4226597044755906475 > {code} > SchemaOperationException [code=0, msg=Client node is disconnected (operation > result is unknown).] > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.onDisconnected(GridQueryProcessor.java:822) > at > org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3770) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:749) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:559) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2391) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2370) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1686) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8311) IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to terminate via NPE
[ https://issues.apache.org/jira/browse/IGNITE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-8311: --- Assignee: Dmitriy Sorokin > IgniteClientRejoinTest.testClientsReconnectDisabled causes exchange-worker to > terminate via NPE > --- > > Key: IGNITE-8311 > URL: https://issues.apache.org/jira/browse/IGNITE-8311 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Currently, tests use {{NoOpFailureHandler}} by default, hence this > exchange-worker termination is masked. We are to fix it: test code should not > be able to terminate system-critical thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4958) Make data pages recyclable into index/meta/etc pages and vice versa
[ https://issues.apache.org/jira/browse/IGNITE-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452571#comment-16452571 ] Dmitriy Sorokin commented on IGNITE-4958: - [~agura], [~ivan.glukos], review my patch, please, test results seems good for me. > Make data pages recyclable into index/meta/etc pages and vice versa > --- > > Key: IGNITE-4958 > URL: https://issues.apache.org/jira/browse/IGNITE-4958 > Project: Ignite > Issue Type: Improvement > Components: cache >Affects Versions: 2.0 >Reporter: Ivan Rakov >Assignee: Dmitriy Sorokin >Priority: Major > Fix For: 2.6 > > > Recycling for data pages is disabled for now. Empty data pages are > accumulated in FreeListImpl#emptyDataPagesBucket, and can be reused only as > data pages again. What has to be done: > * Empty data pages should be recycled into reuse bucket > * We should check reuse bucket first before allocating a new data page > * MemoryPolicyConfiguration#emptyPagesPoolSize should be removed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8255) Possible name collisions in WorkersRegistry
Dmitriy Sorokin created IGNITE-8255: --- Summary: Possible name collisions in WorkersRegistry Key: IGNITE-8255 URL: https://issues.apache.org/jira/browse/IGNITE-8255 Project: Ignite Issue Type: Bug Reporter: Dmitriy Sorokin Assignee: Dmitriy Sorokin Fix For: 2.5 {code:java} java.lang.IllegalStateException: Worker is already registered [worker=GridWorker [name=ttl-cleanup-worker, igniteInstanceName=null, finished=false, hashCode=612569625, interrupted=true, runner=ttl-cleanup-worker-#66]] at org.apache.ignite.internal.worker.WorkersRegistry.register(WorkersRegistry.java:40) at org.apache.ignite.internal.worker.WorkersRegistry.onStarted(WorkersRegistry.java:73) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:108) at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8101) Ability to terminate system workers by JMX for test purposes
[ https://issues.apache.org/jira/browse/IGNITE-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426833#comment-16426833 ] Dmitriy Sorokin commented on IGNITE-8101: - [~agura], review my patch, please! > Ability to terminate system workers by JMX for test purposes > > > Key: IGNITE-8101 > URL: https://issues.apache.org/jira/browse/IGNITE-8101 > Project: Ignite > Issue Type: Improvement >Reporter: Dmitriy Sorokin >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8101) Ability to terminate system workers by JMX for test purposes
[ https://issues.apache.org/jira/browse/IGNITE-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-8101: Description: (was: [0] disco-event-worker [0] [1] grid-timeout-worker [1] [2] partition-exchanger [2]) > Ability to terminate system workers by JMX for test purposes > > > Key: IGNITE-8101 > URL: https://issues.apache.org/jira/browse/IGNITE-8101 > Project: Ignite > Issue Type: Improvement >Reporter: Dmitriy Sorokin >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8101) Ability to terminate system workers by JMX for test purposes
[ https://issues.apache.org/jira/browse/IGNITE-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-8101: Description: [0] disco-event-worker [0] [1] grid-timeout-worker [1] [2] partition-exchanger [2] > Ability to terminate system workers by JMX for test purposes > > > Key: IGNITE-8101 > URL: https://issues.apache.org/jira/browse/IGNITE-8101 > Project: Ignite > Issue Type: Improvement >Reporter: Dmitriy Sorokin >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > [0] disco-event-worker [0] > [1] grid-timeout-worker [1] > [2] partition-exchanger [2] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8071) Add tests for failure handlers
[ https://issues.apache.org/jira/browse/IGNITE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422602#comment-16422602 ] Dmitriy Sorokin commented on IGNITE-8071: - [~agura], review my patch, please! > Add tests for failure handlers > -- > > Key: IGNITE-8071 > URL: https://issues.apache.org/jira/browse/IGNITE-8071 > Project: Ignite > Issue Type: Improvement >Reporter: Andrey Gura >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > Different failure handlers were implemented due to IEP-14 (IGNITE-6890). > Tests should be added for this implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8101) Ability to terminate system workers by JMX for test purposes
Dmitriy Sorokin created IGNITE-8101: --- Summary: Ability to terminate system workers by JMX for test purposes Key: IGNITE-8101 URL: https://issues.apache.org/jira/browse/IGNITE-8101 Project: Ignite Issue Type: Improvement Reporter: Dmitriy Sorokin Assignee: Dmitriy Sorokin Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8068) Some tests failed due to JVM halt by default FailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-8068: Fix Version/s: 2.5 > Some tests failed due to JVM halt by default FailureHandler > --- > > Key: IGNITE-8068 > URL: https://issues.apache.org/jira/browse/IGNITE-8068 > Project: Ignite > Issue Type: Bug >Reporter: Dmitriy Sorokin >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > NoOpFailureHandler is needed by default in test IgniteConfiguration instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8068) Some tests failed due to JVM halt by default FailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-8068: Labels: iep-14 (was: ) > Some tests failed due to JVM halt by default FailureHandler > --- > > Key: IGNITE-8068 > URL: https://issues.apache.org/jira/browse/IGNITE-8068 > Project: Ignite > Issue Type: Bug >Reporter: Dmitriy Sorokin >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > NoOpFailureHandler is needed by default in test IgniteConfiguration instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8068) Some tests failed due to JVM halt by default FailureHandler
Dmitriy Sorokin created IGNITE-8068: --- Summary: Some tests failed due to JVM halt by default FailureHandler Key: IGNITE-8068 URL: https://issues.apache.org/jira/browse/IGNITE-8068 Project: Ignite Issue Type: Bug Reporter: Dmitriy Sorokin Assignee: Dmitriy Sorokin NoOpFailureHandler is needed by default in test IgniteConfiguration instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6890) General way for handling Ignite failures
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360603#comment-16360603 ] Dmitriy Sorokin commented on IGNITE-6890: - [~avinogradov], review my new patch, please. > General way for handling Ignite failures > > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-7 > Fix For: 2.5 > > > Ignite failures which should be handled are: > # Topology segmentation; > # Exchange worker stop; > # Persistence errors. > Proper behavior should be selected according to result of calling > IgniteFailureHandler instance, custom implementation of which can be provided > in IgniteConfiguration. It can be node stop, restart or nothing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-6891) Proper behavior on Persistence errors
[ https://issues.apache.org/jira/browse/IGNITE-6891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin resolved IGNITE-6891. - Resolution: Duplicate > Proper behavior on Persistence errors > -- > > Key: IGNITE-6891 > URL: https://issues.apache.org/jira/browse/IGNITE-6891 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-7 > Fix For: 2.5 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-6890) General way for handling Ignite failures
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-6890: Summary: General way for handling Ignite failures (was: Proper behavior on ExchangeWorker exits with error ) > General way for handling Ignite failures > > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-7 > Fix For: 2.5 > > > Ignite failures which should be handled are: > # Topology segmentation; > # Exchange worker stop; > # Persistence errors. > Proper behavior should be selected according to result of calling > IgniteFailureHandler instance, custom implementation of which can be provided > in IgniteConfiguration. It can be node stop, restart or nothing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-6890) Proper behavior on ExchangeWorker exits with error
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-6890: Description: Ignite failures which should be handled are: # Topology segmentation; # Exchange worker stop; # Persistence errors. Proper behavior should be selected according to result of calling IgniteFailureHandler instance, custom implementation of which can be provided in IgniteConfiguration. It can be node stop, restart or nothing. was:Node should be stopped anyway, what we can provide is user callback, something like beforeNodeStop'. > Proper behavior on ExchangeWorker exits with error > --- > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin >Priority: Major > Labels: iep-7 > Fix For: 2.5 > > > Ignite failures which should be handled are: > # Topology segmentation; > # Exchange worker stop; > # Persistence errors. > Proper behavior should be selected according to result of calling > IgniteFailureHandler instance, custom implementation of which can be provided > in IgniteConfiguration. It can be node stop, restart or nothing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343239#comment-16343239 ] Dmitriy Sorokin commented on IGNITE-7019: - [~avinogradov], Review my patch, please. > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340947#comment-16340947 ] Dmitriy Sorokin commented on IGNITE-7019: - Final solution which was coded is passing ReuseBag instance as parameter through PagesList's getPageForPut and addStripe methods to allocatePage method. That allows use ReuseBag's pages before trying to allocate new pages. > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334085#comment-16334085 ] Dmitriy Sorokin commented on IGNITE-7019: - We discussed possible solutions with [~mcherkasov] and [~avinogradov], and chose the following: first, when IOOME occured on page moving from bucket with lower index to higher one, we leave page on old bucket; second, we add periodical task for looking up such pages (placed on wrong buckets) and correcting its placement if possible (no IOOME on page moving). Also we need reproducer for this bug, I'll make it at first. > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian
[jira] [Assigned] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-7019: --- Assignee: Dmitriy Sorokin > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.4 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6742) Java 9: rework Cleaner usage in PlatformMemoryPool class
[ https://issues.apache.org/jira/browse/IGNITE-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303773#comment-16303773 ] Dmitriy Sorokin commented on IGNITE-6742: - [~agura], review mypatch, please. I think that Ignite basic tests passed is enough for this task. > Java 9: rework Cleaner usage in PlatformMemoryPool class > > > Key: IGNITE-6742 > URL: https://issues.apache.org/jira/browse/IGNITE-6742 > Project: Ignite > Issue Type: Task > Components: platforms >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Fix For: 2.4 > > > We attach special cleaner to {{PlatformMemoryPool}} using > {{sun.misc.Cleaner.create}} method. This way we ensure that thread-local > native memory (which is used to pass data between platform and Java) is > released properly. > Need to rework this API to reflection-based approach, which works for both > Java 7/8 and Java 9. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6894) Hanged Tx monitoring
[ https://issues.apache.org/jira/browse/IGNITE-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-6894: --- Assignee: Dmitriy Sorokin > Hanged Tx monitoring > > > Key: IGNITE-6894 > URL: https://issues.apache.org/jira/browse/IGNITE-6894 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Hanging Transactions not Related to Deadlock > Description > This situation can occur if user explicitly markups the transaction (esp > Pessimistic Repeatable Read) and, for example, calls remote service (which > may be unresponsive) after acquiring some locks. All other transactions > depending on the same keys will hang. > Detection and Solution > This most likely cannot be resolved automatically other than rollback TX by > timeout and release all the locks acquired so far. Also such TXs can be > rolled back from Web Console as described above. > If transaction has been rolled back on timeout or via UI then any further > action in the transaction, e.g. lock acquisition or commit attempt should > throw exception. > Report > Web Console should provide ability to rollback any transaction via UI. > Long running transaction should be reported to logs. Log record should > contain: near nodes, transaction IDs, cache names, keys (limited to several > tens of), etc ( ?). > Also there should be a screen in Web Console that will list all ongoing > transactions in the cluster including the info as above. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6895) TX deadlock monitoring
[ https://issues.apache.org/jira/browse/IGNITE-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-6895: --- Assignee: Dmitriy Sorokin > TX deadlock monitoring > -- > > Key: IGNITE-6895 > URL: https://issues.apache.org/jira/browse/IGNITE-6895 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Deadlocks with Cache Transactions > Description > Deadlocks of this type are possible if user locks 2 or more keys within 2 or > more transactions in different orders (this does not apply to OPTIMISTIC > SERIALIZABLE transactions as they are capable to detect deadlock and choose > winning tx). Currently, Ignite can detect deadlocked transactions but this > procedure is started only for transactions that have timeout set explicitly > or default timeout in configuration set to value greater than 0. > Detection and Solution > Each NEAR node should periodically (need new config property?) scan the list > of local transactions and initiate the same procedure as we have now for > timed out transactions. If deadlock found it should be reported to logs. Log > record should contain: near nodes, transaction IDs, cache names, keys > (limited to several tens of) involved in deadlock. User should have ability > to configure default behavior - REPORT_ONLY, ROLLBACK (any more?) or manually > rollback selected transaction through web console or Visor. > Report > If deadlock found it should be reported to logs. Log record should contain: > near nodes, transaction IDs, cache names, keys (limited to several tens of) > involved in deadlock. > Also there should be a screen in Web Console that will list all ongoing > transactions in the cluster including the following info: > - Near node > - Start time > - DHT nodes > - Pending Locks (by request) > Web Console should provide ability to rollback any transaction via UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (IGNITE-6890) Proper behavior on ExchangeWorker exits with error
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-6890: Comment: was deleted (was: [~alex_pl], review my last patch, please. [Ignite 2.0 Tests :: Ignite Basic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic&tab=buildTypeStatusDiv&branch_Ignite20Tests=pull%2F3083%2Fhead]) > Proper behavior on ExchangeWorker exits with error > --- > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6890) Proper behavior on ExchangeWorker exits with error
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297049#comment-16297049 ] Dmitriy Sorokin commented on IGNITE-6890: - [~alex_pl], review my last patch, please. [Ignite 2.0 Tests :: Ignite Basic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic&tab=buildTypeStatusDiv&branch_Ignite20Tests=pull%2F3083%2Fhead] > Proper behavior on ExchangeWorker exits with error > --- > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-5302) Empty LOST partition may be used as OWNING after resetting lost partitions
[ https://issues.apache.org/jira/browse/IGNITE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296899#comment-16296899 ] Dmitriy Sorokin commented on IGNITE-5302: - The last research results for this ticket (on the head of branch ignite-5267): Without grid activation after start of nodes (last line in code block below) {code} IgniteEx ignite1 = (IgniteEx)G.start(getConfiguration("test1")); IgniteEx ignite2 = (IgniteEx)G.start(getConfiguration("test2")); IgniteEx ignite3 = (IgniteEx)G.start(getConfiguration("test3")); IgniteEx ignite4 = (IgniteEx)G.start(getConfiguration("test4")); ignite1.active(true); {code} test fails with exception shown below: {noformat} class org.apache.ignite.IgniteException: Can not perform the operation because the cluster is inactive. Note, that the cluster is considered inactive by default if Ignite Persistent Store is used to let all the nodes join the cluster. To activate the cluster call Ignite.activate(true). at org.apache.ignite.internal.IgniteKernal.checkClusterState(IgniteKernal.java:3693) at org.apache.ignite.internal.IgniteKernal.cache(IgniteKernal.java:2713) at org.apache.ignite.internal.processors.cache.persistence.IgnitePdsCacheRebalancingAbstractTest.testPartitionLossAndRecover(IgnitePdsCacheRebalancingAbstractTest.java:336) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176) at org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1995) at org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132) at org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1910) at java.lang.Thread.run(Thread.java:745) {noformat} With grid activation test fails with assertion error: {noformat} java.lang.AssertionError at org.apache.ignite.internal.processors.cache.persistence.IgnitePdsCacheRebalancingAbstractTest.testPartitionLossAndRecover(IgnitePdsCacheRebalancingAbstractTest.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176) at org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1995) at org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132) at org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1910) at java.lang.Thread.run(Thread.java:745) {noformat} Assertion error shown above happens on that code line: {code} assert !ignite1.cache(cacheName).lostPartitions().isEmpty(); {code} So, we need provide lost partitions in our test first, then we'll can fix the problem. > Empty LOST partition may be used as OWNING after resetting lost partitions > -- > > Key: IGNITE-5302 > URL: https://issues.apache.org/jira/browse/IGNITE-5302 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Priority: Blocker > Labels: MakeTeamcityGreenAgain, Muted_test, test-fail > Fix For: 2.4 > > > h2. Notes > Test *testPartitionLossAndRecover* reproducing the issue can be found in > ignite-5267 branch with PDS functionality. > h2. Steps to reproduce > # Four nodes are started, some key is added to partitioned cache > # Primary and backup nodes for the key are stopped, key's partition is > declared LOST on remaining nodes > # Primary and backup nodes are started again, cache's lost partitions are > reset > # Key is requested from cache > h2. Expected behavior > Correct value is returned from primary for this partition > h2. Actual behavior > Request for value is sent to node where partition is empty (not to primary > node), null is returned > h2. Latest findings > # The main problem with the scenario is that request for key gets mapped not > only to P/B nodes with real value but also to the node where that partition > existed only in LOST state after P/B shutdown on step #2 > # It was found that on step #3 after primary and backup are joined partition > counter is increased for empty partition in LOST state
[jira] [Assigned] (IGNITE-5302) Empty LOST partition may be used as OWNING after resetting lost partitions
[ https://issues.apache.org/jira/browse/IGNITE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-5302: --- Assignee: (was: Dmitriy Sorokin) > Empty LOST partition may be used as OWNING after resetting lost partitions > -- > > Key: IGNITE-5302 > URL: https://issues.apache.org/jira/browse/IGNITE-5302 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Priority: Blocker > Labels: MakeTeamcityGreenAgain, Muted_test, test-fail > Fix For: 2.4 > > > h2. Notes > Test *testPartitionLossAndRecover* reproducing the issue can be found in > ignite-5267 branch with PDS functionality. > h2. Steps to reproduce > # Four nodes are started, some key is added to partitioned cache > # Primary and backup nodes for the key are stopped, key's partition is > declared LOST on remaining nodes > # Primary and backup nodes are started again, cache's lost partitions are > reset > # Key is requested from cache > h2. Expected behavior > Correct value is returned from primary for this partition > h2. Actual behavior > Request for value is sent to node where partition is empty (not to primary > node), null is returned > h2. Latest findings > # The main problem with the scenario is that request for key gets mapped not > only to P/B nodes with real value but also to the node where that partition > existed only in LOST state after P/B shutdown on step #2 > # It was found that on step #3 after primary and backup are joined partition > counter is increased for empty partition in LOST state which looks wrong -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6890) Proper behavior on ExchangeWorker exits with error
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290688#comment-16290688 ] Dmitriy Sorokin commented on IGNITE-6890: - [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. [Ignite 2.0 Tests :: Ignite Basic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic&branch_Ignite20Tests=pull%2F3083%2Fhead&tab=buildTypeStatusDiv] > Proper behavior on ExchangeWorker exits with error > --- > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287401#comment-16287401 ] Dmitriy Sorokin edited comment on IGNITE-6171 at 12/13/17 1:22 PM: --- [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. [Ignite 2.0 Tests :: Ignite Basic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic&branch_Ignite20Tests=pull%2F3076%2Fhead&tab=buildTypeStatusDiv] was (Author: cyberdemon): [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. [Ignite20Tests_IgniteBasic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic&branch_Ignite20Tests=pull%2F3076%2Fhead&tab=buildTypeStatusDiv] > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287401#comment-16287401 ] Dmitriy Sorokin edited comment on IGNITE-6171 at 12/13/17 1:21 PM: --- [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. [Ignite20Tests_IgniteBasic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic&branch_Ignite20Tests=pull%2F3076%2Fhead&tab=buildTypeStatusDiv] was (Author: cyberdemon): [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. [Ignite20Tests_IgniteBasic|https://ci.ignite.apache.org/viewLog.html?buildId=991128&tab=buildResultsDiv&buildTypeId=Ignite20Tests_IgniteBasic] > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287401#comment-16287401 ] Dmitriy Sorokin edited comment on IGNITE-6171 at 12/13/17 1:18 PM: --- [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. [Ignite20Tests_IgniteBasic|https://ci.ignite.apache.org/viewLog.html?buildId=991128&tab=buildResultsDiv&buildTypeId=Ignite20Tests_IgniteBasic] was (Author: cyberdemon): [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287401#comment-16287401 ] Dmitriy Sorokin commented on IGNITE-6171: - [~avinogradov], review my new patch, please. I think that passing of Ignite Basic test suite is enouth for this patch. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6890) Proper behavior on ExchangeWorker exits with error
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284932#comment-16284932 ] Dmitriy Sorokin commented on IGNITE-6890: - As was proposed by [~avinogradov] in discussion thread [Internal problems requiring graceful node shutdown, reboot, etc.|http://apache-ignite-developers.2346864.n4.nabble.com/Internal-problems-requiring-graceful-node-shutdown-reboot-etc-td24856.html], scheme with IgniteFailureHandler and IgniteFailureAction will be implemented: {code} interface IgniteFailureHandler { IgniteFailureAction onFailure(IgniteFailureCause cause); } public enum IgniteFailureAction { RESTART_JVM, STOP, NOOP; } {code} Default implementation of IgniteFailureHandler will be implemented and enabled by default, and the ability of setting a custom user implementation in IgniteConfiguration will be added. > Proper behavior on ExchangeWorker exits with error > --- > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-6171: Comment: was deleted (was: [~avinogradov], review new patch, please!) > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281641#comment-16281641 ] Dmitriy Sorokin commented on IGNITE-6171: - [~avinogradov], review new patch, please! > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6742) Java 9: rework Cleaner usage in PlatformMemoryPool class
[ https://issues.apache.org/jira/browse/IGNITE-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-6742: --- Assignee: Dmitriy Sorokin > Java 9: rework Cleaner usage in PlatformMemoryPool class > > > Key: IGNITE-6742 > URL: https://issues.apache.org/jira/browse/IGNITE-6742 > Project: Ignite > Issue Type: Task > Components: platforms >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Fix For: 2.4 > > > We attach special cleaner to {{PlatformMemoryPool}} using > {{sun.misc.Cleaner.create}} method. This way we ensure that thread-local > native memory (which is used to pass data between platform and Java) is > released properly. > Need to rework this API to reflection-based approach, which works for both > Java 7/8 and Java 9. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274600#comment-16274600 ] Dmitriy Sorokin commented on IGNITE-6171: - [~avinogradov], review new patch, please! > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6891) Proper behavior on Persistence errors
[ https://issues.apache.org/jira/browse/IGNITE-6891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-6891: --- Assignee: Dmitriy Sorokin > Proper behavior on Persistence errors > -- > > Key: IGNITE-6891 > URL: https://issues.apache.org/jira/browse/IGNITE-6891 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264634#comment-16264634 ] Dmitriy Sorokin edited comment on IGNITE-6171 at 11/23/17 5:22 PM: --- [~avinogradov], please review new patch. was (Author: cyberdemon): Anton Vinogradov, please review new patch. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264634#comment-16264634 ] Dmitriy Sorokin commented on IGNITE-6171: - Anton Vinogradov, please review new patch. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6890) Proper behavior on ExchangeWorker exits with error
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264407#comment-16264407 ] Dmitriy Sorokin commented on IGNITE-6890: - [~avinogradov], please review my patch. > Proper behavior on ExchangeWorker exits with error > --- > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264324#comment-16264324 ] Dmitriy Sorokin commented on IGNITE-6171: - As was agreed with [~avinogradov] and [~vozerov], the new metric will be added - a window (with configurable size) of pairs: time (in ms) -> duration (in ms) of jvm pause events, which duration exceeds a threshold. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262267#comment-16262267 ] Dmitriy Sorokin commented on IGNITE-6171: - [~avinogradov], please review my corrections. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260873#comment-16260873 ] Dmitriy Sorokin commented on IGNITE-6171: - [~avinogradov], please review patch. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Affects Versions: 2.3 >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > Fix For: 2.4 > > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6890) Proper behavior on ExchangeWorker exits with error
[ https://issues.apache.org/jira/browse/IGNITE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-6890: --- Assignee: Dmitriy Sorokin > Proper behavior on ExchangeWorker exits with error > --- > > Key: IGNITE-6890 > URL: https://issues.apache.org/jira/browse/IGNITE-6890 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Dmitriy Sorokin > Labels: iep-7 > Fix For: 2.4 > > > Node should be stopped anyway, what we can provide is user callback, > something like beforeNodeStop'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260656#comment-16260656 ] Dmitriy Sorokin commented on IGNITE-6171: - [~vozerov], [~avinogradov] Implementations and, moreover, existence of that bean may be different in different jvm implementations. Also, pauses theoretically may has cause other than GC STWs. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260600#comment-16260600 ] Dmitriy Sorokin commented on IGNITE-6171: - We discussed with [~avinogradov] the set of required metrics, and worked out the decision that the metrics will be values of total count and duration of pauses exceeding the threshold. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255216#comment-16255216 ] Dmitriy Sorokin edited comment on IGNITE-6171 at 11/16/17 12:22 PM: [~vozerov], [~avinogradov] I think that we don't need to use JNI method, we only need a standard thread that wakes up through a small fixed timeout (20 ms, for example) and updates the time value by current system time. with calculating the difference with the previous value. If the difference with the previous value will differ significantly from the expected one, this will mean that our thread has been frozen some time, and it does not matter if it was a STW pause or other cause of the system response degradation. The system state with our control thread non-running more can't happen instantaneously, so we can detect the fact of system response degradation by this way. was (Author: cyberdemon): I think that we don't need to use JNI method, we only need a standard thread that wakes up through a small fixed timeout (20 ms, for example) and updates the time value by current system time. with calculating the difference with the previous value. If the difference with the previous value will differ significantly from the expected one, this will mean that our thread has been frozen some time, and it does not matter if it was a STW pause or other cause of the system response degradation. The system state with our control thread non-running more can't happen instantaneously, so we can detect the fact of system response degradation by this way. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255216#comment-16255216 ] Dmitriy Sorokin commented on IGNITE-6171: - I think that we don't need to use JNI method, we only need a standard thread that wakes up through a small fixed timeout (20 ms, for example) and updates the time value by current system time. with calculating the difference with the previous value. If the difference with the previous value will differ significantly from the expected one, this will mean that our thread has been frozen some time, and it does not matter if it was a STW pause or other cause of the system response degradation. The system state with our control thread non-running more can't happen instantaneously, so we can detect the fact of system response degradation by this way. > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253223#comment-16253223 ] Dmitriy Sorokin commented on IGNITE-6171: - [~vozerov] [~avinogradov] Hi, people! STW pauses of what length we should consider as very long? > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: iep-7, usability > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6171) Native facility to control excessive GC pauses
[ https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-6171: --- Assignee: Dmitriy Sorokin > Native facility to control excessive GC pauses > -- > > Key: IGNITE-6171 > URL: https://issues.apache.org/jira/browse/IGNITE-6171 > Project: Ignite > Issue Type: Task > Components: general >Reporter: Vladimir Ozerov >Assignee: Dmitriy Sorokin > Labels: usability > > Ignite is Java-based application. If node experiences long GC pauses it may > negatively affect other nodes. We need to find a way to detect long GC pauses > within the process and trigger some actions in response, e.g. node stop. > This is a kind of Inception \[1\], when you need to understand that you sleep > while sleeping. As all Java threads are blocked on safepoint, we cannot use > Java's thread to detect Java's GC. Native threads should be used instead. > Proposed solution: > 1) Thread 1 should periodically call dummy JNI method returning current time, > and set this time to shared variable; > 2) Thread 2 should periodically check that variable. If it has not been > changed for some time - most likely we are in GC pause. Once certain > threashold is reached - trigger compensating action, whether this is a > warning, process kill, or what so ever. > Justification: crossing native -> Java boundaries involves safepoints. This > way Thread 1 will be trapped if STW pause is in progress. Java method cannot > be empty, as JVM is smart enough and can deduce it to no-op. > \[1\] http://www.imdb.com/title/tt1375666/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-5811) Detect internal Ignite problems (java-level deadlock, hangs, etc) and act according to a policy configured.
[ https://issues.apache.org/jira/browse/IGNITE-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-5811: --- Assignee: (was: Dmitriy Sorokin) > Detect internal Ignite problems (java-level deadlock, hangs, etc) and act > according to a policy configured. > --- > > Key: IGNITE-5811 > URL: https://issues.apache.org/jira/browse/IGNITE-5811 > Project: Ignite > Issue Type: New Feature >Reporter: Yakov Zhdanov > Labels: usability > > This has something in common with segmentation policy we currently have. User > should get notified on a deadlock problem and node should take an action > (stop by default). > Also Ignite may react on internal errors and hangs in the same way - fire > event and take the appropriate action. > Current list of cases when node should (by default) stop itself: > # Discovery reports segmentation (already implemented) > # Critical discovery thread fails (already implemented) > # NIO communication thread fails (already implemented) > The following needs to be added > # Java-deadlock detected > # Internal threads stuck (no progress on current tasks during defined period) > # ExchangeWorker exits with error > We need to reapproach handling for all situations above to use the same > mechanism and make node take the action according to a configured policy -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-5811) Detect internal Ignite problems (java-level deadlock, hangs, etc) and act according to a policy configured.
[ https://issues.apache.org/jira/browse/IGNITE-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-5811: --- Assignee: Dmitriy Sorokin > Detect internal Ignite problems (java-level deadlock, hangs, etc) and act > according to a policy configured. > --- > > Key: IGNITE-5811 > URL: https://issues.apache.org/jira/browse/IGNITE-5811 > Project: Ignite > Issue Type: New Feature >Reporter: Yakov Zhdanov >Assignee: Dmitriy Sorokin > Labels: usability > > This has something in common with segmentation policy we currently have. User > should get notified on a deadlock problem and node should take an action > (stop by default). > Also Ignite may react on internal errors and hangs in the same way - fire > event and take the appropriate action. > Current list of cases when node should (by default) stop itself: > # Discovery reports segmentation (already implemented) > # Critical discovery thread fails (already implemented) > # NIO communication thread fails (already implemented) > The following needs to be added > # Java-deadlock detected > # Internal threads stuck (no progress on current tasks during defined period) > # ExchangeWorker exits with error > We need to reapproach handling for all situations above to use the same > mechanism and make node take the action according to a configured policy -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-5691) IgniteHadoopFileSystemShmemExternalDualAsyncSelfTest sometimes hangs on TC
[ https://issues.apache.org/jira/browse/IGNITE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-5691: --- Assignee: Dmitriy Sorokin > IgniteHadoopFileSystemShmemExternalDualAsyncSelfTest sometimes hangs on TC > -- > > Key: IGNITE-5691 > URL: https://issues.apache.org/jira/browse/IGNITE-5691 > Project: Ignite > Issue Type: Bug > Components: hadoop >Affects Versions: 2.1 >Reporter: Ilya Lantukh >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: MakeTeamcityGreenAgain, Muted_test, test-fail > Attachments: Ignite_2.0_Tests_Ignite_IGFS_Linux_and_MacOS_444.log.zip > > > Hangs when output stream is closed: > {noformat} > [12:38:39]W: [org.apache.ignite:ignite-hadoop] Thread > [name="test-runner-#15168%grid%", id=24808, state=WAITING, blockCnt=0, > waitCnt=3] > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > sun.misc.Unsafe.park(Native Method) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:315) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:176) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.i.processors.hadoop.impl.igfs.HadoopIgfsOutProc.closeStream(HadoopIgfsOutProc.java:446) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.i.processors.hadoop.impl.igfs.HadoopIgfsOutputStream.close(HadoopIgfsOutputStream.java:142) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > java.io.FilterOutputStream.close(FilterOutputStream.java:160) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.i.processors.hadoop.impl.igfs.IgniteHadoopFileSystemAbstractSelfTest.testDeleteSuccessfulIfPathIsOpenedToRead(IgniteHadoopFileSystemAbstractSelfTest.java:752) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > java.lang.reflect.Method.invoke(Method.java:606) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > junit.framework.TestCase.runTest(TestCase.java:176) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1997) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > o.a.i.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1912) > [12:38:39]W: [org.apache.ignite:ignite-hadoop] at > java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-5302) Empty LOST partition may be used as OWNING after resetting lost partitions
[ https://issues.apache.org/jira/browse/IGNITE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187704#comment-16187704 ] Dmitriy Sorokin commented on IGNITE-5302: - [~vozerov], I ran into the problem of the lack of lost partitions after stopping two of the four nodes, and now I have some questions about this topic. And I think that we can move the fix to release 2.4 too. > Empty LOST partition may be used as OWNING after resetting lost partitions > -- > > Key: IGNITE-5302 > URL: https://issues.apache.org/jira/browse/IGNITE-5302 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Dmitriy Sorokin >Priority: Blocker > Labels: MakeTeamcityGreenAgain, Muted_test, test-fail > Fix For: 2.3 > > > h2. Notes > Test *testPartitionLossAndRecover* reproducing the issue can be found in > ignite-5267 branch with PDS functionality. > h2. Steps to reproduce > # Four nodes are started, some key is added to partitioned cache > # Primary and backup nodes for the key are stopped, key's partition is > declared LOST on remaining nodes > # Primary and backup nodes are started again, cache's lost partitions are > reset > # Key is requested from cache > h2. Expected behavior > Correct value is returned from primary for this partition > h2. Actual behavior > Request for value is sent to node where partition is empty (not to primary > node), null is returned > h2. Latest findings > # The main problem with the scenario is that request for key gets mapped not > only to P/B nodes with real value but also to the node where that partition > existed only in LOST state after P/B shutdown on step #2 > # It was found that on step #3 after primary and backup are joined partition > counter is increased for empty partition in LOST state which looks wrong -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-5302) Empty LOST partition may be used as OWNING after resetting lost partitions
[ https://issues.apache.org/jira/browse/IGNITE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-5302: --- Assignee: Dmitriy Sorokin > Empty LOST partition may be used as OWNING after resetting lost partitions > -- > > Key: IGNITE-5302 > URL: https://issues.apache.org/jira/browse/IGNITE-5302 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Dmitriy Sorokin >Priority: Blocker > Labels: MakeTeamcityGreenAgain, Muted_test, test-fail > Fix For: 2.3 > > > h2. Notes > Test *testPartitionLossAndRecover* reproducing the issue can be found in > ignite-5267 branch with PDS functionality. > h2. Steps to reproduce > # Four nodes are started, some key is added to partitioned cache > # Primary and backup nodes for the key are stopped, key's partition is > declared LOST on remaining nodes > # Primary and backup nodes are started again, cache's lost partitions are > reset > # Key is requested from cache > h2. Expected behavior > Correct value is returned from primary for this partition > h2. Actual behavior > Request for value is sent to node where partition is empty (not to primary > node), null is returned > h2. Latest findings > # The main problem with the scenario is that request for key gets mapped not > only to P/B nodes with real value but also to the node where that partition > existed only in LOST state after P/B shutdown on step #2 > # It was found that on step #3 after primary and backup are joined partition > counter is increased for empty partition in LOST state which looks wrong -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-4181) The several runs of ServicesExample causes NPE
[ https://issues.apache.org/jira/browse/IGNITE-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-4181: Fix Version/s: (was: 2.2) 2.1 > The several runs of ServicesExample causes NPE > -- > > Key: IGNITE-4181 > URL: https://issues.apache.org/jira/browse/IGNITE-4181 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6, 1.7, 2.0 > Environment: Windows 10, Oracle JDK 7 >Reporter: Sergey Kozlov >Assignee: Dmitriy Sorokin > Labels: newbie > Fix For: 2.1 > > > 0. Open example project in IDEA > 1. Start 2-3 {{ExampleNodeStartup}} > 2. Run {{ServicesExample}} several times. > Sometimes it causes NullPointerException: > {noformat} > Executing closure [mapSize=10] > Service was cancelled: myNodeSingletonService > [15:37:20,020][INFO ][srvc-deploy-#24%null%][GridServiceProcessor] Cancelled > service instance [name=myNodeSingletonService, > execId=88a92a4d-c1cb-4a9b-8930-c67ac7f42bf3] > [15:37:20,032][INFO ][sys-#33%null%][GridCacheProcessor] Stopped cache: > myNodeSingletonService > [15:37:20,033][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=4], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,045][INFO ][sys-#39%null%][GridCacheProcessor] Stopped cache: > myClusterSingletonService > [15:37:20,046][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=5], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,062][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Node left topology: TcpDiscoveryNode > [id=4f9cbc67-d756-4c25-9ee4-aee6528da024, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.25.4.107, 2001:0:9d38:6ab8:34b2:9f3e:3c6f:269], > sockAddrs=[/2001:0:9d38:6ab8:34b2:9f3e:3c6f:269:0, /127.0.0.1:0, > /0:0:0:0:0:0:0:1:0, work-pc/172.25.4.107:0], discPort=0, order=10, > intOrder=7, lastExchangeTime=1478522239236, loc=false, > ver=1.7.3#20161107-sha1:5132ac87, isClient=true] > [15:37:20,063][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Topology snapshot [ver=11, servers=3, clients=0, CPUs=8, heap=11.0GB] > [15:37:20,064][INFO ][sys-#44%null%][GridCacheProcessor] Stopped cache: > myMultiService > [15:37:20,066][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=6], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,076][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myClusterSingletonService, mode=PARTITIONED] > [15:37:20,115][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=7], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,121][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=11, > minorTopVer=0], evt=NODE_LEFT, node=4f9cbc67-d756-4c25-9ee4-aee6528da024] > [15:37:20,133][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myMultiService, mode=PARTITIONED] > [15:37:20,135][ERROR][exchange-worker-#23%null%][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=11, > minorTopVer=1], nodeId=5faac72a, evt=DISCOVERY_CUSTOM_EVT] > java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initStartedCacheOnCoordinator(CacheAffinitySharedManager.java:743) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:413) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:565) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:448) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1447) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > [15:37:2
[jira] [Assigned] (IGNITE-4181) The several runs of ServicesExample causes NPE
[ https://issues.apache.org/jira/browse/IGNITE-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin reassigned IGNITE-4181: --- Assignee: Dmitriy Sorokin (was: Andrey Kuznetsov) > The several runs of ServicesExample causes NPE > -- > > Key: IGNITE-4181 > URL: https://issues.apache.org/jira/browse/IGNITE-4181 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6, 1.7, 2.0 > Environment: Windows 10, Oracle JDK 7 >Reporter: Sergey Kozlov >Assignee: Dmitriy Sorokin > Labels: newbie > Fix For: 2.2 > > > 0. Open example project in IDEA > 1. Start 2-3 {{ExampleNodeStartup}} > 2. Run {{ServicesExample}} several times. > Sometimes it causes NullPointerException: > {noformat} > Executing closure [mapSize=10] > Service was cancelled: myNodeSingletonService > [15:37:20,020][INFO ][srvc-deploy-#24%null%][GridServiceProcessor] Cancelled > service instance [name=myNodeSingletonService, > execId=88a92a4d-c1cb-4a9b-8930-c67ac7f42bf3] > [15:37:20,032][INFO ][sys-#33%null%][GridCacheProcessor] Stopped cache: > myNodeSingletonService > [15:37:20,033][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=4], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,045][INFO ][sys-#39%null%][GridCacheProcessor] Stopped cache: > myClusterSingletonService > [15:37:20,046][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=5], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,062][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Node left topology: TcpDiscoveryNode > [id=4f9cbc67-d756-4c25-9ee4-aee6528da024, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.25.4.107, 2001:0:9d38:6ab8:34b2:9f3e:3c6f:269], > sockAddrs=[/2001:0:9d38:6ab8:34b2:9f3e:3c6f:269:0, /127.0.0.1:0, > /0:0:0:0:0:0:0:1:0, work-pc/172.25.4.107:0], discPort=0, order=10, > intOrder=7, lastExchangeTime=1478522239236, loc=false, > ver=1.7.3#20161107-sha1:5132ac87, isClient=true] > [15:37:20,063][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Topology snapshot [ver=11, servers=3, clients=0, CPUs=8, heap=11.0GB] > [15:37:20,064][INFO ][sys-#44%null%][GridCacheProcessor] Stopped cache: > myMultiService > [15:37:20,066][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=6], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,076][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myClusterSingletonService, mode=PARTITIONED] > [15:37:20,115][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=7], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,121][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=11, > minorTopVer=0], evt=NODE_LEFT, node=4f9cbc67-d756-4c25-9ee4-aee6528da024] > [15:37:20,133][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myMultiService, mode=PARTITIONED] > [15:37:20,135][ERROR][exchange-worker-#23%null%][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=11, > minorTopVer=1], nodeId=5faac72a, evt=DISCOVERY_CUSTOM_EVT] > java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initStartedCacheOnCoordinator(CacheAffinitySharedManager.java:743) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:413) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:565) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:448) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1447) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > [15
[jira] [Commented] (IGNITE-4181) The several runs of ServicesExample causes NPE
[ https://issues.apache.org/jira/browse/IGNITE-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098501#comment-16098501 ] Dmitriy Sorokin commented on IGNITE-4181: - This issue seems resolved by: commit 7e45010b4848d0a570995e6dc938875710d846d8 Author: sboikov Date: 2017-06-04T08:02:31Z ignite-5075 'logical' caches sharing the same 'physical' cache group > The several runs of ServicesExample causes NPE > -- > > Key: IGNITE-4181 > URL: https://issues.apache.org/jira/browse/IGNITE-4181 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6, 1.7, 2.0 > Environment: Windows 10, Oracle JDK 7 >Reporter: Sergey Kozlov >Assignee: Andrey Kuznetsov > Labels: newbie > Fix For: 2.2 > > > 0. Open example project in IDEA > 1. Start 2-3 {{ExampleNodeStartup}} > 2. Run {{ServicesExample}} several times. > Sometimes it causes NullPointerException: > {noformat} > Executing closure [mapSize=10] > Service was cancelled: myNodeSingletonService > [15:37:20,020][INFO ][srvc-deploy-#24%null%][GridServiceProcessor] Cancelled > service instance [name=myNodeSingletonService, > execId=88a92a4d-c1cb-4a9b-8930-c67ac7f42bf3] > [15:37:20,032][INFO ][sys-#33%null%][GridCacheProcessor] Stopped cache: > myNodeSingletonService > [15:37:20,033][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=4], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,045][INFO ][sys-#39%null%][GridCacheProcessor] Stopped cache: > myClusterSingletonService > [15:37:20,046][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=5], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,062][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Node left topology: TcpDiscoveryNode > [id=4f9cbc67-d756-4c25-9ee4-aee6528da024, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.25.4.107, 2001:0:9d38:6ab8:34b2:9f3e:3c6f:269], > sockAddrs=[/2001:0:9d38:6ab8:34b2:9f3e:3c6f:269:0, /127.0.0.1:0, > /0:0:0:0:0:0:0:1:0, work-pc/172.25.4.107:0], discPort=0, order=10, > intOrder=7, lastExchangeTime=1478522239236, loc=false, > ver=1.7.3#20161107-sha1:5132ac87, isClient=true] > [15:37:20,063][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Topology snapshot [ver=11, servers=3, clients=0, CPUs=8, heap=11.0GB] > [15:37:20,064][INFO ][sys-#44%null%][GridCacheProcessor] Stopped cache: > myMultiService > [15:37:20,066][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=6], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,076][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myClusterSingletonService, mode=PARTITIONED] > [15:37:20,115][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=7], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,121][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=11, > minorTopVer=0], evt=NODE_LEFT, node=4f9cbc67-d756-4c25-9ee4-aee6528da024] > [15:37:20,133][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myMultiService, mode=PARTITIONED] > [15:37:20,135][ERROR][exchange-worker-#23%null%][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=11, > minorTopVer=1], nodeId=5faac72a, evt=DISCOVERY_CUSTOM_EVT] > java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initStartedCacheOnCoordinator(CacheAffinitySharedManager.java:743) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:413) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:565) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:448) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.bod
[jira] [Commented] (IGNITE-4181) The several runs of ServicesExample causes NPE
[ https://issues.apache.org/jira/browse/IGNITE-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098485#comment-16098485 ] Dmitriy Sorokin commented on IGNITE-4181: - This issue was caused by mechanics which uses GridCacheProcessor's cache descriptors map concurrently by event handlers at TCP discovery workers and exchange processors at exchange worker. Cache creation event puts newly created cache descriptor into GCP's cache descriptors map, then emits exchange, and processing of one try take cache descriptor, which already can be removed by cache deletion event, processed later then cache creation event but earlier then cache creation exchange. > The several runs of ServicesExample causes NPE > -- > > Key: IGNITE-4181 > URL: https://issues.apache.org/jira/browse/IGNITE-4181 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6, 1.7, 2.0 > Environment: Windows 10, Oracle JDK 7 >Reporter: Sergey Kozlov >Assignee: Andrey Kuznetsov > Labels: newbie > Fix For: 2.2 > > > 0. Open example project in IDEA > 1. Start 2-3 {{ExampleNodeStartup}} > 2. Run {{ServicesExample}} several times. > Sometimes it causes NullPointerException: > {noformat} > Executing closure [mapSize=10] > Service was cancelled: myNodeSingletonService > [15:37:20,020][INFO ][srvc-deploy-#24%null%][GridServiceProcessor] Cancelled > service instance [name=myNodeSingletonService, > execId=88a92a4d-c1cb-4a9b-8930-c67ac7f42bf3] > [15:37:20,032][INFO ][sys-#33%null%][GridCacheProcessor] Stopped cache: > myNodeSingletonService > [15:37:20,033][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=4], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,045][INFO ][sys-#39%null%][GridCacheProcessor] Stopped cache: > myClusterSingletonService > [15:37:20,046][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=5], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,062][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Node left topology: TcpDiscoveryNode > [id=4f9cbc67-d756-4c25-9ee4-aee6528da024, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.25.4.107, 2001:0:9d38:6ab8:34b2:9f3e:3c6f:269], > sockAddrs=[/2001:0:9d38:6ab8:34b2:9f3e:3c6f:269:0, /127.0.0.1:0, > /0:0:0:0:0:0:0:1:0, work-pc/172.25.4.107:0], discPort=0, order=10, > intOrder=7, lastExchangeTime=1478522239236, loc=false, > ver=1.7.3#20161107-sha1:5132ac87, isClient=true] > [15:37:20,063][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Topology snapshot [ver=11, servers=3, clients=0, CPUs=8, heap=11.0GB] > [15:37:20,064][INFO ][sys-#44%null%][GridCacheProcessor] Stopped cache: > myMultiService > [15:37:20,066][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=6], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,076][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myClusterSingletonService, mode=PARTITIONED] > [15:37:20,115][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=7], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,121][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=11, > minorTopVer=0], evt=NODE_LEFT, node=4f9cbc67-d756-4c25-9ee4-aee6528da024] > [15:37:20,133][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myMultiService, mode=PARTITIONED] > [15:37:20,135][ERROR][exchange-worker-#23%null%][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=11, > minorTopVer=1], nodeId=5faac72a, evt=DISCOVERY_CUSTOM_EVT] > java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initStartedCacheOnCoordinator(CacheAffinitySharedManager.java:743) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:413) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartiti
[jira] [Updated] (IGNITE-4181) The several runs of ServicesExample causes NPE
[ https://issues.apache.org/jira/browse/IGNITE-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Sorokin updated IGNITE-4181: Affects Version/s: 2.0 > The several runs of ServicesExample causes NPE > -- > > Key: IGNITE-4181 > URL: https://issues.apache.org/jira/browse/IGNITE-4181 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6, 1.7, 2.0 > Environment: Windows 10, Oracle JDK 7 >Reporter: Sergey Kozlov >Assignee: Andrey Kuznetsov > Labels: newbie > Fix For: 2.2 > > > 0. Open example project in IDEA > 1. Start 2-3 {{ExampleNodeStartup}} > 2. Run {{ServicesExample}} several times. > Sometimes it causes NullPointerException: > {noformat} > Executing closure [mapSize=10] > Service was cancelled: myNodeSingletonService > [15:37:20,020][INFO ][srvc-deploy-#24%null%][GridServiceProcessor] Cancelled > service instance [name=myNodeSingletonService, > execId=88a92a4d-c1cb-4a9b-8930-c67ac7f42bf3] > [15:37:20,032][INFO ][sys-#33%null%][GridCacheProcessor] Stopped cache: > myNodeSingletonService > [15:37:20,033][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=4], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,045][INFO ][sys-#39%null%][GridCacheProcessor] Stopped cache: > myClusterSingletonService > [15:37:20,046][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=5], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,062][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Node left topology: TcpDiscoveryNode > [id=4f9cbc67-d756-4c25-9ee4-aee6528da024, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.25.4.107, 2001:0:9d38:6ab8:34b2:9f3e:3c6f:269], > sockAddrs=[/2001:0:9d38:6ab8:34b2:9f3e:3c6f:269:0, /127.0.0.1:0, > /0:0:0:0:0:0:0:1:0, work-pc/172.25.4.107:0], discPort=0, order=10, > intOrder=7, lastExchangeTime=1478522239236, loc=false, > ver=1.7.3#20161107-sha1:5132ac87, isClient=true] > [15:37:20,063][INFO ][disco-event-worker-#20%null%][GridDiscoveryManager] > Topology snapshot [ver=11, servers=3, clients=0, CPUs=8, heap=11.0GB] > [15:37:20,064][INFO ][sys-#44%null%][GridCacheProcessor] Stopped cache: > myMultiService > [15:37:20,066][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=6], evt=DISCOVERY_CUSTOM_EVT, > node=5faac72a-72ab-4277-9643-0e962973b3f4] > [15:37:20,076][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myClusterSingletonService, mode=PARTITIONED] > [15:37:20,115][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=10, > minorTopVer=7], evt=DISCOVERY_CUSTOM_EVT, > node=478f1752-fdce-42c6-aef6-55a5f4c08d90] > [15:37:20,121][INFO > ][exchange-worker-#23%null%][GridCachePartitionExchangeManager] Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=11, > minorTopVer=0], evt=NODE_LEFT, node=4f9cbc67-d756-4c25-9ee4-aee6528da024] > [15:37:20,133][INFO ][exchange-worker-#23%null%][GridCacheProcessor] Started > cache [name=myMultiService, mode=PARTITIONED] > [15:37:20,135][ERROR][exchange-worker-#23%null%][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=11, > minorTopVer=1], nodeId=5faac72a, evt=DISCOVERY_CUSTOM_EVT] > java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initStartedCacheOnCoordinator(CacheAffinitySharedManager.java:743) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:413) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:565) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:448) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1447) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > [15:37:20,142][ERROR][exchange-worker