Alexey Goncharuk created IGNITE-11616: -----------------------------------------
Summary: NPE in MvccProcessorImpl when stopping a starting node Key: IGNITE-11616 URL: https://issues.apache.org/jira/browse/IGNITE-11616 Project: Ignite Issue Type: Test Components: sql Reporter: Alexey Goncharuk Fix For: 2.8 I observe the following NPE in IgniteBaselineAffinityTopologyActivationTest. It happens because we shutdown when MVCC coordinator is not assigned yet {code} java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106) at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097) at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorFailed(MvccProcessorImpl.java:527) at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onKernalStop(MvccProcessorImpl.java:459) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2335) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2283) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1194) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1992) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1683) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1109) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:607) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:984) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:925) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:913) at org.apache.ignite.internal.processors.cache.persistence.IgniteBaselineAffinityTopologyActivationTest.startGridWithConsistentId(IgniteBaselineAffinityTopologyActivationTest.java:729) at org.apache.ignite.internal.processors.cache.persistence.IgniteBaselineAffinityTopologyActivationTest.testNodeWithBltIsNotAllowedToJoinClusterDuringFirstActivation(IgniteBaselineAffinityTopologyActivationTest.java:532) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2102) at java.lang.Thread.run(Thread.java:745) {code} >From the first glance it looks like we can simply ignore the {{null}} node ID, >however, there is a race - in {{onKernalStop}} we block a busy lock and remove >discovery listener, then do a coordinator cleanup. However, the discovery >notification worker is only stopped in {{stop}} phase, but MVCC manager does a >cleanup in {{onKernalStop}} phase - so listener can execute some code after >the {{onKernalStop}} is executed because there is no busy lock protection in >the discovery listener itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005)