[jira] [Commented] (IGNITE-11253) When a node that is not part of the baseline topology joins the cluster, it may lead to a node failure.

2019-02-08 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763636#comment-16763636
 ] 

Ignite TC Bot commented on IGNITE-11253:


{panel:title=--> Run :: All: Possible 
Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Service Grid (legacy mode){color} [[tests 0 TIMEOUT , Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3038050]]

{color:#d04437}Cache (Restarts) 1{color} [[tests 
4|https://ci.ignite.apache.org/viewLog.html?buildId=3038010]]
* IgniteCacheRestartTestSuite: 
GridCacheReplicatedNodeRestartSelfTest.testRestartWithPutFourNodesNoBackups - 
0,0% fails in last 384 master runs.

{color:#d04437}ZooKeeper (Discovery) 1{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=3037987]]
* ZookeeperDiscoverySpiTestSuite1: 
ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 - 3,8% 
fails in last 286 master runs.

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3038065&buildTypeId=IgniteTests24Java8_RunAll]

> When a node that is not part of the baseline topology joins the cluster, it 
> may lead to a node failure.
> ---
>
> Key: IGNITE-11253
> URL: https://issues.apache.org/jira/browse/IGNITE-11253
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * In case of eager TTL is configured, a starting node creates and starts 
> {{cleanupWorker}} (see {{GridCacheTtlManager.start0()}})
>  * {{GridCacheSharedTtlCleanupManager.CleanupWorker}}, in its turn, has to 
> wait for {{discovery().localJoin()}} future that is completed by discovery 
> thread.
>  * On the other hand, the exchange thread stops cache contexts and, 
> therefore, it stops the {{cleanupWorker}} as well.
>  
> {code:java}
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.stopCleanupWorker(GridCacheSharedTtlCleanupManager.java:109)
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.unregister(GridCacheSharedTtlCleanupManager.java:82)
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.onKernalStop0(GridCacheTtlManager.java:110)
> org.apache.ignite.internal.processors.cache.GridCacheManagerAdapter.onKernalStop(GridCacheManagerAdapter.java:111)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:1495)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStopCaches(GridCacheProcessor.java:1182)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor$CacheRecoveryLifecycle.onBaselineChange(GridCacheProcessor.java:5637)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:910)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:792)
> {code}
> So, exchange thread may try to stop the {{cleanupWorker}} before the 
> {{localJoin}} future is completed by discovery thread. Unfortunately, 
> `cleanupWorker` incorrectly handles this situation, and this fact can lead to 
> a node failure:
> {code:java}
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Got 
> interrupted while waiting for future to complete.]]
> class org.apache.ignite.IgniteException: Got interrupted while waiting for 
> future to complete.
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2217)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:136)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Got interrupted 
> while waiting for future to complete.
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:186)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManage

[jira] [Commented] (IGNITE-11253) When a node that is not part of the baseline topology joins the cluster, it may lead to a node failure.

2019-02-11 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764906#comment-16764906
 ] 

Andrey Gura commented on IGNITE-11253:
--

[~slava.koptilin] Could you please add at least {{InterruptedException}} to 
{{X.hasCause}} method's parameters. Usually any worker stops via {{cancel}} 
call that interrupt worker's runner.

> When a node that is not part of the baseline topology joins the cluster, it 
> may lead to a node failure.
> ---
>
> Key: IGNITE-11253
> URL: https://issues.apache.org/jira/browse/IGNITE-11253
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * In case of eager TTL is configured, a starting node creates and starts 
> {{cleanupWorker}} (see {{GridCacheTtlManager.start0()}})
>  * {{GridCacheSharedTtlCleanupManager.CleanupWorker}}, in its turn, has to 
> wait for {{discovery().localJoin()}} future that is completed by discovery 
> thread.
>  * On the other hand, the exchange thread stops cache contexts and, 
> therefore, it stops the {{cleanupWorker}} as well.
>  
> {code:java}
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.stopCleanupWorker(GridCacheSharedTtlCleanupManager.java:109)
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.unregister(GridCacheSharedTtlCleanupManager.java:82)
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.onKernalStop0(GridCacheTtlManager.java:110)
> org.apache.ignite.internal.processors.cache.GridCacheManagerAdapter.onKernalStop(GridCacheManagerAdapter.java:111)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:1495)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStopCaches(GridCacheProcessor.java:1182)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor$CacheRecoveryLifecycle.onBaselineChange(GridCacheProcessor.java:5637)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:910)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:792)
> {code}
> So, exchange thread may try to stop the {{cleanupWorker}} before the 
> {{localJoin}} future is completed by discovery thread. Unfortunately, 
> `cleanupWorker` incorrectly handles this situation, and this fact can lead to 
> a node failure:
> {code:java}
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Got 
> interrupted while waiting for future to complete.]]
> class org.apache.ignite.IgniteException: Got interrupted while waiting for 
> future to complete.
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2217)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:136)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Got interrupted 
> while waiting for future to complete.
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:186)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2214)
> ... 3 more
> {code}
>  The obvious fix is changing the catch block
> {code:java}
> catch (Throwable t) {
> if (!(t instanceof IgniteInterruptedCheckedException))
> err = t;
> throw t;
> }
> {code}
> to the following:
> {code:java}
> catch (Throwable t) {
> if (!(X.hasCause(t, IgniteInterruptedCheckedException.class)))
> err = t;
> throw t;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11253) When a node that is not part of the baseline topology joins the cluster, it may lead to a node failure.

2019-02-11 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765746#comment-16765746
 ] 

Ignite TC Bot commented on IGNITE-11253:


{panel:title=--> Run :: All: Possible 
Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform .NET (NuGet)*{color} [[tests 0 Exit Code , Compilation 
Error |https://ci.ignite.apache.org/viewLog.html?buildId=3066504]]

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3064210&buildTypeId=IgniteTests24Java8_RunAll]

> When a node that is not part of the baseline topology joins the cluster, it 
> may lead to a node failure.
> ---
>
> Key: IGNITE-11253
> URL: https://issues.apache.org/jira/browse/IGNITE-11253
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * In case of eager TTL is configured, a starting node creates and starts 
> {{cleanupWorker}} (see {{GridCacheTtlManager.start0()}})
>  * {{GridCacheSharedTtlCleanupManager.CleanupWorker}}, in its turn, has to 
> wait for {{discovery().localJoin()}} future that is completed by discovery 
> thread.
>  * On the other hand, the exchange thread stops cache contexts and, 
> therefore, it stops the {{cleanupWorker}} as well.
>  
> {code:java}
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.stopCleanupWorker(GridCacheSharedTtlCleanupManager.java:109)
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.unregister(GridCacheSharedTtlCleanupManager.java:82)
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.onKernalStop0(GridCacheTtlManager.java:110)
> org.apache.ignite.internal.processors.cache.GridCacheManagerAdapter.onKernalStop(GridCacheManagerAdapter.java:111)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:1495)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStopCaches(GridCacheProcessor.java:1182)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor$CacheRecoveryLifecycle.onBaselineChange(GridCacheProcessor.java:5637)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:910)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:792)
> {code}
> So, exchange thread may try to stop the {{cleanupWorker}} before the 
> {{localJoin}} future is completed by discovery thread. Unfortunately, 
> `cleanupWorker` incorrectly handles this situation, and this fact can lead to 
> a node failure:
> {code:java}
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Got 
> interrupted while waiting for future to complete.]]
> class org.apache.ignite.IgniteException: Got interrupted while waiting for 
> future to complete.
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2217)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:136)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Got interrupted 
> while waiting for future to complete.
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:186)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2214)
> ... 3 more
> {code}
>  The obvious fix is changing the catch block
> {code:java}
> catch (Throwable t) {
> if (!(t instanceof IgniteInterruptedCheckedException))
> err = t;
> throw t;
> }
> {code}
> to the following:
> {code:java}
> catch (Throwable t) {
> if (!(X.hasCause(t, IgniteInterruptedCheckedException.class)))
> err = t;
> throw t;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11253) When a node that is not part of the baseline topology joins the cluster, it may lead to a node failure.

2019-02-12 Thread Vyacheslav Koptilin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765807#comment-16765807
 ] 

Vyacheslav Koptilin commented on IGNITE-11253:
--

Hi [~agura] ,

> Could you please add at least {{InterruptedException}} to {{X.hasCause}} 
>method's parameters. 
Done!

> When a node that is not part of the baseline topology joins the cluster, it 
> may lead to a node failure.
> ---
>
> Key: IGNITE-11253
> URL: https://issues.apache.org/jira/browse/IGNITE-11253
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * In case of eager TTL is configured, a starting node creates and starts 
> {{cleanupWorker}} (see {{GridCacheTtlManager.start0()}})
>  * {{GridCacheSharedTtlCleanupManager.CleanupWorker}}, in its turn, has to 
> wait for {{discovery().localJoin()}} future that is completed by discovery 
> thread.
>  * On the other hand, the exchange thread stops cache contexts and, 
> therefore, it stops the {{cleanupWorker}} as well.
>  
> {code:java}
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.stopCleanupWorker(GridCacheSharedTtlCleanupManager.java:109)
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager.unregister(GridCacheSharedTtlCleanupManager.java:82)
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.onKernalStop0(GridCacheTtlManager.java:110)
> org.apache.ignite.internal.processors.cache.GridCacheManagerAdapter.onKernalStop(GridCacheManagerAdapter.java:111)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:1495)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStopCaches(GridCacheProcessor.java:1182)
> org.apache.ignite.internal.processors.cache.GridCacheProcessor$CacheRecoveryLifecycle.onBaselineChange(GridCacheProcessor.java:5637)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:910)
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:792)
> {code}
> So, exchange thread may try to stop the {{cleanupWorker}} before the 
> {{localJoin}} future is completed by discovery thread. Unfortunately, 
> `cleanupWorker` incorrectly handles this situation, and this fact can lead to 
> a node failure:
> {code:java}
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Got 
> interrupted while waiting for future to complete.]]
> class org.apache.ignite.IgniteException: Got interrupted while waiting for 
> future to complete.
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2217)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:136)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Got interrupted 
> while waiting for future to complete.
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:186)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2214)
> ... 3 more
> {code}
>  The obvious fix is changing the catch block
> {code:java}
> catch (Throwable t) {
> if (!(t instanceof IgniteInterruptedCheckedException))
> err = t;
> throw t;
> }
> {code}
> to the following:
> {code:java}
> catch (Throwable t) {
> if (!(X.hasCause(t, IgniteInterruptedCheckedException.class)))
> err = t;
> throw t;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)