[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest

2018-08-22 Thread Sergey Kosarev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588705#comment-16588705
 ] 

Sergey Kosarev commented on IGNITE-9296:


[~agura], well, I've updated fix to your vision and added new TC Run.

>  Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> --
>
> Key: IGNITE-9296
> URL: https://issues.apache.org/jira/browse/IGNITE-9296
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Kosarev
>Assignee: Sergey Kosarev
>Priority: Major
> Attachments: logs.zip
>
>
> Here are log messages:
> {code}
> [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 
> 15:46:27,442][ERROR][main][root] Test has been timed out and will be 
> interrupted (threads dump will be taken before interruption) 
> [test=testFailWhileStart, timeout=6]
> {code}
> And later on all the suite also hangs up:
> {code}
> [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 
> {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] : [Step 3/4] Thread 
> [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, 
> state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] : [Step 3/4] Lock 
> [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, 
> ownerName=node-stopper, ownerId=9267]
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread 
> [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W: [org.apache.ignite:ignite-core] Lock 
> [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Object.java:502)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594)
> [18:46:28]W: [org.apache.ignite:ignite-core]  

[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest

2018-08-21 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587881#comment-16587881
 ] 

Andrey Gura commented on IGNITE-9296:
-

[~macrergate] I've looked a your change. It is odd fix from my point of view. 
It seems that syncer worker just should be stopped before WAL writer.

>  Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> --
>
> Key: IGNITE-9296
> URL: https://issues.apache.org/jira/browse/IGNITE-9296
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Kosarev
>Assignee: Sergey Kosarev
>Priority: Major
> Attachments: logs.zip
>
>
> Here are log messages:
> {code}
> [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 
> 15:46:27,442][ERROR][main][root] Test has been timed out and will be 
> interrupted (threads dump will be taken before interruption) 
> [test=testFailWhileStart, timeout=6]
> {code}
> And later on all the suite also hangs up:
> {code}
> [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 
> {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] : [Step 3/4] Thread 
> [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, 
> state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] : [Step 3/4] Lock 
> [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, 
> ownerName=node-stopper, ownerId=9267]
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread 
> [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W: [org.apache.ignite:ignite-core] Lock 
> [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Object.java:502)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionE

[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest

2018-08-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583804#comment-16583804
 ] 

ASF GitHub Bot commented on IGNITE-9296:


GitHub user macrergate opened a pull request:

https://github.com/apache/ignite/pull/4565

IGNITE-9296 should not wait if walWriter is cancelled



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-9296

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4565


commit 652a656d83c583dc76fcf2d51c0ffd9789ecd721
Author: Sergey Kosarev 
Date:   2018-08-17T11:35:14Z

IGNITE-9296 should not wait if walWriter is cancelled




>  Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> --
>
> Key: IGNITE-9296
> URL: https://issues.apache.org/jira/browse/IGNITE-9296
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Kosarev
>Priority: Major
> Attachments: logs.zip
>
>
> Here are log messages:
> {code}
> [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 
> 15:46:27,442][ERROR][main][root] Test has been timed out and will be 
> interrupted (threads dump will be taken before interruption) 
> [test=testFailWhileStart, timeout=6]
> {code}
> And later on all the suite also hangs up:
> {code}
> [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 
> {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] : [Step 3/4] Thread 
> [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, 
> state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] : [Step 3/4] Lock 
> [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, 
> ownerName=node-stopper, ownerId=9267]
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread 
> [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W: [org.apache.ignite:ignite-core] Lock 
> [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Object.java:502)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(

[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest

2018-08-16 Thread Sergey Kosarev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582595#comment-16582595
 ] 

Sergey Kosarev commented on IGNITE-9296:


suggest fix: while(true) -> while(!isCancelled()) в 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.WALWriter#flushBuffer

>  Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> --
>
> Key: IGNITE-9296
> URL: https://issues.apache.org/jira/browse/IGNITE-9296
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Kosarev
>Priority: Major
> Attachments: logs.zip
>
>
> Here are log messages:
> [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 
> 15:46:27,442][ERROR][main][root] Test has been timed out and will be 
> interrupted (threads dump will be taken before interruption) 
> [test=testFailWhileStart, timeout=6]
> And later on all the suite also hangs up:
> [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 
> {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] : [Step 3/4] Thread 
> [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, 
> state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] : [Step 3/4] Lock 
> [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, 
> ownerName=node-stopper, ownerId=9267]
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] : [Step 3/4] at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] : [Step 3/4] at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread 
> [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W: [org.apache.ignite:ignite-core] Lock 
> [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> java.lang.Object.wait(Object.java:502)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
> [18:46:28]W: [org.apache.ignite:ignite-core] at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594)
> [18:46:28]W: [org.apa