[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
[ https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588705#comment-16588705 ] Sergey Kosarev commented on IGNITE-9296: [~agura], well, I've updated fix to your vision and added new TC Run. > Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest > -- > > Key: IGNITE-9296 > URL: https://issues.apache.org/jira/browse/IGNITE-9296 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Kosarev >Assignee: Sergey Kosarev >Priority: Major > Attachments: logs.zip > > > Here are log messages: > {code} > [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 > 15:46:27,442][ERROR][main][root] Test has been timed out and will be > interrupted (threads dump will be taken before interruption) > [test=testFailWhileStart, timeout=6] > {code} > And later on all the suite also hangs up: > {code} > [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 > {buildId=1662285} has been running for more than 240 minutes. Terminating... > Main thread locked by node-stopper: > [18:46:27] : [Step 3/4] Thread > [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, > state=BLOCKED, blockCnt=4, waitCnt=142] > [18:46:27] : [Step 3/4] Lock > [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, > ownerName=node-stopper, ownerId=9267] > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374) > [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147) > node-stopper waits for the wal-segment-syncer stopping > [18:46:28]W: [org.apache.ignite:ignite-core] Thread > [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22] > [18:46:28]W: [org.apache.ignite:ignite-core] Lock > [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1] > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Native Method) > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Object.java:502) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594) > [18:46:28]W: [org.apache.ignite:ignite-core]
[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
[ https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587881#comment-16587881 ] Andrey Gura commented on IGNITE-9296: - [~macrergate] I've looked a your change. It is odd fix from my point of view. It seems that syncer worker just should be stopped before WAL writer. > Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest > -- > > Key: IGNITE-9296 > URL: https://issues.apache.org/jira/browse/IGNITE-9296 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Kosarev >Assignee: Sergey Kosarev >Priority: Major > Attachments: logs.zip > > > Here are log messages: > {code} > [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 > 15:46:27,442][ERROR][main][root] Test has been timed out and will be > interrupted (threads dump will be taken before interruption) > [test=testFailWhileStart, timeout=6] > {code} > And later on all the suite also hangs up: > {code} > [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 > {buildId=1662285} has been running for more than 240 minutes. Terminating... > Main thread locked by node-stopper: > [18:46:27] : [Step 3/4] Thread > [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, > state=BLOCKED, blockCnt=4, waitCnt=142] > [18:46:27] : [Step 3/4] Lock > [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, > ownerName=node-stopper, ownerId=9267] > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374) > [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147) > node-stopper waits for the wal-segment-syncer stopping > [18:46:28]W: [org.apache.ignite:ignite-core] Thread > [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22] > [18:46:28]W: [org.apache.ignite:ignite-core] Lock > [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1] > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Native Method) > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Object.java:502) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionE
[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
[ https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583804#comment-16583804 ] ASF GitHub Bot commented on IGNITE-9296: GitHub user macrergate opened a pull request: https://github.com/apache/ignite/pull/4565 IGNITE-9296 should not wait if walWriter is cancelled You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-9296 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/4565.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4565 commit 652a656d83c583dc76fcf2d51c0ffd9789ecd721 Author: Sergey Kosarev Date: 2018-08-17T11:35:14Z IGNITE-9296 should not wait if walWriter is cancelled > Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest > -- > > Key: IGNITE-9296 > URL: https://issues.apache.org/jira/browse/IGNITE-9296 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Kosarev >Priority: Major > Attachments: logs.zip > > > Here are log messages: > {code} > [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 > 15:46:27,442][ERROR][main][root] Test has been timed out and will be > interrupted (threads dump will be taken before interruption) > [test=testFailWhileStart, timeout=6] > {code} > And later on all the suite also hangs up: > {code} > [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 > {buildId=1662285} has been running for more than 240 minutes. Terminating... > Main thread locked by node-stopper: > [18:46:27] : [Step 3/4] Thread > [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, > state=BLOCKED, blockCnt=4, waitCnt=142] > [18:46:27] : [Step 3/4] Lock > [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, > ownerName=node-stopper, ownerId=9267] > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374) > [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147) > node-stopper waits for the wal-segment-syncer stopping > [18:46:28]W: [org.apache.ignite:ignite-core] Thread > [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22] > [18:46:28]W: [org.apache.ignite:ignite-core] Lock > [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1] > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Native Method) > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Object.java:502) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(
[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
[ https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582595#comment-16582595 ] Sergey Kosarev commented on IGNITE-9296: suggest fix: while(true) -> while(!isCancelled()) в org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.WALWriter#flushBuffer > Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest > -- > > Key: IGNITE-9296 > URL: https://issues.apache.org/jira/browse/IGNITE-9296 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Kosarev >Priority: Major > Attachments: logs.zip > > > Here are log messages: > [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 > 15:46:27,442][ERROR][main][root] Test has been timed out and will be > interrupted (threads dump will be taken before interruption) > [test=testFailWhileStart, timeout=6] > And later on all the suite also hangs up: > [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 > {buildId=1662285} has been running for more than 240 minutes. Terminating... > Main thread locked by node-stopper: > [18:46:27] : [Step 3/4] Thread > [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, > state=BLOCKED, blockCnt=4, waitCnt=142] > [18:46:27] : [Step 3/4] Lock > [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, > ownerName=node-stopper, ownerId=9267] > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374) > [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147) > node-stopper waits for the wal-segment-syncer stopping > [18:46:28]W: [org.apache.ignite:ignite-core] Thread > [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22] > [18:46:28]W: [org.apache.ignite:ignite-core] Lock > [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1] > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Native Method) > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Object.java:502) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594) > [18:46:28]W: [org.apa