[ https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Kosarev reassigned IGNITE-9296: -------------------------------------- Assignee: Sergey Kosarev > Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest > ------------------------------------------------------------------------------ > > Key: IGNITE-9296 > URL: https://issues.apache.org/jira/browse/IGNITE-9296 > Project: Ignite > Issue Type: Bug > Reporter: Sergey Kosarev > Assignee: Sergey Kosarev > Priority: Major > Attachments: logs.zip > > > Here are log messages: > {code} > [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15 > 15:46:27,442][ERROR][main][root] Test has been timed out and will be > interrupted (threads dump will be taken before interruption) > [test=testFailWhileStart, timeout=60000] > {code} > And later on all the suite also hangs up: > {code} > [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 > {buildId=1662285} has been running for more than 240 minutes. Terminating... > Main thread locked by node-stopper: > [18:46:27] : [Step 3/4] Thread > [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, > state=BLOCKED, blockCnt=4, waitCnt=142] > [18:46:27] : [Step 3/4] Lock > [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, > ownerName=node-stopper, ownerId=9267] > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557) > [18:46:27] : [Step 3/4] at > o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374) > [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131) > [18:46:27] : [Step 3/4] at > o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213) > [18:46:27] : [Step 3/4] at > o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147) > node-stopper waits for the wal-segment-syncer stopping > [18:46:28]W: [org.apache.ignite:ignite-core] Thread > [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22] > [18:46:28]W: [org.apache.ignite:ignite-core] Lock > [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1] > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Native Method) > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.lang.Object.wait(Object.java:502) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594) > [18:46:28]W: [org.apache.ignite:ignite-core] - locked > o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90 > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36) > wal-segment-syncer waits until wal-write-worker flushes data: > [18:46:28]W: [org.apache.ignite:ignite-core] Thread > [name="wal-segment-syncer-#7782%wal.IgniteWalFlushBackgroundSelfTest1%", > id=9253, state=RUNNABLE, blockCnt=0, waitCnt=860657904] > [18:46:28]W: [org.apache.ignite:ignite-core] at > sun.misc.Unsafe.park(Native Method) > [18:46:28]W: [org.apache.ignite:ignite-core] at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushBuffer(FileWriteAheadLogManager.java:3455) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushAll(FileWriteAheadLogManager.java:3419) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:2704) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flushOrWait(FileWriteAheadLogManager.java:2696) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2776) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2538) > [18:46:28]W: [org.apache.ignite:ignite-core] at > o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:820) > And there are no wal-write-worker on the node as he is already interrupted: > [18:45:34]W: [org.apache.ignite:ignite-core] [2018-08-15 > 15:45:34,132][ERROR][wal-write-worker%wal.IgniteWalFlushBackgroundSelfTest1-#7783%wal.IgniteWalFlushBackgroundSelfTest1%][IgniteTestRes > ources] Critical system error detected. Will be handled accordingly to > configured handler [hnd=class o.a.i.failure.StopNodeFailureHandler, > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i > .pagemem.wal.StorageException: Failed to write buffer.]] > Caused by: java.io.IOException: No space left on device (This exception is > generated intentionally by test logic) > {code} > As we don't have wal-write-worker wal-segment-syncer will be waiting for > good. -- This message was sent by Atlassian JIRA (v7.6.3#76005)