[
https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Kosarev updated IGNITE-9296:
-----------------------------------
Description:
Here are log messages:
[18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15
15:46:27,442][ERROR][main][root] Test has been timed out and will be
interrupted (threads dump will be taken before interruption)
[test=testFailWhileStart, timeout=60000]
And later on all the suite also hangs up:
[22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184
{buildId=1662285} has been running for more than 240 minutes. Terminating...
Main thread locked by node-stopper:
[18:46:27] : [Step 3/4] Thread
[name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150,
state=BLOCKED, blockCnt=4, waitCnt=142]
[18:46:27] : [Step 3/4] Lock
[object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90,
ownerName=node-stopper, ownerId=9267]
[18:46:27] : [Step 3/4] at
o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
[18:46:27] : [Step 3/4] at
o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
[18:46:27] : [Step 3/4] at
o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
[18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229)
[18:46:27] : [Step 3/4] at
o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
[18:46:27] : [Step 3/4] at
o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
[18:46:27] : [Step 3/4] at
o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
[18:46:27] : [Step 3/4] at
o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
[18:46:27] : [Step 3/4] at
o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
node-stopper waits for the wal-segment-syncer stopping
[18:46:28]W: [org.apache.ignite:ignite-core] Thread
[name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
[18:46:28]W: [org.apache.ignite:ignite-core] Lock
[object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
[18:46:28]W: [org.apache.ignite:ignite-core] at
java.lang.Object.wait(Native Method)
[18:46:28]W: [org.apache.ignite:ignite-core] at
java.lang.Object.wait(Object.java:502)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594)
[18:46:28]W: [org.apache.ignite:ignite-core] - locked
o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
wal-segment-syncer waits until wal-write-worker flushes data:
[18:46:28]W: [org.apache.ignite:ignite-core] Thread
[name="wal-segment-syncer-#7782%wal.IgniteWalFlushBackgroundSelfTest1%",
id=9253, state=RUNNABLE, blockCnt=0, waitCnt=860657904]
[18:46:28]W: [org.apache.ignite:ignite-core] at
sun.misc.Unsafe.park(Native Method)
[18:46:28]W: [org.apache.ignite:ignite-core] at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushBuffer(FileWriteAheadLogManager.java:3455)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushAll(FileWriteAheadLogManager.java:3419)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:2704)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flushOrWait(FileWriteAheadLogManager.java:2696)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2776)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2538)
[18:46:28]W: [org.apache.ignite:ignite-core] at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:820)
And there are no wal-write-worker on the node as he is already interrupted:
[18:45:34]W: [org.apache.ignite:ignite-core] [2018-08-15
15:45:34,132][ERROR][wal-write-worker%wal.IgniteWalFlushBackgroundSelfTest1-#7783%wal.IgniteWalFlushBackgroundSelfTest1%][IgniteTestRes
ources] Critical system error detected. Will be handled accordingly to
configured handler [hnd=class o.a.i.failure.StopNodeFailureHandler,
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i
.pagemem.wal.StorageException: Failed to write buffer.]]
Caused by: java.io.IOException: No space left on device (This exception is
generated intentionally by test logic)
As we don't have wal-write-worker wal-segment-syncer will bew waiting for
good.
> Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> ------------------------------------------------------------------------------
>
> Key: IGNITE-9296
> URL: https://issues.apache.org/jira/browse/IGNITE-9296
> Project: Ignite
> Issue Type: Bug
> Reporter: Sergey Kosarev
> Priority: Major
>
> Here are log messages:
> [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15
> 15:46:27,442][ERROR][main][root] Test has been timed out and will be
> interrupted (threads dump will be taken before interruption)
> [test=testFailWhileStart, timeout=60000]
> And later on all the suite also hangs up:
> [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184
> {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] : [Step 3/4] Thread
> [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150,
> state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] : [Step 3/4] Lock
> [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90,
> ownerName=node-stopper, ownerId=9267]
> [18:46:27] : [Step 3/4] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] : [Step 3/4] at
> o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] : [Step 3/4] at
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] : [Step 3/4] at
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread
> [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W: [org.apache.ignite:ignite-core] Lock
> [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> java.lang.Object.wait(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> java.lang.Object.wait(Object.java:502)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594)
> [18:46:28]W: [org.apache.ignite:ignite-core] - locked
> o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
> wal-segment-syncer waits until wal-write-worker flushes data:
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread
> [name="wal-segment-syncer-#7782%wal.IgniteWalFlushBackgroundSelfTest1%",
> id=9253, state=RUNNABLE, blockCnt=0, waitCnt=860657904]
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> sun.misc.Unsafe.park(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushBuffer(FileWriteAheadLogManager.java:3455)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushAll(FileWriteAheadLogManager.java:3419)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:2704)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flushOrWait(FileWriteAheadLogManager.java:2696)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2776)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2538)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:820)
> And there are no wal-write-worker on the node as he is already interrupted:
> [18:45:34]W: [org.apache.ignite:ignite-core] [2018-08-15
> 15:45:34,132][ERROR][wal-write-worker%wal.IgniteWalFlushBackgroundSelfTest1-#7783%wal.IgniteWalFlushBackgroundSelfTest1%][IgniteTestRes
> ources] Critical system error detected. Will be handled accordingly to
> configured handler [hnd=class o.a.i.failure.StopNodeFailureHandler,
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i
> .pagemem.wal.StorageException: Failed to write buffer.]]
> Caused by: java.io.IOException: No space left on device (This exception is
> generated intentionally by test logic)
> As we don't have wal-write-worker wal-segment-syncer will bew waiting for
> good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)