[ https://issues.apache.org/jira/browse/IGNITE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660237#comment-16660237 ]
Ignite TC Bot commented on IGNITE-8879: --------------------------------------- {panel:title=Possible Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Inspections (AOP){color} [[tests 0 Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=2118974]] {color:#d04437}Inspections (Core){color} [[tests 0 Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=2119093]] {color:#d04437}_Licenses Headers_{color} [[tests 0 Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=2119216]] {color:#d04437}Platform .NET (Long Running){color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=2114375]] * exe: IgniteStartStopTest.TestProcessorInit - 0,0% fails in last 100 master runs. {color:#d04437}Platform .NET{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=2114370]] * exe: ServicesTest.TestWithKeepBinaryServer - 0,0% fails in last 100 master runs. {panel} [TeamCity Run All|http://ci.ignite.apache.org/viewLog.html?buildId=2114381&buildTypeId=IgniteTests24Java8_RunAll] > Blinking baseline node sometimes unable to connect to cluster > ------------------------------------------------------------- > > Key: IGNITE-8879 > URL: https://issues.apache.org/jira/browse/IGNITE-8879 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.5 > Reporter: Dmitry Sherstobitov > Assignee: Ivan Daschinskiy > Priority: Critical > Fix For: 2.8 > > Attachments: IGNITE-8879.zip > > > Almost the same scenario as in IGNITE-8874 but node left baseline while > blinking > All caches with 2 backups > 4 nodes in cluster > # Start cluster, load data > # Start transactional loading (8 threads, 100 ops/second put/get in each op) > # Repeat 10 times: kill one node, remove from baseline, start node again > (*with no LFS clean*), wait for rebalance > # Check idle_verify, check data corruption > > At some point killed node unable to start and join cluster because of error > (Attachments info: grid.1.node2.X.log - blinking node logs, X - iteration > counter from step 3) > {code:java} > 080ee8-END.bin] > [2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory > [memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, > checkpointBuffer=100.0 MiB] > [2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] > Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, > len=119], lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], > lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] > [2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found > last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, > pos=FileWALPointer [idx=0, fileOff=583691, len=119]] > [2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL > iteration due to an exception: EOF at position [1000000] expected to read [1] > bytes, ptr=FileWALPointer [idx=0, fileOff=1000000, len=0] > [2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment > tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual > state : {Index=3602879702215753728,Offset=775434544} ] > [2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] > Applying lost cache updates since last checkpoint record > [lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], > lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] > [2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL > iteration due to an exception: EOF at position [1000000] expected to read [1] > bytes, ptr=FileWALPointer [idx=0, fileOff=1000000, len=0] > [2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment > tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual > state : {Index=3602879702215753728,Offset=775434544} ] > [2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] > Finished applying WAL changes [updatesApplied=0, time=101ms] > [2018-06-26 19:01:43,450][INFO > ][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for > BaselineTopology[id=4] > [2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start > processors, node will be stopped and close connections > class org.apache.ignite.IgniteCheckedException: Failed to start processor: > GridProcessorAdapter [] > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) > at org.apache.ignite.Ignition.start(Ignition.java:352) > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of > BaselineTopology history has failed, expected history item not found for id=1 > at > org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) > at > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486) > at > org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700) > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1766) > ... 11 more > [2018-06-26 19:01:43,456][ERROR][main][IgniteKernal] Got exception while > starting (will rollback startup routine). > class org.apache.ignite.IgniteCheckedException: Failed to start processor: > GridProcessorAdapter [] > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) > at org.apache.ignite.Ignition.start(Ignition.java:352) > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of > BaselineTopology history has failed, expected history item not found for id=1 > at > org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) > at > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486) > at > org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700) > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1766) > ... 11 more{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)