Dmitry Sherstobitov created IGNITE-8879:
-------------------------------------------
Summary: Blinking baseline node sometimes unable to connect to
cluster
Key: IGNITE-8879
URL: https://issues.apache.org/jira/browse/IGNITE-8879
Project: Ignite
Issue Type: Bug
Affects Versions: 2.5
Reporter: Dmitry Sherstobitov
Almost the same scenario as in IGNITE-8874 but node left baseline while blinking
All caches with 2 backups
4 nodes in cluster
# Start cluster, load data
# Start transactional loading (8 threads, 100 ops/second put/get in each op)
# Repeat 10 times: kill one node, remove from baseline, start node again
(*with no LFS clean*), wait for rebalance
# Check idle_verify, check data corruption
At some point killed node unable to start and join cluster because of
{code:java}
080ee8-END.bin]
[2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory
[memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB,
checkpointBuffer=100.0 MiB]
[2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] Checking
memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, len=119],
lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119],
lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
[2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found
last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8,
pos=FileWALPointer [idx=0, fileOff=583691, len=119]]
[2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL
iteration due to an exception: EOF at position [1000000] expected to read [1]
bytes, ptr=FileWALPointer [idx=0, fileOff=1000000, len=0]
[2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment
tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state
: {Index=3602879702215753728,Offset=775434544} ]
[2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] Applying
lost cache updates since last checkpoint record [lastMarked=FileWALPointer
[idx=0, fileOff=583691, len=119],
lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
[2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL
iteration due to an exception: EOF at position [1000000] expected to read [1]
bytes, ptr=FileWALPointer [idx=0, fileOff=1000000, len=0]
[2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment
tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state
: {Index=3602879702215753728,Offset=775434544} ]
[2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] Finished
applying WAL changes [updatesApplied=0, time=101ms]
[2018-06-26 19:01:43,450][INFO
][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for
BaselineTopology[id=4]
[2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start
processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: Failed to start processor:
GridProcessorAdapter []
at
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:352)
at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of
BaselineTopology history has failed, expected history item not found for id=1
at
org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54)
at
org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486)
at
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700)
at
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1766)
... 11 more
[2018-06-26 19:01:43,456][ERROR][main][IgniteKernal] Got exception while
starting (will rollback startup routine).
class org.apache.ignite.IgniteCheckedException: Failed to start processor:
GridProcessorAdapter []
at
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:352)
at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of
BaselineTopology history has failed, expected history item not found for id=1
at
org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54)
at
org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486)
at
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700)
at
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1766)
... 11 more{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)