[
https://issues.apache.org/jira/browse/IGNITE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527323#comment-16527323
]
Andrey Aleksandrov commented on IGNITE-8893:
--------------------------------------------
This issue has the similar scenario.
> Blinking node in baseline may corrupt own WAL records
> -----------------------------------------------------
>
> Key: IGNITE-8893
> URL: https://issues.apache.org/jira/browse/IGNITE-8893
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.5
> Reporter: Dmitry Sherstobitov
> Priority: Major
>
> # Start cluster, load data
> # Start additional node that not in BLT
> # Repeat 10 times: kill 1 node in baseline and 1 node not in baseline, start
> node in blt and node not in BLT
> Node in baseline in some moment may unable to start because of corrupted WAL:
> Notice that there is no loading on cluster at all - so there is no reason to
> corrupt WAL, rebalance should be interruptible.
> Also there is another scenario that may case same error (but also may cause
> JVM crash)
> # Start cluster, load data, start nodes
> # Repeat 10 times: kill 1 node in baseline, clean LFS, start node again,
> while rebalance blink node that should rebalance data to previously killed
> node
> Node that should rebalance data to cleaned node may corrupt own WAL. But this
> second scenario has configuration "error" - number of backups in each case is
> 1. So obviously 2 nodes blinking actually may cause data loss.
> {code:java}
> [2018-06-28 17:33:39,583][ERROR][wal-file-archiver%null-#63][root] Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
> err=java.lang.AssertionError: lastArchived=757, current=42]]
> java.lang.AssertionError: lastArchived=757, current=42
> at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1629)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)