[jira] [Commented] (IGNITE-8893) Blinking node in baseline may corrupt own WAL records

Andrey Aleksandrov (JIRA) Fri, 29 Jun 2018 01:44:08 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527323#comment-16527323
 ]


Andrey Aleksandrov commented on IGNITE-8893:
--------------------------------------------

This issue has the similar scenario. 

> Blinking node in baseline may corrupt own WAL records
> -----------------------------------------------------
>
>                 Key: IGNITE-8893
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8893
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.5
>            Reporter: Dmitry Sherstobitov
>            Priority: Major
>
> # Start cluster, load data
>  # Start additional node that not in BLT
>  # Repeat 10 times: kill 1 node in baseline and 1 node not in baseline, start 
> node in blt and node not in BLT
> Node in baseline in some moment may unable to start because of corrupted WAL:
> Notice that there is no loading on cluster at all - so there is no reason to 
> corrupt WAL, rebalance should be interruptible.
> Also there is another scenario that may case same error (but also may cause 
> JVM crash)
>  # Start cluster, load data, start nodes
>  # Repeat 10 times: kill 1 node in baseline, clean LFS, start node again, 
> while rebalance blink node that should rebalance data to previously killed 
> node
> Node that should rebalance data to cleaned node may corrupt own WAL. But this 
> second scenario has configuration "error" - number of backups in each case is 
> 1. So obviously 2 nodes blinking actually may cause data loss.
> {code:java}
> [2018-06-28 17:33:39,583][ERROR][wal-file-archiver%null-#63][root] Critical 
> system error detected. Will be handled accordingly to configured handler 
> [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, 
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
> err=java.lang.AssertionError: lastArchived=757, current=42]]
> java.lang.AssertionError: lastArchived=757, current=42
>         at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1629)
>         at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-8893) Blinking node in baseline may corrupt own WAL records

Reply via email to