[jira] [Comment Edited] (ZOOKEEPER-2560) Possible Cluster Unavailability

Maria Ramos (Jira) Wed, 03 Jan 2024 02:46:05 -0800


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802108#comment-17802108
 ]


Maria Ramos edited comment on ZOOKEEPER-2560 at 1/3/24 10:45 AM:
-----------------------------------------------------------------

This issue still exists in v3.7.1.

We have a tool, [LazyFS|https://github.com/dsrhaslab/lazyfs], that is capable 
of reproducing this problem. For doing so, one can follow these steps:

{*}1{*}. Mount LazyFS on a directory where ZooKeeper data will be saved, with a 
specified root directory. Assuming the data path for ZooKeeper is 
{{/home/data/zk}} and the root directory is {{{}/home/data/zk-root{}}}, add the 
following lines to the default configuration file (located in the 
{{config/default.toml}} directory):

{{[[injection]]}}
{{type="reorder"}}
{{op="write"}}
{{file="/home/data/zk-root/version-2/log.100000001"}}
{{occurrence=1}}
{{persist=[1,4,7,8]}}

These lines define a fault to be injected. A power failure will be simulated 
after the eighth write to the 
/{{{}home/data/zk-root/version-2/log.100000001{}}} file. At the moment of the 
power fault, the first, fourth, seventh an eight write will be persisted. The 
{{occurrence}} parameter allows you to specify that this is the first group 
where this happens, as there might be more than one group of consecutive writes 
(consecutive writes are writes that are not interleaved with {{fsync}} 
operations).

 

{*}2{*}. Start LazyFS as the underlying file system of a node in the cluster 
with the following command:

{{./scripts/mount-lazyfs.sh -c config/default.toml -m /home/data/zk -r 
/home/data/zk-root -f}}

 

{*}3{*}. Start ZooKeeper with the command

{{apache-zookeeper-3.7.1-bin/bin/zkServer.sh start-foreground}}

 

{*}4{*}. Execute the {{zk-client-write.py}} script (code below) that creates 
some znodes.

Immediately after this step, LazyFS will be unmounted, simulating a power 
failure, and ZooKeeper will keep printing error messages in the terminal, 
requiring a forced shutdown.

 

{*}5{*}. Remove the fault from the configuration file, unmount the filesystem 
with {{{}fusermount -uz /home/data/zk{}}}.

{*}6{*}. Mount LazyFS again with the previously provided command.

{*}7{*}. Attempt to start ZooKeeper (it fails with the same error).

When trying the fix this CRC mismatch error how ZooKeeper recommends in 
[https://zookeeper.apache.org/doc/current/zookeeperTools.html 
|https://zookeeper.apache.org/doc/current/zookeeperTools.html]we get the error:

 

{{ZooKeeper Transactional Log File with dbid 0 txnlog format version 2}}

{{Exception in thread "main" java.io.IOException: Unsupported Txn with 
type=%d0}}

{{at 
org.apache.zookeeper.server.util.SerializeUtils.deserializeTxn(SerializeUtils.java:104)}}

{{at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.printTxn(TxnLogToolkit.java:287)}}

{{at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.dump(TxnLogToolkit.java:218)}}

{{at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.main(TxnLogToolkit.java:121)}}

 

__________________________________________

{{zk-client-write.py}} 

 
{{from kazoo.client import KazooClient}}
{{import random}}

{{{}dados = {{}}}}

{{for i in range(0,100):}}
{{  j = random.randint(20,8000)}}
{{  value = ""}}
{{  for k in range(0,j):}}
{{    value += str(k)}}
{{  dados[i] = value}}

{{#Conect client and create znodes}}
{{zk = KazooClient(hosts='127.0.0.1:2181')}}
{{zk.start()}}

{{for k,v in dados.items():}}
{{  zk.create("/"+k, bytes(v, 'utf-8'))}}
{{zk.stop()}}


was (Author: JIRAUSER301410):
This issue still exists in v3.7.1.

We have a tool, [LazyFS|https://github.com/dsrhaslab/lazyfs], that is capable 
of reproducing this problem. For doing so, one can follow these steps:

{*}1{*}. Mount LazyFS on a directory where ZooKeeper data will be saved, with a 
specified root directory. Assuming the data path for ZooKeeper is 
{{/home/data/zk}} and the root directory is {{{}/home/data/zk-root{}}}, add the 
following lines to the default configuration file (located in the 
{{config/default.toml}} directory):

{{[[injection]]}}
{{type="reorder"}}
{{op="write"}}
{{file="/home/data/zk-root/version-2/log.100000001"}}
{{occurrence=1}}
{{persist=[1,4,7,8]}}

These lines define a fault to be injected. A power failure will be simulated 
after the eighth write to the 
/{{{}home/data/zk-root/version-2/log.100000001{}}} file. At the moment of the 
power fault, the first, fourth, seventh an eight write will be persisted. The 
{{occurrence}} parameter allows you to specify that this is the first group 
where this happens, as there might be more than one group of consecutive writes 
(consecutive writes are writes that are not interleaved with {{fsync}} 
operations).

 

{*}2{*}. Start LazyFS as the underlying file system of a node in the cluster 
with the following command:

{{./scripts/mount-lazyfs.sh -c config/default.toml -m /home/data/zk -r 
/home/data/zk-root -f}}

 

{*}3{*}. Start ZooKeeper with the command

{{apache-zookeeper-3.7.1-bin/bin/zkServer.sh start-foreground}}

 

{*}4{*}. Execute the {{zk-client-write.py}} script (code below) that creates 
some znodes.

Immediately after this step, LazyFS will be unmounted, simulating a power 
failure, and ZooKeeper will keep printing error messages in the terminal, 
requiring a forced shutdown.

 

{*}5{*}. Remove the fault from the configuration file, unmount the filesystem 
with {{{}fusermount -uz /home/data/zk{}}}.

{*}6{*}. Mount LazyFS again with the previously provided command.

{*}7{*}. Attempt to start ZooKeeper (it fails with the same error).

When trying the fix this CRC mismatch error how ZooKeeper recommends in 
[https://zookeeper.apache.org/doc/current/zookeeperTools.html]([https://zookeeper.apache.org/doc/current/zookeeperTools.html]))
 we get the error:

 

{{ZooKeeper Transactional Log File with dbid 0 txnlog format version 2}}

{{Exception in thread "main" java.io.IOException: Unsupported Txn with 
type=%d0}}

{{at 
org.apache.zookeeper.server.util.SerializeUtils.deserializeTxn(SerializeUtils.java:104)}}

{{at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.printTxn(TxnLogToolkit.java:287)}}

{{at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.dump(TxnLogToolkit.java:218)}}

{{at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.main(TxnLogToolkit.java:121)}}

 

__________________________________________

{{zk-client-write.py}} 

 
{{from kazoo.client import KazooClient}}
{{import random}}

{{dados = {}}}

{{for i in range(0,100):}}
{{  j = random.randint(20,8000)}}
{{  value = ""}}
{{  for k in range(0,j):}}
{{    value += str(k)}}
{{  dados[i] = value}}

{{#Conect client and create znodes}}
{{zk = KazooClient(hosts='127.0.0.1:2181')}}
{{zk.start()}}

{{for k,v in dados.items():}}
{{  zk.create("/"+k, bytes(v, 'utf-8'))}}
{{zk.stop()}}

> Possible Cluster Unavailability
> -------------------------------
>
>                 Key: ZOOKEEPER-2560
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2560
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>         Environment: Three node linux cluster
>            Reporter: Ramnatthan Alagappan
>            Priority: Major
>             Fix For: 3.4.8
>
>
> Possible Cluster Unvailability
> I am running a three node ZooKeeper cluster. Each node runs Linux. 
> I see the below sequence of system calls when ZooKeeper appends a user data 
> item to the log file.
> 1 write("/data/version-2/log.200000001", offset=65, count=12)
> 2 write("/data/version-2/log.200000001", offset=77, count=16323)
> 3 write("/data/version-2/log.200000001", offset=16400, count=4209)
> 4 write("/data/version-2/log.200000001", offset=20609, count=1)
> 5 fdatasync("/data//version-2/log.200000001")
> Now, a crash could happen just after operation 4 but before the final 
> fdatasync. In this situation, the file system could persist the 4th operation 
> and fail to persist the 3rd operation because of the crash and there is fsync 
> in between them. In such cases, ZooKeeper server fails to start with the 
> following messages in its log file:
> [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: 
> /tmp/zoo2.cfg
> [myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
> 127.0.0.2 to address: /127.0.0.2
> [myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
> 127.0.0.4 to address: /127.0.0.4
> [myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
> 127.0.0.3 to address: /127.0.0.3
> [myid:] - INFO  [main:QuorumPeerConfig@331] - Defaulting to majority quorums
> [myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount 
> set to 3
> [myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval 
> set to 0
> [myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not 
> scheduled.
> [myid:1] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
> [myid:1] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 
> 0.0.0.0/0.0.0.0:2182
> [myid:1] - INFO  [main:QuorumPeer@1019] - tickTime set to 2000
> [myid:1] - INFO  [main:QuorumPeer@1039] - minSessionTimeout set to -1
> [myid:1] - INFO  [main:QuorumPeer@1050] - maxSessionTimeout set to -1
> [myid:1] - INFO  [main:QuorumPeer@1065] - initLimit set to 5
> [myid:1] - INFO  [main:FileSnap@83] - Reading snapshot 
> /data/version-2/snapshot.100000002
> [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk
> java.io.IOException: CRC check failed
>       at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
>       at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> 2016-04-15 04:00:32,795 [myid:1] - ERROR [main:QuorumPeerMain@89] - 
> Unexpected exception, exiting abnormally
> java.lang.RuntimeException: Unable to run quorum server 
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> Caused by: java.io.IOException: CRC check failed
>       at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
>       at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
>       ... 4 more
> The same happens when the 3rd and 4th writes hit the disk but the 2nd 
> operation does not. 
> Now, two nodes of a three node cluster can easily reach this state, rendering 
> the entire cluster unavailable. ZooKeeper, on recovery should be able to 
> handle such checksum mismatches gracefully to maintain cluster availability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ZOOKEEPER-2560) Possible Cluster Unavailability

Reply via email to