[
https://issues.apache.org/jira/browse/ZOOKEEPER-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dharani updated ZOOKEEPER-4878:
-------------------------------
Description:
We are running zookeeper in kubernetes as stateful set with 3 replicas. when we
performed chaos mesh IO fault experiment, zookeeper servers are not recovering.
{code:java}
2024-10-24T09:43:40.896+0000 [myid:] - ERROR
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=[0:0:0:0:0:0:0:0]:2281):o.a.z.s.ZooKeeperServer@552]
- Severe unrecoverable error, exiting
java.io.FileNotFoundException:
/var/lib/zookeeper/data/version-2/snapshot.1100000859 (Input/output error)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
at
org.apache.zookeeper.server.persistence.SnapStream.getOutputStream(SnapStream.java:133)
at
org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:242)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:481)
at
org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:550)
at
org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:544)
at
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:540)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:597)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1552)
2024-10-24T09:43:40.898+0000 [myid:] - ERROR
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=[0:0:0:0:0:0:0:0]:2281):o.a.z.u.ServiceUtils@48]
- Exiting JVM with code 10 {code}
was:
We are running zookeeper in kubernetes as stateful set with 3 replicas. when we
performed chaos mesh IO fault experiment, zookeeper servers are not recovering.
"[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=[0:0:0:0:0:0:0:0]:2281):o.a.z.s.ZooKeeperServer@552]
- Severe unrecoverable error, exiting"
java.io.FileNotFoundException:
/var/lib/zookeeper/data/version-2/snapshot.400000ed9 (Input/output error)
> Zookeeper servers not running after Chaos mesh IO fault experiment
> ------------------------------------------------------------------
>
> Key: ZOOKEEPER-4878
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4878
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.8.3
> Reporter: Dharani
> Priority: Major
>
> We are running zookeeper in kubernetes as stateful set with 3 replicas. when
> we performed chaos mesh IO fault experiment, zookeeper servers are not
> recovering.
> {code:java}
> 2024-10-24T09:43:40.896+0000 [myid:] - ERROR
> [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=[0:0:0:0:0:0:0:0]:2281):o.a.z.s.ZooKeeperServer@552]
> - Severe unrecoverable error, exiting
> java.io.FileNotFoundException:
> /var/lib/zookeeper/data/version-2/snapshot.1100000859 (Input/output error)
> at java.base/java.io.FileOutputStream.open0(Native Method)
> at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
> at
> java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
> at
> java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
> at
> org.apache.zookeeper.server.persistence.SnapStream.getOutputStream(SnapStream.java:133)
> at
> org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:242)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:481)
> at
> org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:550)
> at
> org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:544)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:540)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:597)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1552)
> 2024-10-24T09:43:40.898+0000 [myid:] - ERROR
> [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=[0:0:0:0:0:0:0:0]:2281):o.a.z.u.ServiceUtils@48]
> - Exiting JVM with code 10 {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)