[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Kalmar updated ZOOKEEPER-3333:
--------------------------------------
    Affects Version/s:     (was: 3.6.0)

> Detect if txnlogs and / or snapshots is deleted under a running ZK instance
> ---------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3333
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.5.5, 3.4.14
>            Reporter: Norbert Kalmar
>            Priority: Major
>
> ZK does not notice if txnlogs are deleted from it's dataDir, and it will just 
> keep running, writing txns in the buffer. Than, when ZK is restarted, it will 
> lose all data.
> To reproduce:
> I run a 3 node ZK ensemble, and deleted dataDir for just one instance, than 
> wrote some data. It turns out, it will not write the transaction to disk. ZK 
> stores everything in memory, until it “feels like” it’s time to persist it on 
> disk. So it doesn’t even notice the file is deleted, and when it tried to 
> flush, I imagine it just fails and keeps it in the buffer. 
> So anyway, I restarted the instance, it got the snapshot + latest txn logs 
> from the other nodes, as expected it would. It also wrote them in dataDir, so 
> now every node had the dataDir.
> So deleting from one node is fine (again, as expected, they will sync after a 
> restart).
> Then, I deleted all 3 nodes dataDir under running instances. Until restart, 
> it worked fine (of course I was getting my buffer full, I did not test until 
> the point it got overflowed).
> But after restart, I got a fresh new ZK with all my znodes gone.
> For starter, I think ZK should detect if the file it is appending is removed. 
> What should ZK do? At least give a warning log message. The question should 
> it try to create a new file? Or try to get it from other nodes? Or just fail 
> instantly? Restart itself, see if it can sync?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to