[jira] [Commented] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-11-19 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15679486#comment-15679486
 ] 

Abhishek Rai commented on ZOOKEEPER-2574:
-

[~hanm] and [~rakeshr], thanks for finding the relation to ZOOKEEPER-2420 and 
thanks for your guidance.

I've created a pull request as per your suggestion with the following changes:
(1) Patch previously uploaded containing fix and tests.
(2) Tests from ZOOKEEPER-2420 and enabling code.
(3) Documentation fixes.

[~rakeshr] great call on documentation review, as I went through it I found 
multiple inconsistencies about the snapshot-log dependency.  I've fixed all 
that I could find in the docs/ directory.

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.3.patch, 
> ZOOKEEPER-2574.4.patch, ZOOKEEPER-2574.5.patch, ZOOKEEPER-2574.6.patch, 
> ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-11-04 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-1621:

Attachment: ZOOKEEPER-1621.2.patch

Based on the discussion with [~mkizner] above, skipping of the truncated txn 
log file is insufficient, and its deletion is necessary.  Otherwise we can run 
into problems in two places:

- FileTxnLog is required to include the latest txn log before the snapshot that 
it's loading.  If that latest txn log is truncated (and previously skipped), 
then it can incorrectly satisfy this requirement.  Instead, if we delete the 
truncated file, then we are forced to reach back into the older valid txn log.

- PurgeTxnLog has similar logic about retaining the latest txn log before the 
last retained snapshot.  Therefore, without the deletion, its requirements 
would similarly be met by a truncated and useless txn log.

I've now updated [~michim]'s patch with two changes and corresponding testing 
changes:
- Deletion as described here.
- Use a tighter exception (EOFException) instead of IOException.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> 

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-28 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616237#comment-15616237
 ] 

Abhishek Rai commented on ZOOKEEPER-1621:
-

Thanks [~mkizner].  Your suggestion of doing this only for the most recent txn 
log file is sound.  Are you also suggesting that we delete this truncated txn 
log file?

Cause, if we skip it and don't delete, then in the future, newer txn log files 
will get created.  So, the truncated txn log file will no longer be the latest 
txn log when we do a purge afterwards.

Deletion seems consistent with this approach as well as consistent with 
PurgeTxnLog's behavior.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-10-12 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.6.patch

Thanks [~rakeshr].  I've updated the doc now, please take another look.  Thanks

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.3.patch, 
> ZOOKEEPER-2574.4.patch, ZOOKEEPER-2574.5.patch, ZOOKEEPER-2574.6.patch, 
> ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-09 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560526#comment-15560526
 ] 

Abhishek Rai commented on ZOOKEEPER-1621:
-

Reviving this old thread.  [~shralex] has a valid concern about trading off 
consistency for availability.  However, for the specific issue being addressed 
here, we can have both.

The patch skips transaction logs with an incomplete header (the first 16 
bytes).  Skipping such files should not cause any loss of data as the header is 
an internal bookkeeping write from Zookeeper and does not contain any user 
data.  This avoids the current behavior of Zookeeper crashing on encountering 
an incomplete header, which compromises availability.

This has been a recurring problem for us in production because our app's 
operating environment occasionally causes a Zookeeper server's disk to become 
full.  After that, the server invariably runs into this problem - perhaps 
because there's something else that deterministically triggers a log rotation 
when the previous txn log throws an IOException due to disk full?

That said, we can tighten the exception being caught in [~michim]'s patch to 
EOFException instead of IOException to make sure that the log we are skipping 
indeed only has a partially written header and nothing else (in 
FileTxnLog.goToNextLog).

Additionally, I have written a test to verify that EOFException is thrown if 
and only if the header is truncated.  Zookeeper already ignores any other 
partially written transactions in the txn log.  If that's useful, I can upload 
the test, thanks.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> 

[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-19 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.5.patch

Thanks [~abrahamfine].

>> I switched logsToPurge from a List to an ArrayList so I can 
>> simply use remove(0) to remove the first element in the list on line 239
> I think I must be missing something as all of the lists are ArrayLists. For 
> example, this still passes:

Sorry I was confused about something, fixed the usage of logsToPurge as you 
suggested, thanks for persisting.

>> Is there a way to achieve both goals, logging and console output (preferably 
>> stdout) without any duplication.
> I'm not sure, perhaps system.err?

I tried System.err.println, but then this output comes at the end of the test 
log under "stderr" section.  It may have limited utility in debugging since 
it's not inline with other related logging.

Thanks

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.3.patch, 
> ZOOKEEPER-2574.4.patch, ZOOKEEPER-2574.5.patch, ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-18 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.4.patch

Thanks for the review [~abrahamfine].  I've applied your comments and uploaded 
a new patch set, please take another look.

> PurgeTxnTest.java:224 Can we change ArrayList logsToPurge back to 
> List logsToPurge?
I switched logsToPurge from a List to an ArrayList so I can simply 
use remove(0) to remove the first element in the list on line 239.  However, as 
you pointed out, this is probably not obvious given that all other lists around 
it are List, so I've added a comment explaining the choice.

> PurgeTxnLog.java:138 Do we need to use the FileFilter here since we do 
> "filtering" on line 142?
Both filtering are required.  The FileFilter used in lines 134-138 are useful 
for listing all snapshot and log files with zxid >= leastZxidToBeRetain.  The 
check on 142 is to skip deletion of the newest log file that comes before the 
oldest retained snapshot.  However, I agree that the logic would be simpler if 
all filtering logic is in one place, in MyFileFilter.accept().  I've moved it 
there now.

> PurgeTxnLog.java:148 We do logging and System.out.println for the same 
> String, do we need both?

My goal here was to capture the output in the log file generated by the ant 
test run.  System.out.println wasn't useful in this context.  However, I needed 
to retain System.out.println cause PurgeTxnLog can also be invoked 
interactively from a console.  Is there a way to achieve both goals, logging 
and console output (preferably stdout) without any duplication.

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.3.patch, 
> ZOOKEEPER-2574.4.patch, ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-12 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483721#comment-15483721
 ] 

Abhishek Rai commented on ZOOKEEPER-2574:
-

Thanks for the references [~rakeshr].  The learner writes the snapshot in 
response to the NEWLEADER message received from the leader.  Based on my 
understanding, this is because the leader could be ahead of the learner - 
meaning that the learner is missing some transactions that the leader has.  So 
receiving a snapshot and committing it locally is a valid option for the 
learner to catch up and join the quorum.  However, going forward it will 
receive subsequent transactions, which as you mentioned get appended to the 
existing txn log file.  It seems a log rollover could have been done before 
snapshotting in the learner, but perhaps changing behaviors at this point is 
not worth it given the need to support old behavior too?

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.3.patch, 
> ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-11 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.3.patch

Thanks [~arshad.mohammad] for the review.  I've applied your suggestions and 
uploaded the latest patch.

Also, I noticed that on Hadoop QA, a test is failing 
(org.apache.zookeeper.test.QuorumTest) but I cannot reproduce this failure 
locally and it also seems unrelated.

Thanks!

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.3.patch, 
> ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2310) Snapshot files must be synced to prevent inconsistency or data loss

2016-09-11 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2310:

Attachment: ZOOKEEPER-2310.3.patch

> Snapshot files must be synced to prevent inconsistency or data loss
> ---
>
> Key: ZOOKEEPER-2310
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2310
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
> Attachments: ZOOKEEPER-2310.3.patch, zookeeper-2310-version-2.patch, 
> zookeeper-2310.patch
>
>
> Today, Zookeeper server syncs transaction log files to disk by default, but 
> does not sync snapshot files.  Consequently, an untimely crash may result in 
> a lost or incomplete snapshot file.  During recovery, if the server finds a 
> valid older snapshot file, it will load it and replay subsequent log(s), 
> skipping the incomplete snapshot file.  It's possible that the skipped file 
> had some transactions which are not present in the replayed transaction logs. 
>  Since quorum synchronization is based on last transaction ID of each server, 
> this will never get noticed, resulting in inconsistency between servers and 
> possible data loss.
> Following sequence of events describes a sample scenario where this can 
> happen:
> # Server F is a follower in a Zookeeper ensemble.
> # F's most recent valid snapshot file is named "snapshot.10" containing state 
> up to zxid = 10.  F is currently writing to the transaction log file 
> "log.11", with the most recent zxid = 20.
> # Fresh round of election.
> # F receives a few new transactions 21 to 30 from new leader L as the "diff". 
>  Current server behavior is to dump current state plus diff to a new snapshot 
> file, "snapshot.30".
> # F finalizes the snapshot file, but file contents are still buffered in OS 
> caches.  Zookeeper does not sync snapshot file contents to disk.
> # F receives a new transaction 31 from the leader, which it appends to the 
> existing transaction log file, "log.11" and syncs the file to disk.
> # Server machine crashes or is cold rebooted.
> # After recovery, snapshot file "snapshot.30" may not exist or may be empty.  
> See below for why that may happen.
> # In either case, F looks for the last finalized snapshot file, finds and 
> loads "snapshot.10".  It then replays transactions from "log.11".  
> Ultimately, its last seen zxid will be 31, but it would not have replayed 
> transactions 21 to 30 received via the "diff" from the leader.
> # Clients which are connected to F may see different data than clients 
> connected to other members of the ensemble, violating single system image 
> invariant.  Also, if F were to become a leader at some point, it could use 
> its state to seed other servers, and they all could lose the writes in the 
> missing interval above.
> *Notes:*
> - Reason why snapshot file may be missing or incomplete:
> -- Zookeeper does not sync the data directory after creating a snapshot file. 
>  Even if a newly created file is synced to disk, if the corresponding 
> directory entry is not, then the file will not be visible in the namespace.
> -- Zookeeper does not sync snapshot files.  So, they may be empty or 
> incomplete during recovery from an untimely crash.
> - In step (6) above, the server could also have written the new transaction 
> 31 to a new log file, "log.31".  The final outcome would still be the same.
> We are able to deterministically reproduce this problem using the following 
> steps:
> # Create a new Zookeeper ensemble on 3 hosts: A, B, and C.
> # Ensured each server has at least one snapshot file in its data dir.
> # Stop Zookeeper process on server A.
> # Slow down disk syncs on server A (see example script below). This ensures 
> that snapshot files written by Zookeeper don't make it to disk spontaneously. 
>  Log files will be written to disk as Zookeeper explicitly issues a sync call 
> on such files.
> # Connect to server B and create a new znode /test1.
> # Start Zookeeper process on A, wait for it to write a new snapshot to its 
> datadir.  This snapshot would contain /test1 but it won’t be synced to disk 
> yet.
> # Connect to A and verify that /test1 is visible.
> # Connect to B and create another znode /test2.  This will cause A’s 
> transaction log to grow further to receive /test2.
> # Cold reboot A.
> # A’s last snapshot is a zero-sized file or is missing altogether since it 
> did not get synced to disk before reboot.  We have seen both in different 
> runs.
> # Connect to A and verify that /test1 does not exist.  It exists on B and C.
> Slowing down disk syncs:
> {noformat}
> echo 36 | sudo tee /proc/sys/vm/dirty_writeback_centisecs
> echo 36 | sudo tee 

[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-11 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: (was: ZOOKEEPER-2574.2.patch)

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-11 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.2.patch

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-11 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.2.patch

Uploading patch for trunk, previous patch does not work on trunk (works on 
3.4.8 and 3.5.2).

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-11 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: (was: ZOOKEEPER-2574.patch)

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-11 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.patch

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.patch, ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-10 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481129#comment-15481129
 ] 

Abhishek Rai commented on ZOOKEEPER-2574:
-

Thanks [~phunt], I've uploaded a fix and unittest.  Without the fix, the 
unittest fails in the assertion below, thanks.

{noformat}
/**
 * Verify that the last znode that was created above exists.  This 
znode's creation was
 * captured by the transaction log which was created before any of the 
above
 * SNAP_RETAIN_COUNT snapshots were created, but it's not captured in 
any of these
 * snapshots.  So for it it exist, the (only) existing log file should 
not have been purged.
 */
final String lastZnode = "/snap-" + (unique - 1);
final Stat stat = zk.exists(lastZnode, false);
Assert.assertNotNull("Last znode does not exist: " + lastZnode, stat);
{noformat}

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-10 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Attachment: ZOOKEEPER-2574.patch

Fix and unittest for ZOOKEEPER-2574.

> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-10 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2574:

Description: 
As part of the fix for ZOOKEEPER-1797, the call to 
FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
result, some old-looking but required txn log files can be deleted, resulting 
in data corruption or loss.

For example, consider the following:

1. Configuration:
autopurge.snapRetainCount=3

2. Following files exist:
log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
snapshot.110 - snapshot as of zxid=110
snapshot.120 - snapshot as of zxid=120
snapshot.130 - snapshot as of zxid=130

Above scenario is possible when snapshotting has happened multiple times but 
without accompanying log rollover, which is possible if the server was running 
as a learner.

3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
older than the zxid of the oldest snapshot (110).  This results in loss of 
transactions in the range 131-140.

Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
file with starting zxid < oldest retained snapshot's highest zxid.

  was:
As part of the fix for ZOOKEEPER-1797, the call to 
FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
result, some old-looking but required txn log files can be deleted, resulting 
in data corruption or loss.

For example, consider the following:

1. Configuration:
autopurge.snapRetainCount=3

2. Following files exist:
log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
snapshot.110 - snapshot as of zxid=110
snapshot.120 - snapshot as of zxid=120
snapshot.130 - snapshot as of zxid=130

3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
older than the zxid of the oldest snapshot (110).  This results in loss of 
transactions in the range 131-140.

Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
FileTxnSnapLog.getSnapshotLogs() which finds the newest txn log file with 
starting zxid < snapshot zxid.


> PurgeTxnLog can inadvertently delete required txn log files
> ---
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>Reporter: Abhishek Rai
>Priority: Critical
>
> As part of the fix for ZOOKEEPER-1797, the call to 
> FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
> result, some old-looking but required txn log files can be deleted, resulting 
> in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but 
> without accompanying log rollover, which is possible if the server was 
> running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
> older than the zxid of the oldest snapshot (110).  This results in loss of 
> transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
> FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log 
> file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

2016-09-09 Thread Abhishek Rai (JIRA)
Abhishek Rai created ZOOKEEPER-2574:
---

 Summary: PurgeTxnLog can inadvertently delete required txn log 
files
 Key: ZOOKEEPER-2574
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.2, 3.5.1, 3.5.0, 3.4.8, 3.4.7
 Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
Reporter: Abhishek Rai
Priority: Critical


As part of the fix for ZOOKEEPER-1797, the call to 
FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a 
result, some old-looking but required txn log files can be deleted, resulting 
in data corruption or loss.

For example, consider the following:

1. Configuration:
autopurge.snapRetainCount=3

2. Following files exist:
log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
snapshot.110 - snapshot as of zxid=110
snapshot.120 - snapshot as of zxid=120
snapshot.130 - snapshot as of zxid=130

3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is 
older than the zxid of the oldest snapshot (110).  This results in loss of 
transactions in the range 131-140.

Before the fix for ZOOKEEPER-1797, this was avoided by the call to 
FileTxnSnapLog.getSnapshotLogs() which finds the newest txn log file with 
starting zxid < snapshot zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2310) Snapshot files must be synced to prevent inconsistency or data loss

2016-05-03 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269602#comment-15269602
 ] 

Abhishek Rai commented on ZOOKEEPER-2310:
-

Thanks for bringing this up [~zhangyongxyz].  As you pointed out, FileChannel 
does not provide a way of accomplishing this in Windows.  There are conflicting 
opinions online about whether it's even necessary for Windows based on how it 
automatically handles updates to folders.

I've provided a modified patch (zookeeper-2310-version-2.patch) which skips 
syncing of directory on Windows.  The pattern I used has been used elsewhere in 
Zookeeper source, so should be safe.

> Snapshot files must be synced to prevent inconsistency or data loss
> ---
>
> Key: ZOOKEEPER-2310
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2310
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
> Attachments: zookeeper-2310-version-2.patch, zookeeper-2310.patch
>
>
> Today, Zookeeper server syncs transaction log files to disk by default, but 
> does not sync snapshot files.  Consequently, an untimely crash may result in 
> a lost or incomplete snapshot file.  During recovery, if the server finds a 
> valid older snapshot file, it will load it and replay subsequent log(s), 
> skipping the incomplete snapshot file.  It's possible that the skipped file 
> had some transactions which are not present in the replayed transaction logs. 
>  Since quorum synchronization is based on last transaction ID of each server, 
> this will never get noticed, resulting in inconsistency between servers and 
> possible data loss.
> Following sequence of events describes a sample scenario where this can 
> happen:
> # Server F is a follower in a Zookeeper ensemble.
> # F's most recent valid snapshot file is named "snapshot.10" containing state 
> up to zxid = 10.  F is currently writing to the transaction log file 
> "log.11", with the most recent zxid = 20.
> # Fresh round of election.
> # F receives a few new transactions 21 to 30 from new leader L as the "diff". 
>  Current server behavior is to dump current state plus diff to a new snapshot 
> file, "snapshot.30".
> # F finalizes the snapshot file, but file contents are still buffered in OS 
> caches.  Zookeeper does not sync snapshot file contents to disk.
> # F receives a new transaction 31 from the leader, which it appends to the 
> existing transaction log file, "log.11" and syncs the file to disk.
> # Server machine crashes or is cold rebooted.
> # After recovery, snapshot file "snapshot.30" may not exist or may be empty.  
> See below for why that may happen.
> # In either case, F looks for the last finalized snapshot file, finds and 
> loads "snapshot.10".  It then replays transactions from "log.11".  
> Ultimately, its last seen zxid will be 31, but it would not have replayed 
> transactions 21 to 30 received via the "diff" from the leader.
> # Clients which are connected to F may see different data than clients 
> connected to other members of the ensemble, violating single system image 
> invariant.  Also, if F were to become a leader at some point, it could use 
> its state to seed other servers, and they all could lose the writes in the 
> missing interval above.
> *Notes:*
> - Reason why snapshot file may be missing or incomplete:
> -- Zookeeper does not sync the data directory after creating a snapshot file. 
>  Even if a newly created file is synced to disk, if the corresponding 
> directory entry is not, then the file will not be visible in the namespace.
> -- Zookeeper does not sync snapshot files.  So, they may be empty or 
> incomplete during recovery from an untimely crash.
> - In step (6) above, the server could also have written the new transaction 
> 31 to a new log file, "log.31".  The final outcome would still be the same.
> We are able to deterministically reproduce this problem using the following 
> steps:
> # Create a new Zookeeper ensemble on 3 hosts: A, B, and C.
> # Ensured each server has at least one snapshot file in its data dir.
> # Stop Zookeeper process on server A.
> # Slow down disk syncs on server A (see example script below). This ensures 
> that snapshot files written by Zookeeper don't make it to disk spontaneously. 
>  Log files will be written to disk as Zookeeper explicitly issues a sync call 
> on such files.
> # Connect to server B and create a new znode /test1.
> # Start Zookeeper process on A, wait for it to write a new snapshot to its 
> datadir.  This snapshot would contain /test1 but it won’t be synced to disk 
> yet.
> # Connect to A and verify that /test1 is visible.
> # Connect to B and create another znode /test2.  This will cause A’s 
> transaction 

[jira] [Updated] (ZOOKEEPER-2310) Snapshot files must be synced to prevent inconsistency or data loss

2016-05-03 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2310:

Attachment: zookeeper-2310-version-2.patch

> Snapshot files must be synced to prevent inconsistency or data loss
> ---
>
> Key: ZOOKEEPER-2310
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2310
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
> Attachments: zookeeper-2310-version-2.patch, zookeeper-2310.patch
>
>
> Today, Zookeeper server syncs transaction log files to disk by default, but 
> does not sync snapshot files.  Consequently, an untimely crash may result in 
> a lost or incomplete snapshot file.  During recovery, if the server finds a 
> valid older snapshot file, it will load it and replay subsequent log(s), 
> skipping the incomplete snapshot file.  It's possible that the skipped file 
> had some transactions which are not present in the replayed transaction logs. 
>  Since quorum synchronization is based on last transaction ID of each server, 
> this will never get noticed, resulting in inconsistency between servers and 
> possible data loss.
> Following sequence of events describes a sample scenario where this can 
> happen:
> # Server F is a follower in a Zookeeper ensemble.
> # F's most recent valid snapshot file is named "snapshot.10" containing state 
> up to zxid = 10.  F is currently writing to the transaction log file 
> "log.11", with the most recent zxid = 20.
> # Fresh round of election.
> # F receives a few new transactions 21 to 30 from new leader L as the "diff". 
>  Current server behavior is to dump current state plus diff to a new snapshot 
> file, "snapshot.30".
> # F finalizes the snapshot file, but file contents are still buffered in OS 
> caches.  Zookeeper does not sync snapshot file contents to disk.
> # F receives a new transaction 31 from the leader, which it appends to the 
> existing transaction log file, "log.11" and syncs the file to disk.
> # Server machine crashes or is cold rebooted.
> # After recovery, snapshot file "snapshot.30" may not exist or may be empty.  
> See below for why that may happen.
> # In either case, F looks for the last finalized snapshot file, finds and 
> loads "snapshot.10".  It then replays transactions from "log.11".  
> Ultimately, its last seen zxid will be 31, but it would not have replayed 
> transactions 21 to 30 received via the "diff" from the leader.
> # Clients which are connected to F may see different data than clients 
> connected to other members of the ensemble, violating single system image 
> invariant.  Also, if F were to become a leader at some point, it could use 
> its state to seed other servers, and they all could lose the writes in the 
> missing interval above.
> *Notes:*
> - Reason why snapshot file may be missing or incomplete:
> -- Zookeeper does not sync the data directory after creating a snapshot file. 
>  Even if a newly created file is synced to disk, if the corresponding 
> directory entry is not, then the file will not be visible in the namespace.
> -- Zookeeper does not sync snapshot files.  So, they may be empty or 
> incomplete during recovery from an untimely crash.
> - In step (6) above, the server could also have written the new transaction 
> 31 to a new log file, "log.31".  The final outcome would still be the same.
> We are able to deterministically reproduce this problem using the following 
> steps:
> # Create a new Zookeeper ensemble on 3 hosts: A, B, and C.
> # Ensured each server has at least one snapshot file in its data dir.
> # Stop Zookeeper process on server A.
> # Slow down disk syncs on server A (see example script below). This ensures 
> that snapshot files written by Zookeeper don't make it to disk spontaneously. 
>  Log files will be written to disk as Zookeeper explicitly issues a sync call 
> on such files.
> # Connect to server B and create a new znode /test1.
> # Start Zookeeper process on A, wait for it to write a new snapshot to its 
> datadir.  This snapshot would contain /test1 but it won’t be synced to disk 
> yet.
> # Connect to A and verify that /test1 is visible.
> # Connect to B and create another znode /test2.  This will cause A’s 
> transaction log to grow further to receive /test2.
> # Cold reboot A.
> # A’s last snapshot is a zero-sized file or is missing altogether since it 
> did not get synced to disk before reboot.  We have seen both in different 
> runs.
> # Connect to A and verify that /test1 does not exist.  It exists on B and C.
> Slowing down disk syncs:
> {noformat}
> echo 36 | sudo tee /proc/sys/vm/dirty_writeback_centisecs
> echo 36 | sudo tee /proc/sys/vm/dirty_expire_centisecs
> echo 

[jira] [Created] (ZOOKEEPER-2310) Snapshot files must be synced to prevent inconsistency or data loss

2015-11-01 Thread Abhishek Rai (JIRA)
Abhishek Rai created ZOOKEEPER-2310:
---

 Summary: Snapshot files must be synced to prevent inconsistency or 
data loss
 Key: ZOOKEEPER-2310
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2310
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6
Reporter: Abhishek Rai


Today, Zookeeper server syncs transaction log files to disk by default, but 
does not sync snapshot files.  Consequently, an untimely crash may result in a 
lost or incomplete snapshot file.  During recovery, if the server finds a valid 
older snapshot file, it will load it and replay subsequent log(s), skipping the 
incomplete snapshot file.  It's possible that the skipped file had some 
transactions which are not present in the replayed transaction logs.  Since 
quorum synchronization is based on last transaction ID of each server, this 
will never get noticed, resulting in inconsistency between servers and possible 
data loss.

Following sequence of events describes a sample scenario where this can happen:

# Server F is a follower in a Zookeeper ensemble.
# F's most recent valid snapshot file is named "snapshot.10" containing state 
up to zxid = 10.  F is currently writing to the transaction log file "log.11", 
with the most recent zxid = 20.
# Fresh round of election.
# F receives a few new transactions 21 to 30 from new leader L as the "diff".  
Current server behavior is to dump current state plus diff to a new snapshot 
file, "snapshot.30".
# F finalizes the snapshot file, but file contents are still buffered in OS 
caches.  Zookeeper does not sync snapshot file contents to disk.
# F receives a new transaction 31 from the leader, which it appends to the 
existing transaction log file, "log.11" and syncs the file to disk.
# Server machine crashes or is cold rebooted.
# After recovery, snapshot file "snapshot.30" may not exist or may be empty.  
See below for why that may happen.
# In either case, F looks for the last finalized snapshot file, finds and loads 
"snapshot.10".  It then replays transactions from "log.11".  Ultimately, its 
last seen zxid will be 31, but it would not have replayed transactions 21 to 30 
received via the "diff" from the leader.
# Clients which are connected to F may see different data than clients 
connected to other members of the ensemble, violating single system image 
invariant.  Also, if F were to become a leader at some point, it could use its 
state to seed other servers, and they all could lose the writes in the missing 
interval above.

*Notes:*
- Reason why snapshot file may be missing or incomplete:
-- Zookeeper does not sync the data directory after creating a snapshot file.  
Even if a newly created file is synced to disk, if the corresponding directory 
entry is not, then the file will not be visible in the namespace.
-- Zookeeper does not sync snapshot files.  So, they may be empty or incomplete 
during recovery from an untimely crash.
- In step (6) above, the server could also have written the new transaction 31 
to a new log file, "log.31".  The final outcome would still be the same.

We are able to deterministically reproduce this problem using the following 
steps:

# Create a new Zookeeper ensemble on 3 hosts: A, B, and C.
# Ensured each server has at least one snapshot file in its data dir.
# Stop Zookeeper process on server A.
# Slow down disk syncs on server A (see example script below). This ensures 
that snapshot files written by Zookeeper don't make it to disk spontaneously.  
Log files will be written to disk as Zookeeper explicitly issues a sync call on 
such files.
# Connect to server B and create a new znode /test1.
# Start Zookeeper process on A, wait for it to write a new snapshot to its 
datadir.  This snapshot would contain /test1 but it won’t be synced to disk yet.
# Connect to A and verify that /test1 is visible.
# Connect to B and create another znode /test2.  This will cause A’s 
transaction log to grow further to receive /test2.
# Cold reboot A.
# A’s last snapshot is a zero-sized file or is missing altogether since it did 
not get synced to disk before reboot.  We have seen both in different runs.
# Connect to A and verify that /test1 does not exist.  It exists on B and C.

Slowing down disk syncs:
{noformat}
echo 36 | sudo tee /proc/sys/vm/dirty_writeback_centisecs
echo 36 | sudo tee /proc/sys/vm/dirty_expire_centisecs
echo 99 | sudo tee /proc/sys/vm/dirty_background_ratio
echo 99 | sudo tee /proc/sys/vm/dirty_ratio
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2310) Snapshot files must be synced to prevent inconsistency or data loss

2015-11-01 Thread Abhishek Rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rai updated ZOOKEEPER-2310:

Attachment: zookeeper-2310.patch

Patch for above issue which:
# Syncs snapshot file
# Syncs snapshot directory
# Debug log message about snapshot file once written.

> Snapshot files must be synced to prevent inconsistency or data loss
> ---
>
> Key: ZOOKEEPER-2310
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2310
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Abhishek Rai
> Attachments: zookeeper-2310.patch
>
>
> Today, Zookeeper server syncs transaction log files to disk by default, but 
> does not sync snapshot files.  Consequently, an untimely crash may result in 
> a lost or incomplete snapshot file.  During recovery, if the server finds a 
> valid older snapshot file, it will load it and replay subsequent log(s), 
> skipping the incomplete snapshot file.  It's possible that the skipped file 
> had some transactions which are not present in the replayed transaction logs. 
>  Since quorum synchronization is based on last transaction ID of each server, 
> this will never get noticed, resulting in inconsistency between servers and 
> possible data loss.
> Following sequence of events describes a sample scenario where this can 
> happen:
> # Server F is a follower in a Zookeeper ensemble.
> # F's most recent valid snapshot file is named "snapshot.10" containing state 
> up to zxid = 10.  F is currently writing to the transaction log file 
> "log.11", with the most recent zxid = 20.
> # Fresh round of election.
> # F receives a few new transactions 21 to 30 from new leader L as the "diff". 
>  Current server behavior is to dump current state plus diff to a new snapshot 
> file, "snapshot.30".
> # F finalizes the snapshot file, but file contents are still buffered in OS 
> caches.  Zookeeper does not sync snapshot file contents to disk.
> # F receives a new transaction 31 from the leader, which it appends to the 
> existing transaction log file, "log.11" and syncs the file to disk.
> # Server machine crashes or is cold rebooted.
> # After recovery, snapshot file "snapshot.30" may not exist or may be empty.  
> See below for why that may happen.
> # In either case, F looks for the last finalized snapshot file, finds and 
> loads "snapshot.10".  It then replays transactions from "log.11".  
> Ultimately, its last seen zxid will be 31, but it would not have replayed 
> transactions 21 to 30 received via the "diff" from the leader.
> # Clients which are connected to F may see different data than clients 
> connected to other members of the ensemble, violating single system image 
> invariant.  Also, if F were to become a leader at some point, it could use 
> its state to seed other servers, and they all could lose the writes in the 
> missing interval above.
> *Notes:*
> - Reason why snapshot file may be missing or incomplete:
> -- Zookeeper does not sync the data directory after creating a snapshot file. 
>  Even if a newly created file is synced to disk, if the corresponding 
> directory entry is not, then the file will not be visible in the namespace.
> -- Zookeeper does not sync snapshot files.  So, they may be empty or 
> incomplete during recovery from an untimely crash.
> - In step (6) above, the server could also have written the new transaction 
> 31 to a new log file, "log.31".  The final outcome would still be the same.
> We are able to deterministically reproduce this problem using the following 
> steps:
> # Create a new Zookeeper ensemble on 3 hosts: A, B, and C.
> # Ensured each server has at least one snapshot file in its data dir.
> # Stop Zookeeper process on server A.
> # Slow down disk syncs on server A (see example script below). This ensures 
> that snapshot files written by Zookeeper don't make it to disk spontaneously. 
>  Log files will be written to disk as Zookeeper explicitly issues a sync call 
> on such files.
> # Connect to server B and create a new znode /test1.
> # Start Zookeeper process on A, wait for it to write a new snapshot to its 
> datadir.  This snapshot would contain /test1 but it won’t be synced to disk 
> yet.
> # Connect to A and verify that /test1 is visible.
> # Connect to B and create another znode /test2.  This will cause A’s 
> transaction log to grow further to receive /test2.
> # Cold reboot A.
> # A’s last snapshot is a zero-sized file or is missing altogether since it 
> did not get synced to disk before reboot.  We have seen both in different 
> runs.
> # Connect to A and verify that /test1 does not exist.  It exists on B and C.
> Slowing down disk syncs:
> {noformat}
> echo 36 | sudo tee /proc/sys/vm/dirty_writeback_centisecs
> echo 

[jira] [Commented] (ZOOKEEPER-2310) Snapshot files must be synced to prevent inconsistency or data loss

2015-11-01 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984796#comment-14984796
 ] 

Abhishek Rai commented on ZOOKEEPER-2310:
-

Thanks for your response [~fpj].  I think my claim about the "diff" being 
present in the snapshot and not in the log looks incorrect.  When pushing a 
diff, leader (LearnerHandler) pushes individual transactions which the follower 
writes to its log (Learner.syncWithLeader).  Leader eventually sends a 
"NEWLEADER", in response, the follower snapshots.  Ultimately, the diff is 
visible in both the log and snapshot.

But consider the case of leader (LearnerHandler) pushing a full snapshot to the 
follower.  In this case, the follower does not receive the individual 
transactions contributing to that snapshot.  In fact, it's not practical to do 
so - by design, the snapshot is sent when the diff is too large.  Thus, the 
follower can have a snapshot which reflects some transactions that are not 
present in the log.  After writing the snapshot, the follower continues writing 
subsequent transactions to the log.

Imagine a crash + recovery is induced at this point, such that the latest 
snapshot file is incomplete or non-existent.  The follower would try to load 
the preceding healthy snapshot, and replay the log since then.  Since the log 
does not contain some transactions corresponding to the missing snapshot file, 
the follower would never find out about them.  This would cause the 
inconsistency scenario I described above.

Without syncing the snapshot file (and its parent directory) to disk, we cannot 
guarantee that the snapshot file exists during recovery.  And the loss of 
finalized snapshot files can result in data loss since all transactions may not 
be present in the log.

> Snapshot files must be synced to prevent inconsistency or data loss
> ---
>
> Key: ZOOKEEPER-2310
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2310
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Abhishek Rai
>Assignee: Abhishek Rai
> Attachments: zookeeper-2310.patch
>
>
> Today, Zookeeper server syncs transaction log files to disk by default, but 
> does not sync snapshot files.  Consequently, an untimely crash may result in 
> a lost or incomplete snapshot file.  During recovery, if the server finds a 
> valid older snapshot file, it will load it and replay subsequent log(s), 
> skipping the incomplete snapshot file.  It's possible that the skipped file 
> had some transactions which are not present in the replayed transaction logs. 
>  Since quorum synchronization is based on last transaction ID of each server, 
> this will never get noticed, resulting in inconsistency between servers and 
> possible data loss.
> Following sequence of events describes a sample scenario where this can 
> happen:
> # Server F is a follower in a Zookeeper ensemble.
> # F's most recent valid snapshot file is named "snapshot.10" containing state 
> up to zxid = 10.  F is currently writing to the transaction log file 
> "log.11", with the most recent zxid = 20.
> # Fresh round of election.
> # F receives a few new transactions 21 to 30 from new leader L as the "diff". 
>  Current server behavior is to dump current state plus diff to a new snapshot 
> file, "snapshot.30".
> # F finalizes the snapshot file, but file contents are still buffered in OS 
> caches.  Zookeeper does not sync snapshot file contents to disk.
> # F receives a new transaction 31 from the leader, which it appends to the 
> existing transaction log file, "log.11" and syncs the file to disk.
> # Server machine crashes or is cold rebooted.
> # After recovery, snapshot file "snapshot.30" may not exist or may be empty.  
> See below for why that may happen.
> # In either case, F looks for the last finalized snapshot file, finds and 
> loads "snapshot.10".  It then replays transactions from "log.11".  
> Ultimately, its last seen zxid will be 31, but it would not have replayed 
> transactions 21 to 30 received via the "diff" from the leader.
> # Clients which are connected to F may see different data than clients 
> connected to other members of the ensemble, violating single system image 
> invariant.  Also, if F were to become a leader at some point, it could use 
> its state to seed other servers, and they all could lose the writes in the 
> missing interval above.
> *Notes:*
> - Reason why snapshot file may be missing or incomplete:
> -- Zookeeper does not sync the data directory after creating a snapshot file. 
>  Even if a newly created file is synced to disk, if the corresponding 
> directory entry is not, then the file will not be visible in the namespace.
> -- Zookeeper does not sync snapshot files.  So, they may