Github user enixon commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/560#discussion_r203474532 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -399,8 +399,30 @@ public void save(DataTree dataTree, File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid)); LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid), snapshotFile); - snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); - + try { + snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); + } catch (IOException e) { + if (snapshotFile.length() == 0) { --- End diff -- I really like the idea! Clearly this patch is aimed at closing one very particular kind of error case around snapshot writing and the idea generalizes to many more errors. My only concern with "always delete" is that I do not know how wide a net that casts and whether we might end up deleting a legitimately useful snapshot under some circumstances. Unfortunately, I don't have time to do the testing required to validate the safety of the more general approach. If that sounds reasonable, I'd propose that we commit on this patch and close the hole created from this particular issue (snapshot problems on no disk space) and I'll open a separate JIRA with details about extending the solution so someone can pick up the issue and do the necessary testing.
---