fanyang89 commented on code in PR #2141:
URL: https://github.com/apache/zookeeper/pull/2141#discussion_r1758105294


##########
zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnLog.java:
##########
@@ -701,14 +708,27 @@ public long getStorageSize() {
 
         /**
          * go to the next logfile
+         *
          * @return true if there is one and false if there is no
          * new file to be read
-         * @throws IOException
          */
         private boolean goToNextLog() throws IOException {
-            if (storedFiles.size() > 0) {
+            if (!storedFiles.isEmpty()) {
                 this.logFile = storedFiles.remove(storedFiles.size() - 1);
-                ia = createInputArchive(this.logFile);
+                try {
+                    ia = createInputArchive(this.logFile);
+                } catch (EOFException ex) {
+                    // If this file is the last log file in the database and 
is empty,
+                    // it means that the last time the file was created
+                    // before the header was written.
+                    if (storedFiles.isEmpty() && this.logFile.length() == 0) {
+                        boolean deleted = this.logFile.delete();

Review Comment:
   Delete failures are usually due to permissions or disk failure. For example, 
the `zookeeper` user only has read access to the data directory, so deleting 
the empty log file will fail. 1) If the disk completes the I/O in less time 
than the I/O timeout, it may be retrying in the blk layer, just slower, but it 
won't fail. 2) When the disk I/O timeout occurs, most filesystems will remount 
as read-only, and zookeeper will suffer with insufficient permissions. 3) I/O 
may not always return properly if the disk is broken from my experience, the 
process is blocked in the uninterruptible state. When delete failed (filesystem 
readonly) and throw I/O exception, the server restarted, and failed again. If 
we didn't throw the exception, the server will fail on the next write with 
fsync (usually writing new epoch files after FLE completed and reaches 
initLimit timeout).
   I don't think we can do much more about the deletion failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to