sajjad-moradi commented on code in PR #14883:
URL: https://github.com/apache/pinot/pull/14883#discussion_r1927454933
##########
pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/LocalPinotFS.java:
##########
@@ -193,19 +200,78 @@ private static File toFile(URI uri) {
private static void copy(File srcFile, File dstFile, boolean recursive)
throws IOException {
- if (dstFile.exists()) {
- FileUtils.deleteQuietly(dstFile);
+ boolean oldFileExists = dstFile.exists(); // Automatically set
oldFileExists if the old file exists
+
+ // Step 1: Calculate CRC of srcFile/directory
+ long srcCrc = calculateCrc(srcFile);
+ File backupFile = null;
+
+ if (oldFileExists) {
+ // Step 2: Rename destination file if it exists
+ backupFile = new File(dstFile.getAbsolutePath() + BACKUP +
System.currentTimeMillis());
+ if (!dstFile.renameTo(backupFile)) {
+ throw new IOException("Failed to rename destination file to backup.");
+ }
}
- if (srcFile.isDirectory()) {
- if (recursive) {
- FileUtils.copyDirectory(srcFile, dstFile);
+
+ // If any error occurs during the copy process, we want to restore the
destination file from the backup
+ try {
+ // Step 3: Copy the file or directory
+ if (srcFile.isDirectory()) {
+ if (recursive) {
+ FileUtils.copyDirectory(srcFile, dstFile);
+ } else {
+ throw new IOException(srcFile.getAbsolutePath() + " is a directory
and recursive copy is not enabled.");
+ }
+ } else {
+ FileUtils.copyFile(srcFile, dstFile);
Review Comment:
I think this is still problematic. For example for the case of refreshing an
existing segment, if controller restarts during the copy operation, the
corrupted segment with dstFile name remains in storage layer, which is not
desirable.
A simpler way is to copy srsFile to a temp file in deep store. Then do a
rename/move from temp file to dstFile. If copy fails for any reason, it's on a
temp file, so there's no harm. If copy passes, then since rename is atomic, we
should be in a good state in both failure or success of rename.
IMO if we do ^^, the CRC check is not needed here. If we still have corner
cases that CRC check can help, we can create a periodic task to check CRC.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]