[GitHub] [hudi] nsivabalan commented on a change in pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues

2022-02-01 Thread GitBox


nsivabalan commented on a change in pull request #4716:
URL: https://github.com/apache/hudi/pull/4716#discussion_r797157620



##
File path: hudi-common/src/main/avro/HoodieRollbackMetadata.avsc
##
@@ -38,14 +38,6 @@
 "type": "long",
 "doc": "Size of this file in bytes"
 }
-}], "default":null },

Review comment:
   hmmm, interesting. Can you try this out explicitly. write an avro using 
master branch. and then try to read it using this branch. or you can just try 
it out using a stand alone java main class too, which ever works. 
   just wanted to ensure we are good here. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues

2022-02-01 Thread GitBox


nsivabalan commented on a change in pull request #4716:
URL: https://github.com/apache/hudi/pull/4716#discussion_r797155269



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/MarkerBasedRollbackStrategy.java
##
@@ -90,42 +86,41 @@ public MarkerBasedRollbackStrategy(HoodieTable 
table, HoodieEngineCo
 Collections.singletonList(fullDeletePath.toString()),
 Collections.emptyMap());
   case APPEND:
+// NOTE: This marker file-path does NOT correspond to a log-file, 
but rather is a phony
+//   path serving as a "container" for the following 
components:
+//  - Base file's file-id
+//  - Base file's commit instant
+//  - Partition path
 return 
getRollbackRequestForAppend(WriteMarkers.stripMarkerSuffix(markerFilePath));
   default:
 throw new HoodieRollbackException("Unknown marker type, during 
rollback of " + instantToRollback);
 }
-  }, parallelism).stream().collect(Collectors.toList());
+  }, parallelism);
 } catch (Exception e) {
   throw new HoodieRollbackException("Error rolling back using marker files 
written for " + instantToRollback, e);
 }
   }
 
-  protected HoodieRollbackRequest getRollbackRequestForAppend(String 
appendBaseFilePath) throws IOException {
-Path baseFilePathForAppend = new Path(basePath, appendBaseFilePath);
+  protected HoodieRollbackRequest getRollbackRequestForAppend(String 
markerFilePath) throws IOException {
+Path baseFilePathForAppend = new Path(basePath, markerFilePath);
 String fileId = FSUtils.getFileIdFromFilePath(baseFilePathForAppend);
 String baseCommitTime = 
FSUtils.getCommitTime(baseFilePathForAppend.getName());
-String partitionPath = FSUtils.getRelativePartitionPath(new 
Path(basePath), new Path(basePath, appendBaseFilePath).getParent());
-Map writtenLogFileSizeMap = 
getWrittenLogFileSizeMap(partitionPath, baseCommitTime, fileId);
-Map writtenLogFileStrSizeMap = new HashMap<>();
-for (Map.Entry entry : writtenLogFileSizeMap.entrySet()) 
{
-  writtenLogFileStrSizeMap.put(entry.getKey().getPath().toString(), 
entry.getValue());
-}
-return new HoodieRollbackRequest(partitionPath, fileId, baseCommitTime, 
Collections.emptyList(), writtenLogFileStrSizeMap);
+String relativePartitionPath = FSUtils.getRelativePartitionPath(new 
Path(basePath), baseFilePathForAppend.getParent());
+Path partitionPath = FSUtils.getPartitionPath(config.getBasePath(), 
relativePartitionPath);
+
+// NOTE: Since we're rolling back incomplete Delta Commit, it only could 
have appended its
+//   block to the latest log-file
+// TODO(HUDI-1517) use provided marker-file's path instead
+HoodieLogFile latestLogFile = 
FSUtils.getLatestLogFile(table.getMetaClient().getFs(), partitionPath, fileId,
+HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime).get();
+
+// NOTE: Marker's don't carry information about the cumulative size of the 
blocks that have been appended,
+//   therefore we simply stub this value.
+Map logFilesWithBlocsToRollback =

Review comment:
   based on offline discussion, this will case an issue in storage schemes 
like hdfs where rollback block could be added to existing log file. will take 
it up as a follow up PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues

2022-02-01 Thread GitBox


nsivabalan commented on a change in pull request #4716:
URL: https://github.com/apache/hudi/pull/4716#discussion_r797153919



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##
@@ -442,8 +443,11 @@ private void updateTableMetadata(HoodieTable>, JavaRD
 metaClient, config, context, 
SparkUpgradeDowngradeHelper.getInstance())
 .run(HoodieTableVersion.current(), instantTime);
 metaClient.reloadActiveTimeline();
-initializeMetadataTable(Option.of(instantTime));
   }
+  // Initialize Metadata Table to make sure it's bootstrapped _before_ the 
operation,
+  // if it didn't exist before
+  // See https://issues.apache.org/jira/browse/HUDI-3343 for more details
+  initializeMetadataTable(Option.of(instantTime));

Review comment:
   based on offline discussion, we are good here. MDT intitalization may 
not kick in if there are any pending operations in data table




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues

2022-01-29 Thread GitBox


nsivabalan commented on a change in pull request #4716:
URL: https://github.com/apache/hudi/pull/4716#discussion_r795095166



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##
@@ -442,8 +443,11 @@ private void updateTableMetadata(HoodieTable>, JavaRD
 metaClient, config, context, 
SparkUpgradeDowngradeHelper.getInstance())
 .run(HoodieTableVersion.current(), instantTime);
 metaClient.reloadActiveTimeline();
-initializeMetadataTable(Option.of(instantTime));
   }
+  // Initialize Metadata Table to make sure it's bootstrapped _before_ the 
operation,
+  // if it didn't exist before
+  // See https://issues.apache.org/jira/browse/HUDI-3343 for more details
+  initializeMetadataTable(Option.of(instantTime));

Review comment:
   I get the fix here. But lets say there is a failed commit just before 
this current writer. Wouldn't the MDT bootstrap also include the files from the 
failed commit ?  
   Or are we gonna rely on MDT bootstrap will not happen if there is any 
pending operation in data table. And so unless the rollback of the failed 
commit completes, we may not trigger MDT bootstrap. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues

2022-01-29 Thread GitBox


nsivabalan commented on a change in pull request #4716:
URL: https://github.com/apache/hudi/pull/4716#discussion_r795093095



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/MarkerBasedRollbackStrategy.java
##
@@ -90,42 +86,41 @@ public MarkerBasedRollbackStrategy(HoodieTable 
table, HoodieEngineCo
 Collections.singletonList(fullDeletePath.toString()),
 Collections.emptyMap());
   case APPEND:
+// NOTE: This marker file-path does NOT correspond to a log-file, 
but rather is a phony
+//   path serving as a "container" for the following 
components:
+//  - Base file's file-id
+//  - Base file's commit instant
+//  - Partition path
 return 
getRollbackRequestForAppend(WriteMarkers.stripMarkerSuffix(markerFilePath));
   default:
 throw new HoodieRollbackException("Unknown marker type, during 
rollback of " + instantToRollback);
 }
-  }, parallelism).stream().collect(Collectors.toList());
+  }, parallelism);
 } catch (Exception e) {
   throw new HoodieRollbackException("Error rolling back using marker files 
written for " + instantToRollback, e);
 }
   }
 
-  protected HoodieRollbackRequest getRollbackRequestForAppend(String 
appendBaseFilePath) throws IOException {
-Path baseFilePathForAppend = new Path(basePath, appendBaseFilePath);
+  protected HoodieRollbackRequest getRollbackRequestForAppend(String 
markerFilePath) throws IOException {
+Path baseFilePathForAppend = new Path(basePath, markerFilePath);
 String fileId = FSUtils.getFileIdFromFilePath(baseFilePathForAppend);
 String baseCommitTime = 
FSUtils.getCommitTime(baseFilePathForAppend.getName());
-String partitionPath = FSUtils.getRelativePartitionPath(new 
Path(basePath), new Path(basePath, appendBaseFilePath).getParent());
-Map writtenLogFileSizeMap = 
getWrittenLogFileSizeMap(partitionPath, baseCommitTime, fileId);
-Map writtenLogFileStrSizeMap = new HashMap<>();
-for (Map.Entry entry : writtenLogFileSizeMap.entrySet()) 
{
-  writtenLogFileStrSizeMap.put(entry.getKey().getPath().toString(), 
entry.getValue());
-}
-return new HoodieRollbackRequest(partitionPath, fileId, baseCommitTime, 
Collections.emptyList(), writtenLogFileStrSizeMap);
+String relativePartitionPath = FSUtils.getRelativePartitionPath(new 
Path(basePath), baseFilePathForAppend.getParent());
+Path partitionPath = FSUtils.getPartitionPath(config.getBasePath(), 
relativePartitionPath);
+
+// NOTE: Since we're rolling back incomplete Delta Commit, it only could 
have appended its
+//   block to the latest log-file
+// TODO(HUDI-1517) use provided marker-file's path instead
+HoodieLogFile latestLogFile = 
FSUtils.getLatestLogFile(table.getMetaClient().getFs(), partitionPath, fileId,
+HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime).get();
+
+// NOTE: Marker's don't carry information about the cumulative size of the 
blocks that have been appended,
+//   therefore we simply stub this value.
+Map logFilesWithBlocsToRollback =

Review comment:
   I was also thinking if we should fix the way we store the file size in 
HoodieMetadataPayload.
   ```
HoodieMetadataFileInfo(long size, boolean isFullSize, boolean isDelete)
   ```
   With CommitMetadata, these will represent delta sizes. and during 
rollback/restore, these will represent full sizes. We can combine multiple 
metadata records based on whether its delta or full size. 
   

##
File path: hudi-common/src/main/avro/HoodieRollbackMetadata.avsc
##
@@ -38,14 +38,6 @@
 "type": "long",
 "doc": "Size of this file in bytes"
 }
-}], "default":null },

Review comment:
   this may not be backwards compatible while reading rollback metadata 
written w/ 0.10.0 or previous versions. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/MarkerBasedRollbackStrategy.java
##
@@ -90,42 +86,41 @@ public MarkerBasedRollbackStrategy(HoodieTable 
table, HoodieEngineCo
 Collections.singletonList(fullDeletePath.toString()),
 Collections.emptyMap());
   case APPEND:
+// NOTE: This marker file-path does NOT correspond to a log-file, 
but rather is a phony
+//   path serving as a "container" for the following 
components:
+//  - Base file's file-id
+//  - Base file's commit instant
+//  - Partition path
 return 
getRollbackRequestForAppend(WriteMarkers.stripMarkerSuffix(markerFilePath));
   default:
 throw new HoodieRollbackException("Unknown marker type, during 
rollback of " + instantToRol