Re: [PR] [HUDI-8632] Fix bootstrap support with file group reader in compaction and clustering in Spark [hudi]

via GitHub Tue, 21 Jan 2025 15:50:43 -0800


yihua commented on code in PR #12490:
URL: https://github.com/apache/hudi/pull/12490#discussion_r1924503544



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -575,7 +575,7 @@ public Iterator<InternalRow> call(ClusteringOperation 
clusteringOperation) throw
             getHoodieTable().getMetaClient(),
             getHoodieTable().getMetaClient().getTableConfig().getProps(),
             0,
-            Long.MAX_VALUE,
+            -1,

Review Comment:
   We want to avoid file system calls on getting the status of files as much as 
possible.  In most cases, the file system view already provides the file 
information including the length.  Since the compaction and clustering plans do 
not store the file length information, we have to pass in `-1` here for the 
file group reader to find the length in an extra file system call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8632] Fix bootstrap support with file group reader in compaction and clustering in Spark [hudi]

Reply via email to