yihua commented on code in PR #12490:
URL: https://github.com/apache/hudi/pull/12490#discussion_r1924503544
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -575,7 +575,7 @@ public Iterator<InternalRow> call(ClusteringOperation
clusteringOperation) throw
getHoodieTable().getMetaClient(),
getHoodieTable().getMetaClient().getTableConfig().getProps(),
0,
- Long.MAX_VALUE,
+ -1,
Review Comment:
We want to avoid file system calls on getting the status of files as much as
possible. In most cases, the file system view already provides the file
information including the length. Since the compaction and clustering plans do
not store the file length information, we have to pass in `-1` here for the
file group reader to find the length in an extra file system call.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]