yihua commented on code in PR #18678:
URL: https://github.com/apache/hudi/pull/18678#discussion_r3178430212
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -220,7 +220,8 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath:
String,
// This will enable us to take advantage of spark's file splitting
capability.
// For overly large single files, we can use multiple concurrent tasks to
read them, thereby reducing the overall job reading time consumption
val superSplitable = super.isSplitable(sparkSession, options, path)
- val splitable = !isMOR && !isIncremental && !isBootstrap && superSplitable
+ val isLance = hoodieFileFormat == HoodieFileFormat.LANCE
+ val splitable = !isMOR && !isIncremental && !isBootstrap && !isLance &&
superSplitable
Review Comment:
Could we follow up to revisit all such patterns on the hardcoded
format-related condition? It should be made pluggable through file format
adapter, so that adding a new file format should only change the adapter
implementation instead of every such condition in different places.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]