hudi-agent commented on code in PR #18715:
URL: https://github.com/apache/hudi/pull/18715#discussion_r3237115899


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.java:
##########
@@ -74,13 +74,35 @@ public String getFileExtension() {
     return extension;
   }
 
+  public boolean requiresSparkRecordType() {
+    return this == LANCE;
+  }
+
+  public HoodieRecord.HoodieRecordType 
resolveRecordType(HoodieRecord.HoodieRecordType fallback) {
+    return requiresSparkRecordType() ? HoodieRecord.HoodieRecordType.SPARK : 
fallback;

Review Comment:
   🤖 Could this introduce a regression for `.orc` (and `.hfile`) base files? 
Previously `filterKeysFromFile` always used the AVRO reader factory (which 
supports Parquet/ORC/HFile via `HoodieAvroFileReaderFactory`). After this 
change, when `config.getRecordMerger().getRecordType()` returns SPARK, 
`resolveRecordTypeForExtension(".orc", SPARK)` returns SPARK — and 
`HoodieSparkFileReaderFactory.newOrcFileReader` throws `HoodieIOException("Not 
support read orc file")` (same for HFile). It might be safer to special-case 
Lance only and keep returning AVRO for the others, or override the fallback 
only when `requiresSparkRecordType()` is true (which it already does — the 
concern is whether `getRecordMerger().getRecordType()` being SPARK + ORC base 
file was previously working via the AVRO path). @yihua could you confirm 
whether ORC bloom-index filtering with a Spark merger was previously functional?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to