hudi-agent commented on code in PR #18385:
URL: https://github.com/apache/hudi/pull/18385#discussion_r3191802421
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/CloudObjectsSelectorCommon.java:
##########
@@ -546,6 +547,34 @@ private static Option<String> getPropVal(TypedProperties
props, ConfigProperty<S
return Option.empty();
}
+ /**
+ * Enables Spark {@code mergeSchema} for cloud object batches of Parquet or
ORC files when configured, so
+ * heterogeneous files in one sync round share a merged struct type. Applied
before user
+ * {@link CloudSourceConfig#SPARK_DATASOURCE_OPTIONS} so explicit reader
options can override.
+ *
+ * <p>Spark's native Parquet reader honors {@code mergeSchema} on all
supported versions. Spark's native ORC
+ * reader honors it on Spark 3.0+ (the native ORC impl is the default since
Spark 2.4); on older runtimes the
+ * option is silently ignored, which is harmless.
+ */
+ private DataFrameReader applyMergeSchemaOption(DataFrameReader reader,
String fileFormat) {
+ if (!isParquetOrOrcFileFormat(fileFormat)) {
+ return reader;
+ }
+ if (!getBooleanWithAltKeys(properties, CLOUD_INCREMENTAL_MERGE_SCHEMA)) {
+ return reader;
+ }
+ return reader.option("mergeSchema", "true");
+ }
+
+ // Package-private for unit testing — see TestCloudObjectsSelectorCommon.
+ static boolean isParquetOrOrcFileFormat(String fileFormat) {
+ if (fileFormat == null) {
+ return false;
+ }
Review Comment:
🤖 nit: could you rename `f` to something like `trimmed` or
`normalizedFormat`? Single-letter locals make sense in tiny lambdas but here
it's a named local in a package-private method that test code calls directly,
so a slightly longer name would make the reader's intent clearer at a glance.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]