voonhous commented on code in PR #7480: URL: https://github.com/apache/hudi/pull/7480#discussion_r1053930850
########## hudi-spark-datasource/hudi-spark3.2plus-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32PlusHoodieParquetFileFormat.scala: ########## @@ -228,7 +228,24 @@ class Spark32PlusHoodieParquetFileFormat(private val shouldAppendPartitionValues SparkInternalSchemaConverter.collectTypeChangedCols(querySchemaOption.get(), mergedInternalSchema) } else { - new java.util.HashMap() + val implicitTypeChangeInfo: java.util.Map[Integer, Pair[DataType, DataType]] = new java.util.HashMap() Review Comment: > Nonetheless, a configuration key can be introduced where in this behaviour is enabled by default. > > @voonhous Maybe we need a parameter to control this feature, not all tables need to follow this logic > > Hmmm, CMIIW, Hudi has been relying on ASR for schema resolution since `hudi-0.7`. As such, I was under the impression that this should be a default behaviour. > > Nonetheless, a configuration key can be introduced where in this behaviour is enabled by default. > > However, validation will need to be performed such that the choice between ASR/HFSE is mutually exclusive. i.e. if ASR is enabled, HFSE should be disabled and vice-versa. WDYT? @xiarixiaoyao I looked at the code and realised that there is no way validate configuration values based on other configuration values. I wanted to add a `AVRO_SCHEMA_RESOLUTION_ENABLE` configuration key with the description: ```text Enable support for schema evolution using Avro's Schema Resolution (ASR). This configuration is mutually exclusive to Hudi's Full/Comprehensive Schema Evolution (HFSE) feature via the configuration key (hoodie.schema.on.read.enable). The choice between ASR/HFSE is mutually exclusive. i.e. if ASR is enabled, HFSE should be disabled and vice-versa. HFSE will take precedence over ASR. i.e. Enabling both HFSE and ASR will cause Hudi to default to HFSE for schema evolution. ``` Given that this is the intended behaviour and lack of configuration validation, I see no benefit for introducing `AVRO_SCHEMA_RESOLUTION_ENABLE`. Since `SCHEMA_EVOLUTION_ENABLE` will take precedence over `AVRO_SCHEMA_RESOLUTION_ENABLE`, I think we can rely on the former (`SCHEMA_EVOLUTION_ENABLE`) to determine if ASR should be used. If `SCHEMA_EVOLUTION_ENABLE` is enabled, use HFSE, else, fallback to ASR. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org