danny0405 commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1357686036
########## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaCompatibility.java: ########## @@ -372,7 +372,8 @@ private SchemaCompatibilityResult calculateCompatibility(final Schema reader, fi return (writer.getType() == Type.STRING) ? result : result.mergedWith(typeMismatch(reader, writer, locations)); } case STRING: { - return (writer.getType() == Type.BYTES) ? result : result.mergedWith(typeMismatch(reader, writer, locations)); + return ((writer.getType() == Type.BYTES) || (writer.getType() == Type.INT) || (writer.getType() == Type.LONG) Review Comment: Maybe we can have a utilities to decide whether the given type belongs to a NUMERIC type. ########## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaCompatibility.java: ########## @@ -372,7 +372,8 @@ private SchemaCompatibilityResult calculateCompatibility(final Schema reader, fi return (writer.getType() == Type.STRING) ? result : result.mergedWith(typeMismatch(reader, writer, locations)); } case STRING: { - return (writer.getType() == Type.BYTES) ? result : result.mergedWith(typeMismatch(reader, writer, locations)); + return ((writer.getType() == Type.BYTES) || (writer.getType() == Type.INT) || (writer.getType() == Type.LONG) Review Comment: Maybe we can have a utility to decide whether the type is a numeric data type. ########## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java: ########## @@ -71,6 +71,14 @@ public class HoodieCommonConfig extends HoodieConfig { + " operation will fail schema compatibility check. Set this option to true will make the newly added " + " column nullable to successfully complete the write operation."); + public static final ConfigProperty<String> ADD_NULL_FOR_DELETED_COLUMNS = ConfigProperty Review Comment: +1 for this to be default behavior or just fail the write operation. ########## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ########## @@ -1131,6 +1196,25 @@ private static Schema getActualSchemaFromUnion(Schema schema, Object data) { return actualSchema; } + private static Schema getActualSchemaFromUnion(Schema schema) { + Schema actualSchema; + if (schema.getType() != UNION) { + return schema; + } + if (schema.getTypes().size() == 2 + && schema.getTypes().get(0).getType() == Schema.Type.NULL) { + actualSchema = schema.getTypes().get(1); + } else if (schema.getTypes().size() == 2 + && schema.getTypes().get(1).getType() == Schema.Type.NULL) { + actualSchema = schema.getTypes().get(0); + } else if (schema.getTypes().size() == 1) { + actualSchema = schema.getTypes().get(0); Review Comment: +1 ########## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java: ########## @@ -206,6 +213,9 @@ public IndexedRecord next() { IndexedRecord record = this.reader.read(null, decoder); this.dis.skipBytes(recordLength); this.readRecords++; + if (this.promotedSchema.isPresent()) { + return HoodieAvroUtils.rewriteRecordWithNewSchema(record, this.promotedSchema.get()); Review Comment: Is it possible we read the record with the `promotedSchema` directly so that there is no need to rewrite each record ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org