ramitg254 commented on code in PR #6565:
URL: https://github.com/apache/hive/pull/6565#discussion_r3504394459
##########
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java:
##########
@@ -429,6 +446,24 @@ public static void setDefaultValues(Record record,
List<Types.NestedField> missi
}
}
+ /**
+ * Backfills struct column that is null on read using nested {@code
initialDefault} metadata.
+ * This applies to rows written before {@code ADD COLUMNS} added the struct.
+ * Spec allows struct defaults as {@code {}} (see
https://iceberg.apache.org/spec/#default-values), but
+ * {@code UpdateSchema} add column only supports primitives today;
+ * if empty structs are allowed, this backfill can be removed.
+ */
+ public static void backfillStructInitialDefaults(Record iceRecord,
List<Types.NestedField> columns) {
+ for (Types.NestedField field : columns) {
+ if (field.type().isStructType() && iceRecord.getField(field.name()) ==
null) {
Review Comment:
this works fine in the case you mentioned because the columns passed here
are `readSchema.columns()` which is the current schema columns and chose
field.name() over field.id() as Record has direct getter to find field value
via get getField which is not the case with fieldId we need loop through all
columns and match id to find the field itself,
also I have optimized it please check the latest one.
and verifed as well via these set of queries showing correct results:
```
set hive.vectorized.execution.enabled=false;
CREATE TABLE ice_drop_read (
id INT)
STORED BY ICEBERG stored as parquet
TBLPROPERTIES ('format-version'='3');
INSERT INTO ice_drop_read (id) VALUES (1);
ALTER TABLE ice_drop_read ADD COLUMNS (meta INT DEFAULT 99);
INSERT INTO ice_drop_read (id, meta) VALUES (2, 10);
ALTER TABLE ice_drop_read DROP COLUMN meta;
ALTER TABLE ice_drop_read ADD COLUMNS (meta STRUCT<x:INT, y:INT> DEFAULT
'{"x":10,"y":20}');
INSERT INTO ice_drop_read (id) VALUES (3);
SELECT * FROM ice_drop_read ORDER BY id;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]