ramitg254 commented on code in PR #6565:
URL: https://github.com/apache/hive/pull/6565#discussion_r3504394459


##########
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java:
##########
@@ -429,6 +446,24 @@ public static void setDefaultValues(Record record, 
List<Types.NestedField> missi
     }
   }
 
+  /**
+   * Backfills struct column that is null on read using nested {@code 
initialDefault} metadata.
+   * This applies to rows written before {@code ADD COLUMNS} added the struct.
+   * Spec allows struct defaults as {@code {}} (see 
https://iceberg.apache.org/spec/#default-values), but
+   * {@code UpdateSchema} add column only supports primitives today;
+   * if empty structs are allowed, this backfill can be removed.
+   */
+  public static void backfillStructInitialDefaults(Record iceRecord, 
List<Types.NestedField> columns) {
+    for (Types.NestedField field : columns) {
+      if (field.type().isStructType() && iceRecord.getField(field.name()) == 
null) {

Review Comment:
   this works fine in the case you mentioned because the columns passed here 
are `readSchema.columns()` which is the current schema columns and chose 
field.name() over field.id() as Record has direct getter to find field value 
via get getField which is not the case with fieldId we need loop through all 
columns and match id to find the field itself, 
   also I have optimized it please check the latest one.
   and verifed as well via these set of queries showing correct results:
   ```
   set hive.vectorized.execution.enabled=false;
   CREATE TABLE ice_drop_read (
     id INT)
   STORED BY ICEBERG stored as parquet
   TBLPROPERTIES ('format-version'='3');
   INSERT INTO ice_drop_read (id) VALUES (1);
   ALTER TABLE ice_drop_read ADD COLUMNS (meta INT DEFAULT 99);
   INSERT INTO ice_drop_read (id, meta) VALUES (2, 10);
   ALTER TABLE ice_drop_read DROP COLUMN meta;
   ALTER TABLE ice_drop_read ADD COLUMNS (meta STRUCT<x:INT, y:INT> DEFAULT 
'{"x":10,"y":20}');
   INSERT INTO ice_drop_read (id) VALUES (3);
   SELECT * FROM ice_drop_read ORDER BY id;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to