nsivabalan commented on code in PR #11381:
URL: https://github.com/apache/hudi/pull/11381#discussion_r1623751675


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java:
##########
@@ -113,6 +120,21 @@ public static InternalSchema reconcileSchema(Schema 
incomingSchema, InternalSche
       typeChange.updateColumnType(col, inComingInternalSchema.findType(col));
     });
 
+    // mark columns missing from incoming schema as nullable
+    Set<String> visited = new HashSet<>();
+    diffFromOldSchema.stream()
+        // ignore meta fields
+        .filter(col -> !META_FIELD_NAMES.contains(col))
+        .sorted()
+        .forEach(col -> {
+          // if parent is marked as nullable, only update the parent and not 
all the missing children field
+          String parent = TableChangesHelper.getParentName(col);
+          if (!visited.contains(parent)) {
+            typeChange.updateColumnNullability(col, true);
+          }
+          visited.add(col);
+        });

Review Comment:
   I do see that we call this only when the config value is true within 
HoodieSchemaUtils. but I am trying to be future proof and avoid mis-use of the 
method. 
   



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java:
##########
@@ -113,6 +120,21 @@ public static InternalSchema reconcileSchema(Schema 
incomingSchema, InternalSche
       typeChange.updateColumnType(col, inComingInternalSchema.findType(col));
     });
 
+    // mark columns missing from incoming schema as nullable
+    Set<String> visited = new HashSet<>();
+    diffFromOldSchema.stream()
+        // ignore meta fields
+        .filter(col -> !META_FIELD_NAMES.contains(col))
+        .sorted()
+        .forEach(col -> {
+          // if parent is marked as nullable, only update the parent and not 
all the missing children field
+          String parent = TableChangesHelper.getParentName(col);
+          if (!visited.contains(parent)) {
+            typeChange.updateColumnNullability(col, true);
+          }
+          visited.add(col);
+        });

Review Comment:
   hey @the-other-tim-brown : where is the check for 
"hoodie.write.set.null.for.missing.columns". I don't see it being used or 
checked within AvroSchemaEvolutionUtils.reconcileSchema() 
   
   I am wondering how can we be sure that all callers will be calling this 
method only when "hoodie.write.set.null.for.missing.columns" is set to true.
   



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java:
##########
@@ -113,6 +120,21 @@ public static InternalSchema reconcileSchema(Schema 
incomingSchema, InternalSche
       typeChange.updateColumnType(col, inComingInternalSchema.findType(col));
     });
 
+    // mark columns missing from incoming schema as nullable
+    Set<String> visited = new HashSet<>();

Review Comment:
   hey @jonvex : when we added support for 
"hoodie.write.set.null.for.missing.columns", how did we miss this. the java 
docs for this method does call out that nulls will be injected for missing 
cols. but prior to this patch, I don't see that happening. 
   



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java:
##########
@@ -113,6 +120,21 @@ public static InternalSchema reconcileSchema(Schema 
incomingSchema, InternalSche
       typeChange.updateColumnType(col, inComingInternalSchema.findType(col));
     });
 
+    // mark columns missing from incoming schema as nullable
+    Set<String> visited = new HashSet<>();

Review Comment:
   do you think we can rename 2 variable names to make it more comprehensible 
or w/o ambiguity. 
   
   diffFromOldSchema -> missingFromIncoming
   diffFromEvolutionColumns -> additionsToTableSchema or additionsToOldSchema 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to