wmoustafa commented on code in PR #4242:
URL: https://github.com/apache/iceberg/pull/4242#discussion_r877311220


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java:
##########
@@ -79,11 +79,29 @@ private static <T> T visitRecord(Types.StructType struct, 
Schema record, AvroSch
   private static <T> T visitUnion(Type type, Schema union, 
AvroSchemaWithTypeVisitor<T> visitor) {
     List<Schema> types = union.getTypes();
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit((Type) null, branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+
+    // simple union case
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit((Type) null, branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else { // complex union case
+      Preconditions.checkArgument(type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: 
%s", type, union);

Review Comment:
   What about the second option, where we expect them to be in the same order? 
(see this [PR](https://github.com/linkedin/iceberg/pull/108) to support missing 
fields in the case of projection pruning)? This approach will also make sure 
that the deep types also match.
   For the above suggestion, I am a bit worried about using only the top level 
types since it will fail in unexpected places and could lead to cryptic error 
messages if things go wrong. Also, can we list the whole type hierarchy 
instead? 
   
   That said, does it need to be retrofit into name mapping? I feel we could 
implement it without extending the name mapping as long as we have functions to 
deeply compare types.
   
   I think as a starting point, we could assume they must align in terms of 
order (but do not necessarily be the same, as we could skip some in the 
struct), and later we can implement deep type comparisons, which will relax the 
ordering expectation (which is also forward compatible with the type-based 
check).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to