[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7787: [HUDI-5646] Guard dropping columns by a config, do not allow by default

via GitHub Sun, 29 Jan 2023 15:20:08 -0800


alexeykudinkin commented on code in PR #7787:
URL: https://github.com/apache/hudi/pull/7787#discussion_r1090073190



##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -88,7 +106,7 @@ public static boolean isCompatibleProjectionOf(Schema 
sourceSchema, Schema targe
   private static boolean isAtomicSchemasCompatible(Schema oneAtomicType, 
Schema anotherAtomicType) {
     // NOTE: Checking for compatibility of atomic types, we should ignore their
     //       corresponding fully-qualified names (as irrelevant)
-    return isSchemaCompatible(oneAtomicType, anotherAtomicType, false);
+    return isSchemaCompatible(oneAtomicType, anotherAtomicType, false, true);

Review Comment:
   B/c this check is validating whether one schema is the projection of another



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -217,31 +217,6 @@ object HoodieSparkSqlWriter {
         }
       }
 
-      // NOTE: Target writer's schema is deduced based on
-      //         - Source's schema
-      //         - Existing table's schema (including its Hudi's 
[[InternalSchema]] representation)
-      val writerSchema = deduceWriterSchema(sourceSchema, 
latestTableSchemaOpt, internalSchemaOpt, parameters)

Review Comment:
   This code has moved to avoid running it for operations like 
delete/delete_partition



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -217,31 +217,6 @@ object HoodieSparkSqlWriter {
         }
       }
 
-      // NOTE: Target writer's schema is deduced based on
-      //         - Source's schema
-      //         - Existing table's schema (including its Hudi's 
[[InternalSchema]] representation)
-      val writerSchema = deduceWriterSchema(sourceSchema, 
latestTableSchemaOpt, internalSchemaOpt, parameters)
-
-      validateSchemaForHoodieIsDeleted(writerSchema)
-
-      // NOTE: PLEASE READ CAREFULLY BEFORE CHANGING THIS
-      //       We have to register w/ Kryo all of the Avro schemas that might 
potentially be used to decode
-      //       records into Avro format. Otherwise, Kryo wouldn't be able to 
apply an optimization allowing
-      //       it to avoid the need to ser/de the whole schema along _every_ 
Avro record
-      val targetAvroSchemas = sourceSchema +: writerSchema +: 
latestTableSchemaOpt.toSeq

Review Comment:
   This code is actually a misnomer: unfortunately after Spark Session is 
started it's impossible to register Kryo schemas with it (therefore this code 
is removed to unblock writer schema handling) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7787: [HUDI-5646] Guard dropping columns by a config, do not allow by default

Reply via email to