n3nash commented on a change in pull request #2927:
URL: https://github.com/apache/hudi/pull/2927#discussion_r630724709



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -353,6 +361,91 @@ public static boolean isSchemaCompatible(String oldSchema, 
String newSchema) {
     return isSchemaCompatible(new Schema.Parser().parse(oldSchema), new 
Schema.Parser().parse(newSchema));
   }
 
+  /**
+   * Get latest schema either from incoming schema or table schema.
+   * @param incomingSchema incoming batch's schema.
+   * @param convertTableSchemaToAddNamespace {@code true} if table schema 
needs to be converted. {@code false} otherwise.
+   * @param converterFn converter function to be called over table schema. In 
DeltaSync flow, table schema needs to convert
+   * from avro -> df -> avro to add the namespace in the schema. But in spark 
writer flow, no such conversion is required.
+   * This package does not have access to some elements needed for conversion, 
hence added it as function call rather than embedding here.
+   * @return the latest schema.
+   */
+  public Schema getLatestSchema(Schema incomingSchema, boolean 
convertTableSchemaToAddNamespace,
+      Function1<Schema, Schema> converterFn) {
+    Schema latestSchema = incomingSchema;
+    try {
+      if (isTimelineNonEmpty()) {
+        Schema tableSchema = getTableAvroSchemaWithoutMetadataFields();
+        if (convertTableSchemaToAddNamespace) {
+          tableSchema = converterFn.apply(tableSchema);
+        }
+        if (incomingSchema.getFields().size() < tableSchema.getFields().size() 
&& isSchemaSubset(tableSchema, incomingSchema)) {

Review comment:
       What if a nested field has been added to the new incoming schema, in 
that case does incomingSchema.getFields() count it ? Basically, what level of 
information does .getFields() return ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to