Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

via GitHub Mon, 23 Oct 2023 18:27:25 -0700


danny0405 commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1369447936



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -411,10 +414,14 @@ object ExpressionPayload {
     parseSchema(props.getProperty(PAYLOAD_RECORD_AVRO_SCHEMA))
   }
 
-  private def getWriterSchema(props: Properties): Schema = {
-    
ValidationUtils.checkArgument(props.containsKey(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key),
-      s"Missing ${HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key} property")
-    parseSchema(props.getProperty(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key))
+  private def getWriterSchema(props: Properties, isPartialUpdate: Boolean): 
Schema = {
+    if (isPartialUpdate) {
+      
parseSchema(props.getProperty(HoodieWriteConfig.WRITE_PARTIAL_UPDATE_SCHEMA.key))

Review Comment:
   Generally we may have 3 modes for fields that not updated in partial update:
   1. keep it as it is;
   2. force update it as null; (which I think should never happen in real case);
   3. overwrite with default (if the detault is defined in the schema)
   
   I think 1 is the most natural handling, but in any case, there reader should 
always use it's own reader schema for merging, not the writer schema.
   
   Another question is when to evolve the table schema, does it happends before 
or after the commit succeed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

Reply via email to