[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2

2022-01-16 Thread GitBox


YannByron commented on a change in pull request #4565:
URL: https://github.com/apache/hudi/pull/4565#discussion_r785606065



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -729,6 +733,12 @@ object HoodieSparkSqlWriter {
 mergedParams(key) = value
   }
 }
+
+// use preCombineField to fill in PAYLOAD_ORDERING_FIELD_PROP_KEY and 
PAYLOAD_EVENT_TIME_FIELD_PROP_KEY
+if (mergedParams.contains(PRECOMBINE_FIELD.key())) {
+  mergedParams.put(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY, 
mergedParams(PRECOMBINE_FIELD.key()))
+  mergedParams.put(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, 
mergedParams(PRECOMBINE_FIELD.key()))
+}

Review comment:
   but I see it didn't do that and I don't know how can run correctly 
before.
   
   btw, why `PAYLOAD_EVENT_TIME_FIELD_PROP_KEY ` is never required. After all, 
`DefaultHoodieRecordPayload` will use this for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2

2022-01-16 Thread GitBox


YannByron commented on a change in pull request #4565:
URL: https://github.com/apache/hudi/pull/4565#discussion_r785606065



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -729,6 +733,12 @@ object HoodieSparkSqlWriter {
 mergedParams(key) = value
   }
 }
+
+// use preCombineField to fill in PAYLOAD_ORDERING_FIELD_PROP_KEY and 
PAYLOAD_EVENT_TIME_FIELD_PROP_KEY
+if (mergedParams.contains(PRECOMBINE_FIELD.key())) {
+  mergedParams.put(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY, 
mergedParams(PRECOMBINE_FIELD.key()))
+  mergedParams.put(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, 
mergedParams(PRECOMBINE_FIELD.key()))
+}

Review comment:
   but I see it didn't do that and I don't know how can run correctly 
before.
   
   btw, why `PAYLOAD_EVENT_TIME_FIELD_PROP_KEY ` is never required. After all, 
`DefaultHoodiePayload` will use this for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2

2022-01-16 Thread GitBox


YannByron commented on a change in pull request #4565:
URL: https://github.com/apache/hudi/pull/4565#discussion_r785605100



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##
@@ -118,6 +123,18 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext,
 rows
   }
 
+  private def extractRequiredSchema(iter: Iterator[InternalRow]): 
Iterator[InternalRow] = {
+val tableAvroSchema = new Schema.Parser().parse(tableState.tableAvroSchema)
+val requiredAvroSchema = new 
Schema.Parser().parse(tableState.requiredAvroSchema)

Review comment:
   yes, i should remove this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2

2022-01-16 Thread GitBox


YannByron commented on a change in pull request #4565:
URL: https://github.com/apache/hudi/pull/4565#discussion_r785604585



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##
@@ -54,15 +54,20 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext,
   private val preCombineField = tableState.preCombineField
   private val recordKeyFieldOpt = tableState.recordKeyFieldOpt
   private val payloadProps = if (preCombineField.isDefined) {
-
Some(HoodiePayloadConfig.newBuilder.withPayloadOrderingField(preCombineField.get).build.getProps)
+val properties = HoodiePayloadConfig.newBuilder
+  .withPayloadOrderingField(preCombineField.get)
+  .withPayloadEventTimeField(preCombineField.get)

Review comment:
   I think not. But I also have no idea about the exception when set 
`returnNullIfNotFound` true and call `getNestedFieldVal`. But if 
`PAYLOAD_ORDERING_FIELD_PROP_KEY` is set automatically when `preCombineField` 
is set, which we should have done this, it works well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2

2022-01-16 Thread GitBox


YannByron commented on a change in pull request #4565:
URL: https://github.com/apache/hudi/pull/4565#discussion_r785454198



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##
@@ -54,15 +54,20 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext,
   private val preCombineField = tableState.preCombineField
   private val recordKeyFieldOpt = tableState.recordKeyFieldOpt
   private val payloadProps = if (preCombineField.isDefined) {
-
Some(HoodiePayloadConfig.newBuilder.withPayloadOrderingField(preCombineField.get).build.getProps)
+val properties = HoodiePayloadConfig.newBuilder
+  .withPayloadOrderingField(preCombineField.get)
+  .withPayloadEventTimeField(preCombineField.get)

Review comment:
   i think yes.
   If not set, upsert may get a wrong result if use 
`DefaultHoodieRecordPaylod`. And in spark3.2, if no `ordering.field` or 
`event.time.field` found, setting `returnNullIfNotFound` to true does not take 
effect when call `HoodieAvroUtils.getNestedFieldVal`, like #4169.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org