[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2
YannByron commented on a change in pull request #4565: URL: https://github.com/apache/hudi/pull/4565#discussion_r785606065 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -729,6 +733,12 @@ object HoodieSparkSqlWriter { mergedParams(key) = value } } + +// use preCombineField to fill in PAYLOAD_ORDERING_FIELD_PROP_KEY and PAYLOAD_EVENT_TIME_FIELD_PROP_KEY +if (mergedParams.contains(PRECOMBINE_FIELD.key())) { + mergedParams.put(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY, mergedParams(PRECOMBINE_FIELD.key())) + mergedParams.put(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, mergedParams(PRECOMBINE_FIELD.key())) +} Review comment: but I see it didn't do that and I don't know how can run correctly before. btw, why `PAYLOAD_EVENT_TIME_FIELD_PROP_KEY ` is never required. After all, `DefaultHoodieRecordPayload` will use this for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2
YannByron commented on a change in pull request #4565: URL: https://github.com/apache/hudi/pull/4565#discussion_r785606065 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -729,6 +733,12 @@ object HoodieSparkSqlWriter { mergedParams(key) = value } } + +// use preCombineField to fill in PAYLOAD_ORDERING_FIELD_PROP_KEY and PAYLOAD_EVENT_TIME_FIELD_PROP_KEY +if (mergedParams.contains(PRECOMBINE_FIELD.key())) { + mergedParams.put(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY, mergedParams(PRECOMBINE_FIELD.key())) + mergedParams.put(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, mergedParams(PRECOMBINE_FIELD.key())) +} Review comment: but I see it didn't do that and I don't know how can run correctly before. btw, why `PAYLOAD_EVENT_TIME_FIELD_PROP_KEY ` is never required. After all, `DefaultHoodiePayload` will use this for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2
YannByron commented on a change in pull request #4565: URL: https://github.com/apache/hudi/pull/4565#discussion_r785605100 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala ## @@ -118,6 +123,18 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext, rows } + private def extractRequiredSchema(iter: Iterator[InternalRow]): Iterator[InternalRow] = { +val tableAvroSchema = new Schema.Parser().parse(tableState.tableAvroSchema) +val requiredAvroSchema = new Schema.Parser().parse(tableState.requiredAvroSchema) Review comment: yes, i should remove this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2
YannByron commented on a change in pull request #4565: URL: https://github.com/apache/hudi/pull/4565#discussion_r785604585 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala ## @@ -54,15 +54,20 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext, private val preCombineField = tableState.preCombineField private val recordKeyFieldOpt = tableState.recordKeyFieldOpt private val payloadProps = if (preCombineField.isDefined) { - Some(HoodiePayloadConfig.newBuilder.withPayloadOrderingField(preCombineField.get).build.getProps) +val properties = HoodiePayloadConfig.newBuilder + .withPayloadOrderingField(preCombineField.get) + .withPayloadEventTimeField(preCombineField.get) Review comment: I think not. But I also have no idea about the exception when set `returnNullIfNotFound` true and call `getNestedFieldVal`. But if `PAYLOAD_ORDERING_FIELD_PROP_KEY` is set automatically when `preCombineField` is set, which we should have done this, it works well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a change in pull request #4565: [HUDI-3215] Solve UT for Spark 3.2
YannByron commented on a change in pull request #4565: URL: https://github.com/apache/hudi/pull/4565#discussion_r785454198 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala ## @@ -54,15 +54,20 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext, private val preCombineField = tableState.preCombineField private val recordKeyFieldOpt = tableState.recordKeyFieldOpt private val payloadProps = if (preCombineField.isDefined) { - Some(HoodiePayloadConfig.newBuilder.withPayloadOrderingField(preCombineField.get).build.getProps) +val properties = HoodiePayloadConfig.newBuilder + .withPayloadOrderingField(preCombineField.get) + .withPayloadEventTimeField(preCombineField.get) Review comment: i think yes. If not set, upsert may get a wrong result if use `DefaultHoodieRecordPaylod`. And in spark3.2, if no `ordering.field` or `event.time.field` found, setting `returnNullIfNotFound` to true does not take effect when call `HoodieAvroUtils.getNestedFieldVal`, like #4169. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org