Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
yihua commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2024077244 > > We have #7146 which also attempted to solve the same problem. Should we close #7146 and prefer this one? > > That does not solve the problem as the sorting (of the input batch) is thrown away by the hashing based mapping of the record to a specific bucket. This tries to solve the problem by implementing a new partitioner `UpsertSortPartitioner`, derived from `UpsertPartitioner`, which preserves the sorted nature of the input batch (by assigning a contiguous range of sorted input records to a single bucket/spark-partition) Then #7146 can be deprecated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2019436395 > We have #7146 which also attempted to solve the same problem. Should we close #7146 and prefer this one? That does not solve the problem as the sorting (of the input batch) is thrown away by the hashing based mapping of the record to a specific bucket. This tries to solve the problem by implementing a new partitioner `UpsertSortPartitioner`, derived from `UpsertPartitioner`, which preserves the sorted nature of the input batch (by assigning a contiguous range of sorted input records to a single bucket/spark-partition) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
yihua commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2019412865 We have #7146 which also attempted to solve the same problem. Should we close #7146 and prefer this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016810146 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * 9329d8d43e9274478e64a0d40cbe7a5a0362ec90 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23010) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016795397 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * e2296a2de6391dee42a83d390410eb71f193d55c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23004) * 9329d8d43e9274478e64a0d40cbe7a5a0362ec90 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23010) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016793663 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * e2296a2de6391dee42a83d390410eb71f193d55c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23004) * 9329d8d43e9274478e64a0d40cbe7a5a0362ec90 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016391528 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * e2296a2de6391dee42a83d390410eb71f193d55c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23004) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016360644 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * a84507191a942c5d8c98610958ca48f47188bc48 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22994) * e2296a2de6391dee42a83d390410eb71f193d55c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23004) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016357819 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * a84507191a942c5d8c98610958ca48f47188bc48 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22994) * e2296a2de6391dee42a83d390410eb71f193d55c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536561023 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation(BulkInsertSortMode.class); + public static final ConfigProperty INSERT_SORT = ConfigProperty Review Comment: Already handled by setting valid values for the config property. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536560898 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +JavaPairRDD, HoodieRecord> mappedRDD = getSortedIndexedRecords(dedupedRecords); +JavaPairRDD, HoodieRecord> partitionedRDD; +if (table.requireSortedRecords()) { + // Partition and sort within each partition as a single step. This is faster than partitioning first and then + // applying a sort. + Comparator> comparator = (Comparator> & Serializable) (t1, t2) -> { +HoodieKey key1 = t1._1(); +HoodieKey key2 = t2._1(); +return key1.getRecordKey().compareTo(key2.getRecordKey()); + }; + partitionedRDD = mappedRDD.repartitionAndSortWithinPartitions(partitioner, comparator); +} else { + // Partition only + partitionedRDD = mappedRDD.partitionBy(partitioner); +} + +return HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition, recordItr) -> { + if (WriteOperationType.isChangingRecords(operationType)) { +return handleUpsertPartition(instantTime, partition, recordItr, partitioner); + } else { +return handleInsertPartition(instantTime, partition, recordItr, partitioner); + } +}, true).flatMap(List::iterator)); + } + + private boolean operationRequiresSorting() { +return operationType == WriteOperationType.INSERT && config.getBoolean(INSERT_SORT); Review Comment: The current implementation (in this PR) does not support sorting for UPSERT operation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
vinothchandar commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536245569 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { Review Comment: yes lets rename. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
vinothchandar commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536245244 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -230,6 +236,10 @@ protected Partitioner getPartitioner(WorkloadProfile profile) { } private HoodieData mapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +if (operationRequiresSorting()) { Review Comment: upsert is updates and inserts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
vinothchandar commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536245082 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation(BulkInsertSortMode.class); + public static final ConfigProperty INSERT_SORT = ConfigProperty Review Comment: lets make sure we throw an exception for the unsupported mode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015437643 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * a84507191a942c5d8c98610958ca48f47188bc48 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22994) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015350367 ## CI report: * b802619f011c1d9ef5b334ecf67ab7df74964e08 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22958) * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * a84507191a942c5d8c98610958ca48f47188bc48 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015337228 ## CI report: * b802619f011c1d9ef5b334ecf67ab7df74964e08 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22958) * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015327684 > IIUC this adds additional shuffle and a new job? I'd like to understand how we think this impacts the current insert DAG. Yet to review the new partitioner, will do once I hear back on these. Yes, there is a sorting stage (global sort of the input batch) which might add a shuffle. New job is to assign sequentially increasing indexes for the sorted records (which the `UpsertSortPartitioner` relies on to ensure that sorted nature of the input batch is preserved while still handling small files as efficiently as possible). Not sure if this is what you meant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535756962 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { Review Comment: > lets UT this method? Done. > also rename? this is performing the actual write . sortIfNeededAndWrite ? Borrowed the name from `mapPartitionsAsRDD(...)` which also performs a write and does not make it explicit. I can rename this if it makes it easier to read -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535754607 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +JavaPairRDD, HoodieRecord> mappedRDD = getSortedIndexedRecords(dedupedRecords); +JavaPairRDD, HoodieRecord> partitionedRDD; +if (table.requireSortedRecords()) { + // Partition and sort within each partition as a single step. This is faster than partitioning first and then + // applying a sort. + Comparator> comparator = (Comparator> & Serializable) (t1, t2) -> { +HoodieKey key1 = t1._1(); +HoodieKey key2 = t2._1(); +return key1.getRecordKey().compareTo(key2.getRecordKey()); + }; + partitionedRDD = mappedRDD.repartitionAndSortWithinPartitions(partitioner, comparator); +} else { + // Partition only + partitionedRDD = mappedRDD.partitionBy(partitioner); +} + +return HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition, recordItr) -> { + if (WriteOperationType.isChangingRecords(operationType)) { +return handleUpsertPartition(instantTime, partition, recordItr, partitioner); + } else { +return handleInsertPartition(instantTime, partition, recordItr, partitioner); + } +}, true).flatMap(List::iterator)); + } + + private boolean operationRequiresSorting() { +return operationType == WriteOperationType.INSERT && config.getBoolean(INSERT_SORT); + } + + private JavaPairRDD, HoodieRecord> getSortedIndexedRecords(HoodieData> dedupedRecords) { +// Get any user specified sort columns +String customSortColField = config.getString(INSERT_USER_DEFINED_SORT_COLUMNS); + +String[] sortColumns; +if (!isNullOrEmpty(customSortColField)) { + // Extract user specified sort-column fields as an array + sortColumns = Arrays.stream(customSortColField.split(",")) + .map(String::trim).toArray(String[]::new); +} else { + // Use record-key as sort column + sortColumns = Arrays.stream(HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName().split(",")) Review Comment: > left a comment already. idk how this works for partitioned tables? I ma not sure I understand. Why will it not work for partitioned tables? > do we need the .split(,"). here That block was to placate the compiler. Reworked and removed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535751978 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +JavaPairRDD, HoodieRecord> mappedRDD = getSortedIndexedRecords(dedupedRecords); +JavaPairRDD, HoodieRecord> partitionedRDD; +if (table.requireSortedRecords()) { + // Partition and sort within each partition as a single step. This is faster than partitioning first and then + // applying a sort. + Comparator> comparator = (Comparator> & Serializable) (t1, t2) -> { +HoodieKey key1 = t1._1(); +HoodieKey key2 = t2._1(); +return key1.getRecordKey().compareTo(key2.getRecordKey()); + }; + partitionedRDD = mappedRDD.repartitionAndSortWithinPartitions(partitioner, comparator); +} else { + // Partition only + partitionedRDD = mappedRDD.partitionBy(partitioner); +} + +return HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition, recordItr) -> { + if (WriteOperationType.isChangingRecords(operationType)) { +return handleUpsertPartition(instantTime, partition, recordItr, partitioner); + } else { +return handleInsertPartition(instantTime, partition, recordItr, partitioner); + } +}, true).flatMap(List::iterator)); + } + + private boolean operationRequiresSorting() { +return operationType == WriteOperationType.INSERT && config.getBoolean(INSERT_SORT); + } + + private JavaPairRDD, HoodieRecord> getSortedIndexedRecords(HoodieData> dedupedRecords) { +// Get any user specified sort columns +String customSortColField = config.getString(INSERT_USER_DEFINED_SORT_COLUMNS); + +String[] sortColumns; +if (!isNullOrEmpty(customSortColField)) { + // Extract user specified sort-column fields as an array + sortColumns = Arrays.stream(customSortColField.split(",")) + .map(String::trim).toArray(String[]::new); +} else { + // Use record-key as sort column + sortColumns = Arrays.stream(HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName().split(",")) + .map(String::trim).toArray(String[]::new); +} + +// Get the record's schema from the write config +SerializableSchema serializableSchema = new SerializableSchema(new Schema.Parser().parse(config.getSchema())); + +JavaRDD> javaRdd = HoodieJavaRDD.getJavaRDD(dedupedRecords); +JavaRDD> sortedRecords = javaRdd.sortBy(record -> { Review Comment: My understanding is that `repartitionAndSortWithinPartitions` is to sort within a bucket (or a Spark RDD partition) after UpsertPartitioner has already partitioned the input batch. It is for handling the case of writing sorted key-values to file with file formats that depend on it (ex : HFile). I am not sure how partitioning first and then sorting within that partition will be useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535749601 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +JavaPairRDD, HoodieRecord> mappedRDD = getSortedIndexedRecords(dedupedRecords); +JavaPairRDD, HoodieRecord> partitionedRDD; +if (table.requireSortedRecords()) { + // Partition and sort within each partition as a single step. This is faster than partitioning first and then + // applying a sort. + Comparator> comparator = (Comparator> & Serializable) (t1, t2) -> { +HoodieKey key1 = t1._1(); +HoodieKey key2 = t2._1(); +return key1.getRecordKey().compareTo(key2.getRecordKey()); + }; + partitionedRDD = mappedRDD.repartitionAndSortWithinPartitions(partitioner, comparator); +} else { + // Partition only + partitionedRDD = mappedRDD.partitionBy(partitioner); +} + +return HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition, recordItr) -> { + if (WriteOperationType.isChangingRecords(operationType)) { +return handleUpsertPartition(instantTime, partition, recordItr, partitioner); + } else { +return handleInsertPartition(instantTime, partition, recordItr, partitioner); + } +}, true).flatMap(List::iterator)); + } + + private boolean operationRequiresSorting() { +return operationType == WriteOperationType.INSERT && config.getBoolean(INSERT_SORT); + } + + private JavaPairRDD, HoodieRecord> getSortedIndexedRecords(HoodieData> dedupedRecords) { +// Get any user specified sort columns +String customSortColField = config.getString(INSERT_USER_DEFINED_SORT_COLUMNS); + +String[] sortColumns; +if (!isNullOrEmpty(customSortColField)) { + // Extract user specified sort-column fields as an array + sortColumns = Arrays.stream(customSortColField.split(",")) + .map(String::trim).toArray(String[]::new); +} else { + // Use record-key as sort column + sortColumns = Arrays.stream(HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName().split(",")) + .map(String::trim).toArray(String[]::new); +} + +// Get the record's schema from the write config +SerializableSchema serializableSchema = new SerializableSchema(new Schema.Parser().parse(config.getSchema())); + +JavaRDD> javaRdd = HoodieJavaRDD.getJavaRDD(dedupedRecords); +JavaRDD> sortedRecords = javaRdd.sortBy(record -> { + if (isNullOrEmpty(customSortColField)) { +// If sorting based on record-key, extract it directly using record.getRecordKey() +return new StringBuilder() +.append(record.getPartitionPath()) +.append("+") +.append(record.getRecordKey()) +.toString(); + } else { +// Extract the sort columns from the record and return it as string (prepended with partition-path) +Object[] columnValues = record.getColumnValues(serializableSchema.get(), sortColumns, false); +String sortColString = Arrays.stream(columnValues).map(Object::toString).collect(Collectors.joining()); +return new StringBuilder() +.append(record.getPartitionPath()) +.append("+") +.append(sortColString) +.toString(); + } +}, true, 0); + +// Assign index to each record in the RDD +JavaRDD, Long>> indexedRecords = sortedRecords.zipWithIndex() Review Comment: This is required by the partitioner to assign a contiguous chunk of sorted input records to a single bucket (bucket in turn maps to a single file, hence the sorted records are written to a single file). I am not sure if there is any other way to assign sequentially increasing indexes to the sorted records - which can then be used in `UpsertSortPartitioner::getPartition(...)` to detect the bucket that this record maps to. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535706564 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -394,6 +404,12 @@ public Partitioner getUpsertPartitioner(WorkloadProfile profile) { if (profile == null) { throw new HoodieUpsertException("Need workload profile to construct the upsert partitioner."); } + +if (operationRequiresSorting()) { + // Return UpsertSortPartitioner if the input records are going to be sorted + return new UpsertSortPartitioner<>(profile, context, table, config); +} Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535704794 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -230,6 +236,10 @@ protected Partitioner getPartitioner(WorkloadProfile profile) { } private HoodieData mapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +if (operationRequiresSorting()) { Review Comment: What does sorting mean for 'upsert' operation. If the record is really being updated, wont there be a index lookup which routes the record to its specific filegroup? Or is there benefit of supporting sorting when an upsert batch contains new records that are getting written for the first time? This PR allows sorting only for INSERT operation. `BaseSparkCommitActionExecutor::operationRequiresSorting(...)` takes care of that. If the config needs to be made ambiguity-proof for future usecases, should I rename it to `WRITE_SORT_MODE`, `WRITE_SORT_OPERATIONS` and `WRITE_USER_DEFINED_PARTITIONER_SORT_COLUMNS`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535694790 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation(BulkInsertSortMode.class); + public static final ConfigProperty INSERT_SORT = ConfigProperty Review Comment: Done. IIUC, you are asking to use `BulkInsertSortMode::NONE` and `BulkInsertSortMode::GLOBAL_SORT` (instead of a boolean). FYI, there are no sort modes for `insert`. There is only global sort (i.e sort the entire input batch). Hence the valid values are NONE or GLOBAL_SORT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535695254 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation(BulkInsertSortMode.class); + public static final ConfigProperty INSERT_SORT = ConfigProperty + .key("hoodie.insert.sort") + .defaultValue(false) + .markAdvanced() + .withDocumentation("Determines whether the insert operation should sort the input records. The sorting for insert is always" + + " global (among all input records in a batch)"); + + public static final ConfigProperty INSERT_USER_DEFINED_SORT_COLUMNS = ConfigProperty + .key("hoodie.insert.user.defined.sort.columns") + .noDefaultValue() + .markAdvanced() + .withDocumentation("Columns to sort the data by when hoodie.insert.sort is set to true. If not specified, record-key is used for sorting." Review Comment: Yes, it is. It was just not explicitly mentioned here. Update the document to be more explicit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535473655 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation(BulkInsertSortMode.class); + public static final ConfigProperty INSERT_SORT = ConfigProperty + .key("hoodie.insert.sort") + .defaultValue(false) + .markAdvanced() + .withDocumentation("Determines whether the insert operation should sort the input records. The sorting for insert is always" + + " global (among all input records in a batch)"); + + public static final ConfigProperty INSERT_USER_DEFINED_SORT_COLUMNS = ConfigProperty + .key("hoodie.insert.user.defined.sort.columns") Review Comment: Bulk insert's sort column config is named `hoodie.bulkinsert.user.defined.partitioner.sort.columns`. Hence using `hoodie.insert.user.defined.partitioner.sort.columns` here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
vinothchandar commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1534279513 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation(BulkInsertSortMode.class); + public static final ConfigProperty INSERT_SORT = ConfigProperty + .key("hoodie.insert.sort") + .defaultValue(false) + .markAdvanced() + .withDocumentation("Determines whether the insert operation should sort the input records. The sorting for insert is always" + + " global (among all input records in a batch)"); + + public static final ConfigProperty INSERT_USER_DEFINED_SORT_COLUMNS = ConfigProperty + .key("hoodie.insert.user.defined.sort.columns") Review Comment: lets make sure its consistent in naming with bulk_insert 's config. ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -230,6 +236,10 @@ protected Partitioner getPartitioner(WorkloadProfile profile) { } private HoodieData mapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +if (operationRequiresSorting()) { Review Comment: so technically - this works for both insert and upsert operations? or just insert? If both, then we can't name the configs just around `insert` ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +JavaPairRDD, HoodieRecord> mappedRDD = getSortedIndexedRecords(dedupedRecords); +JavaPairRDD, HoodieRecord> partitionedRDD; +if (table.requireSortedRecords()) { + // Partition and sort within each partition as a single step. This is faster than partitioning first and then + // applying a sort. + Comparator> comparator = (Comparator> & Serializable) (t1, t2) -> { +HoodieKey key1 = t1._1(); +HoodieKey key2 = t2._1(); +return key1.getRecordKey().compareTo(key2.getRecordKey()); + }; + partitionedRDD = mappedRDD.repartitionAndSortWithinPartitions(partitioner, comparator); +} else { + // Partition only + partitionedRDD = mappedRDD.partitionBy(partitioner); +} + +return HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition, recordItr) -> { + if (WriteOperationType.isChangingRecords(operationType)) { +return handleUpsertPartition(instantTime, partition, recordItr, partitioner); + } else { +return handleInsertPartition(instantTime, partition, recordItr, partitioner); + } +}, true).flatMap(List::iterator)); + } + + private boolean operationRequiresSorting() { +return operationType == WriteOperationType.INSERT && config.getBoolean(INSERT_SORT); Review Comment: ok here, we are skipping upserts. but should this be done for upserts too? ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { Review Comment: lets UT this method? ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner getLayoutPartitioner(WorkloadProfile profile, String layoutPa protected void runPrecommitValidators(HoodieWriteMetadata> writeMetadata) { SparkValidatorUtils.runValidators(config, writeMetadata, context, table, instantTime); } + + private HoodieData sortAndMapPartitionsAsRDD(HoodieData> dedupedRecords, Partitioner partitioner) { +JavaPairRDD, HoodieRecord> mappedRDD = getSortedIndexedRecords(dedupedRecords); +JavaPairRDD, HoodieRecord> partitionedRDD; +if (table.requireSortedRecords()) { + // Partition and sort within each partition as a single step. This is faster than partitioning first and then + // applying a sort. + Comparator> comparator = (Comparator> & Serializable) (t1, t2) -> { +HoodieKey key1 = t1._1(); +HoodieKey key2
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2007159791 ## CI report: * b802619f011c1d9ef5b334ecf67ab7df74964e08 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22958) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2007022918 ## CI report: * bd71699ccef3e28be182c2cd5f8093b0cb507694 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22951) * b802619f011c1d9ef5b334ecf67ab7df74964e08 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22958) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2007009003 ## CI report: * bd71699ccef3e28be182c2cd5f8093b0cb507694 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22951) * b802619f011c1d9ef5b334ecf67ab7df74964e08 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2006227969 ## CI report: * bd71699ccef3e28be182c2cd5f8093b0cb507694 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22951) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1529876646 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java: ## @@ -90,8 +94,11 @@ public class UpsertPartitioner extends SparkHoodiePartitioner { public UpsertPartitioner(WorkloadProfile profile, HoodieEngineContext context, HoodieTable table, Review Comment: Done. The changes were minimal, hence did not add it earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2006041219 ## CI report: * 5016a9c8d9daeea9f6f28f63cc090514482571a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22941) * bd71699ccef3e28be182c2cd5f8093b0cb507694 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22951) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2006019061 ## CI report: * 5016a9c8d9daeea9f6f28f63cc090514482571a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22941) * bd71699ccef3e28be182c2cd5f8093b0cb507694 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
rmahindra123 commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1529460065 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java: ## @@ -90,8 +94,11 @@ public class UpsertPartitioner extends SparkHoodiePartitioner { public UpsertPartitioner(WorkloadProfile profile, HoodieEngineContext context, HoodieTable table, Review Comment: Should we add the implementation to a new class, may be sortedUpsertPartitioner or something, so there is a clean separation. We can use the same config to control which one gets called. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2003610845 ## CI report: * 5016a9c8d9daeea9f6f28f63cc090514482571a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22941) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2003452778 ## CI report: * f3c15a77a88d778d532dcc3fbed186441b3fa04c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22937) * 5016a9c8d9daeea9f6f28f63cc090514482571a4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22941) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2003422824 ## CI report: * f3c15a77a88d778d532dcc3fbed186441b3fa04c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22937) * 5016a9c8d9daeea9f6f28f63cc090514482571a4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2003139899 ## CI report: * f3c15a77a88d778d532dcc3fbed186441b3fa04c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22937) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2003066457 ## CI report: * f3c15a77a88d778d532dcc3fbed186441b3fa04c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22937) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2003058078 ## CI report: * f3c15a77a88d778d532dcc3fbed186441b3fa04c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org