[GitHub] [hudi] ssandona commented on issue #7032: [SUPPORT] When metatable enabled, some query using index column as filter will get empty result
ssandona commented on issue #7032: URL: https://github.com/apache/hudi/issues/7032#issuecomment-1288506888 We are observing the same behavior with Hudi 0.11.1. In our case we are filtering by a string column containing a timestamp like "202001110858". We obtain different results if enabling or disabling "hoodie.enable.data.skipping". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map
danny0405 commented on code in PR #6632: URL: https://github.com/apache/hudi/pull/6632#discussion_r1002934556 ## hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java: ## @@ -202,22 +199,19 @@ public R get(Object key) { @Override public R put(T key, R value) { +if (this.currentInMemoryMapSize >= maxInMemorySizeInBytes || inMemoryMap.size() % NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0) { Review Comment: What is the purpose for estimation pre-check: `this.currentInMemoryMapSize >= maxInMemorySizeInBytes` ? And why we have this evaluate expression: ```java this.estimatedPayloadSize = (long) (this.estimatedPayloadSize * 0.9 + (keySizeEstimator.sizeEstimate(key) + valueSizeEstimator.sizeEstimate(value)) * 0.1) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map
danny0405 commented on code in PR #6632: URL: https://github.com/apache/hudi/pull/6632#discussion_r1002934556 ## hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java: ## @@ -202,22 +199,19 @@ public R get(Object key) { @Override public R put(T key, R value) { +if (this.currentInMemoryMapSize >= maxInMemorySizeInBytes || inMemoryMap.size() % NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0) { Review Comment: What is the purpose for estimation for check: `this.currentInMemoryMapSize >= maxInMemorySizeInBytes` ? And why we have this evaluate expression: ```java this.estimatedPayloadSize = (long) (this.estimatedPayloadSize * 0.9 + (keySizeEstimator.sizeEstimate(key) + valueSizeEstimator.sizeEstimate(value)) * 0.1) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5083) A bug occurs when the schema changes multiple times to a once existed column
shenshengli created HUDI-5083: - Summary: A bug occurs when the schema changes multiple times to a once existed column Key: HUDI-5083 URL: https://issues.apache.org/jira/browse/HUDI-5083 Project: Apache Hudi Issue Type: Bug Reporter: shenshengli -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map
danny0405 commented on code in PR #6632: URL: https://github.com/apache/hudi/pull/6632#discussion_r1002932637 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java: ## @@ -521,11 +522,16 @@ private void writeToBuffer(HoodieRecord record) { * Checks if the number of records have reached the set threshold and then flushes the records to disk. */ private void flushToDiskIfRequired(HoodieRecord record) { +if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize) +|| numberOfRecords % NUMBER_OF_RECORDS_TO_ESTIMATE_RECORD_SIZE == 0) { + averageRecordSize = (long) (averageRecordSize * 0.8 + sizeEstimator.sizeEstimate(record) * 0.2); +} + // Append if max number of records reached to achieve block size if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize)) { // Recompute averageRecordSize before writing a new block and update existing value with // avg of new and old - LOG.info("AvgRecordSize => " + averageRecordSize); + LOG.info("Flush log block to disk, the current avgRecordSize => " + averageRecordSize); Review Comment: What's the problem here if we only estimate the record size on flushing ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy commented on pull request #6999: [HUDI-5057] Fix msck repair hudi table
Zouxxyy commented on PR #6999: URL: https://github.com/apache/hudi/pull/6999#issuecomment-1288485908 @Zouxxyy Fixed all comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #6976: [HUDI-5042]fix clustering schedule problem in flink
danny0405 commented on code in PR #6976: URL: https://github.com/apache/hudi/pull/6976#discussion_r1002916349 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java: ## @@ -171,7 +171,7 @@ public static HoodieWriteConfig getHoodieClientConfig( .withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf)) .withClusteringConfig( HoodieClusteringConfig.newBuilder() - .withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED)) + .withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED) || conf.getBoolean(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED)) Review Comment: ```java // mainly for clustering scheduling withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED)) ``` Config only as scheduling option should work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on issue #6931: SparkSQL create hudi DDL do not support hoodie.datasource.write.operation = 'insert'
YannByron commented on issue #6931: URL: https://github.com/apache/hudi/issues/6931#issuecomment-1288475156 @nsivabalan i think we can close this. this issue to spark-sql has been explained by @Zouxxyy and @boneanxs , and @Zouxxyy provides a pr https://github.com/apache/hudi/pull/6949 to describe how can work with `hoodie.datasource.write.operation` and `hoodie.merge.allow.duplicate.on.inserts`. If still have issue to flink-sql, better to create a new issue to follow up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-5082) Improve the cdc log file name format
[ https://issues.apache.org/jira/browse/HUDI-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yann Byron reassigned HUDI-5082: Assignee: Yann Byron > Improve the cdc log file name format > > > Key: HUDI-5082 > URL: https://issues.apache.org/jira/browse/HUDI-5082 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5082) Improve the cdc log file name format
Yann Byron created HUDI-5082: Summary: Improve the cdc log file name format Key: HUDI-5082 URL: https://issues.apache.org/jira/browse/HUDI-5082 Project: Apache Hudi Issue Type: Improvement Components: core Reporter: Yann Byron -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths
hudi-bot commented on PR #7000: URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288469554 ## CI report: * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500) * f77806bcd4a38c2f4c1d44e970199d19bfc72737 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12511) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths
hudi-bot commented on PR #7000: URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288464282 ## CI report: * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500) * f77806bcd4a38c2f4c1d44e970199d19bfc72737 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6976: [HUDI-5042]fix clustering schedule problem in flink
hudi-bot commented on PR #6976: URL: https://github.com/apache/hudi/pull/6976#issuecomment-1288464150 ## CI report: * 89e792414d09daaa8a367ecc4011450cc21e7069 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12386) * c6954076cdb8818f7df54e297a9184c48c7217d0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12510) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6976: [HUDI-5042]fix clustering schedule problem in flink
hudi-bot commented on PR #6976: URL: https://github.com/apache/hudi/pull/6976#issuecomment-1288459087 ## CI report: * 89e792414d09daaa8a367ecc4011450cc21e7069 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12386) * c6954076cdb8818f7df54e297a9184c48c7217d0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7018: [HUDI-5067] Merge the columns stats of multiple log blocks from the s…
hudi-bot commented on PR #7018: URL: https://github.com/apache/hudi/pull/7018#issuecomment-1288455437 ## CI report: * 2fd1d5ab5b34cdf0b5f9042e38efccd6b8091a60 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12501) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto
hudi-bot commented on PR #6989: URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288420287 ## CI report: * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12503) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7042: [MINOR] Improve the cdc log file name format
hudi-bot commented on PR #7042: URL: https://github.com/apache/hudi/pull/7042#issuecomment-1288417562 ## CI report: * 6ad211f90e9d94467ca6888e11bc28903b79ad15 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12509) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46
hudi-bot commented on PR #7003: URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288417452 ## CI report: * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497) * 01c496501a59412c66df656a6d8801f1d2c45d6b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12508) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map
hudi-bot commented on PR #6632: URL: https://github.com/apache/hudi/pull/6632#issuecomment-1288417059 ## CI report: * d9e12ddf962b670b8ec1e2260d5389c688e16001 UNKNOWN * ba3513d5b65e39f7cbb71e851ddd34cfe9d846a0 UNKNOWN * 0836cbf5794ede5be427ef529cf7b660c2a6f4fa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12480) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12490) * 8b7f94e6743c5f2decfeddaf164585a2a471a6c6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12507) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency
hudi-bot commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1288416366 ## CI report: * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN * 0fc24d3af4181d8fb68d803b97be78e5cd448787 UNKNOWN * 298f66d2842b1fa3ce9c487fd3d0f94eda4bd2b1 UNKNOWN * a81ffdf9c24a3f2f984161ca193af3b387b1e9a1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12324) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12331) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12348) * e5c17f060235551dd9130e7bc7bbc33b294ebb18 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12506) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7042: [MINOR] Improve the cdc log file name format
hudi-bot commented on PR #7042: URL: https://github.com/apache/hudi/pull/7042#issuecomment-1288414199 ## CI report: * 6ad211f90e9d94467ca6888e11bc28903b79ad15 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46
hudi-bot commented on PR #7003: URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288414073 ## CI report: * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497) * 01c496501a59412c66df656a6d8801f1d2c45d6b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6946: [HUDI-5027] Improve getHBaseConnection Use Constants Replace HardCode.
hudi-bot commented on PR #6946: URL: https://github.com/apache/hudi/pull/6946#issuecomment-1288413958 ## CI report: * 86099181bd76a59cdd1b537eb724f6f51ed0c711 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12494) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map
hudi-bot commented on PR #6632: URL: https://github.com/apache/hudi/pull/6632#issuecomment-1288413659 ## CI report: * d9e12ddf962b670b8ec1e2260d5389c688e16001 UNKNOWN * ba3513d5b65e39f7cbb71e851ddd34cfe9d846a0 UNKNOWN * 0836cbf5794ede5be427ef529cf7b660c2a6f4fa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12480) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12490) * 8b7f94e6743c5f2decfeddaf164585a2a471a6c6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
hudi-bot commented on PR #5830: URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288413251 ## CI report: * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency
hudi-bot commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1288413070 ## CI report: * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN * 0fc24d3af4181d8fb68d803b97be78e5cd448787 UNKNOWN * 298f66d2842b1fa3ce9c487fd3d0f94eda4bd2b1 UNKNOWN * a81ffdf9c24a3f2f984161ca193af3b387b1e9a1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12324) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12331) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12348) * e5c17f060235551dd9130e7bc7bbc33b294ebb18 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7001: [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception
hudi-bot commented on PR #7001: URL: https://github.com/apache/hudi/pull/7001#issuecomment-1288410932 ## CI report: * 03cb91b295d74d1fa7daf73592e53df76c84bc85 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12485) * 67282ced98d0531a1096bcc418c0126836d0fb51 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12504) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths
hudi-bot commented on PR #7000: URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288410910 ## CI report: * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6946: [HUDI-5027] Improve getHBaseConnection Use Constants Replace HardCode.
hudi-bot commented on PR #6946: URL: https://github.com/apache/hudi/pull/6946#issuecomment-1288410810 ## CI report: * 86099181bd76a59cdd1b537eb724f6f51ed0c711 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
hudi-bot commented on PR #5830: URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288410026 ## CI report: * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
trushev commented on PR #5830: URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288409306 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
trushev commented on code in PR #5830: URL: https://github.com/apache/hudi/pull/5830#discussion_r1002875151 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/SchemaEvolutionContext.java: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.table.format; + +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.TableSchemaResolver; +import org.apache.hudi.common.util.InternalSchemaCache; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.configuration.FlinkOptions; +import org.apache.hudi.internal.schema.InternalSchema; +import org.apache.hudi.internal.schema.Types; +import org.apache.hudi.internal.schema.action.InternalSchemaMerger; +import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter; +import org.apache.hudi.table.format.mor.MergeOnReadInputSplit; +import org.apache.hudi.util.AvroSchemaConverter; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.table.types.DataType; +import org.apache.flink.table.types.logical.LogicalType; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.List; +import java.util.stream.Collectors; + +/** + * This class is responsible for calculating names and types of fields that are actual at a certain point in time. + * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time. + * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one. + */ +public final class SchemaEvolutionContext implements Serializable { + private static final long serialVersionUID = 1L; + + private final HoodieTableMetaClient metaClient; + private final InternalSchema querySchema; + + public static Option of(Configuration conf) { +if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) { + HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf); + return new TableSchemaResolver(metaClient) + .getTableInternalSchemaFromCommitMetadata() + .map(schema -> new SchemaEvolutionContext(metaClient, schema)); +} else { + return Option.empty(); +} + } + + public SchemaEvolutionContext(HoodieTableMetaClient metaClient, InternalSchema querySchema) { +this.metaClient = metaClient; +this.querySchema = querySchema; + } + + public InternalSchema getQuerySchema() { +return querySchema; + } + + public InternalSchema getActualSchema(FileInputSplit fileSplit) { +return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName())); + } + + public InternalSchema getActualSchema(MergeOnReadInputSplit split) { +String commitTime = split.getBasePath() +.map(FSUtils::getCommitTime) +.orElse(split.getLatestCommit()); +return getActualSchema(commitTime); + } + + public List getFieldNames(InternalSchema internalSchema) { +return internalSchema.columns().stream().map(Types.Field::name).collect(Collectors.toList()); + } + + public List getFieldTypes(InternalSchema internalSchema) { +return AvroSchemaConverter.convertToDataType( Review Comment: Fixed ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java: ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, e
[jira] [Updated] (HUDI-3303) CI tests Improvements
[ https://issues.apache.org/jira/browse/HUDI-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3303: - Epic Name: CI tests improvements (was: CI Improvements) > CI tests Improvements > - > > Key: HUDI-3303 > URL: https://issues.apache.org/jira/browse/HUDI-3303 > Project: Apache Hudi > Issue Type: Epic > Components: tests-ci >Reporter: Raymond Xu >Priority: Blocker > > Automate tests that need to be manually performed before releases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5081) Resources clean-up in hudi-utilities tests
[ https://issues.apache.org/jira/browse/HUDI-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-5081. Resolution: Fixed > Resources clean-up in hudi-utilities tests > -- > > Key: HUDI-5081 > URL: https://issues.apache.org/jira/browse/HUDI-5081 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci >Reporter: Raymond Xu >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated (0bc3eb8aab -> fa04e814cd)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 0bc3eb8aab [HUDI-4971] Remove direct use of kryo from `SerDeUtils` (#7014) add fa04e814cd [HUDI-5081] Tests clean up in hudi-utilities (#7033) No new revisions were added by this update. Summary of changes: .../apache/hudi/utilities/TestHoodieIndexer.java | 205 - .../deser/TestKafkaAvroSchemaDeserializer.java | 3 +- .../utilities/sources/TestGcsEventsSource.java | 13 -- 3 files changed, 79 insertions(+), 142 deletions(-)
[GitHub] [hudi] xushiyan merged pull request #7033: [HUDI-5081] Tests clean up in hudi-utilities
xushiyan merged PR #7033: URL: https://github.com/apache/hudi/pull/7033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #7033: [HUDI-5081] Tests clean up in hudi-utilities
xushiyan commented on PR #7033: URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288403021 CI failure due to unrelated flakiness. this change is only for utilities tests, which have passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5081) Resources clean-up in hudi-utilities tests
[ https://issues.apache.org/jira/browse/HUDI-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5081: - Labels: pull-request-available (was: ) > Resources clean-up in hudi-utilities tests > -- > > Key: HUDI-5081 > URL: https://issues.apache.org/jira/browse/HUDI-5081 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci >Reporter: Raymond Xu >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan commented on a diff in pull request #7033: [HUDI-5081] Tests clean up in hudi-utilities
xushiyan commented on code in PR #7033: URL: https://github.com/apache/hudi/pull/7033#discussion_r1002872498 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieIndexer.java: ## @@ -75,46 +68,29 @@ import static org.junit.jupiter.api.Assertions.assertFalse; import static org.junit.jupiter.api.Assertions.assertTrue; -public class TestHoodieIndexer extends HoodieCommonTestHarness implements SparkProvider { +public class TestHoodieIndexer extends SparkClientFunctionalTestHarness implements SparkProvider { Review Comment: /nit SparkProvider already implemented by SparkClientFunctionalTestHarness -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5081) Resources clean-up in hudi-utilities tests
[ https://issues.apache.org/jira/browse/HUDI-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5081: - Fix Version/s: 0.12.2 > Resources clean-up in hudi-utilities tests > -- > > Key: HUDI-5081 > URL: https://issues.apache.org/jira/browse/HUDI-5081 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci >Reporter: Raymond Xu >Assignee: Timothy Brown >Priority: Major > Fix For: 0.12.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5081) Resources clean-up in hudi-utilities tests
Raymond Xu created HUDI-5081: Summary: Resources clean-up in hudi-utilities tests Key: HUDI-5081 URL: https://issues.apache.org/jira/browse/HUDI-5081 Project: Apache Hudi Issue Type: Task Components: tests-ci Reporter: Raymond Xu Assignee: Timothy Brown -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3303) CI tests Improvements
[ https://issues.apache.org/jira/browse/HUDI-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3303: - Summary: CI tests Improvements (was: CI Improvements) > CI tests Improvements > - > > Key: HUDI-3303 > URL: https://issues.apache.org/jira/browse/HUDI-3303 > Project: Apache Hudi > Issue Type: Epic > Components: tests-ci >Reporter: Raymond Xu >Priority: Blocker > > Automate tests that need to be manually performed before releases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table
YannByron commented on code in PR #6999: URL: https://github.com/apache/hudi/pull/6999#discussion_r1002869098 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/RepairHoodieTableCommand.scala: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi.command + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.{FileSystem, Path, PathFilter} +import org.apache.hadoop.mapred.{FileInputFormat, JobConf} + +import org.apache.hudi.common.table.HoodieTableConfig + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.command.PartitionStatistics +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.util.{SerializableConfiguration, ThreadUtils} + +import java.util.concurrent.TimeUnit.MILLISECONDS + +import scala.util.control.NonFatal + +/** + * Command for repair hudi table's partitions. + * Use hoodieCatalogTable.getPartitionPaths() to get partitions instead of scanning the file system. + */ +case class RepairHoodieTableCommand(tableName: TableIdentifier, +enableAddPartitions: Boolean, +enableDropPartitions: Boolean, +cmd: String = "MSCK REPAIR TABLE") extends HoodieLeafRunnableCommand { + + // These are list of statistics that can be collected quickly without requiring a scan of the data + // see https://github.com/apache/hive/blob/master/ + // common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java + val NUM_FILES = "numFiles" + val TOTAL_SIZE = "totalSize" + val DDL_TIME = "transient_lastDdlTime" + + private def getPathFilter(hadoopConf: Configuration): PathFilter = { +// Dummy jobconf to get to the pathFilter defined in configuration +// It's very expensive to create a JobConf(ClassUtil.findContainingJar() is slow) +val jobConf = new JobConf(hadoopConf, this.getClass) +val pathFilter = FileInputFormat.getInputPathFilter(jobConf) +new PathFilter { + override def accept(path: Path): Boolean = { +val name = path.getName +if (name != "_SUCCESS" && name != "_temporary" && !name.startsWith(".")) { + pathFilter == null || pathFilter.accept(path) +} else { + false +} + } +} + } + + override def run(spark: SparkSession): Seq[Row] = { +val catalog = spark.sessionState.catalog +val table = catalog.getTableMetadata(tableName) +val tableIdentWithDB = table.identifier.quotedString +if (table.partitionColumnNames.isEmpty) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on partitioned tables: $tableIdentWithDB") +} + +if (table.storage.locationUri.isEmpty) { + throw new AnalysisException(s"Operation not allowed: $cmd only works on table with " + +s"location provided: $tableIdentWithDB") +} + +val root = new Path(table.location) +logInfo(s"Recover all the partitions in $root") + +val hoodieCatalogTable = HoodieCatalogTable(spark, table.identifier) +val isHiveStyledPartitioning = hoodieCatalogTable.catalogProperties. + getOrElse(HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE.key, "true").equals("true") +val partitionSpecsAndLocs: Seq[(TablePartitionSpec, Path)] = hoodieCatalogTable.getPartitionPaths.map(partitionPath => { + var values = partitionPath.split('/') + if (isHiveStyledPartitioning) { +values = values.map(_.split('=')(1)) + } + (table.partitionColumnNames.zip(values).toMap, new Path(root, partitionPath)) +}) + +val droppedAmount = if (enableDropPartitions) { + dropPartitions(catalog, partitionSpecsAndLocs) +} else 0 +val addedAmount = if (enableAddPartitions) { + val hadoopConf = spark.sessionState.newHadoopConf() + val fs = root.getFileSystem(hadoopConf) + val pathFilter = getPathFilter(hadoopConf) + val threshold = spark.sparkContext.conf
[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table
YannByron commented on code in PR #6999: URL: https://github.com/apache/hudi/pull/6999#discussion_r1002867035 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/RepairHoodieTableCommand.scala: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi.command + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.{FileSystem, Path, PathFilter} +import org.apache.hadoop.mapred.{FileInputFormat, JobConf} + +import org.apache.hudi.common.table.HoodieTableConfig + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.command.PartitionStatistics +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.util.{SerializableConfiguration, ThreadUtils} + +import java.util.concurrent.TimeUnit.MILLISECONDS + +import scala.util.control.NonFatal + +/** + * Command for repair hudi table's partitions. + * Use hoodieCatalogTable.getPartitionPaths() to get partitions instead of scanning the file system. + */ +case class RepairHoodieTableCommand(tableName: TableIdentifier, +enableAddPartitions: Boolean, +enableDropPartitions: Boolean, +cmd: String = "MSCK REPAIR TABLE") extends HoodieLeafRunnableCommand { + + // These are list of statistics that can be collected quickly without requiring a scan of the data + // see https://github.com/apache/hive/blob/master/ + // common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java + val NUM_FILES = "numFiles" + val TOTAL_SIZE = "totalSize" + val DDL_TIME = "transient_lastDdlTime" + + private def getPathFilter(hadoopConf: Configuration): PathFilter = { +// Dummy jobconf to get to the pathFilter defined in configuration +// It's very expensive to create a JobConf(ClassUtil.findContainingJar() is slow) +val jobConf = new JobConf(hadoopConf, this.getClass) +val pathFilter = FileInputFormat.getInputPathFilter(jobConf) +new PathFilter { + override def accept(path: Path): Boolean = { +val name = path.getName +if (name != "_SUCCESS" && name != "_temporary" && !name.startsWith(".")) { + pathFilter == null || pathFilter.accept(path) +} else { + false +} + } +} + } + + override def run(spark: SparkSession): Seq[Row] = { +val catalog = spark.sessionState.catalog +val table = catalog.getTableMetadata(tableName) +val tableIdentWithDB = table.identifier.quotedString +if (table.partitionColumnNames.isEmpty) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on partitioned tables: $tableIdentWithDB") +} + +if (table.storage.locationUri.isEmpty) { + throw new AnalysisException(s"Operation not allowed: $cmd only works on table with " + +s"location provided: $tableIdentWithDB") +} + +val root = new Path(table.location) +logInfo(s"Recover all the partitions in $root") + +val hoodieCatalogTable = HoodieCatalogTable(spark, table.identifier) +val isHiveStyledPartitioning = hoodieCatalogTable.catalogProperties. + getOrElse(HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE.key, "true").equals("true") Review Comment: `toBoolean` can work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #7042: [MINOR] Improve the cdc log file name format
xushiyan commented on PR #7042: URL: https://github.com/apache/hudi/pull/7042#issuecomment-1288385370 @YannByron can you file a JIRA please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
trushev commented on code in PR #5830: URL: https://github.com/apache/hudi/pull/5830#discussion_r1002865594 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java: ## @@ -99,10 +113,36 @@ public CopyOnWriteInputFormat( this.selectedFields = selectedFields; this.conf = new SerializableConfiguration(conf); this.utcTimestamp = utcTimestamp; +this.schemaEvolutionContext = SchemaEvolutionContext.of(flinkConf); } @Override public void open(FileInputSplit fileSplit) throws IOException { +String[] actualFieldNames; +DataType[] actualFieldTypes; +if (schemaEvolutionContext.isPresent()) { + SchemaEvolutionContext context = schemaEvolutionContext.get(); + InternalSchema actualSchema = context.getActualSchema(fileSplit); Review Comment: Moved this logic to separate method `setActualFields` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
trushev commented on code in PR #5830: URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864979 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java: ## @@ -135,6 +139,12 @@ */ private boolean closed = true; + private final Option schemaEvolutionContext; + private List actualFieldNames; + private List actualFieldTypes; + private InternalSchema actualSchema; + private InternalSchema querySchema; Review Comment: Fields `actualFieldNames`, `actualFieldTypes`, `actualSchema`, `querySchema` are removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table
YannByron commented on code in PR #6999: URL: https://github.com/apache/hudi/pull/6999#discussion_r1002864497 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestRepairTable.scala: ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import org.apache.hudi.DataSourceWriteOptions.{PARTITIONPATH_FIELD, PRECOMBINE_FIELD, RECORDKEY_FIELD} +import org.apache.hudi.HoodieSparkUtils +import org.apache.hudi.common.table.HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE +import org.apache.hudi.config.HoodieWriteConfig.TBL_NAME + +import org.apache.spark.sql.SaveMode + +class TestRepairTable extends HoodieSparkSqlTestBase { + + test("Test msck repair non-partitioned table") { +Seq("true", "false").foreach { hiveStylePartitionEnable => + withTempDir { tmp => +val tableName = generateTableName +val basePath = s"${tmp.getCanonicalPath}/$tableName" +spark.sql( + s""" + | create table $tableName ( + | id int, + | name string, + | ts long, + | dt string, + | hh string + | ) using hudi + | location '$basePath' + | tblproperties ( + | primaryKey = 'id', + | preCombineField = 'ts', + | hoodie.datasource.write.hive_style_partitioning = '$hiveStylePartitionEnable' + | ) +""".stripMargin) + +checkExceptionContain(s"msck repair table $tableName")( + s"Operation not allowed") + } +} + } + + test("Test msck repair partitioned table") { +Seq("true", "false").foreach { hiveStylePartitionEnable => + withTempDir { tmp => +val tableName = generateTableName +val basePath = s"${tmp.getCanonicalPath}/$tableName" +spark.sql( + s""" + | create table $tableName ( + | id int, + | name string, + | ts long, + | dt string, + | hh string + | ) using hudi + | partitioned by (dt, hh) + | location '$basePath' + | tblproperties ( + | primaryKey = 'id', + | preCombineField = 'ts', + | hoodie.datasource.write.hive_style_partitioning = '$hiveStylePartitionEnable' + | ) +""".stripMargin) +val table = spark.sessionState.sqlParser.parseTableIdentifier(tableName) + +import spark.implicits._ +val df = Seq((1, "a1", 1000, "2022-10-06", "11"), (2, "a2", 1001, "2022-10-06", "12")) + .toDF("id", "name", "ts", "dt", "hh") +df.write.format("hudi") + .option(RECORDKEY_FIELD.key, "id") + .option(PRECOMBINE_FIELD.key, "ts") + .option(PARTITIONPATH_FIELD.key, "dt, hh") + .option(HIVE_STYLE_PARTITIONING_ENABLE.key, hiveStylePartitionEnable) + .mode(SaveMode.Append) + .save(basePath) + + assertResult(Seq())(spark.sessionState.catalog.listPartitionNames(table)) +spark.sql(s"msck repair table $tableName") +spark.sql(s"msck repair table $tableName") Review Comment: why execute this sql twice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
trushev commented on code in PR #5830: URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864355 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FormatUtils.java: ## @@ -130,6 +132,7 @@ public static HoodieMergedLogRecordScanner logScanner( .withBasePath(split.getTablePath()) .withLogFilePaths(split.getLogPaths().get()) .withReaderSchema(logSchema) +.withInternalSchema(internalSchema) .withLatestInstantTime(split.getLatestCommit()) Review Comment: Fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution
trushev commented on code in PR #5830: URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864150 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java: ## @@ -135,10 +137,15 @@ public Builder withLogRecordScannerCallback(LogRecordScannerCallback callback) { return this; } +public Builder withInternalSchema(InternalSchema internalSchema) { + this.internalSchema = internalSchema; + return this; Review Comment: I reverted changes in `HoodieMergedLogRecordScanner`. Now there is only one schema -- `InternalSchema` which wraps `org.apache.avro.Schema`. The same approach is used in `HoodieUnMergedLogRecordScanner` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table
YannByron commented on code in PR #6999: URL: https://github.com/apache/hudi/pull/6999#discussion_r1002864134 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestRepairTable.scala: ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import org.apache.hudi.DataSourceWriteOptions.{PARTITIONPATH_FIELD, PRECOMBINE_FIELD, RECORDKEY_FIELD} +import org.apache.hudi.HoodieSparkUtils +import org.apache.hudi.common.table.HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE +import org.apache.hudi.config.HoodieWriteConfig.TBL_NAME + +import org.apache.spark.sql.SaveMode + +class TestRepairTable extends HoodieSparkSqlTestBase { + + test("Test msck repair non-partitioned table") { +Seq("true", "false").foreach { hiveStylePartitionEnable => + withTempDir { tmp => +val tableName = generateTableName +val basePath = s"${tmp.getCanonicalPath}/$tableName" +spark.sql( + s""" + | create table $tableName ( + | id int, + | name string, + | ts long, + | dt string, + | hh string + | ) using hudi + | location '$basePath' + | tblproperties ( + | primaryKey = 'id', + | preCombineField = 'ts', + | hoodie.datasource.write.hive_style_partitioning = '$hiveStylePartitionEnable' + | ) +""".stripMargin) + +checkExceptionContain(s"msck repair table $tableName")( + s"Operation not allowed") + } +} + } + + test("Test msck repair partitioned table") { +Seq("true", "false").foreach { hiveStylePartitionEnable => + withTempDir { tmp => +val tableName = generateTableName +val basePath = s"${tmp.getCanonicalPath}/$tableName" +spark.sql( + s""" + | create table $tableName ( + | id int, + | name string, + | ts long, + | dt string, + | hh string + | ) using hudi + | partitioned by (dt, hh) + | location '$basePath' + | tblproperties ( + | primaryKey = 'id', + | preCombineField = 'ts', + | hoodie.datasource.write.hive_style_partitioning = '$hiveStylePartitionEnable' + | ) +""".stripMargin) +val table = spark.sessionState.sqlParser.parseTableIdentifier(tableName) + +import spark.implicits._ +val df = Seq((1, "a1", 1000, "2022-10-06", "11"), (2, "a2", 1001, "2022-10-06", "12")) + .toDF("id", "name", "ts", "dt", "hh") +df.write.format("hudi") + .option(RECORDKEY_FIELD.key, "id") + .option(PRECOMBINE_FIELD.key, "ts") + .option(PARTITIONPATH_FIELD.key, "dt, hh") + .option(HIVE_STYLE_PARTITIONING_ENABLE.key, hiveStylePartitionEnable) + .mode(SaveMode.Append) + .save(basePath) + + assertResult(Seq())(spark.sessionState.catalog.listPartitionNames(table)) +spark.sql(s"msck repair table $tableName") +spark.sql(s"msck repair table $tableName") +assertResult(Seq("dt=2022-10-06/hh=11", "dt=2022-10-06/hh=12"))(spark.sessionState.catalog.listPartitionNames(table)) Review Comment: nit: code format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table
YannByron commented on code in PR #6999: URL: https://github.com/apache/hudi/pull/6999#discussion_r1002863123 ## hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/HoodieSpark2CatalystPlanUtils.scala: ## @@ -74,4 +74,15 @@ object HoodieSpark2CatalystPlanUtils extends HoodieCatalystPlansUtils { override def getRelationTimeTravel(plan: LogicalPlan): Option[(LogicalPlan, Option[Expression], Option[String])] = { throw new IllegalStateException(s"Should not call getRelationTimeTravel for spark2") } + + override def isRepairTable(plan: LogicalPlan): Boolean = { +plan.isInstanceOf[AlterTableRecoverPartitionsCommand] + } + + override def getRepairTableChildren(plan: LogicalPlan): Option[(TableIdentifier, Boolean, Boolean, String)] = { +plan match { + case c: AlterTableRecoverPartitionsCommand => +Some((c.tableName, true, false, c.cmd)) Review Comment: please explain why use `true` as the 2nd param default value, and `false` as the 3rd one in Code. ## hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/HoodieSpark31CatalystPlanUtils.scala: ## @@ -31,4 +33,15 @@ object HoodieSpark31CatalystPlanUtils extends HoodieSpark3CatalystPlanUtils { } override def projectOverSchema(schema: StructType, output: AttributeSet): ProjectionOverSchema = ProjectionOverSchema(schema) + + override def isRepairTable(plan: LogicalPlan): Boolean = { +plan.isInstanceOf[AlterTableRecoverPartitionsCommand] + } + + override def getRepairTableChildren(plan: LogicalPlan): Option[(TableIdentifier, Boolean, Boolean, String)] = { +plan match { + case c: AlterTableRecoverPartitionsCommand => +Some((c.tableName, true, false, c.cmd)) Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron opened a new pull request, #7042: [MINOR] optimize the cdc log file name
YannByron opened a new pull request, #7042: URL: https://github.com/apache/hudi/pull/7042 ### Change Logs optimize the cdc log file name ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7001: [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception
hudi-bot commented on PR #7001: URL: https://github.com/apache/hudi/pull/7001#issuecomment-1288371602 ## CI report: * 03cb91b295d74d1fa7daf73592e53df76c84bc85 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12485) * 67282ced98d0531a1096bcc418c0126836d0fb51 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liufangqi commented on pull request #7001: [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception
liufangqi commented on PR #7001: URL: https://github.com/apache/hudi/pull/7001#issuecomment-1288368502 > rebased w/ latest master. @nsivabalan THX for your remind, I have done the rebase work & squash work. Please check it again when you are free. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto
hudi-bot commented on PR #6989: URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288368335 ## CI report: * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12503) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] MihawkZoro commented on issue #7040: [SUPPORT] spark-sql schema_evolution
MihawkZoro commented on issue #7040: URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288368315 @xiarixiaoyao Thank you. When will the repaired official spark bundle jar be released? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto
xiarixiaoyao commented on PR #6989: URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288366080 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution
xiarixiaoyao commented on issue #7040: URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288365585 @MihawkZoro schema evolution for hive and presto(mor table) can be found https://github.com/apache/hudi/pull/6989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7018: [HUDI-5067] Merge the columns stats of multiple log blocks from the s…
hudi-bot commented on PR #7018: URL: https://github.com/apache/hudi/pull/7018#issuecomment-1288365113 ## CI report: * 622fb9f5639ace8e15db0a778dfcb03b4c059ca8 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12422) * 2fd1d5ab5b34cdf0b5f9042e38efccd6b8091a60 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12501) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7020: [HUDI-5069] TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky
hudi-bot commented on PR #7020: URL: https://github.com/apache/hudi/pull/7020#issuecomment-1288365144 ## CI report: * 04d5ca0a11f5f8568f2d389dc4ec60468c04b596 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12488) * 23754c9d84c66721016d846ca6a20614626baa35 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12502) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7020: [HUDI-5069] TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky
hudi-bot commented on PR #7020: URL: https://github.com/apache/hudi/pull/7020#issuecomment-1288361616 ## CI report: * 04d5ca0a11f5f8568f2d389dc4ec60468c04b596 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12488) * 23754c9d84c66721016d846ca6a20614626baa35 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7018: [HUDI-5067] Merge the columns stats of multiple log blocks from the s…
hudi-bot commented on PR #7018: URL: https://github.com/apache/hudi/pull/7018#issuecomment-1288361577 ## CI report: * 622fb9f5639ace8e15db0a778dfcb03b4c059ca8 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12422) * 2fd1d5ab5b34cdf0b5f9042e38efccd6b8091a60 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution
xiarixiaoyao commented on issue #7040: URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288359886 already fix local, let me raise a pr spark.sql("set hoodie.schema.on.read.enable=true") spark.sql("""create table ddl_test_t2 ( | col1 string, | col2 string, | col3 string, | ts bigint |) using hudi |tblproperties ( | type = 'mor', | primaryKey = 'col1', | preCombineField = 'ts' |)""".stripMargin) spark.sql("insert into ddl_test_t2 values('1','col2','col3',1),('2','col2','col3',2),('3','col2','col3',3)") spark.sql("""ALTER TABLE ddl_test_t2 DROP COLUMN col3""") spark.sql("ALTER TABLE ddl_test_t2 RENAME COLUMN col2 to col3") spark.sql("insert into ddl_test_t2 values('4','col2',4)") spark.sql("select col3 from ddl_test_t2").show(false) ++ |col3| ++ |col2| |col2| |col2| |col2| ++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] MihawkZoro commented on issue #7040: [SUPPORT] spark-sql schema_evolution
MihawkZoro commented on issue #7040: URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288357272 @xiarixiaoyao When will this bug be fixed, we are using this feature, it is urgent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution
xiarixiaoyao commented on issue #7040: URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288348142 rewriteRecordWithNewSchema deal with rename failed,it should deal with rename first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] eric9204 commented on issue #6965: [SUPPORT]Data can be found in the latest partition of hudi table, but not in the historical partition.
eric9204 commented on issue #6965: URL: https://github.com/apache/hudi/issues/6965#issuecomment-1288338928 > @eric9204 i try to reproduce this prolem, but failed, could you pls provide some dummy data @xiarixiaoyao sorry, I don't know what you mean, do you need hoodie parquet file on hdfs? or like this: ![image](https://user-images.githubusercontent.com/90449228/197438688-3f7e87b0-55fc-4571-a117-a6c048f2cfc5.png) ![image](https://user-images.githubusercontent.com/90449228/197438831-c8db835d-f88a-4712-be1d-66119779cdc7.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xicm commented on pull request #7020: [HUDI-5069] TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky
xicm commented on PR #7020: URL: https://github.com/apache/hudi/pull/7020#issuecomment-1288335284 This test fails on my laptop, succeeds on Azure. I guess my laptop is too slow, the elapsed time reaches INLINE_COMPACT_TIME_DELTA_SECONDS quickly, so the compaction is executed earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution
xiarixiaoyao commented on issue #7040: URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288334741 @MihawkZoro Thank you for your test, This is really a bug, the final write ‘insert into ddl_test_t2 values('4','col2',4);’ trigger is bug,Fix this bug as soon as possible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] KnightChess commented on a diff in pull request #6824: [HUDI-4946] fix merge into with no preCombineField has dup row by onl…
KnightChess commented on code in PR #6824: URL: https://github.com/apache/hudi/pull/6824#discussion_r1002828098 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala: ## @@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie // column order changed after left anti join , we should keep column order of source dataframe val cols = removeMetaFields(sourceDF).columns - executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), parameters) + executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), writeParam) Review Comment: @YannByron sorry, will add these days -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] KnightChess commented on a diff in pull request #6824: [HUDI-4946] fix merge into with no preCombineField has dup row by onl…
KnightChess commented on code in PR #6824: URL: https://github.com/apache/hudi/pull/6824#discussion_r1002827833 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala: ## @@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie // column order changed after left anti join , we should keep column order of source dataframe val cols = removeMetaFields(sourceDF).columns - executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), parameters) + executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), writeParam) Review Comment: @xushiyan I think `executeInsertOnly` and `executeUpsert` is different from hudi op `insert` and `upsert`, just a condition branch for `merge into` sql. And for the SQL Semantic, I think `merge into` shoudl only be used to `upsert` op, and event shoudle not follow the hudi `precombineKey`, because `merget into` sql has a lot of flexibility to update the record which we want. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto
hudi-bot commented on PR #6989: URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288312080 ## CI report: * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths
hudi-bot commented on PR #7000: URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288312127 ## CI report: * c9fe9314c82cc42ac497f47ccc8a53ae266beb55 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12462) * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1288311641 ## CI report: * a09114e1c326791e33e910b2f660aaa6882dcfc9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12493) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on issue #6965: [SUPPORT]Data can be found in the latest partition of hudi table, but not in the historical partition.
xiarixiaoyao commented on issue #6965: URL: https://github.com/apache/hudi/issues/6965#issuecomment-1288309523 @eric9204 i try to reproduce this prolem, but failed, could you pls provide some dummy data -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths
hudi-bot commented on PR #7000: URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288307439 ## CI report: * c9fe9314c82cc42ac497f47ccc8a53ae266beb55 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12462) * f0c09d506905d6e80f109b900e6e04bacffec4e6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto
hudi-bot commented on PR #6989: URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288307376 ## CI report: * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1288306886 ## CI report: * Unknown: [CANCELED](TBD) * a09114e1c326791e33e910b2f660aaa6882dcfc9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto
xiarixiaoyao commented on PR #6989: URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288306078 @hudi-bot run azur -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
YuweiXiao commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1288289575 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7033: [MINOR] test cleanup
hudi-bot commented on PR #7033: URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288232199 ## CI report: * c4dcf26eba06562edb428e668fccbc94ed48f07b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12499) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288228417 ## CI report: * acd416d779132b9fd7a7b1fe58eaaeebcf1b821f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12498) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46
hudi-bot commented on PR #7003: URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288211985 ## CI report: * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6952: [HUDI-5035] Remove usage of deprecated HoodieTimer constructor
hudi-bot commented on PR #6952: URL: https://github.com/apache/hudi/pull/6952#issuecomment-1288211938 ## CI report: * 41b6c99a662d2361e5f351079dc06c61f507f791 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12495) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-5056) Add support to DELETE_PARTITIONS w/ wild card
[ https://issues.apache.org/jira/browse/HUDI-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hussein Awala reassigned HUDI-5056: --- Assignee: Hussein Awala > Add support to DELETE_PARTITIONS w/ wild card > - > > Key: HUDI-5056 > URL: https://issues.apache.org/jira/browse/HUDI-5056 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Assignee: Hussein Awala >Priority: Major > > as of now, DELETE_PARTITIONS expected comma separated list of partitions to > delete. But would like to support wild card with that. > For eg, > year=2022/month=10/day=05/* > assuming its hour based partitioning > > Ref: https://github.com/apache/hudi/issues/6866 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7033: [MINOR] test cleanup
hudi-bot commented on PR #7033: URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288200718 ## CI report: * 94bac61f150805e4ec82f3d5bf44b55699257534 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12471) * c4dcf26eba06562edb428e668fccbc94ed48f07b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12499) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7033: [MINOR] test cleanup
hudi-bot commented on PR #7033: URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288199741 ## CI report: * 94bac61f150805e4ec82f3d5bf44b55699257534 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12471) * c4dcf26eba06562edb428e668fccbc94ed48f07b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6946: [HUDI-5027] Improve getHBaseConnection Use Constants Replace HardCode.
hudi-bot commented on PR #6946: URL: https://github.com/apache/hudi/pull/6946#issuecomment-1288198608 ## CI report: * 86099181bd76a59cdd1b537eb724f6f51ed0c711 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12494) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #7033: [MINOR] test cleanup
the-other-tim-brown commented on code in PR #7033: URL: https://github.com/apache/hudi/pull/7033#discussion_r1002767620 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieIndexer.java: ## @@ -100,6 +101,16 @@ public void init() throws IOException { initMetaClient(); } + @AfterAll + public static void cleanup() { Review Comment: Yes, I've updated the code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ft-bazookanu commented on issue #6970: [SUPPORT] Performance of Snapshot Exporter
ft-bazookanu commented on issue #6970: URL: https://github.com/apache/hudi/issues/6970#issuecomment-1288194206 Please see https://hudi.apache.org/docs/snapshot_exporter/- partitioner configs are ignored when the output format is hudi. Moreover we're using this as a backup and do not want to repartition. I feel my issue is orthogonal to partitioning: - why does performance _decrease_ on increasing memory/cores per executor? - why does performance saturate at 16 executors, although the table has far more than 16 partitions? - Most of the time is spent exporting the contents of `.hoodie/`, which appears to be happening serially (not parallel). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288185718 ## CI report: * 3300f7bdbf9d1cb178390d36523db2ec0279448c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12496) * acd416d779132b9fd7a7b1fe58eaaeebcf1b821f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12498) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288184587 ## CI report: * 3300f7bdbf9d1cb178390d36523db2ec0279448c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12496) * acd416d779132b9fd7a7b1fe58eaaeebcf1b821f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6660: [MINOR] Skip loading last completed txn for single writer
hudi-bot commented on PR #6660: URL: https://github.com/apache/hudi/pull/6660#issuecomment-1288182996 ## CI report: * 542d9421e85f8b745780102e2201982763ed8db3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12492) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6878: [HUDI-3397] Guard repeated rdd triggers
xushiyan commented on code in PR #6878: URL: https://github.com/apache/hudi/pull/6878#discussion_r1002756379 ## hudi-common/src/main/java/org/apache/hudi/common/data/HoodieListData.java: ## @@ -148,6 +148,11 @@ public HoodieData distinctWithKey(SerializableFunction keyGetter, i .values(); } + @Override + public int getNumPartitions() { +return 1; Review Comment: to revert unneeded change ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -176,7 +177,10 @@ public HoodieWriteMetadata> execute(HoodieData writeStatuses = mapPartitionsAsRDD(inputRecordsWithClusteringUpdate, partitioner); HoodieWriteMetadata> result = new HoodieWriteMetadata<>(); -updateIndexAndCommitIfNeeded(writeStatuses, result); +// dereference rdd so that no double de-referencing can happen by mistake. +int numPartitions = Math.max(1, writeStatuses.getNumPartitions()); Review Comment: don't think we need to guard it by min 1. the API getNumPartitions() should guarantee meaningful return value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46
hudi-bot commented on PR #7003: URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288172095 ## CI report: * 9671826a7dfc417a79ad00e4eb4feec09853acb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12465) * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46
hudi-bot commented on PR #7003: URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288171209 ## CI report: * 9671826a7dfc417a79ad00e4eb4feec09853acb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12465) * 77ff687b1e0e945d6658ffe47992bd85484d78b2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7021: [Minor] fix multi deser avro payload
hudi-bot commented on PR #7021: URL: https://github.com/apache/hudi/pull/7021#issuecomment-1288169004 ## CI report: * 359ee069037b3e252564ef71668d8453e1481267 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12466) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12478) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12491) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map
hudi-bot commented on PR #6632: URL: https://github.com/apache/hudi/pull/6632#issuecomment-1288168839 ## CI report: * d9e12ddf962b670b8ec1e2260d5389c688e16001 UNKNOWN * ba3513d5b65e39f7cbb71e851ddd34cfe9d846a0 UNKNOWN * 0836cbf5794ede5be427ef529cf7b660c2a6f4fa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12480) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12490) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288143002 ## CI report: * 3300f7bdbf9d1cb178390d36523db2ec0279448c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12496) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288142113 ## CI report: * 3300f7bdbf9d1cb178390d36523db2ec0279448c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org