[GitHub] [hudi] yyar commented on issue #7472: [SUPPORT] Too many metadata timeline file caused by old rollback active timeline
yyar commented on issue #7472: URL: https://github.com/apache/hudi/issues/7472#issuecomment-1367774930 Thanks, @yihua That's good news. I'll check it maybe next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stayrascal opened a new pull request, #7584: [HUDI-5205] support flink 1.16.0
stayrascal opened a new pull request, #7584: URL: https://github.com/apache/hudi/pull/7584 ### Change Logs - support flink 1.16.0 - Based on [PR](https://github.com/apache/hudi/pull/7397) - copy the existing `adapters` from `hudi-flink1.15.x` to `hudi-flink1.16.x` - Add new adapters `StreamWriteOperatorCoordinatorAdapter` & `SortOperatorGenAdapter` in each flink module ### Impact Low ### Risk level (write none, low medium or high below) Low ### Documentation Update the official documents need to be updated ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #7562: [SUPPORT] How to Fire Async Compaction on Pyspark
yihua commented on issue #7562: URL: https://github.com/apache/hudi/issues/7562#issuecomment-1367767235 @soumilshah1995 Thanks for raising the question. `HoodieCompactor` is a Java class and the command-line arguments for spark-submit are parsed using JCommander, which is not compatible with PySpark. One way to get around this is to call Java class in Python like [this](https://stackoverflow.com/questions/33544105/running-custom-java-class-in-pyspark), but then you have to construct `HoodieCompactor.Config` yourself to pass in relevant args. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #7536: [HUDI-5455] Add commons-configuration2 in hudi cli bundle
yihua commented on code in PR #7536: URL: https://github.com/apache/hudi/pull/7536#discussion_r1059263208 ## packaging/hudi-cli-bundle/pom.xml: ## @@ -239,5 +241,11 @@ httpclient ${http.version} + + org.apache.commons + commons-configuration2 Review Comment: This is missing based on @rahil-c 's testing. Without this, hudi-cli-bundle will fail with Spark 3.2 + Hadoop 3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode
hudi-bot commented on PR #4966: URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367755118 ## CI report: * 24ea27ad2bc29400d8e5271f8f683662d0e0a93b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14048) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367754215 ## CI report: * 424e4a25b477f8aab3f8b4e5590023d20cca98f3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14049) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-5420) Fix metadata table validator to exclude uncommitted log files in successful deltacommits
[ https://issues.apache.org/jira/browse/HUDI-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-5420. --- Resolution: Fixed > Fix metadata table validator to exclude uncommitted log files in successful > deltacommits > > > Key: HUDI-5420 > URL: https://issues.apache.org/jira/browse/HUDI-5420 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > When a write transaction writes uncommitted log files in a delta commit, > e.g., due to Spark task retries, these log files stay in the file system > after the successful delta commit for some time (unlike uncommitted base > files which are deleted based on the markers). The delta commit metadata > does not contain these log files, and the metadata table does not contain > these entries either. Currently, the metadata table validator does not > consider such valid case for discrepancy and thus throws errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5434) Fix archival in MDT to not rely on rollbacks/clean in DT
[ https://issues.apache.org/jira/browse/HUDI-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5434: Status: Patch Available (was: In Progress) > Fix archival in MDT to not rely on rollbacks/clean in DT > > > Key: HUDI-5434 > URL: https://issues.apache.org/jira/browse/HUDI-5434 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > as of now, archival in MDT is guarded until first entry in DT's active > timeline. but DT could contain rollback that could date back few days or even > weeks. So, we need to fix that to check for first write action in DT (commit, > delta commit, replace commit) and then guard MDT archival based on that. > > Impact: > could result in huge no of entries in active timeline in MDT. might hamper > perf or throttling in cloud stores. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] boneanxs commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
boneanxs commented on PR #7571: URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367749797 The test failure is caused by https://github.com/apache/hudi/pull/7582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope opened a new issue, #7583: [SUPPORT] Unable to query Partitioned COW Hudi tables with metadata enabled using Trino-Hudi Connector
codope opened a new issue, #7583: URL: https://github.com/apache/hudi/issues/7583 **Describe the problem you faced** Original issue: https://github.com/trinodb/trino/issues/15368 > Our team is testing the same on COPY ON WRITE HUDI (0.10.1) tables with metadata enabled at version using Trino 400. And we are facing the error while reading from partitioned tables. > `Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex`. The issue was resolved by placing some dependencies in the classpath. Interestingly, those dependencies are [already included in the trino-hudi-bundle](https://github.com/apache/hudi/blob/release-0.12.1/packaging/hudi-trino-bundle/pom.xml#L69-L98). This particular issues tracks any gap in packaging. **To Reproduce** Steps to reproduce the behavior: 1. Write a Hudi COW table with the below properties and metadata enabled. 2. Query the same table using the trino-hudi connector (properties mentioned below) with `hudi.metadata-enabled=true`. **Trino Hudi Connector Properties:** ``` connector.name=hudi hive.metastore.uri={METASTORE_URI} hive.s3.iam-role={S3_IAM_ROLE} hive.metastore-refresh-interval=2m hive.metastore-timeout=3m hudi.max-outstanding-splits=1800 hive.s3.max-error-retries=50 hive.s3.connect-timeout=1m hive.s3.socket-timeout=2m hudi.parquet.use-column-names=true hudi.metadata-enabled=true ``` **Hudi Properties set while writing:** ``` hoodie.datasource.write.partitionpath.field = "insert_ds_ist", hoodie.datasource.write.recordkey.field = "id", hoodie.datasource.write.precombine.field = "_hoodie_incremental_key", (self generated column), hoodie.datasource.write.hive_style_partitioning = "true", hoodie.datasource.hive_sync.auto_create_database = "true", hoodie.parquet.compression.codec = "gzip", hoodie.table.name = "", hoodie.datasource.write.keygenerator.class = "org.apache.hudi.keygen.SimpleKeyGenerator", hoodie.datasource.write.table.type = "COPY_ON_WRITE", hoodie.metadata.enable = "true", hoodie.datasource.hive_sync.enable = "true", hoodie.datasource.hive_sync.partition_fields = "insert_ds_ist", hoodie.datasource.hive_sync.partition_extractor_class = "org.apache.hudi.hive.MultiPartKeysValueExtractor" ``` **General information of table:** Total rows = 1,213,959,199 Total Partitions = 2400+ Total file objects = 120,000 Total Size on S3 = 12~13 GB The table was upgraded from 0.9.0 to 0.10.1 **Coordinator Relevant Logs:** **Expected behavior** They query should work out-of-the-box without having to place jars in classpath. **Environment Description** * Hudi version : 0.10.1 * Spark version : 2.4 * Trino version : [400](https://github.com/trinodb/trino/tree/400) * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** Full stacktrace in [Partitioned_COW_Hudi_Coordinator_logs.log](https://github.com/apache/hudi/files/10323254/Partitioned_COW_Hudi_Coordinator_logs.log) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table
hudi-bot commented on PR #7580: URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367736903 ## CI report: * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045) * 363e7ec434dfac617a963387e65ffa1aa4b8308b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14050) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table
hudi-bot commented on PR #7580: URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367735720 ## CI report: * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045) * 363e7ec434dfac617a963387e65ffa1aa4b8308b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
hudi-bot commented on PR #7571: URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367734589 ## CI report: * 4de5c804a29ff11796ccae4308cbb2ce86def8e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14026) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14047) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367734470 ## CI report: * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14046) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367714336 ## CI report: * 424e4a25b477f8aab3f8b4e5590023d20cca98f3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14049) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367712487 ## CI report: * 424e4a25b477f8aab3f8b4e5590023d20cca98f3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5488) Make sure Discrupt queue start first, then insert records
[ https://issues.apache.org/jira/browse/HUDI-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui An updated HUDI-5488: - Description: We must to make sure to set up Disruptor's queue first, then producer can insert records to the queue. But currently we have no idea which thread start first, so this pr tries to fix it. CompletableFuture consuming = startConsumingAsync(); CompletableFuture producing = startProducingAsync(); Also, I think the test TestDisruptorExecutionInSpark#testExecutor and TestDisruptorMessageQueue#testRecordReading failures relate to this bug. > Make sure Discrupt queue start first, then insert records > - > > Key: HUDI-5488 > URL: https://issues.apache.org/jira/browse/HUDI-5488 > Project: Apache Hudi > Issue Type: Bug > Components: core >Reporter: Hui An >Priority: Major > Labels: pull-request-available > > We must to make sure to set up Disruptor's queue first, then producer can > insert records to the queue. But currently we have no idea which thread start > first, so this pr tries to fix it. > CompletableFuture consuming = startConsumingAsync(); > CompletableFuture producing = startProducingAsync(); > Also, I think the test TestDisruptorExecutionInSpark#testExecutor and > TestDisruptorMessageQueue#testRecordReading failures relate to this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] boneanxs commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records
boneanxs commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367711026 @alexeykudinkin @zhangyue19921010 Could you please help to take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5488) Make sure Discrupt queue start first, then insert records
[ https://issues.apache.org/jira/browse/HUDI-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5488: - Labels: pull-request-available (was: ) > Make sure Discrupt queue start first, then insert records > - > > Key: HUDI-5488 > URL: https://issues.apache.org/jira/browse/HUDI-5488 > Project: Apache Hudi > Issue Type: Bug > Components: core >Reporter: Hui An >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] boneanxs opened a new pull request, #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records
boneanxs opened a new pull request, #7582: URL: https://github.com/apache/hudi/pull/7582 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ We must to make sure to set up Disruptor's queue first, then producer can insert records to the queue. But currently we have no idea which thread start first, so this pr tries to fix it. ```java CompletableFuture consuming = startConsumingAsync(); CompletableFuture producing = startProducingAsync(); ``` Also, I think the test `TestDisruptorExecutionInSpark#testExecutor` and `TestDisruptorMessageQueue#testRecordReading` failures relate to this bug. https://user-images.githubusercontent.com/10115332/210033047-7b3573ec-c43b-44b3-a898-c4269b6bfd14.png;> ### Impact _Describe any public API or user-facing feature change or any performance impact._ none ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client
hudi-bot commented on PR #7561: URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367708973 ## CI report: * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5488) Make sure Discrupt queue start first, then insert records
Hui An created HUDI-5488: Summary: Make sure Discrupt queue start first, then insert records Key: HUDI-5488 URL: https://issues.apache.org/jira/browse/HUDI-5488 Project: Apache Hudi Issue Type: Bug Components: core Reporter: Hui An -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 commented on a diff in pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client
danny0405 commented on code in PR #7561: URL: https://github.com/apache/hudi/pull/7561#discussion_r1059228760 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java: ## @@ -210,11 +210,30 @@ public static HoodieDefaultTimeline getTimeline(HoodieTableMetaClient metaClient return activeTimeline; } + /** + * Returns a Hudi timeline with commits after the given instant time (exclusive). + * + * @param metaClient{@link HoodieTableMetaClient} instance. + * @param exclusiveStartInstantTime Start instant time (exclusive). + * @return Hudi timeline. + */ + public static HoodieTimeline getCommitsTimelineAfter( + HoodieTableMetaClient metaClient, String exclusiveStartInstantTime) { +HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline(); +HoodieDefaultTimeline timeline = +activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime) +? metaClient.getArchivedTimeline(exclusiveStartInstantTime) +.mergeTimeline(activeTimeline) +: activeTimeline; +return timeline.getCommitsTimeline() +.findInstantsAfter(exclusiveStartInstantTime, Integer.MAX_VALUE); + } Review Comment: I mean if `activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime)` is true, the whole merged timeline should be scanned, there is no need to calling `#findInstantsAfter`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
danny0405 commented on code in PR #7571: URL: https://github.com/apache/hudi/pull/7571#discussion_r1059227583 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/bloom/TestKeyRangeLookupTree.java: ## @@ -68,7 +68,7 @@ public void testFileGroupLookUpManyEntriesWithSameStartValue() { updateExpectedMatchesToTest(toInsert); keyRangeLookupTree.insert(toInsert); for (int i = 0; i < 10; i++) { - endKey += 1 + RANDOM.nextInt(100); + endKey += 1 + RANDOM.nextInt(50); toInsert = new KeyRangeNode(startKey, Long.toString(endKey), UUID.randomUUID().toString()); updateExpectedMatchesToTest(toInsert); Review Comment: Yeah, the fix works, it is better if we can fix the record key comparing with Long instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode
hudi-bot commented on PR #4966: URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367694657 ## CI report: * 24ea27ad2bc29400d8e5271f8f683662d0e0a93b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14048) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode
hudi-bot commented on PR #4966: URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367693592 ## CI report: * 24ea27ad2bc29400d8e5271f8f683662d0e0a93b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
hudi-bot commented on PR #7571: URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367691314 ## CI report: * 4de5c804a29ff11796ccae4308cbb2ce86def8e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14026) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14047) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client
hudi-bot commented on PR #7561: URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367691272 ## CI report: * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs commented on a diff in pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
boneanxs commented on code in PR #7571: URL: https://github.com/apache/hudi/pull/7571#discussion_r1059217632 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/bloom/TestKeyRangeLookupTree.java: ## @@ -68,7 +68,7 @@ public void testFileGroupLookUpManyEntriesWithSameStartValue() { updateExpectedMatchesToTest(toInsert); keyRangeLookupTree.insert(toInsert); for (int i = 0; i < 10; i++) { - endKey += 1 + RANDOM.nextInt(100); + endKey += 1 + RANDOM.nextInt(50); toInsert = new KeyRangeNode(startKey, Long.toString(endKey), UUID.randomUUID().toString()); updateExpectedMatchesToTest(toInsert); Review Comment: As `KeyRangeNode` stores recordValue, which is always string value, `KeyRangeNode` doesn't need to compare with other type. I think the test purpose here wants to use Long's order to represent string's order to test `KeyRangeNode` function, so it can work if we force the `endKey` not exceed 1000. Before the fix, the endKey's maxValue could be 100 * 10 + 250 = 1250, which can exceed 1000. As I forcily set `RANDOM` cannot get value exceed than 50 for each iteration, and the max iteration number is 10, so the endKey cannot exceed 50 * 10 + 250(which is 750), smaller than 1000, so in this range, Long's order is same as the string's order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
boneanxs commented on PR #7571: URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367689666 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client
hudi-bot commented on PR #7561: URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367689402 ## CI report: * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367689266 ## CI report: * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14046) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs commented on a diff in pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
boneanxs commented on code in PR #7571: URL: https://github.com/apache/hudi/pull/7571#discussion_r1059217632 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/bloom/TestKeyRangeLookupTree.java: ## @@ -68,7 +68,7 @@ public void testFileGroupLookUpManyEntriesWithSameStartValue() { updateExpectedMatchesToTest(toInsert); keyRangeLookupTree.insert(toInsert); for (int i = 0; i < 10; i++) { - endKey += 1 + RANDOM.nextInt(100); + endKey += 1 + RANDOM.nextInt(50); toInsert = new KeyRangeNode(startKey, Long.toString(endKey), UUID.randomUUID().toString()); updateExpectedMatchesToTest(toInsert); Review Comment: As `KeyRangeNode` stores recordValue, which is always string value, `KeyRangeNode` doesn't need to compare with other type. I think the test purpose here wants to use Long's order to represent string's order to test `KeyRangeNode` function, so it can work if we force the `endKey` not exceed 1000. Before the fix, the endKey's maxValue could be 101 * 10 + 250 = 1260, which can exceed 1000. As I forcily set `RANDOM` cannot get value exceed than 50 for each iteration, and the max iteration number is 10, so the endKey cannot exceed 51 * 10 + 250(which is 760), smaller than 1000, so in this range, Long's order is same as the string's order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field
xicm commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367685026 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
SteNicholas commented on PR #7568: URL: https://github.com/apache/hudi/pull/7568#issuecomment-1367682208 @yihua, could you please review this pull request? @leesf has approved this changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table
hudi-bot commented on PR #7580: URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367669253 ## CI report: * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client
hudi-bot commented on PR #7561: URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367638148 ## CI report: * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua merged pull request #7581: [MINOR][BLOG] - 2022 Blog post
yihua merged PR #7581: URL: https://github.com/apache/hudi/pull/7581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [DOCS][BLOG] 2022 Blog post (#7581)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 4880154bb11 [DOCS][BLOG] 2022 Blog post (#7581) 4880154bb11 is described below commit 4880154bb1152353acbcc51b6390176e6d1e926b Author: Kyle Weller AuthorDate: Thu Dec 29 15:45:29 2022 -0700 [DOCS][BLOG] 2022 Blog post (#7581) --- ...2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md | 89 + .../assets/images/blog/Apache-Hudi-2022-Review.png | Bin 0 -> 664778 bytes .../assets/images/blog/Apache-Hudi-Conferences.png | Bin 0 -> 6480488 bytes .../blog/Apache-Hudi-Pull-Request-History.png | Bin 0 -> 296199 bytes 4 files changed, 89 insertions(+) diff --git a/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md new file mode 100644 index 000..82246324766 --- /dev/null +++ b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md @@ -0,0 +1,89 @@ +--- +title: "Apache Hudi 2022 - A year in Review" +excerpt: "2022 was the best year for Apache Hudi yet! Huge thank you to everyone who contributed!" +author: Sivabalan Narayanan +category: blog +image: /assets/images/blog/Apache-Hudi-2022-Review.png +tags: +- apache hudi +--- + + + +## Apache Hudi Momentum +As we wrap up 2022 I want to take the opportunity to reflect on and highlight the incredible progress of the Apache Hudi +project and most importantly, the community. First and foremost, I want to thank all of the contributors who have made +2022 the best year for the project ever. There were [over 2,200 PRs](https://ossinsight.io/analyze/apache/hudi#pull-requests) +created (+38% YoY) and over 600+ users engaged on [Github](https://github.com/apache/hudi/). The Apache Hudi community +[slack channel](https://join.slack.com/t/apache-hudi/shared_invite/zt-1e94d3xro-JvlNO1kSeIHJBTVfLPlI5w) has grown to more +than 2,600 users (+100% YoY growth) averaging nearly 200 messages per month! The most impressive stat is that with this +volume growth, the median response time to questions is ~3h. [Come join the community](https://join.slack.com/t/apache-hudi/shared_invite/zt-1e94d3xro-JvlNO1kSeIHJBTVfLPlI5w) +where people are sharing and helping each other! + + + +## Key Releases in 2022 +2022 has been a year jam packed with exciting new features for Apache Hudi across 0.11.0 and 0.12.0 releases. In addition to new features, vendor/ecosystem partnerships and relationships have been strengthened across many in the community. [AWS continues to double down](https://www.onehouse.ai/blog/apache-hudi-native-aws-integrations) on Apache Hudi, upgrading versions in [EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi.html), [Athena](https://docs.aws.amazon.com/athena [...] + +While there are too many features added in 2022 to list them all, take a look at some of the exciting highlights: + +- [Multi-Modal Index](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi) is a first-of-its-kind high-performance indexing subsystem for the Lakehouse. It improves metadata lookup performance by up to 100x and reduces overall query latency by up to 30x. Two new indices were added to the metadata table - Bloom filter index that enables faster upsert performance and[ column stats index along with Data skipping](https://hudi.apache.org/bl [...] +- Hudi added support for [asynchronous indexing](https://hudi.apache.org/releases/release-0.11.0/#async-indexer) to assist building such indices without blocking ingestion so that regular writers don't need to scale up resources for such one off spikes. +- A new type of index called Bucket Index was introduced this year. This could be game changing for deterministic workloads with partitioned datasets. It is very light-weight and allows the distribution of records to buckets using a hash function. +- Filesystem based Lock Provider - This implementation avoids the need of external systems and leverages the abilities of underlying filesystem to support lock provider needed for optimistic concurrency control in case of multiple writers. Please check the [lock configuration](https://hudi.apache.org/docs/configurations#Locks-Configurations) for details. +- Deltastreamer Graceful Completion - Users can now configure a post-write completion strategy with deltastreamer continuous mode for graceful shutdown. +- Schema on read is supported as an experimental feature since 0.11.0, allowing users to leverage Spark SQL DDL support for [evolving data schema](https://hudi.apache.org/docs/schema_evolution) needs(drop, rename etc). Added support for a lot of [CALL commands](https://hudi.apache.org/docs/procedures/) to invoke an array of actions on Hudi tables. +- It is now feasible to
[GitHub] [hudi] kywe665 commented on pull request #7581: [MINOR][BLOG] - 2022 Blog post
kywe665 commented on PR #7581: URL: https://github.com/apache/hudi/pull/7581#issuecomment-1367617115 preview https://user-images.githubusercontent.com/1703248/210017096-a1fbd3c0-07ee-43a3-a794-eee6f555ee05.png;> https://user-images.githubusercontent.com/1703248/210017128-bf5c643f-b5b5-44f4-8e24-fed0c684cfde.png;> https://user-images.githubusercontent.com/1703248/210017182-da84774b-83b8-41c3-9904-3dc46099dede.png;> https://user-images.githubusercontent.com/1703248/210017165-6937efcd-4ab4-4544-8a15-2ddaf04ffd13.png;> https://user-images.githubusercontent.com/1703248/210017149-358a26be-5f05-4f45-b19b-3e923122d104.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table
hudi-bot commented on PR #7580: URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367613254 ## CI report: * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table
hudi-bot commented on PR #7580: URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367611336 ## CI report: * df101606342f8b91be6cc232d99d7009c4577ed9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kywe665 opened a new pull request, #7581: [MINOR][BLOG] - 2022 Blog post
kywe665 opened a new pull request, #7581: URL: https://github.com/apache/hudi/pull/7581 ### Change Logs added a blog post and images to docs site ### Impact no impact ### Risk level (write none, low medium or high below) none, docs only ### Documentation Update n/a ### Contributor's checklist - [X] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [X] Change Logs and Impact were stated clearly - [X] Adequate tests were added if applicable - [X] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #7472: [SUPPORT] Too many metadata timeline file caused by old rollback active timeline
yihua commented on issue #7472: URL: https://github.com/apache/hudi/issues/7472#issuecomment-1367591309 Hi @yyar I've put up the fix #7580 and verified locally that it works. Could you try it and see if it solves your problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5434) Fix archival in MDT to not rely on rollbacks/clean in DT
[ https://issues.apache.org/jira/browse/HUDI-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5434: - Labels: pull-request-available (was: ) > Fix archival in MDT to not rely on rollbacks/clean in DT > > > Key: HUDI-5434 > URL: https://issues.apache.org/jira/browse/HUDI-5434 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > as of now, archival in MDT is guarded until first entry in DT's active > timeline. but DT could contain rollback that could date back few days or even > weeks. So, we need to fix that to check for first write action in DT (commit, > delta commit, replace commit) and then guard MDT archival based on that. > > Impact: > could result in huge no of entries in active timeline in MDT. might hamper > perf or throttling in cloud stores. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua opened a new pull request, #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table
yihua opened a new pull request, #7580: URL: https://github.com/apache/hudi/pull/7580 ### Change Logs Before this PR, the archival for the metadata table uses the earliest instant of all actions from the active timeline of the data table. In the archival process, CLEAN and ROLLBACK instants are archived separately apart from commits (check HoodieTimelineArchiver#getCleanInstantsToArchive). Because of this, a very old completed CLEAN or ROLLBACK instant in the data table can block the archive of the metadata table timeline and causes the active timeline of the metadata table to be extremely long, leading to performance issues for loading the timeline. This PR changes the archival in metadata table to not rely on completed rollback or clean in data table, by archiving the metadata table's instants after the earliest commit (COMMIT, DELTA_COMMIT, and REPLACE_COMMIT only) and the earliest inflight instant (all actions) in the data table's active timeline. The savepoints are seamlessly handled here, i.e., the completed savepoints do not affect the archive process in the metadata table. ### Impact Makes the active timeline of the metadata table shorter and improves the performance of loading the active timeline of the metadata table. ### Risk level low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (495b6fbb062 -> fb28ad8f737)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 495b6fbb062 [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive (#7385) add fb28ad8f737 [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry (#7517) No new revisions were added by this update. Summary of changes: .../utilities/HoodieMetadataTableValidator.java| 109 + 1 file changed, 91 insertions(+), 18 deletions(-)
[GitHub] [hudi] yihua merged pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry
yihua merged PR #7517: URL: https://github.com/apache/hudi/pull/7517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client
hudi-bot commented on PR #7561: URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367579907 ## CI report: * df28b5141ea2b920a55149668c12ebda1416194a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13979) * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry
hudi-bot commented on PR #7517: URL: https://github.com/apache/hudi/pull/7517#issuecomment-1367579852 ## CI report: * eaa6d00e2952cd6b1dc6d67d9d06df99eb882b98 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14023) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client
hudi-bot commented on PR #7561: URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367577262 ## CI report: * df28b5141ea2b920a55149668c12ebda1416194a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13979) * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table
hudi-bot commented on PR #7448: URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367577123 ## CI report: * 9314399c3b40e65689ffeeade5be40ed289563f0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14042) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client
[ https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5477: Status: Patch Available (was: In Progress) > Optimize timeline loading in Hudi sync client > - > > Key: HUDI-5477 > URL: https://issues.apache.org/jira/browse/HUDI-5477 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving, meta-sync >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The Hudi archived timeline is always loaded during the metastore sync process > if the last sync time is given. Besides, the archived timeline is not cached > inside the meta client if the start instant time is given. These cause > performance issues and read timeout on cloud storage due to rate limiting on > requests because of loading archived timeline from the storage, when the > archived timeline is huge, e.g., hundreds of log files in > {{.hoodie/archived}} folder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client
[ https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5477: Status: In Progress (was: Open) > Optimize timeline loading in Hudi sync client > - > > Key: HUDI-5477 > URL: https://issues.apache.org/jira/browse/HUDI-5477 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving, meta-sync >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The Hudi archived timeline is always loaded during the metastore sync process > if the last sync time is given. Besides, the archived timeline is not cached > inside the meta client if the start instant time is given. These cause > performance issues and read timeout on cloud storage due to rate limiting on > requests because of loading archived timeline from the storage, when the > archived timeline is huge, e.g., hundreds of log files in > {{.hoodie/archived}} folder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client
[ https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5477: Story Points: 2 > Optimize timeline loading in Hudi sync client > - > > Key: HUDI-5477 > URL: https://issues.apache.org/jira/browse/HUDI-5477 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving, meta-sync >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The Hudi archived timeline is always loaded during the metastore sync process > if the last sync time is given. Besides, the archived timeline is not cached > inside the meta client if the start instant time is given. These cause > performance issues and read timeout on cloud storage due to rate limiting on > requests because of loading archived timeline from the storage, when the > archived timeline is huge, e.g., hundreds of log files in > {{.hoodie/archived}} folder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client
[ https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5477: Reviewers: Danny Chen > Optimize timeline loading in Hudi sync client > - > > Key: HUDI-5477 > URL: https://issues.apache.org/jira/browse/HUDI-5477 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving, meta-sync >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The Hudi archived timeline is always loaded during the metastore sync process > if the last sync time is given. Besides, the archived timeline is not cached > inside the meta client if the start instant time is given. These cause > performance issues and read timeout on cloud storage due to rate limiting on > requests because of loading archived timeline from the storage, when the > archived timeline is huge, e.g., hundreds of log files in > {{.hoodie/archived}} folder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client
[ https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5477: Sprint: 0.13.0 Final Sprint > Optimize timeline loading in Hudi sync client > - > > Key: HUDI-5477 > URL: https://issues.apache.org/jira/browse/HUDI-5477 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving, meta-sync >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The Hudi archived timeline is always loaded during the metastore sync process > if the last sync time is given. Besides, the archived timeline is not cached > inside the meta client if the start instant time is given. These cause > performance issues and read timeout on cloud storage due to rate limiting on > requests because of loading archived timeline from the storage, when the > archived timeline is huge, e.g., hundreds of log files in > {{.hoodie/archived}} folder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5486) Update 0.12.x release notes with Long Term Support
[ https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-5486. --- Resolution: Fixed > Update 0.12.x release notes with Long Term Support > --- > > Key: HUDI-5486 > URL: https://issues.apache.org/jira/browse/HUDI-5486 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on a diff in pull request #7561: [HUDI-5477][DO NOT MERGE] Optimize timeline loading in Hudi sync client
yihua commented on code in PR #7561: URL: https://github.com/apache/hudi/pull/7561#discussion_r1059134988 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() { } /** - * Returns fresh new archived commits as a timeline from startTs (inclusive). - * - * This is costly operation if really early endTs is specified. - * Be caution to use this only when the time range is short. - * - * This method is not thread safe. + * Returns the cached archived timeline from startTs (inclusive). * - * @return Archived commit timeline + * @param startTs The start instant time (inclusive) of the archived timeline. + * @return the archived timeline. */ public HoodieArchivedTimeline getArchivedTimeline(String startTs) { -return new HoodieArchivedTimeline(this, startTs); +return getArchivedTimeline(startTs, true); + } + + /** + * Returns the cached archived timeline if using in-memory cache or a fresh new archived + * timeline if not using cache, from startTs (inclusive). + * + * Instantiating an archived timeline is costly operation if really early startTs is + * specified. + * + * This method is not thread safe. + * + * @param startTs The start instant time (inclusive) of the archived timeline. + * @param useCache Whether to use in-memory cache. + * @return the archived timeline based on the arguments. + */ + public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean useCache) { +if (useCache) { + return archivedTimelineMap.computeIfAbsent(startTs, this::instantiateArchivedTimeline); Review Comment: This is fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry
hudi-bot commented on PR #7517: URL: https://github.com/apache/hudi/pull/7517#issuecomment-1367536674 ## CI report: * eaa6d00e2952cd6b1dc6d67d9d06df99eb882b98 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14023) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry
hudi-bot commented on PR #7517: URL: https://github.com/apache/hudi/pull/7517#issuecomment-1367534143 ## CI report: * eaa6d00e2952cd6b1dc6d67d9d06df99eb882b98 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7371: [HUDI-3673] Clean up hbase shading dependencies
hudi-bot commented on PR #7371: URL: https://github.com/apache/hudi/pull/7371#issuecomment-1367533938 ## CI report: * f3d658be1ab30458c286ace26ec67b4715e188fe Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14040) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #7561: [HUDI-5477][DO NOT MERGE] Optimize timeline loading in Hudi sync client
yihua commented on code in PR #7561: URL: https://github.com/apache/hudi/pull/7561#discussion_r1059107210 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() { } /** - * Returns fresh new archived commits as a timeline from startTs (inclusive). - * - * This is costly operation if really early endTs is specified. - * Be caution to use this only when the time range is short. - * - * This method is not thread safe. + * Returns the cached archived timeline from startTs (inclusive). * - * @return Archived commit timeline + * @param startTs The start instant time (inclusive) of the archived timeline. + * @return the archived timeline. */ public HoodieArchivedTimeline getArchivedTimeline(String startTs) { -return new HoodieArchivedTimeline(this, startTs); +return getArchivedTimeline(startTs, true); + } + + /** + * Returns the cached archived timeline if using in-memory cache or a fresh new archived + * timeline if not using cache, from startTs (inclusive). + * + * Instantiating an archived timeline is costly operation if really early startTs is + * specified. + * + * This method is not thread safe. + * + * @param startTs The start instant time (inclusive) of the archived timeline. + * @param useCache Whether to use in-memory cache. + * @return the archived timeline based on the arguments. + */ + public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean useCache) { +if (useCache) { + return archivedTimelineMap.computeIfAbsent(startTs, this::instantiateArchivedTimeline); Review Comment: The assumption is that there should be only one `startTs` in the cache so there is no need to clear it and the cache is destructed once the lifecycle of the meta client is over. I can make it cleared if there is a new `startTs` coming in. ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java: ## @@ -210,11 +210,30 @@ public static HoodieDefaultTimeline getTimeline(HoodieTableMetaClient metaClient return activeTimeline; } + /** + * Returns a Hudi timeline with commits after the given instant time (exclusive). + * + * @param metaClient{@link HoodieTableMetaClient} instance. + * @param exclusiveStartInstantTime Start instant time (exclusive). + * @return Hudi timeline. + */ + public static HoodieTimeline getCommitsTimelineAfter( + HoodieTableMetaClient metaClient, String exclusiveStartInstantTime) { +HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline(); +HoodieDefaultTimeline timeline = +activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime) +? metaClient.getArchivedTimeline(exclusiveStartInstantTime) +.mergeTimeline(activeTimeline) +: activeTimeline; +return timeline.getCommitsTimeline() +.findInstantsAfter(exclusiveStartInstantTime, Integer.MAX_VALUE); + } Review Comment: We need to scan all the instants since `exclusiveStartInstantTime` to figure out the touched partitions and it is possible that `exclusiveStartInstantTime` is before the start of the archived timeline, in which case we need to still scan the archived timeline (see #6662 for details). In most of the cases, `exclusiveStartInstantTime` should be after the start of the active timeline, so the archived timeline is not loaded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table
hudi-bot commented on PR #7448: URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367493784 ## CI report: * b79a063798079dfdb34d61dc57ec0341e93d7c57 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14041) * 9314399c3b40e65689ffeeade5be40ed289563f0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14042) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config
hudi-bot commented on PR #7575: URL: https://github.com/apache/hudi/pull/7575#issuecomment-1367491053 ## CI report: * a35c9c05aec17c775e39c0472fbe952178b2f60e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14021) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14024) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14039) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table
hudi-bot commented on PR #7448: URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367490841 ## CI report: * eaed1745913960ef5e40a323eedaeaf96438c5fb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13695) * b79a063798079dfdb34d61dc57ec0341e93d7c57 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14041) * 9314399c3b40e65689ffeeade5be40ed289563f0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table
hudi-bot commented on PR #7448: URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367454386 ## CI report: * eaed1745913960ef5e40a323eedaeaf96438c5fb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13695) * b79a063798079dfdb34d61dc57ec0341e93d7c57 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14041) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7371: [HUDI-3673] Clean up hbase shading dependencies
hudi-bot commented on PR #7371: URL: https://github.com/apache/hudi/pull/7371#issuecomment-1367454213 ## CI report: * 2f69501f430d9e536a78b65e38e91cc710c69832 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13432) * f3d658be1ab30458c286ace26ec67b4715e188fe Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14040) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5160) Spark df saveAsTable failed with CTAS
[ https://issues.apache.org/jira/browse/HUDI-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5160: - Status: In Progress (was: Open) > Spark df saveAsTable failed with CTAS > - > > Key: HUDI-5160 > URL: https://issues.apache.org/jira/browse/HUDI-5160 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: 董可伦 >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > In 0.9.0 Version,It's ok,But now failed > {code:java} > import spark.implicits._ > val partitionValue = "2022-11-05" > val df = Seq((1, "a1", 10, 1000, partitionValue)).toDF("id", "name", "value", > "ts", "dt") > val tableName = "test_hudi_table" > // Write a table by spark dataframe. > df.write.format("hudi") > .option(HoodieWriteConfig.TBL_NAME.key, tableName) > .option(TABLE_TYPE.key, MOR_TABLE_TYPE_OPT_VAL) > // .option(HoodieTableConfig.TYPE.key(), MOR_TABLE_TYPE_OPT_VAL) > .option(RECORDKEY_FIELD.key, "id") > .option(PRECOMBINE_FIELD.key, "ts") > .option(PARTITIONPATH_FIELD.key, "dt") > .option(KEYGENERATOR_CLASS_NAME.key, classOf[SimpleKeyGenerator].getName) > .option(HoodieWriteConfig.INSERT_PARALLELISM_VALUE.key, "1") > .option(HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key, "1") > .partitionBy("dt") > .mode(SaveMode.Overwrite) > .saveAsTable(tableName){code} > > {code:java} > Can't find primaryKey `uuid` in root > |-- _hoodie_commit_time: string (nullable = true) > |-- _hoodie_commit_seqno: string (nullable = true) > |-- _hoodie_record_key: string (nullable = true) > |-- _hoodie_partition_path: string (nullable = true) > |-- _hoodie_file_name: string (nullable = true) > |-- id: integer (nullable = false) > |-- name: string (nullable = true) > |-- value: integer (nullable = false) > |-- ts: integer (nullable = false) > |-- dt: string (nullable = true) > . > java.lang.IllegalArgumentException: Can't find primaryKey `uuid` in root > |-- _hoodie_commit_time: string (nullable = true) > |-- _hoodie_commit_seqno: string (nullable = true) > |-- _hoodie_record_key: string (nullable = true) > |-- _hoodie_partition_path: string (nullable = true) > |-- _hoodie_file_name: string (nullable = true) > |-- id: integer (nullable = false) > |-- name: string (nullable = true) > |-- value: integer (nullable = false) > |-- ts: integer (nullable = false) > |-- dt: string (nullable = true) > . > at > org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at > org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:201) > at > org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:200) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at > org.apache.spark.sql.hudi.HoodieOptionConfig$.validateTable(HoodieOptionConfig.scala:200) > at > org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.parseSchemaAndConfigs(HoodieCatalogTable.scala:256) > at > org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.initHoodieTable(HoodieCatalogTable.scala:171) > at > org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:99){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5160) Spark df saveAsTable failed with CTAS
[ https://issues.apache.org/jira/browse/HUDI-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5160: - Status: Patch Available (was: In Progress) > Spark df saveAsTable failed with CTAS > - > > Key: HUDI-5160 > URL: https://issues.apache.org/jira/browse/HUDI-5160 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: 董可伦 >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > In 0.9.0 Version,It's ok,But now failed > {code:java} > import spark.implicits._ > val partitionValue = "2022-11-05" > val df = Seq((1, "a1", 10, 1000, partitionValue)).toDF("id", "name", "value", > "ts", "dt") > val tableName = "test_hudi_table" > // Write a table by spark dataframe. > df.write.format("hudi") > .option(HoodieWriteConfig.TBL_NAME.key, tableName) > .option(TABLE_TYPE.key, MOR_TABLE_TYPE_OPT_VAL) > // .option(HoodieTableConfig.TYPE.key(), MOR_TABLE_TYPE_OPT_VAL) > .option(RECORDKEY_FIELD.key, "id") > .option(PRECOMBINE_FIELD.key, "ts") > .option(PARTITIONPATH_FIELD.key, "dt") > .option(KEYGENERATOR_CLASS_NAME.key, classOf[SimpleKeyGenerator].getName) > .option(HoodieWriteConfig.INSERT_PARALLELISM_VALUE.key, "1") > .option(HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key, "1") > .partitionBy("dt") > .mode(SaveMode.Overwrite) > .saveAsTable(tableName){code} > > {code:java} > Can't find primaryKey `uuid` in root > |-- _hoodie_commit_time: string (nullable = true) > |-- _hoodie_commit_seqno: string (nullable = true) > |-- _hoodie_record_key: string (nullable = true) > |-- _hoodie_partition_path: string (nullable = true) > |-- _hoodie_file_name: string (nullable = true) > |-- id: integer (nullable = false) > |-- name: string (nullable = true) > |-- value: integer (nullable = false) > |-- ts: integer (nullable = false) > |-- dt: string (nullable = true) > . > java.lang.IllegalArgumentException: Can't find primaryKey `uuid` in root > |-- _hoodie_commit_time: string (nullable = true) > |-- _hoodie_commit_seqno: string (nullable = true) > |-- _hoodie_record_key: string (nullable = true) > |-- _hoodie_partition_path: string (nullable = true) > |-- _hoodie_file_name: string (nullable = true) > |-- id: integer (nullable = false) > |-- name: string (nullable = true) > |-- value: integer (nullable = false) > |-- ts: integer (nullable = false) > |-- dt: string (nullable = true) > . > at > org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at > org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:201) > at > org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:200) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at > org.apache.spark.sql.hudi.HoodieOptionConfig$.validateTable(HoodieOptionConfig.scala:200) > at > org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.parseSchemaAndConfigs(HoodieCatalogTable.scala:256) > at > org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.initHoodieTable(HoodieCatalogTable.scala:171) > at > org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:99){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table
hudi-bot commented on PR #7448: URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367450975 ## CI report: * eaed1745913960ef5e40a323eedaeaf96438c5fb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13695) * b79a063798079dfdb34d61dc57ec0341e93d7c57 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7371: [HUDI-3673] Clean up hbase shading dependencies
hudi-bot commented on PR #7371: URL: https://github.com/apache/hudi/pull/7371#issuecomment-1367450840 ## CI report: * 2f69501f430d9e536a78b65e38e91cc710c69832 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13432) * f3d658be1ab30458c286ace26ec67b4715e188fe UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS
xushiyan commented on PR #7139: URL: https://github.com/apache/hudi/pull/7139#issuecomment-1367445735 closing in favor of #7448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS
xushiyan closed pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS URL: https://github.com/apache/hudi/pull/7139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367443562 ## CI report: * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #7578: [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support
yihua commented on code in PR #7578: URL: https://github.com/apache/hudi/pull/7578#discussion_r1059039497 ## website/releases/download.md: ## @@ -7,14 +7,17 @@ last_modified_at: 2022-12-27T15:59:57-04:00 --- ### Release 0.12.2 +* [Long Term Support](/releases/release-0.12.2#long-term-support): this is the latest stable release * Source Release : [Apache Hudi 0.12.2 Source Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.2/hudi-0.12.2.src.tgz) ([asc](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.sha512)) Review Comment: Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support (#7578)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new c18d9621a1c [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support (#7578) c18d9621a1c is described below commit c18d9621a1c375c39bd5aaeb57ca13635753e601 Author: Y Ethan Guo AuthorDate: Thu Dec 29 08:10:13 2022 -0800 [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support (#7578) --- website/releases/download.md | 3 +++ website/releases/release-0.12.0.md | 5 + website/releases/release-0.12.1.md | 5 + website/releases/release-0.12.2.md | 5 + 4 files changed, 18 insertions(+) diff --git a/website/releases/download.md b/website/releases/download.md index 609fdff5862..e7ceb1d5c56 100644 --- a/website/releases/download.md +++ b/website/releases/download.md @@ -7,14 +7,17 @@ last_modified_at: 2022-12-27T15:59:57-04:00 --- ### Release 0.12.2 +* [Long Term Support](/releases/release-0.12.2#long-term-support): this is the latest stable release * Source Release : [Apache Hudi 0.12.2 Source Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.2/hudi-0.12.2.src.tgz) ([asc](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.sha512)) * Release Note : ([Release Note for Apache Hudi 0.12.2](/releases/release-0.12.2)) ### Release 0.12.1 +* [Long Term Support](/releases/release-0.12.1#long-term-support): upgrade to [0.12.2](/releases/release-0.12.2) for the latest stable release * Source Release : [Apache Hudi 0.12.1 Source Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.1/hudi-0.12.1.src.tgz) ([asc](https://downloads.apache.org/hudi/0.12.1/hudi-0.12.1.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.12.1/hudi-0.12.1.src.tgz.sha512)) * Release Note : ([Release Note for Apache Hudi 0.12.1](/releases/release-0.12.1)) ### Release 0.12.0 +* [Long Term Support](/releases/release-0.12.0#long-term-support): upgrade to [0.12.2](/releases/release-0.12.2) for the latest stable release * Source Release : [Apache Hudi 0.12.0 Source Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.0/hudi-0.12.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.12.0/hudi-0.12.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.12.0/hudi-0.12.0.src.tgz.sha512)) * Release Note : ([Release Note for Apache Hudi 0.12.0](/releases/release-0.12.0)) diff --git a/website/releases/release-0.12.0.md b/website/releases/release-0.12.0.md index ba072cb0423..fe764b5dd8a 100644 --- a/website/releases/release-0.12.0.md +++ b/website/releases/release-0.12.0.md @@ -7,6 +7,11 @@ last_modified_at: 2022-08-17T10:30:00+05:30 --- # [Release 0.12.0](https://github.com/apache/hudi/releases/tag/release-0.12.0) ([docs](/docs/quick-start-guide)) +## Long Term Support + +We aim to maintain 0.12 for a longer period of time and provide a stable release through the latest 0.12.x release for +users to migrate to. The latest 0.12 release is [0.12.2](/releases/release-0.12.2). + ## Migration Guide In this release, there have been a few API and configuration updates listed below that warranted a new table version. diff --git a/website/releases/release-0.12.1.md b/website/releases/release-0.12.1.md index 709c8adbdcc..dbd98f98ed9 100644 --- a/website/releases/release-0.12.1.md +++ b/website/releases/release-0.12.1.md @@ -7,6 +7,11 @@ last_modified_at: 2022-08-17T10:30:00+05:30 --- # [Release 0.12.1](https://github.com/apache/hudi/releases/tag/release-0.12.1) ([docs](/docs/quick-start-guide)) +## Long Term Support + +We aim to maintain 0.12 for a longer period of time and provide a stable release through the latest 0.12.x release for +users to migrate to. The latest 0.12 release is [0.12.2](/releases/release-0.12.2). + ## Migration Guide * This release (0.12.1) does not introduce any new table version, thus no migration is needed if you are on 0.12.0. diff --git a/website/releases/release-0.12.2.md b/website/releases/release-0.12.2.md index a40c1e032b8..3594206cda4 100644 --- a/website/releases/release-0.12.2.md +++ b/website/releases/release-0.12.2.md @@ -7,6 +7,11 @@ last_modified_at: 2022-12-27T10:30:00+05:30 --- # [Release 0.12.2](https://github.com/apache/hudi/releases/tag/release-0.12.2) ([docs](/docs/quick-start-guide)) +## Long Term Support + +We aim to maintain 0.12 for a longer period of time and provide a stable release through the latest 0.12.x release for +users to migrate to. This release (0.12.2) is the latest 0.12 release. + ## Migration Guide * This release (0.12.2) does not introduce any new table version, thus no migration is needed if you are on 0.12.0.
[GitHub] [hudi] nsivabalan merged pull request #7578: [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support
nsivabalan merged PR #7578: URL: https://github.com/apache/hudi/pull/7578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7579: [HUDI-5487] Reduce duplicate Logs in ExternalSpillableMap
hudi-bot commented on PR #7579: URL: https://github.com/apache/hudi/pull/7579#issuecomment-1367390604 ## CI report: * ba9aa020afa608a3b51d7085c48217d97bbc1881 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14032) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14037) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config
hudi-bot commented on PR #7575: URL: https://github.com/apache/hudi/pull/7575#issuecomment-1367390548 ## CI report: * a35c9c05aec17c775e39c0472fbe952178b2f60e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14021) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14024) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14039) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xccui commented on pull request #7575: [MINOR] Set engine when creating meta write config
xccui commented on PR #7575: URL: https://github.com/apache/hudi/pull/7575#issuecomment-1367387566 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] minihippo commented on a diff in pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.
minihippo commented on code in PR #7572: URL: https://github.com/apache/hudi/pull/7572#discussion_r1058989874 ## hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java: ## @@ -120,7 +118,7 @@ private boolean checkIfExceptionInRetryList(Exception e) { // if users didn't set hoodie.filesystem.operation.retry.exceptions // we will retry all the IOException and RuntimeException -if (retryExceptionsClasses.isEmpty()) { +if (retryExceptionsClasses.equals(RETRY_EXCEPTION_CLASS)) { return true; } Review Comment: fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] minihippo commented on a diff in pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.
minihippo commented on code in PR #7572: URL: https://github.com/apache/hudi/pull/7572#discussion_r1058989340 ## hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java: ## @@ -36,9 +36,10 @@ * * @param Type of return value for checked function. */ -public class RetryHelper implements Serializable { +public class RetryHelper implements Serializable { private static final Logger LOG = LogManager.getLogger(RetryHelper.class); - private transient CheckedFunction func; + private static final List> RETRY_EXCEPTION_CLASS = Arrays.asList(IOException.class, RuntimeException.class); Review Comment: fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6983: [HUDI-5031] Fix MERGE INTO creates empty partition files when source table has partitions but target table does not
hudi-bot commented on PR #6983: URL: https://github.com/apache/hudi/pull/6983#issuecomment-1367320447 ## CI report: * d2f4ce7779a835a6f524aabd8fa16c7c5dcc8c6e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14035) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367283242 ## CI report: * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field
xicm commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367282191 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager
hudi-bot commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1367275891 ## CI report: * c20aa589730546c0c7bb82969c92aa6d364af101 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14020) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14033) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7579: [HUDI-5487] Reduce duplicate Logs in ExternalSpillableMap
hudi-bot commented on PR #7579: URL: https://github.com/apache/hudi/pull/7579#issuecomment-1367272629 ## CI report: * ba9aa020afa608a3b51d7085c48217d97bbc1881 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14032) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14037) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367272014 ## CI report: * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
SteNicholas commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058910341 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -432,6 +433,11 @@ private Stream getCommitInstantsToArchive() { table.getActiveTimeline(), config.getInlineCompactDeltaCommitMax()) : Option.empty(); + // The clustering commit instant can not be archived unless we ensure that the replaced files have been cleaned, + // without the replaced files metadata on the timeline, the fs view would expose duplicates for readers. + Option oldestInstantToRetainForClustering = Review Comment: @leesf, refer to the naming of `oldestInstantToRetainForCompaction`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
leesf commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058904972 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -432,6 +433,11 @@ private Stream getCommitInstantsToArchive() { table.getActiveTimeline(), config.getInlineCompactDeltaCommitMax()) : Option.empty(); + // The clustering commit instant can not be archived unless we ensure that the replaced files have been cleaned, + // without the replaced files metadata on the timeline, the fs view would expose duplicates for readers. + Option oldestInstantToRetainForClustering = Review Comment: this name is a little confused with `oldestPendingCompactionAndReplaceInstant` below. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive (#7385)
This is an automated email from the ASF dual-hosted git repository. forwardxu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 495b6fbb062 [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive (#7385) 495b6fbb062 is described below commit 495b6fbb062c843d19de420acfefd3a6a2ee3c58 Author: cxzl25 AuthorDate: Thu Dec 29 19:17:43 2022 +0800 [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive (#7385) --- .../main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java| 11 ++- .../java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java | 11 ++- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java index c14536a2774..fbba5861741 100644 --- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java +++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java @@ -30,6 +30,7 @@ import org.apache.hudi.sync.common.model.PartitionValueExtractor; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.common.StatsSetupConst; +import org.apache.hadoop.hive.conf.HiveConf; import org.apache.hadoop.hive.metastore.IMetaStoreClient; import org.apache.hadoop.hive.metastore.TableType; import org.apache.hadoop.hive.metastore.api.Database; @@ -48,6 +49,7 @@ import org.apache.log4j.Logger; import org.apache.parquet.schema.MessageType; import org.apache.thrift.TException; +import java.lang.reflect.InvocationTargetException; import java.util.ArrayList; import java.util.HashMap; import java.util.LinkedHashMap; @@ -78,7 +80,14 @@ public class HMSDDLExecutor implements DDLExecutor { public HMSDDLExecutor(HiveSyncConfig syncConfig) throws HiveException, MetaException { this.syncConfig = syncConfig; this.databaseName = syncConfig.getStringOrDefault(META_SYNC_DATABASE_NAME); -this.client = Hive.get(syncConfig.getHiveConf()).getMSC(); +HiveConf hiveConf = syncConfig.getHiveConf(); +IMetaStoreClient tempMetaStoreClient; +try { + tempMetaStoreClient = ((Hive) Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, hiveConf)).getMSC(); +} catch (NoSuchMethodException | IllegalAccessException | IllegalArgumentException | InvocationTargetException ex) { + tempMetaStoreClient = Hive.get(hiveConf).getMSC(); +} +this.client = tempMetaStoreClient; try { this.partitionValueExtractor = (PartitionValueExtractor) Class.forName(syncConfig.getStringOrDefault(META_SYNC_PARTITION_EXTRACTOR_CLASS)).newInstance(); diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java index 93ae3cfbf73..e0f7dab5f35 100644 --- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java +++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java @@ -23,6 +23,7 @@ import org.apache.hudi.hive.HiveSyncConfig; import org.apache.hudi.hive.HoodieHiveSyncException; import org.apache.hudi.hive.util.HivePartitionUtil; +import org.apache.hadoop.hive.conf.HiveConf; import org.apache.hadoop.hive.metastore.IMetaStoreClient; import org.apache.hadoop.hive.metastore.api.FieldSchema; import org.apache.hadoop.hive.metastore.api.MetaException; @@ -37,6 +38,7 @@ import org.apache.log4j.LogManager; import org.apache.log4j.Logger; import java.io.IOException; +import java.lang.reflect.InvocationTargetException; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; @@ -59,7 +61,14 @@ public class HiveQueryDDLExecutor extends QueryBasedDDLExecutor { public HiveQueryDDLExecutor(HiveSyncConfig config) throws HiveException, MetaException { super(config); -this.metaStoreClient = Hive.get(config.getHiveConf()).getMSC(); +HiveConf hiveConf = config.getHiveConf(); +IMetaStoreClient tempMetaStoreClient; +try { + tempMetaStoreClient = ((Hive) Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, hiveConf)).getMSC(); +} catch (NoSuchMethodException | IllegalAccessException | IllegalArgumentException | InvocationTargetException ex) { + tempMetaStoreClient = Hive.get(hiveConf).getMSC(); +} +this.metaStoreClient = tempMetaStoreClient; try { this.sessionState = new SessionState(config.getHiveConf(), UserGroupInformation.getCurrentUser().getShortUserName());
[GitHub] [hudi] XuQianJin-Stars merged pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive
XuQianJin-Stars merged PR #7385: URL: https://github.com/apache/hudi/pull/7385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lokeshj1703 commented on issue #7363: [SUPPORT] how to get hudi table schema and get table list under the same database
lokeshj1703 commented on issue #7363: URL: https://github.com/apache/hudi/issues/7363#issuecomment-1367241201 ``` public static final ConfigProperty CREATE_SCHEMA = ConfigProperty .key("hoodie.table.create.schema") .noDefaultValue() .withDocumentation("Schema used when creating the table, for the first time."); ``` This is the config value returned by function `hoodieTableMetaClient.getTableConfig().getTableCreateSchema()`. There is no default value for this config. It seems this would return a value only if configured. ``` scala> import org.apache.hudi.common.table.TableSchemaResolver; import org.apache.hudi.common.table.TableSchemaResolver scala> var schemaResolver = new TableSchemaResolver(hoodieTableMetaClient); schemaResolver: org.apache.hudi.common.table.TableSchemaResolver = org.apache.hudi.common.table.TableSchemaResolver@3662dc9b scala> schemaResolver.getTableAvroSchema() res20: org.apache.avro.Schema = {"type":"record","name":"hudi_table_record","namespace":"hoodie.hudi_table","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"emp_id","type":["null","long"],"default":null},{"name":"employee_name","type":["null","string"],"default":null},{"name":"department","type":["null","string"],"default":null},{"name":"state","type":["null","string"],"default":null},{"name":"salary","type":["null","long"... scala> schemaResolver.getTableParquetSchema() res21: org.apache.parquet.schema.MessageType = message hoodie.hudi_table.hudi_table_record { optional binary _hoodie_commit_time (UTF8); optional binary _hoodie_commit_seqno (UTF8); optional binary _hoodie_record_key (UTF8); optional binary _hoodie_partition_path (UTF8); optional binary _hoodie_file_name (UTF8); optional int64 emp_id; optional binary employee_name (UTF8); optional binary department (UTF8); optional binary state (UTF8); optional int64 salary; optional int64 age; optional int64 bonus; optional int64 ts; } ``` You can use the above snippet for fetching the table schema instead. cc @xushiyan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] perfectcw commented on issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time
perfectcw commented on issue #7570: URL: https://github.com/apache/hudi/issues/7570#issuecomment-1367235107 > Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cxzl25 commented on pull request #7579: [HUDI-5487] Reduce duplicate Logs in ExternalSpillableMap
cxzl25 commented on PR #7579: URL: https://github.com/apache/hudi/pull/7579#issuecomment-1367234808 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lucasberlang closed issue #7223: [SUPPORT] Error to write .hoodie_partition_metadata in IBM Cloud Object Storage
lucasberlang closed issue #7223: [SUPPORT] Error to write .hoodie_partition_metadata in IBM Cloud Object Storage URL: https://github.com/apache/hudi/issues/7223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lucasberlang commented on issue #7223: [SUPPORT] Error to write .hoodie_partition_metadata in IBM Cloud Object Storage
lucasberlang commented on issue #7223: URL: https://github.com/apache/hudi/issues/7223#issuecomment-1367233792 Good news! now is working, I finally fixed it by adding these properties to the core-site.xml ```xml fs.s3a.access.key fs.s3a.secret.key fs.s3a.awsAccessKeyId fs.s3a.awsSecretAccessKey fs.s3a.server-side-encryption.key ``` Thanks @yihua for the support! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] fengjian428 commented on issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time
fengjian428 commented on issue #7570: URL: https://github.com/apache/hudi/issues/7570#issuecomment-1367231286 > > > Thanks for your reply. And could you explain the specific reason? Is it because some commits are archived so cannot be synced to hive. > > > > > > the sync logic is: check last_update_time in hive table properties, get all commits from that time, then update last_update_time,this is not working for multiple writers > > Is that means, when 20221227042855832.commit goes to sync hive, if the last_update_time in hive table properties is 20221227042906103, then the commit of 20221227042855832 will not be synced to hive. yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode
hudi-bot commented on PR #4966: URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367230023 ## CI report: Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org