[GitHub] [hudi] audas007 commented on issue #8017: [SUPPORT] Parquet file size is small after running deltastreamer in BULK_INSERT which results in large number of files under same partitioning
audas007 commented on issue #8017: URL: https://github.com/apache/hudi/issues/8017#issuecomment-1508006549 Was able to get this to work, with a config hoodie.copyonwrite.record.size.estimate=150 per suggestion here https://github.com/apache/hudi/issues/1583#issuecomment-622894674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ThinkerLei commented on issue #8425: [SUPPORT] When the downstream tasks read the logfile, use the startoffset of each logfile recorded in the deltacommit metadata to read the logfile
ThinkerLei commented on issue #8425: URL: https://github.com/apache/hudi/issues/8425#issuecomment-1508002140 @danny0405 thanks for your quick feedback. I'm going to make some modifications here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…
hudi-bot commented on PR #8418: URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507993349 ## CI report: * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16340) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #7834: [HUDI-5690] Add simpleBucketPartitioner to support using the simple bucket index under bulkinsert
bvaradar commented on PR #7834: URL: https://github.com/apache/hudi/pull/7834#issuecomment-1507984765 Code changes look good. Will wait for tests to pass before merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] LinMingQiang opened a new issue, #8459: [Discuss] Do we need to promote the bucket number as table config instead of a write config
LinMingQiang opened a new issue, #8459: URL: https://github.com/apache/hudi/issues/8459 **_Tips before filing an issue_** Users may sometimes modify the bucket num, and the inconsistency of the bucket num will lead to data duplication and make it unavailability. So, do we need to promote the bucket number as table config instead of a write config, this way, we can perform a configuration consistency check before starting the job. pr link: https://github.com/apache/hudi/pull/8338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [DOCS][MINOR] Add new blogs (#8458)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new c3fbf95bef9 [DOCS][MINOR] Add new blogs (#8458) c3fbf95bef9 is described below commit c3fbf95bef9986759e87edaf4161aa2efa815ced Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Thu Apr 13 22:55:27 2023 -0700 [DOCS][MINOR] Add new blogs (#8458) --- ...ake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx | 1 - ...ur-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx | 13 + website/src/pages/videos.md | 2 -- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx b/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx index ccd88bf7c5e..8dfe96d34a4 100644 --- a/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx +++ b/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx @@ -1,4 +1,3 @@ - --- title: "Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi" authors: diff --git a/website/blog/2023-04-07-Speed-up-your-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx b/website/blog/2023-04-07-Speed-up-your-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx new file mode 100644 index 000..f2a9d3e2a25 --- /dev/null +++ b/website/blog/2023-04-07-Speed-up-your-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx @@ -0,0 +1,13 @@ +--- +title: "Speed up your write latencies using Bucket Index in Apache Hudi" +authors: +- name: Sivabalan Narayanan +category: blog +tags: +- how-to +- indexing +- hudi +--- +import Redirect from '@site/src/components/Redirect'; + +https://medium.com/@simpsons/speed-up-your-write-latencies-using-bucket-index-in-apache-hudi-2f7c297493dc";>Redirecting... please wait!! diff --git a/website/src/pages/videos.md b/website/src/pages/videos.md index a3ce0cce5a0..b5a6ab64e8f 100644 --- a/website/src/pages/videos.md +++ b/website/src/pages/videos.md @@ -160,5 +160,3 @@ last_modified_at: 2022-12-21T15:59:57-04:00 58. [Data Analysis for Apache Hudi Blogs on Medium with Pandas](https://www.youtube.com/watch?v=a7FD4zIOwVg)- By Soumil Shah, Mar 24th 2023 -58. [How to scrape all Blogs about a topic from medium like pro with Python](https://www.youtube.com/watch?v=-KUSaC_1X6M)- By Soumil Shah, Mar 24th 2023 -
[GitHub] [hudi] nsivabalan merged pull request #8458: [DOCS][MINOR] Add new blogs
nsivabalan merged PR #8458: URL: https://github.com/apache/hudi/pull/8458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bhasudha opened a new pull request, #8458: [DOCS][MINOR] Add new blogs
bhasudha opened a new pull request, #8458: URL: https://github.com/apache/hudi/pull/8458 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte
SteNicholas commented on PR #8455: URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507933748 @Zouxxyy, did you take the compatibility of this change into consideration? With this change, the config value of `clustering.plan.strategy.small.file.limit` must be changed when upgrade to the lastest version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time
codope commented on code in PR #7627: URL: https://github.com/apache/hudi/pull/7627#discussion_r1166270869 ## hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc: ## @@ -128,6 +128,11 @@ "HoodieIndexCommitMetadata" ], "default": null + }, + { + "name":"stateTransitionTime", + "type":["null","string"], + "default": null Review Comment: +1 we should save it in the archived metadata. I can see other potential use cases when there can be holes in the timeline after we allow archival beyond savepoint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nfarah86 opened a new pull request, #8457: demo change for compaction docs
nfarah86 opened a new pull request, #8457: URL: https://github.com/apache/hudi/pull/8457 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ i made a change to the compaction doc header https://user-images.githubusercontent.com/5392555/231940689-b397a167-73c0-48e1-8f69-16e71f081564.png";> _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] t-raghavan commented on issue #8016: Inline Clustering : Clustering failed to write to files
t-raghavan commented on issue #8016: URL: https://github.com/apache/hudi/issues/8016#issuecomment-1507900897 Thanks for the suggestion and it worked. 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink
hudi-bot commented on PR #8456: URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507885169 ## CI report: * 5fc5cf9e31b0209f280e06eb10bf2a75e201b807 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16345) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507884806 ## CI report: * 44487a13a5abd52affb6212f85482976f461790a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16336) * 054fbfeae4583b99bc4a6cd319be5fc9e4572214 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16344) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan merged pull request #7993: [MINOR] Fix hard-coded storage level for indexing
xushiyan merged PR #7993: URL: https://github.com/apache/hudi/pull/7993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink
hudi-bot commented on PR #8456: URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507881515 ## CI report: * 5fc5cf9e31b0209f280e06eb10bf2a75e201b807 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507881248 ## CI report: * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335) * 44487a13a5abd52affb6212f85482976f461790a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16336) * 054fbfeae4583b99bc4a6cd319be5fc9e4572214 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8443: [HUDI-6068] Improve logic of getOldestInstantToRetainForClustering wh…
hudi-bot commented on PR #8443: URL: https://github.com/apache/hudi/pull/8443#issuecomment-1507878055 ## CI report: * 996608e7ab379d38fdd997b9532b8b90dcfe99ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16300) * 0dff5a604b6db928602ad4c242464a1bb52feb91 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16343) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8416: [SUPPORT] data loss after createRdd method in HoodieSparkUtils.scala
ad1happy2go commented on issue #8416: URL: https://github.com/apache/hudi/issues/8416#issuecomment-1507871268 https://github.com/apache/hudi/pull/7334 fixed the issue in 0.13.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9
ad1happy2go commented on issue #8436: URL: https://github.com/apache/hudi/issues/8436#issuecomment-1507866544 I see issue related to quotes in the spark-submit command. Try this - spark-submit --class org.apache.hudi.utilities.HoodieCleaner /usr/lib/hudi/hudi-utilities-bundle.jar --target-base-path s3://edi-dp-qa-datalake/DATA_PLATFORM/ods/ods_d_crm_crmd_customer_i_prod_r/hudiTable/.hoodie --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS --hoodie-conf hoodie.cleaner.commits.retained=10 --hoodie-conf hoodie.cleaner.parallelism=200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink
Zouxxyy commented on PR #8456: URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507861640 @danny0405 One problem is that there is only `clean.async.enabled` in flink, but there is `hoodie.clean.automatic` in spark to control whether to clean automatically. Should we add parameter, or use `clean.async.enabled` uniformly to control the clean behavior in flink? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6078) Clean is always triggered with flink
[ https://issues.apache.org/jira/browse/HUDI-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6078: - Labels: pull-request-available (was: ) > Clean is always triggered with flink > > > Key: HUDI-6078 > URL: https://issues.apache.org/jira/browse/HUDI-6078 > Project: Apache Hudi > Issue Type: Bug >Reporter: zouxxyy >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] Zouxxyy opened a new pull request, #8456: [HUDI-6078] Make clean controlled by parameter in flink
Zouxxyy opened a new pull request, #8456: URL: https://github.com/apache/hudi/pull/8456 ### Change Logs Make clean controlled by parameters in flink ### Impact Make clean controlled by parameters in flink ### Risk level (write none, low medium or high below) low ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6078) Clean is always triggered with flink
zouxxyy created HUDI-6078: - Summary: Clean is always triggered with flink Key: HUDI-6078 URL: https://issues.apache.org/jira/browse/HUDI-6078 Project: Apache Hudi Issue Type: Bug Reporter: zouxxyy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8450: [HUDI-6073] Table create schema should not include metadata fields
hudi-bot commented on PR #8450: URL: https://github.com/apache/hudi/pull/8450#issuecomment-1507853676 ## CI report: * fd8641d8f762b47ad2a0721d406083deb7d23933 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16315) * b9e5ebaba08cf4176ef039298ee7719e50aba3d3 UNKNOWN * 5433223d35c2216ef5d58c2705466bb8a0550a1c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16341) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8443: [HUDI-6068] Improve logic of getOldestInstantToRetainForClustering wh…
hudi-bot commented on PR #8443: URL: https://github.com/apache/hudi/pull/8443#issuecomment-1507853639 ## CI report: * 996608e7ab379d38fdd997b9532b8b90dcfe99ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16300) * 0dff5a604b6db928602ad4c242464a1bb52feb91 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…
hudi-bot commented on PR #8418: URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507853543 ## CI report: * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN * e3b8fc000ef3a7d57dd23aec1c9b37afaeb3c2dd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16319) * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16340) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte
hudi-bot commented on PR #8455: URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507848469 ## CI report: * f0215afb8f8298848391fa8168189832c614a667 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16342) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8450: [HUDI-6073] Table create schema should not include metadata fields
hudi-bot commented on PR #8450: URL: https://github.com/apache/hudi/pull/8450#issuecomment-1507848443 ## CI report: * fd8641d8f762b47ad2a0721d406083deb7d23933 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16315) * b9e5ebaba08cf4176ef039298ee7719e50aba3d3 UNKNOWN * 5433223d35c2216ef5d58c2705466bb8a0550a1c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…
hudi-bot commented on PR #8418: URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507848318 ## CI report: * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN * e3b8fc000ef3a7d57dd23aec1c9b37afaeb3c2dd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16319) * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte
hudi-bot commented on PR #8455: URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507843416 ## CI report: * f0215afb8f8298848391fa8168189832c614a667 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8450: [HUDI-6073] Table create schema should not include metadata fields
hudi-bot commented on PR #8450: URL: https://github.com/apache/hudi/pull/8450#issuecomment-1507843227 ## CI report: * fd8641d8f762b47ad2a0721d406083deb7d23933 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16315) * b9e5ebaba08cf4176ef039298ee7719e50aba3d3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…
voonhous commented on code in PR #8418: URL: https://github.com/apache/hudi/pull/8418#discussion_r1165173485 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java: ## @@ -283,34 +291,66 @@ public void write(ArrayData array, int ordinal) { } } - /** - * Timestamp of INT96 bytes, julianDay(4) + nanosOfDay(8). See - * https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp - * TIMESTAMP_MILLIS and TIMESTAMP_MICROS are the deprecated ConvertedType. - */ private class Timestamp64Writer implements FieldWriter { -private Timestamp64Writer() { + +private final int precision; +private Timestamp64Writer(int precision) { + this.precision = precision; } @Override public void write(RowData row, int ordinal) { - recordConsumer.addLong(timestampToInt64(row.getTimestamp(ordinal, 3))); + TimestampData timestampData = row.getTimestamp(ordinal, precision); + recordConsumer.addLong(timestampToInt64(timestampData, precision)); } @Override public void write(ArrayData array, int ordinal) { - recordConsumer.addLong(timestampToInt64(array.getTimestamp(ordinal, 3))); + TimestampData timestampData = array.getTimestamp(ordinal, precision); + recordConsumer.addLong(timestampToInt64(timestampData, precision)); } } - private long timestampToInt64(TimestampData timestampData) { -return utcTimestamp ? timestampData.getMillisecond() : timestampData.toTimestamp().getTime(); + /** + * Converts a {@code TimestampData} to its corresponding int64 value. This function only accepts TimestampData of + * precision 3 or 6. Special attention will need to be given to a TimestampData of precision = 6. + * + * For example representing `1970-01-01T00:00:03.11` of precision 6 will have: + * + * millisecond = 3100 + * nanoOfMillisecond = 1000 + * + * As such, the int64 value will be: + * + * millisecond * 1000 + nanoOfMillisecond / 1000 + * + * @param timestampData TimestampData to be converted to int64 format + * @param precision the precision of the TimestampData + * @return int64 value of the TimestampData + */ + private long timestampToInt64(TimestampData timestampData, int precision) { +if (!utcTimestamp) { + // toTimestamp is agnostic of precision + return timestampData.toTimestamp().getTime(); Review Comment: Fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] waitingF commented on a diff in pull request #8376: [HUDI-6019] support config minPartitions when reading from kafka
waitingF commented on code in PR #8376: URL: https://github.com/apache/hudi/pull/8376#discussion_r1166212077 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java: ## @@ -148,9 +166,58 @@ public static OffsetRange[] computeOffsetRanges(Map fromOf } Review Comment: @bvaradar I think the algorithm would not work well in data skew case. In data skew case, it will not divvy partition evenly. For example, given topic partitions "0:0->100, 1:0->500" and minPartitions=3, the algorithm will generate 2 ranges: "0:0->100, 1:0->200, 1:200->300", for the 2 ranges of partition 1, they are not divvied evenly. Given more skew partitions, it will be worse. In the data skew case, resplit will generate even ranges for one TopicPartition. Because it will allocate ranges for topic partitions first, then based on the allocated ranges resplit into roughly minPartitions ranges. Based on this and the complex of the resplit should be very small, I think resplit should be better. How do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6077) Add more partition push down filters
[ https://issues.apache.org/jira/browse/HUDI-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6077: - Labels: pull-request-available (was: ) > Add more partition push down filters > > > Key: HUDI-6077 > URL: https://issues.apache.org/jira/browse/HUDI-6077 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Hui An >Priority: Major > Labels: pull-request-available > > 1. Implement some basic `Expression`s for HUDI > 2. Try to convert all spark `Expression` to HUDI `Expression` > 3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI > `Expression` > 4. Currently, we only support push down `EqualTo` Filters if it's the first > level of partitions(by path prefix), this pr tries to pushdown more complex > partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. > Through Parallel listing partition paths, will use `PartialBindVisitor` to > bind partitions which are listed, and change the unresolved references to > `AlwaysTrue`. > e.g. > {code:java} > Given the table has 3 partition levels: year, month, day. And the existing > table partition paths are: > year=2023/month=2/day=11 > year=2023/month=2/day=12 > year=2024/month=2/day=12 > If we want to push down the filter `year=2023 AND day=12`, When listing the > partition first level `year`, will bind schema `year` to `PartialBindVisitor`. > Since `day` is not provided, the filter will be modified to `year=2023 AND > TRUE`(optimized to `year=2023`), so the first 2 paths will be selected. > Then starts to parallel listing first 2 paths, since the day is still not > provided, these 2 paths still are selected. > And finally listing the last partition level, the filter `year=2023 AND > day=12` will be used and return `year=2023/month=2/day=12` > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] boneanxs commented on pull request #8452: [WIP] [HUDI-6077] Add more partition push down filters
boneanxs commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1507833845 > @boneanxs Can you please create a JIRA and add more details to it? Don't we already push down `EqualTo`? Thanks @codope, updated the description. Currently we only push down `EqualTo` if the filter is the first partition level of table. This pr 1) support all partition level filters 2) try to push more filters when performing listing partitions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6077) Add more partition push down filters
Hui An created HUDI-6077: Summary: Add more partition push down filters Key: HUDI-6077 URL: https://issues.apache.org/jira/browse/HUDI-6077 Project: Apache Hudi Issue Type: Improvement Reporter: Hui An 1. Implement some basic `Expression`s for HUDI 2. Try to convert all spark `Expression` to HUDI `Expression` 3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI `Expression` 4. Currently, we only support push down `EqualTo` Filters if it's the first level of partitions(by path prefix), this pr tries to pushdown more complex partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. Through Parallel listing partition paths, will use `PartialBindVisitor` to bind partitions which are listed, and change the unresolved references to `AlwaysTrue`. e.g. {code:java} Given the table has 3 partition levels: year, month, day. And the existing table partition paths are: year=2023/month=2/day=11 year=2023/month=2/day=12 year=2024/month=2/day=12 If we want to push down the filter `year=2023 AND day=12`, When listing the partition first level `year`, will bind schema `year` to `PartialBindVisitor`. Since `day` is not provided, the filter will be modified to `year=2023 AND TRUE`(optimized to `year=2023`), so the first 2 paths will be selected. Then starts to parallel listing first 2 paths, since the day is still not provided, these 2 paths still are selected. And finally listing the last partition level, the filter `year=2023 AND day=12` will be used and return `year=2023/month=2/day=12` {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] Zouxxyy commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte
Zouxxyy commented on PR #8455: URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507819932 @danny0405 Can you help with a review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (d0a13e64c8c -> 46c9bc1791b)
This is an automated email from the ASF dual-hosted git repository. biyan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from d0a13e64c8c [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled (#8453) add 46c9bc1791b [HUDI-6000] Fix RunClusteringProcedure when no partition matched (#8318) No new revisions were added by this update. Summary of changes: .../sql/hudi/command/procedures/RunClusteringProcedure.scala | 11 +++ .../spark/sql/hudi/procedure/TestClusteringProcedure.scala| 9 - 2 files changed, 15 insertions(+), 5 deletions(-)
[GitHub] [hudi] YannByron merged pull request #8318: [HUDI-6000] Fix RunClusteringProcedure when no partition matched
YannByron merged PR #8318: URL: https://github.com/apache/hudi/pull/8318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6074) check inlineClusteringEnabled in isAsyncClusteringEnabled
[ https://issues.apache.org/jira/browse/HUDI-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6074: -- Fix Version/s: 0.14.0 > check inlineClusteringEnabled in isAsyncClusteringEnabled > - > > Key: HUDI-6074 > URL: https://issues.apache.org/jira/browse/HUDI-6074 > Project: Apache Hudi > Issue Type: Bug >Reporter: zouxxyy >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6074) check inlineClusteringEnabled in isAsyncClusteringEnabled
[ https://issues.apache.org/jira/browse/HUDI-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-6074. - Resolution: Fixed > check inlineClusteringEnabled in isAsyncClusteringEnabled > - > > Key: HUDI-6074 > URL: https://issues.apache.org/jira/browse/HUDI-6074 > Project: Apache Hudi > Issue Type: Bug >Reporter: zouxxyy >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated: [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled (#8453)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d0a13e64c8c [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled (#8453) d0a13e64c8c is described below commit d0a13e64c8c755e28c2c0920d246f711b0663bc1 Author: Zouxxyy AuthorDate: Fri Apr 14 09:50:36 2023 +0800 [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled (#8453) --- .../main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala| 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala index 1f9e218572e..d338f74bc5a 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala @@ -1015,18 +1015,16 @@ object HoodieSparkSqlWriter { tableConfig: HoodieTableConfig, parameters: Map[String, String], configuration: Configuration): Boolean = { log.info(s"Config.inlineCompactionEnabled ? ${client.getConfig.inlineCompactionEnabled}") -if (asyncCompactionTriggerFnDefined && !client.getConfig.inlineCompactionEnabled - && parameters.get(ASYNC_COMPACT_ENABLE.key).exists(r => r.toBoolean)) { - tableConfig.getTableType == HoodieTableType.MERGE_ON_READ -} else { - false -} +(asyncCompactionTriggerFnDefined && !client.getConfig.inlineCompactionEnabled + && parameters.get(ASYNC_COMPACT_ENABLE.key).exists(r => r.toBoolean) + && tableConfig.getTableType == HoodieTableType.MERGE_ON_READ) } private def isAsyncClusteringEnabled(client: SparkRDDWriteClient[_], parameters: Map[String, String]): Boolean = { log.info(s"Config.asyncClusteringEnabled ? ${client.getConfig.isAsyncClusteringEnabled}") -asyncClusteringTriggerFnDefined && client.getConfig.isAsyncClusteringEnabled +(asyncClusteringTriggerFnDefined && !client.getConfig.inlineClusteringEnabled + && client.getConfig.isAsyncClusteringEnabled) } /**
[GitHub] [hudi] codope merged pull request #8453: [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled
codope merged PR #8453: URL: https://github.com/apache/hudi/pull/8453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6048) Exceptions should not be thrown when querying partitions that do not exist
[ https://issues.apache.org/jira/browse/HUDI-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6048: -- Fix Version/s: 0.14.0 > Exceptions should not be thrown when querying partitions that do not exist > -- > > Key: HUDI-6048 > URL: https://issues.apache.org/jira/browse/HUDI-6048 > Project: Apache Hudi > Issue Type: Bug >Reporter: zouxxyy >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6048) Exceptions should not be thrown when querying partitions that do not exist
[ https://issues.apache.org/jira/browse/HUDI-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-6048. - Resolution: Fixed > Exceptions should not be thrown when querying partitions that do not exist > -- > > Key: HUDI-6048 > URL: https://issues.apache.org/jira/browse/HUDI-6048 > Project: Apache Hudi > Issue Type: Bug >Reporter: zouxxyy >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6076) clustering.plan.strategy.small.file.limit's unit should is byte
[ https://issues.apache.org/jira/browse/HUDI-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6076: - Labels: pull-request-available (was: ) > clustering.plan.strategy.small.file.limit's unit should is byte > --- > > Key: HUDI-6076 > URL: https://issues.apache.org/jira/browse/HUDI-6076 > Project: Apache Hudi > Issue Type: Bug >Reporter: zouxxyy >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] Zouxxyy opened a new pull request, #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte
Zouxxyy opened a new pull request, #8455: URL: https://github.com/apache/hudi/pull/8455 ### Change Logs `clustering.plan.strategy.target.file.max.bytes`'s unit is byte, `clustering.plan.strategy.small.file.limit` should be unified with it. And they also compare sizes somewhere, there's not even a uniform unit here, like this ```java this.conf.setLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT.key(), this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES) > this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT) ? this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT) : this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES)); ``` ### Impact Change clustering.plan.strategy.small.file.limit's unit to byte ### Risk level (write none, low medium or high below) low ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6048] Check if partition exists before list partition by path prefix (#8402)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 657b837aaa6 [HUDI-6048] Check if partition exists before list partition by path prefix (#8402) 657b837aaa6 is described below commit 657b837aaa6fa825945625579c52ff7365b1ecfd Author: Zouxxyy AuthorDate: Fri Apr 14 09:48:48 2023 +0800 [HUDI-6048] Check if partition exists before list partition by path prefix (#8402) --- .../src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala | 4 +++- .../src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala | 7 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala index a9a20057795..6459c967c56 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala @@ -300,7 +300,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession, // prefix to try to reduce the scope of the required file-listing val relativePartitionPathPrefix = composeRelativePartitionPath(staticPartitionColumnNameValuePairs) - if (staticPartitionColumnNameValuePairs.length == partitionColumnNames.length) { + if (!metaClient.getFs.exists(new Path(getBasePath, relativePartitionPathPrefix))) { +Seq() + } else if (staticPartitionColumnNameValuePairs.length == partitionColumnNames.length) { // In case composed partition path is complete, we can return it directly avoiding extra listing operation Seq(new PartitionPath(relativePartitionPathPrefix, staticPartitionColumnNameValuePairs.map(_._2._2.asInstanceOf[AnyRef]).toArray)) } else { diff --git a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala index e69819fb6f4..ed73940186d 100644 --- a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala +++ b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala @@ -519,7 +519,12 @@ class TestHoodieFileIndex extends HoodieSparkClientTestBase with ScalaAssertionS EqualTo(attribute("region_code"), literal("1"))), "dt = '2023/01/01' and region_code = '1'", enablePartitionPathPrefixAnalysis, -Seq(("1", "2023/01/01"))) +Seq(("1", "2023/01/01"))), + // no partition matched + (Seq(EqualTo(attribute("region_code"), literal("0"))), +"region_code = '0'", +enablePartitionPathPrefixAnalysis, +Seq()) ) testCases.foreach(testCase => {
[GitHub] [hudi] codope merged pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix
codope merged PR #8402: URL: https://github.com/apache/hudi/pull/8402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nfarah86 closed pull request #8454: change for demo
nfarah86 closed pull request #8454: change for demo URL: https://github.com/apache/hudi/pull/8454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6076) clustering.plan.strategy.small.file.limit's unit should is byte
zouxxyy created HUDI-6076: - Summary: clustering.plan.strategy.small.file.limit's unit should is byte Key: HUDI-6076 URL: https://issues.apache.org/jira/browse/HUDI-6076 Project: Apache Hudi Issue Type: Bug Reporter: zouxxyy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8454: change for demo
hudi-bot commented on PR #8454: URL: https://github.com/apache/hudi/pull/8454#issuecomment-1507810698 ## CI report: * 9b3e77b8a3f70da310c72fba7177932bef4bb548 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16337) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507810397 ## CI report: * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335) * 44487a13a5abd52affb6212f85482976f461790a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16336) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8454: change for demo
hudi-bot commented on PR #8454: URL: https://github.com/apache/hudi/pull/8454#issuecomment-1507807085 ## CI report: * 9b3e77b8a3f70da310c72fba7177932bef4bb548 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507802385 ## CI report: * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN * 4adec3849535bce65c8d1a3d1909679a94da4d44 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16334) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507802024 ## CI report: * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335) * 44487a13a5abd52affb6212f85482976f461790a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #8353: [MINOR] Remove unused code
bvaradar commented on PR #8353: URL: https://github.com/apache/hudi/pull/8353#issuecomment-1507795687 @huangxiaopingRD : Can you merge all the refactoring code to a single PR. Makes it easy to review and land. Thanks, Balaji.V -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nfarah86 opened a new pull request, #8454: change for demo
nfarah86 opened a new pull request, #8454: URL: https://github.com/apache/hudi/pull/8454 ### Change Logs THIS IS A DEMO _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507771015 ## CI report: * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325) * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335) * 44487a13a5abd52affb6212f85482976f461790a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507767580 ## CI report: * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325) * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507763904 ## CI report: * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325) * a7b0e52741609f58fa47adabbcc34387e5f1b678 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8440: [DO NOT MERGE] run gh actions with java 17
hudi-bot commented on PR #8440: URL: https://github.com/apache/hudi/pull/8440#issuecomment-1507730119 ## CI report: * 6b7a1a58b0df09c8e262cda3fc3087e08cbf3905 UNKNOWN * 7a3e534cf0abd559f9eec2640dcc7e18b9215288 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16331) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507726163 ## CI report: * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16333) * 4adec3849535bce65c8d1a3d1909679a94da4d44 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16334) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.
hudi-bot commented on PR #8434: URL: https://github.com/apache/hudi/pull/8434#issuecomment-1507726100 ## CI report: * 4197e6d1216ee9bc269fb5368ab6c919c55ec146 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16329) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8410: [HUDI-6050] Fix write helper deduplicate records lost origin data operation
hudi-bot commented on PR #8410: URL: https://github.com/apache/hudi/pull/8410#issuecomment-1507726024 ## CI report: * 98195cd8fa7423d6756c845f6afcc0d6d2fb2eea Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16328) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507695160 ## CI report: * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332) * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16333) * 4adec3849535bce65c8d1a3d1909679a94da4d44 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8449: [HUDI-6071] If the Flink Hive Catalog is used and the table type is B…
hudi-bot commented on PR #8449: URL: https://github.com/apache/hudi/pull/8449#issuecomment-1507685663 ## CI report: * c8acee7666a4cabce9b9eb76b1da71b1f6826bf9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16326) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507645429 ## CI report: * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332) * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16333) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507639266 ## CI report: * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332) * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507630741 ## CI report: * b485d5800a466164d1619e2d7696c9116bd7d123 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16297) * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332) * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507630311 ## CI report: * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507574064 ## CI report: * b485d5800a466164d1619e2d7696c9116bd7d123 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16297) * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1507573458 ## CI report: * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11
hudi-bot commented on PR #8439: URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507521807 ## CI report: * b485d5800a466164d1619e2d7696c9116bd7d123 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16297) * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8453: [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled
hudi-bot commented on PR #8453: URL: https://github.com/apache/hudi/pull/8453#issuecomment-1507504914 ## CI report: * fbfacaab486ef7bc97a5880d91f7bbd88830e789 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16322) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8440: [DO NOT MERGE] run gh actions with java 17
hudi-bot commented on PR #8440: URL: https://github.com/apache/hudi/pull/8440#issuecomment-1507448923 ## CI report: * 6b7a1a58b0df09c8e262cda3fc3087e08cbf3905 UNKNOWN * b268bf853f7cbb88f3204a8f50d26ac44a8edc2a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16293) * 7a3e534cf0abd559f9eec2640dcc7e18b9215288 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16331) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8440: [DO NOT MERGE] run gh actions with java 17
hudi-bot commented on PR #8440: URL: https://github.com/apache/hudi/pull/8440#issuecomment-1507440429 ## CI report: * 6b7a1a58b0df09c8e262cda3fc3087e08cbf3905 UNKNOWN * b268bf853f7cbb88f3204a8f50d26ac44a8edc2a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16293) * 7a3e534cf0abd559f9eec2640dcc7e18b9215288 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8445: [HUDI-3088] Use Spark 3.2 as default Spark version
hudi-bot commented on PR #8445: URL: https://github.com/apache/hudi/pull/8445#issuecomment-1507356898 ## CI report: * 25574338253f1e7c9db7eabcb1239a8cb5ca2b1d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16321) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…
hudi-bot commented on PR #8418: URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507356645 ## CI report: * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN * e3b8fc000ef3a7d57dd23aec1c9b37afaeb3c2dd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16319) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6075) Improve config generation script and docs
[ https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6075: Summary: Improve config generation script and docs (was: Improve config generation script) > Improve config generation script and docs > - > > Key: HUDI-6075 > URL: https://issues.apache.org/jira/browse/HUDI-6075 > Project: Apache Hudi > Issue Type: New Feature > Components: configs >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6075) Improve config generation script
[ https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6075: Fix Version/s: 0.14.0 > Improve config generation script > > > Key: HUDI-6075 > URL: https://issues.apache.org/jira/browse/HUDI-6075 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6075) Improve config generation script
[ https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6075: Story Points: 0.5 > Improve config generation script > > > Key: HUDI-6075 > URL: https://issues.apache.org/jira/browse/HUDI-6075 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6075) Improve config generation script
[ https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6075: Component/s: configs Epic Link: HUDI-5738 > Improve config generation script > > > Key: HUDI-6075 > URL: https://issues.apache.org/jira/browse/HUDI-6075 > Project: Apache Hudi > Issue Type: New Feature > Components: configs >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6075) Improve config generation script
[ https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6075: --- Assignee: Ethan Guo > Improve config generation script > > > Key: HUDI-6075 > URL: https://issues.apache.org/jira/browse/HUDI-6075 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6075) Improve config generation script
Ethan Guo created HUDI-6075: --- Summary: Improve config generation script Key: HUDI-6075 URL: https://issues.apache.org/jira/browse/HUDI-6075 Project: Apache Hudi Issue Type: New Feature Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix
codope commented on code in PR #8402: URL: https://github.com/apache/hudi/pull/8402#discussion_r1165824450 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala: ## @@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession, // prefix to try to reduce the scope of the required file-listing val relativePartitionPathPrefix = composeRelativePartitionPath(staticPartitionColumnNameValuePairs) - if (staticPartitionColumnNameValuePairs.length == partitionColumnNames.length) { + if (!metaClient.getFs.exists(new Path(getBasePath, relativePartitionPathPrefix))) { Review Comment: Got it 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-5990) Incremental queries on MOR sometimes miss data
[ https://issues.apache.org/jira/browse/HUDI-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5990. - Resolution: Fixed > Incremental queries on MOR sometimes miss data > -- > > Key: HUDI-5990 > URL: https://issues.apache.org/jira/browse/HUDI-5990 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Affects Versions: 0.12.2, 0.13.0 >Reporter: ruofan >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > env: hudi-0.12.2 spark-3.2.0 > Currently,we have a hudi timeline and data files. > {code:java} > -rw-r--r-- 1 rfyu rfyu 1.5K 3月 26 09:58 20230326095758155.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:57 > 20230326095758155.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:57 > 20230326095758155.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.6K 3月 26 09:58 20230326095810406.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095810406.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095810406.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:58 20230326095811072.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095811072.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095811072.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:58 20230326095820974.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095820974.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095820974.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.8K 3月 26 09:58 20230326095830980.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095830980.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095830980.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.8K 3月 26 09:58 > 20230326095840978.compaction.requested > -rw-r--r-- 1 rfyu rfyu 1.5K 3月 26 09:58 20230326095841125.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095841125.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095841125.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.6K 3月 26 09:59 20230326095850994.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095850994.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095850994.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:59 20230326095900988.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095900988.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095900988.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:59 20230326095910983.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095910983.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095910983.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095920986.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095920986.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.5K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:59 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:59 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:59 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code} > We use spark to incrementally query this hudi table. Data maybe go missing > due to the incremental range contains an incomplete compaction plan. > There is an example of incremental query.Normally, from begin_instance_time > to end_instance_time, 6 commits should have been found, but only 3 were found. > {code:java} > sql: > call > copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988'); > select _hoodie_commit_time,count(*) from incremental_table group by > _hoodie_commit_time order by _hoodie_commit_time desc; > actual result: > +---++ > |_hoodie_commit_time|count(
[hudi] branch master updated: [HUDI-5990] Avoid missing data during incremental queries (#8299)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new c91d7e1f78d [HUDI-5990] Avoid missing data during incremental queries (#8299) c91d7e1f78d is described below commit c91d7e1f78dbb4a12dab23b5d4b147bfb150002a Author: rfyu <39233058+r...@users.noreply.github.com> AuthorDate: Fri Apr 14 01:15:13 2023 +0800 [HUDI-5990] Avoid missing data during incremental queries (#8299) The reason for missing data is that the timeline used by `MergeOnReadIncrementalRelation` only contain completed instants. When the incremental range contains an incomplete compaction plan, fsView.getLatestMergedFileSlicesBeforeOrOn in collectFileSplits will filter out some fileslices. --- .../hudi/MergeOnReadIncrementalRelation.scala | 4 +- .../functional/TestParquetColumnProjection.scala | 75 -- 2 files changed, 73 insertions(+), 6 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala index 93bf730a56d..636624f3950 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala @@ -60,9 +60,9 @@ case class MergeOnReadIncrementalRelation(override val sqlContext: SQLContext, override protected def timeline: HoodieTimeline = { if (fullTableScan) { - super.timeline + metaClient.getCommitsAndCompactionTimeline } else { - super.timeline.findInstantsInRange(startTimestamp, endTimestamp) + metaClient.getCommitsAndCompactionTimeline.findInstantsInRange(startTimestamp, endTimestamp) } } diff --git a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala index 0eefc7beeec..eaf1839d5dc 100644 --- a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala +++ b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala @@ -22,12 +22,13 @@ import org.apache.calcite.runtime.SqlFunctions.abs import org.apache.hudi.HoodieBaseRelation.projectSchema import org.apache.hudi.common.config.{HoodieMetadataConfig, HoodieStorageConfig} import org.apache.hudi.common.model.{HoodieRecord, OverwriteNonDefaultsWithLatestAvroPayload} -import org.apache.hudi.common.table.HoodieTableConfig +import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient} import org.apache.hudi.common.testutils.{HadoopMapRedUtils, HoodieTestDataGenerator} -import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.config.{HoodieCompactionConfig, HoodieWriteConfig} +import org.apache.hudi.keygen.NonpartitionedKeyGenerator import org.apache.hudi.testutils.SparkClientFunctionalTestHarness import org.apache.hudi.testutils.SparkClientFunctionalTestHarness.getSparkSqlConf -import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DefaultSource, HoodieBaseRelation, HoodieSparkUtils, HoodieUnsafeRDD} +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DefaultSource, HoodieBaseRelation, HoodieMergeOnReadRDD, HoodieSparkUtils, HoodieUnsafeRDD} import org.apache.parquet.hadoop.util.counters.BenchmarkCounter import org.apache.spark.SparkConf import org.apache.spark.internal.Logging @@ -252,7 +253,6 @@ class TestParquetColumnProjection extends SparkClientFunctionalTestHarness with runTest(tableState, DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL, DataSourceReadOptions.REALTIME_PAYLOAD_COMBINE_OPT_VAL, fullColumnsReadStats) } - // TODO add test for incremental query of the table with logs @Test def testMergeOnReadIncrementalRelationWithNoDeltaLogs(): Unit = { val tablePath = s"$basePath/mor-no-logs" @@ -296,6 +296,41 @@ class TestParquetColumnProjection extends SparkClientFunctionalTestHarness with projectedColumnsReadStats, incrementalOpts) } + @Test + def testMergeOnReadIncrementalRelationWithDeltaLogs(): Unit = { +val tablePath = s"$basePath/mor-with-logs-incr" +val targetRecordsCount = 100 + +bootstrapMORTableWithDeltaLog(tablePath, targetRecordsCount, defaultWriteOpts, populateMetaFields = true) + +println(s"Running test for $tablePath / incremental") +/** + * State of timeline and updated data + * +--+--+--+--++--+--+--+
[GitHub] [hudi] codope merged pull request #8299: [HUDI-5990]Avoid missing data during incremental queries
codope merged PR #8299: URL: https://github.com/apache/hudi/pull/8299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] sydneyhoran commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability
sydneyhoran commented on PR #5071: URL: https://github.com/apache/hudi/pull/5071#issuecomment-1507301215 I am also looking forward to this PR being merged 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords
hudi-bot commented on PR #8300: URL: https://github.com/apache/hudi/pull/8300#issuecomment-1507251814 ## CI report: * 059463c77c641929a07e9b9ebb9e369d746c157f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16318) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8299: [HUDI-5990]Avoid missing data during incremental queries
hudi-bot commented on PR #8299: URL: https://github.com/apache/hudi/pull/8299#issuecomment-1507251690 ## CI report: * c0fc740641546218180be303626a86aea628b3a2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16317) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on pull request #8452: [WIP] Add more partition push down filters
codope commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1507244545 @boneanxs Can you please create a JIRA and add more details to it? Don't we already push down `EqualTo`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.
hudi-bot commented on PR #8434: URL: https://github.com/apache/hudi/pull/8434#issuecomment-1507195751 ## CI report: * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269) * 4197e6d1216ee9bc269fb5368ab6c919c55ec146 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16329) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8410: [HUDI-6050] Fix write helper deduplicate records lost origin data operation
hudi-bot commented on PR #8410: URL: https://github.com/apache/hudi/pull/8410#issuecomment-1507195540 ## CI report: * efda2dce4010d10f0342c30bfb45adf1cf3fe5c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16199) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16327) * 98195cd8fa7423d6756c845f6afcc0d6d2fb2eea Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16328) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] slfan1989 commented on a diff in pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.
slfan1989 commented on code in PR #8435: URL: https://github.com/apache/hudi/pull/8435#discussion_r1165710809 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java: ## @@ -129,10 +129,11 @@ public Map getTableSchema(String tableName) { ResultSet result = null; try { DatabaseMetaData databaseMetaData = connection.getMetaData(); - result = databaseMetaData.getColumns(null, databaseName, tableName, null); + String catalog = connection.getCatalog(); + result = databaseMetaData.getColumns(catalog, databaseName, tableName, "%"); Review Comment: @danny0405 Thank you very much for your help in reviewing the code! For this part of JDBC, we use HiveJDBC. I refer to the usage of Hive#Beeline and modify this part of the code. HiveDatabaseMetaData#getColumns ``` public class HiveDatabaseMetaData implements DatabaseMetaData { public ResultSet getColumns(String catalog, String schemaPattern, String tableNamePattern, String columnNamePattern) throws SQLException {} . } ``` The call stack of the code is as follows: ``` Hive \-- CLIService#getColumns \-- HiveSessionImpl#getColumns \-- GetColumnsOperation#runInternal ``` By reading GetColumnsOperation#runInternal, we can find that `catalogName` has no obvious effect, so we can set it to null, But Hive's Beeline code directly uses `HiveConnetion's getCatalog`, So I also follow Hive's usage. If columnNamePattern is null, it means to get all fields, but this is not conducive to reading, % means no filtering, matching all columns, which is easier to understand. Beeline#getColumns ``` ResultSet getColumns(String table) throws SQLException { if (!(assertConnection())) { return null; } return getDatabaseConnection().getDatabaseMetaData().getColumns( getDatabaseConnection().getDatabaseMetaData().getConnection().getCatalog(), null, table, "%"); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.
hudi-bot commented on PR #8434: URL: https://github.com/apache/hudi/pull/8434#issuecomment-1507183590 ## CI report: * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269) * 4197e6d1216ee9bc269fb5368ab6c919c55ec146 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8410: [HUDI-6050] Fix write helper deduplicate records lost origin data operation
hudi-bot commented on PR #8410: URL: https://github.com/apache/hudi/pull/8410#issuecomment-1507183327 ## CI report: * efda2dce4010d10f0342c30bfb45adf1cf3fe5c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16199) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16327) * 98195cd8fa7423d6756c845f6afcc0d6d2fb2eea UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] slfan1989 commented on a diff in pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.
slfan1989 commented on code in PR #8435: URL: https://github.com/apache/hudi/pull/8435#discussion_r1165710809 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java: ## @@ -129,10 +129,11 @@ public Map getTableSchema(String tableName) { ResultSet result = null; try { DatabaseMetaData databaseMetaData = connection.getMetaData(); - result = databaseMetaData.getColumns(null, databaseName, tableName, null); + String catalog = connection.getCatalog(); + result = databaseMetaData.getColumns(catalog, databaseName, tableName, "%"); Review Comment: @danny0405 Thank you very much for your help in reviewing the code! For this part of JDBC, we use HiveJDBC. Before I changed it, I referred to some codes of Hive. HiveDatabaseMetaData#getColumns ``` public class HiveDatabaseMetaData implements DatabaseMetaData { public ResultSet getColumns(String catalog, String schemaPattern, String tableNamePattern, String columnNamePattern) throws SQLException {} . } ``` The call stack of the code is as follows: ``` Hive \-- CLIService#getColumns \-- HiveSessionImpl#getColumns \-- GetColumnsOperation#runInternal ``` By reading GetColumnsOperation#runInternal, we can find that `catalogName` has no obvious effect, so we can set it to null, But Hive's Beeline code directly uses `HiveConnetion's getCatalog`, So I also follow Hive's usage. If columnNamePattern is null, it means to get all fields, but this is not conducive to reading, % means no filtering, matching all columns, which is easier to understand. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org