[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1347877944 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] waywtdcc commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution
waywtdcc commented on PR #5830: URL: https://github.com/apache/hudi/pull/5830#issuecomment-1347854543 Hope this pr can be merged into 0.12.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
codope commented on code in PR #7437: URL: https://github.com/apache/hudi/pull/7437#discussion_r1046736411 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1125,6 +1130,14 @@ private HoodieData getFilesPartitionRecords(String createInstantTi return filesPartitionRecords.union(fileListRecords); } + protected void closeInternal() { +try { + close(); +} catch (Exception e) { Review Comment: AutoCloseable would throw an exception if the resource is not closed. However, I think it is better to catch and wrap in HoodieException. Easier to search and debug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
zhuanshenbsj1 commented on code in PR #7159: URL: https://github.com/apache/hudi/pull/7159#discussion_r1046699321 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala: ## @@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase { .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL) .option("hoodie.clustering.inline", "true") .option("hoodie.clustering.inline.max.commits", "1") + .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key") .mode(SaveMode.Append) Review Comment: Without this change,after inputDF5 the timeline will be: commit(instantC,cleaned)->clustering(instantD,cleaned)->commit(instantE)->clean(instantF) and file belong to instantA will archived, it will case line-193 -> line 194: allVisibleCDCData = cdcDataFrame((commitTime1.toLong - 1).toString) assertCDCOpCnt(allVisibleCDCData, totalInsertedCnt, totalUpdatedCnt, totalDeletedCnt) can't get instantC file error. With this change, after after inputDF5 the timeline will be: clustering(instantD,cleaned)->clustering(instantE,cleaned)->commit(instantF)->clean(instantG), it will fix this problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
zhuanshenbsj1 commented on code in PR #7159: URL: https://github.com/apache/hudi/pull/7159#discussion_r1046731112 ## hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java: ## @@ -141,6 +141,7 @@ private void setHiveColumnNameProps(List fields, JobConf jobConf, jobConf.set(hive_metastoreConstants.META_TABLE_PARTITION_COLUMNS, PARTITION_COLUMN); } jobConf.set(hive_metastoreConstants.META_TABLE_COLUMNS, hiveOrderedColumnNames); +jobConf.set("columns.types", "string,string,string,string,string,string,string,string,bigint,string,string"); } Review Comment: This function (TestHoodieRealtimeRecordReader#testIncrementalWithOnlylog) has a warn, and I should move this change to here. ![image](https://user-images.githubusercontent.com/34104400/207249735-5b4443c3-6fae-4287-9eb3-d680679ece7a.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase
hudi-bot commented on PR #7345: URL: https://github.com/apache/hudi/pull/7345#issuecomment-1347828520 ## CI report: * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN * 29e31dd516112fa9a38463a9fedccb423db589cb Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13676) * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN * 1930cfe77fc3ddbd75564a75558b1211f823be89 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase
hudi-bot commented on PR #7345: URL: https://github.com/apache/hudi/pull/7345#issuecomment-1347824795 ## CI report: * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN * f559ebdc000ac712c15ce2d7b1f6fda3302dfabf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13580) * 29e31dd516112fa9a38463a9fedccb423db589cb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13676) * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase
hudi-bot commented on PR #7345: URL: https://github.com/apache/hudi/pull/7345#issuecomment-1347821193 ## CI report: * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN * f559ebdc000ac712c15ce2d7b1f6fda3302dfabf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13580) * 29e31dd516112fa9a38463a9fedccb423db589cb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
svn commit: r58689 - /dev/hudi/KEYS
Author: sivabalan Date: Tue Dec 13 06:46:05 2022 New Revision: 58689 Log: Adding satish's gpg keys Modified: dev/hudi/KEYS Modified: dev/hudi/KEYS == --- dev/hudi/KEYS (original) +++ dev/hudi/KEYS Tue Dec 13 06:46:05 2022 @@ -1171,3 +1171,62 @@ vVmwNnpErMRCa+GMaulS06s2mkJdLVX8EW5z3BLz RRaeFMCVTqi/Xw== =LZ7a -END PGP PUBLIC KEY BLOCK- +pub rsa4096 2022-11-19 [SC] [expires: 2026-11-19] + 6DA0B39A13C2658D22AE7D14D08C4B6BD98EA659 +uid [ultimate] hudi +sig 3D08C4B6BD98EA659 2022-11-19 hudi +sub rsa4096 2022-11-19 [E] [expires: 2026-11-19] +sig D08C4B6BD98EA659 2022-11-19 hudi + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBGN5EkYBEACeR5uneVMbI5W5BtHEibjXEuskBHT7Z+MiU158YQEAmVvy+9NK +TBntrIpNioqgdQlyZypcGQvuse11+TYh0AbCmK6y+8iOqi7EiDsSsdDOwTR7ROXJ +Br1/ZLQbjb9poPFQxOzSLQWeQT6ETU6wZwrZ8ZC/ZJ1hGKX+SsDjqHWGZhPZInfB +c/uqmB1advGZEdWRKSN4b4IIcOL69vO50NfGFTbu5n6MQjpFGnBoW9Ed3IO+UsvN +3K7opD6/DYth5F88shvW5YEEuS/yHBHKHPs4XAqVCtjUozDmMmpbIEKuF3Id9eO6 +l+mhAUBLy9Tj6dgkIyQ6nkjzJwe0BsijyL/U1O5XFpnP+ETK1QsrPbQrQ4ykFv3R +LV8qnXilyiM3iKtxDwvEtZPF7fMtGwhQEKDBwWu0zfsVZ1kQSvuvYC/T83BCQ2Rx +faP+Xy3bc/979WtvxM30yxp2aQ+ZcCLB6ORF95irXTVu3Z6QSZRGCUfJmAfckXOv +u7mLh4wH2Lua8ppruqE8Ic1cpl/VQOLuYOBM0yMuJSpAUuSVt9k8XKdZt8zrg0BV +NdIo9uir7lf0csKtFR79vPJq0YBOo7sj56C6KDQZLQ6B9Dx54qGr03RZTAeidb2R +qTaNSrvJzVtMNdPwgHtn507OGt2ZJOo4cIbl+Im5IzMesvC4uRzrqZTrJwARAQAB +tBhodWRpIDxzYXRpc2hAYXBhY2hlLm9yZz6JAlQEEwEIAD4WIQRtoLOaE8JljSKu +fRTQjEtr2Y6mWQUCY3kSRgIbAwUJB4YfXQULCQgHAgYVCgkICwIEFgIDAQIeAQIX +gAAKCRDQjEtr2Y6mWdyuD/9AgaM08CYxbAYDtPAb6uC1edCZbvPzkP98us4m8jL/ +979grfvgyPkH2c87f8ec/JlGIOZSaDZOsNO9hhsCyfT3SrN/DQnIqlimEkh4k7Wb +DGp3aktP5Qv80BtExkIca8J92Z7Cs5FRub9Vp51bqfS9wDgBZDvTbOpXc2snJgK5 +Bh9JlfFUyb4ev6pFizrT/sL5COhkqYKgyunl8fMOiX2hgl/aNyOjOCOQrHQNpW8d +EsdTvj8+IVkadeCkD5+lqaNS1cY0U7ycGpciwjGZ4aNypb27lF2L0o5zmKT0u2yD +gtOg28RpMV6uz4rQWNibz1vH/USGxIdd67dPFVYshUqicdhjmH4848qGkkXvhgqE +wL0e5EH9HxbN6VocxZ9YHvnNfA8hy2K0sJTm+TMQJvD7dW112LLX3u8XLmDp4URQ +bYwtEw82VfcbYIZbXUIWY5NPLNevDKVs3SXmkdXXz0OzsX0ODb+pp3rW/NztOeQ9 +4huvwXLmm9WiKiTgz7SQXvhZNpi6sUIlX82yEHr/+KbCXRTsz4xmSMBrqKufbTrF +P/QyH6mONeXdsCb50jkMG8L2TzFEQdElchInobcfAZ0E2SuZ8Rmm6HdB2iS4SN6O +jexAkN0VVi63f2Zsl2XjZhckW3x/X52CzlyAPc6m0NxsrEYjsmdzX0ACn0g/6KCj +M7kCDQRjeRJGARAAs8p3JcS7icMBJIl1MHDjF0nGBMrway7HpqzROfnXJLogjUu9 +L0ASGojloytecQzcDGDbB8zuF2o+qyu0EtEzMc8m2PrRgBmOg+TMEZOovCSjiIEZ +/w7ZOlfOU8Iva3fBbAg++oFb24LEOC+z1gjcUie0QlvReZWLWvZ1ATD3y0oWapqR +IyyqOHaSF/l8cIRvv2kgigEvLch8iVuVHnc/ZOjyQ5iEbBZpe/ejg08dlPU2VCdO +jGcL17JOqoltCKsQmK+xBnAHQL5VNcTd9fEo6FeiUIofyf/9d/LpqPVjo1zo6Eld +9hk7q3I5Ms4Lh+cbtslnzi7t7U+cI+Zs0s7G0FcMBIXzqdgiHP3doQVm1Viex12f +Wp+lN+QJDmyo+wEtkxbWSXKutiL0OAdSmO/1Cx901ygSTw5F/lmTxzqn3oc8F3vq +lMQKn+WpKRcMwWeQU3nhtSi/zw7Zto/LyWmt8JQdQUoFYAXIXIbQaihP1k3COnnC +wPj3XbzIBUWCM2642jcX/ieUrsKLRu7/WoVf0L6CPLqk8QKPBI85HUooB5oA8ZzX +U4Io/VzRZ9plo1q8I9JOR9g4HLoGh4GovWlfjsifa/h5j2W1o7Z/Ix9Ze5fDvKW+ +0wvRONQKiAeiKYy0k/SfX6WBxHhMNg9VmaWT6bhCKstSJX/Mo1uu4d+BUxkAEQEA +AYkCPAQYAQgAJhYhBG2gs5oTwmWNIq59FNCMS2vZjqZZBQJjeRJGAhsMBQkHhh9d +AAoJENCMS2vZjqZZYs0P/RspAruxRd5dhWc0YA1KM8BkGg7UZDa1o3EYBkX/clm5 +QaeI2ozTphVPACyonCSxsH1AkC8Vi5TFkg3PKHMe7MAlPDxlW94nLnBBIk+ncDeL +kz+CI1oFDXF1KrohSyzgxTfw9wHMn5vsBMJ+Of1+YSSNhTN5XmMgA1qwz9po6SU2 +FgzTrfrMSv8E7vANusqcl+hfGpdUg6oOA9LRJziIzd+Zrddq0urq49qAaDF3VEq6 +kh/nAMtvRiT/idLJE6z0O1INRpj6Bq8J6JsadM9CSsVHYn4Vn/38rJTl4FPJpaxn +Hyfn/j+BsaWCr1mCRqVsUexcIhDQCtND9mVkYt0RBJaCJ+jVpGReoXcxL/yqpV2H +rQyKTQYYJmBTUimbvX30ct+7UH7wM7llTcRRqF+EnVU+5+y8AMtbDlIoByX1NnPM +qgJYRxlxbo79FGg03fKA0NRSxszpZb2BqGuVZtVfXMLCorMpua/5S8KriHujGLlx +KJtkpiC/npPXxvWvVyi/4h184Xrp7wCQ0ITapxnCHaxLHdBak2kSCbhBuAcNnalJ +FDeahocca6V+Sxosds8J9keQNIzz+HAoFbdGBBRsPv/3rZbG50s4CTGGxPpiBIpR +bTekUhOFAo/Xl12LSY0Wv5c7YEWWgbFH9qfKg5srtYEGqjJe0yWKWpzQKZVuSMA4 +=pbmq +-END PGP PUBLIC KEY BLOCK- +
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347817544 ## CI report: * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN * 6d3d125caa257a3b290ae286dd77499a39683750 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13665) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [DOCS] Update community sync schedule and fix broken links (#7442)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 226ff1ffd1f [DOCS] Update community sync schedule and fix broken links (#7442) 226ff1ffd1f is described below commit 226ff1ffd1f1eeac0353ce1a65a621416efb7f5a Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Mon Dec 12 22:31:12 2022 -0800 [DOCS] Update community sync schedule and fix broken links (#7442) --- website/docusaurus.config.js | 2 +- .../assets/images/upcoming-community-calls.png | Bin 191558 -> 80694 bytes 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js index 66490d2cd53..eb79bc7a8ee 100644 --- a/website/docusaurus.config.js +++ b/website/docusaurus.config.js @@ -338,7 +338,7 @@ module.exports = { items: [ { label: 'Get Involved', - to: '/contribute/get-involved' + to: '/community/get-involved' }, { label: 'Slack', diff --git a/website/static/assets/images/upcoming-community-calls.png b/website/static/assets/images/upcoming-community-calls.png index 72451a76d10..dbe42c7f8b0 100644 Binary files a/website/static/assets/images/upcoming-community-calls.png and b/website/static/assets/images/upcoming-community-calls.png differ
[GitHub] [hudi] xushiyan merged pull request #7442: [DOCS] Update community sync schedule and fix broken links
xushiyan merged PR #7442: URL: https://github.com/apache/hudi/pull/7442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
zhuanshenbsj1 commented on code in PR #7159: URL: https://github.com/apache/hudi/pull/7159#discussion_r1046699321 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala: ## @@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase { .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL) .option("hoodie.clustering.inline", "true") .option("hoodie.clustering.inline.max.commits", "1") + .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key") .mode(SaveMode.Append) Review Comment: Without this change,after inputDF5 the timeline will be: commit(instantC,cleaned)->clustering(instantD,cleaned)->commit(instantE)->clean(instantF) and file belong to instantA will archived, it will case line-193 -> line 194: allVisibleCDCData = cdcDataFrame((commitTime1.toLong - 1).toString) assertCDCOpCnt(allVisibleCDCData, totalInsertedCnt, totalUpdatedCnt, totalDeletedCnt) can't get instantA file error. With this change, after after inputDF5 the timeline will be: clustering(instantD,cleaned)->clustering(instantE,cleaned)->commit(instantF)->clean(instantG), it will fix this problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
zhuanshenbsj1 commented on code in PR #7159: URL: https://github.com/apache/hudi/pull/7159#discussion_r1046698488 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala: ## @@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase { .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL) .option("hoodie.clustering.inline", "true") .option("hoodie.clustering.inline.max.commits", "1") + .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key") .mode(SaveMode.Append) Review Comment: Without this change,after inputDF5 the timeline will be: commit(instantC,cleaned)->clustering(instantD,cleaned)->commit(instantE)->clean(instantF) and file belong to instantA will archived, it will case line-193 -> line 194: --- allVisibleCDCData = cdcDataFrame((commitTime1.toLong - 1).toString) assertCDCOpCnt(allVisibleCDCData, totalInsertedCnt, totalUpdatedCnt, totalDeletedCnt) --- can't get instantA file error. With this change, after after inputDF5 the timeline will be: clustering(instantD,cleaned)->clustering(instantE,cleaned)->commit(instantF)->clean(instantG), it will fix this problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bhasudha opened a new pull request, #7442: [DOCS] Update community sync schedule and fix broken links
bhasudha opened a new pull request, #7442: URL: https://github.com/apache/hudi/pull/7442 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347769648 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 3ae5c2f76e605c9674d216cb87279bb662f07e2f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13667) * e8abc2db2ed326381ca8de35611b40467d7c17ae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13671) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347769625 ## CI report: * 5c821f53d8eef00588491421dd751e3bb04866fb Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13666) * 0e1dfccb119b2595e420528986ca1d8cf0431543 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13675) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
hudi-bot commented on PR #7437: URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347769590 ## CI report: * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657) * 40c69ac7d433245f25296fd2883205c890596dd9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13674) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7423: [MINOR] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`
hudi-bot commented on PR #7423: URL: https://github.com/apache/hudi/pull/7423#issuecomment-1347769543 ## CI report: * c6623fc8d2a5c5eb71181bc5c458bfdbc976d15a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13605) * 4c28887b7079a7e00ca0543a7ac3daee9872422b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13673) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347766446 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 3ae5c2f76e605c9674d216cb87279bb662f07e2f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13667) * e8abc2db2ed326381ca8de35611b40467d7c17ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7441: [HUDI-5378] Remove minlog.Log
hudi-bot commented on PR #7441: URL: https://github.com/apache/hudi/pull/7441#issuecomment-1347766468 ## CI report: * 4aceef648f1ec5513df5283945f2ba3e42733ae4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13672) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347766413 ## CI report: * baa62578663a77cc37725533fad04e4b75a47e1a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658) * 5c821f53d8eef00588491421dd751e3bb04866fb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13666) * 0e1dfccb119b2595e420528986ca1d8cf0431543 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
hudi-bot commented on PR #7437: URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347766367 ## CI report: * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657) * 40c69ac7d433245f25296fd2883205c890596dd9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7423: [MINOR] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`
hudi-bot commented on PR #7423: URL: https://github.com/apache/hudi/pull/7423#issuecomment-1347766281 ## CI report: * c6623fc8d2a5c5eb71181bc5c458bfdbc976d15a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13605) * 4c28887b7079a7e00ca0543a7ac3daee9872422b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7383: [HUDI-4432] Checkpoint management for muti-writer scenario
hudi-bot commented on PR #7383: URL: https://github.com/apache/hudi/pull/7383#issuecomment-1347766161 ## CI report: * bcb09ed13fc86a7b68219384161ed0c6a8ee8556 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13461) * 1d096935d79eadebfaa64fcab5439681547a9223 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13670) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7383: [HUDI-4432] Checkpoint management for muti-writer scenario
hudi-bot commented on PR #7383: URL: https://github.com/apache/hudi/pull/7383#issuecomment-1347762210 ## CI report: * bcb09ed13fc86a7b68219384161ed0c6a8ee8556 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13461) * 1d096935d79eadebfaa64fcab5439681547a9223 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1347762103 ## CI report: * be82fc49989f5262d833fb2b803fd6ea69af8d0c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13661) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7441: [HUDI-5378] Remove minlog.Log
hudi-bot commented on PR #7441: URL: https://github.com/apache/hudi/pull/7441#issuecomment-1347762441 ## CI report: * 4aceef648f1ec5513df5283945f2ba3e42733ae4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling
hudi-bot commented on PR #7366: URL: https://github.com/apache/hudi/pull/7366#issuecomment-1347762156 ## CI report: * 3f6572349834d904a697fbd8c8546f56a7f2844a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13662) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner (#7411)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new ae426bc483f [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner (#7411) ae426bc483f is described below commit ae426bc483ffb310e99738219e6ecc9cb8336c0c Author: Sagar Sumit AuthorDate: Tue Dec 13 10:22:10 2022 +0530 [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner (#7411) --- .../MultipleSparkJobExecutionStrategy.java | 6 +- .../BulkInsertInternalPartitionerFactory.java | 26 +++ ...lkInsertInternalPartitionerWithRowsFactory.java | 19 ++--- .../bulkinsert/GlobalSortPartitioner.java | 14 .../bulkinsert/GlobalSortPartitionerWithRows.java | 14 ...PartitionPathRepartitionAndSortPartitioner.java | 12 +++- ...nPathRepartitionAndSortPartitionerWithRows.java | 12 +++- .../PartitionPathRepartitionPartitioner.java | 12 +++- ...artitionPathRepartitionPartitionerWithRows.java | 12 +++- .../PartitionSortPartitionerWithRows.java | 14 .../bulkinsert/RDDPartitionSortPartitioner.java| 14 .../TestBulkInsertInternalPartitioner.java | 83 ++ .../TestBulkInsertInternalPartitionerForRows.java | 69 -- .../org/apache/hudi/HoodieSparkSqlWriter.scala | 5 +- 14 files changed, 227 insertions(+), 85 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java index 074deaa6212..954daaad1e1 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java @@ -206,10 +206,8 @@ public abstract class MultipleSparkJobExecutionStrategy> get(BulkInsertSortMode sortMode, + public static BulkInsertPartitioner> get(HoodieWriteConfig config, boolean isTablePartitioned) { -return get(sortMode, isTablePartitioned, false); +return get(config, isTablePartitioned, false); } - public static BulkInsertPartitioner> get( - BulkInsertSortMode sortMode, boolean isTablePartitioned, boolean enforceNumOutputPartitions) { + public static BulkInsertPartitioner> get(HoodieWriteConfig config, +boolean isTablePartitioned, +boolean enforceNumOutputPartitions) { +BulkInsertSortMode sortMode = config.getBulkInsertSortMode(); switch (sortMode) { case NONE: return new NonSortPartitionerWithRows(enforceNumOutputPartitions); case GLOBAL_SORT: -return new GlobalSortPartitionerWithRows(); +return new GlobalSortPartitionerWithRows(config); case PARTITION_SORT: -return new PartitionSortPartitionerWithRows(); +return new PartitionSortPartitionerWithRows(config); case PARTITION_PATH_REPARTITION: -return new PartitionPathRepartitionPartitionerWithRows(isTablePartitioned); +return new PartitionPathRepartitionPartitionerWithRows(isTablePartitioned, config); case PARTITION_PATH_REPARTITION_AND_SORT: -return new PartitionPathRepartitionAndSortPartitionerWithRows(isTablePartitioned); +return new PartitionPathRepartitionAndSortPartitionerWithRows(isTablePartitioned, config); default: throw new UnsupportedOperationException("The bulk insert sort mode \"" + sortMode.name() + "\" is not supported."); } diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java index a184c009a1b..e10d23743da 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java @@ -20,10 +20,14 @@ package org.apache.hudi.execution.bulkinsert; import org.apache.hudi.common.model.HoodieRecord; import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieException; import org.apache.hudi.table.BulkInsertPartitioner; import org.apache.spark.api.java.JavaRDD; +import static org.apache.hudi.execution.bulkinsert.BulkInsertSortMode.GLOBAL_SORT; + /** * A built-in partitioner
[GitHub] [hudi] codope merged pull request #7411: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner
codope merged PR #7411: URL: https://github.com/apache/hudi/pull/7411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on pull request #7411: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner
codope commented on PR #7411: URL: https://github.com/apache/hudi/pull/7411#issuecomment-1347745239 https://user-images.githubusercontent.com/16440354/207229581-83e9c594-b690-44a9-8894-2df6791ae683.png;> CI succeeded for the latest commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7397: [HUDI-5205] Upgrade Flink to 1.16.0
danny0405 commented on code in PR #7397: URL: https://github.com/apache/hudi/pull/7397#discussion_r1046653338 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java: ## @@ -291,19 +291,19 @@ public void handleEventFromOperator(int i, OperatorEvent operatorEvent) { } @Override - public void subtaskFailed(int i, @Nullable Throwable throwable) { -// reset the event -this.eventBuffer[i] = null; -LOG.warn("Reset the event for task [" + i + "]", throwable); + public void subtaskReset(int i, long l) { Review Comment: Is the change Forward-compatible ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (5beadbfbe54 -> ef721d0af7d)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 5beadbfbe54 [HUDI-5373] Different fileids are assigned to the same bucket (#7433) add ef721d0af7d 【HUDI-4917】Optimized the way to get HoodieBaseFile of loadColumnRangesFromFiles of Bloom Index (#6793) No new revisions were added by this update. Summary of changes: .../java/org/apache/hudi/index/bloom/HoodieBloomIndex.java | 13 +++-- .../main/java/org/apache/hudi/io/HoodieRangeInfoHandle.java | 8 .../src/main/java/org/apache/hudi/io/HoodieReadHandle.java | 5 + 3 files changed, 20 insertions(+), 6 deletions(-)
[GitHub] [hudi] nsivabalan merged pull request #6793: [HUDI-4917] Optimized the way to get HoodieBaseFile of loadColumnRangesFromFiles of Bloom Index
nsivabalan merged PR #6793: URL: https://github.com/apache/hudi/pull/6793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
nsivabalan commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347735635 @danny0405 : no issues from querying standpoint. might have some perf hit, but no correctness or failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347720835 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 3ae5c2f76e605c9674d216cb87279bb662f07e2f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13667) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5378) Remove minlog.Log
[ https://issues.apache.org/jira/browse/HUDI-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5378: - Labels: pull-request-available (was: ) > Remove minlog.Log > - > > Key: HUDI-5378 > URL: https://issues.apache.org/jira/browse/HUDI-5378 > Project: Apache Hudi > Issue Type: Improvement >Reporter: dzcxzl >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] cxzl25 opened a new pull request, #7441: [HUDI-5378] Remove minlog.Log
cxzl25 opened a new pull request, #7441: URL: https://github.com/apache/hudi/pull/7441 ### Change Logs Remove minlog.Log ### Impact Use the correct log4j configuration ### Risk level (write none, low medium or high below) ### Documentation Update ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347713314 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 3ae5c2f76e605c9674d216cb87279bb662f07e2f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347713244 ## CI report: * baa62578663a77cc37725533fad04e4b75a47e1a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658) * 5c821f53d8eef00588491421dd751e3bb04866fb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13666) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347712770 ## CI report: * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13663) * 6d3d125caa257a3b290ae286dd77499a39683750 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13665) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5378) Remove minlog.Log
dzcxzl created HUDI-5378: Summary: Remove minlog.Log Key: HUDI-5378 URL: https://issues.apache.org/jira/browse/HUDI-5378 Project: Apache Hudi Issue Type: Improvement Reporter: dzcxzl -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347708193 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347708169 ## CI report: * baa62578663a77cc37725533fad04e4b75a47e1a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658) * 5c821f53d8eef00588491421dd751e3bb04866fb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347708063 ## CI report: * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660) * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13663) * 6d3d125caa257a3b290ae286dd77499a39683750 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347704997 ## CI report: * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660) * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13663) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5377) Add call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5377: - Labels: pull-request-available (was: ) > Add call stack information to lock file > --- > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > > LOCK-TIME : 2022-12-13 11:13:15.015 > LOCK-STACK-INFO : > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:148) > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:100) > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:102) > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService > (BaseHoodieWriteClient.java:1425) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant > (BaseHoodieWriteClient.java:1037) > org.apache.hudi.util.CompactionUtil.scheduleCompaction > (CompactionUtil.java:72) > > org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 > (StreamWriteOperatorCoordinator.java:250) > org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > java.lang.Thread.run (Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] LinMingQiang opened a new pull request, #7440: [HUDI-5377] Add call stack information to lock file
LinMingQiang opened a new pull request, #7440: URL: https://github.com/apache/hudi/pull/7440 ### Change Logs Add call stack information to lock file. ### Impact When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. ### Risk level (write none, low medium or high below) none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
zhuanshenbsj1 commented on code in PR #7159: URL: https://github.com/apache/hudi/pull/7159#discussion_r1046629024 ## hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java: ## @@ -141,6 +141,7 @@ private void setHiveColumnNameProps(List fields, JobConf jobConf, jobConf.set(hive_metastoreConstants.META_TABLE_PARTITION_COLUMNS, PARTITION_COLUMN); } jobConf.set(hive_metastoreConstants.META_TABLE_COLUMNS, hiveOrderedColumnNames); +jobConf.set("columns.types", "string,string,string,string,string,string,string,string,bigint,string,string"); } Review Comment: Without this change,after inputDF5 the timeline will be commit(instantA)->clustering(instantB)->commit(instantC)->clean(instantD), and instantA will archive by instantB, line-193:allVisibleCDCData = cdcDataFrame((commitTime1.toLong - 1).toString) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-5377) Add call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5377: --- Assignee: HunterXHunter > Add call stack information to lock file > --- > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > > LOCK-TIME : 2022-12-13 11:13:15.015 > LOCK-STACK-INFO : > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:148) > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:100) > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:102) > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService > (BaseHoodieWriteClient.java:1425) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant > (BaseHoodieWriteClient.java:1037) > org.apache.hudi.util.CompactionUtil.scheduleCompaction > (CompactionUtil.java:72) > > org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 > (StreamWriteOperatorCoordinator.java:250) > org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > java.lang.Thread.run (Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] codope commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
codope commented on code in PR #7437: URL: https://github.com/apache/hudi/pull/7437#discussion_r1046622218 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -752,6 +752,7 @@ public void dropMetadataPartitions(List metadataPartition LOG.warn("Deleting pending indexing instant from the timeline for partition: " + partitionPath); deletePendingIndexingInstant(dataMetaClient, partitionPath); } +closeInternal(); } Review Comment: `HoodieBackedTableMetadataWriter` extends `HoodieTableMetadataWriter` which implements `AutoClosable`. But yeah we could create the writer in try-with-resource. +1 for fixing `HoodieFlinkWriteClient` too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5377) Add call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5377: Description: When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. like : LOCK-TIME : 2022-12-13 11:13:15.015 LOCK-STACK-INFO : org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock (FileSystemBasedLockProvider.java:148) org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock (FileSystemBasedLockProvider.java:100) org.apache.hudi.client.transaction.lock.LockManager.lock (LockManager.java:102) org.apache.hudi.client.transaction.TransactionManager.beginTransaction (TransactionManager.java:58) org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService (BaseHoodieWriteClient.java:1425) org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant (BaseHoodieWriteClient.java:1037) org.apache.hudi.util.CompactionUtil.scheduleCompaction (CompactionUtil.java:72) org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 (StreamWriteOperatorCoordinator.java:250) org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 (NonThrownExecutor.java:130) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) java.lang.Thread.run (Thread.java:750) was: When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. > Add call stack information to lock file > --- > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Priority: Major > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > > LOCK-TIME : 2022-12-13 11:13:15.015 > LOCK-STACK-INFO : > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:148) > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:100) > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:102) > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService > (BaseHoodieWriteClient.java:1425) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant > (BaseHoodieWriteClient.java:1037) > org.apache.hudi.util.CompactionUtil.scheduleCompaction > (CompactionUtil.java:72) > > org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 > (StreamWriteOperatorCoordinator.java:250) > org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > java.lang.Thread.run (Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
danny0405 commented on code in PR #7437: URL: https://github.com/apache/hudi/pull/7437#discussion_r1046617576 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1125,6 +1130,14 @@ private HoodieData getFilesPartitionRecords(String createInstantTi return filesPartitionRecords.union(fileListRecords); } + protected void closeInternal() { +try { + close(); +} catch (Exception e) { Review Comment: If we do not want `#close` to throw checked exception everywhere, just remote the throws from the interface. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
danny0405 commented on code in PR #7437: URL: https://github.com/apache/hudi/pull/7437#discussion_r1046617209 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -752,6 +752,7 @@ public void dropMetadataPartitions(List metadataPartition LOG.warn("Deleting pending indexing instant from the timeline for partition: " + partitionPath); deletePendingIndexingInstant(dataMetaClient, partitionPath); } +closeInternal(); } Review Comment: Can we let the `HoodieBackedTableMetadataWriter` implement `AutoClosable` so that it can be used in the try finally block. Can we also fix `HoodieFlinkWriteClient#writeTableMetadata`, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5377) Add call stack information to lock file
HunterXHunter created HUDI-5377: --- Summary: Add call stack information to lock file Key: HUDI-5377 URL: https://issues.apache.org/jira/browse/HUDI-5377 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
danny0405 commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347672319 Thanks for the fix, what is the affect without this fix, the user can not query the latest result set if the file index is cached ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #6260: [SUPPORT]Caused by: java.util.NoSuchElementException: No value present in Option
danny0405 commented on issue #6260: URL: https://github.com/apache/hudi/issues/6260#issuecomment-1347668956 Take a look at this PR: https://github.com/apache/hudi/pull/6766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
danny0405 commented on code in PR #7159: URL: https://github.com/apache/hudi/pull/7159#discussion_r1046605266 ## hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java: ## @@ -141,6 +141,7 @@ private void setHiveColumnNameProps(List fields, JobConf jobConf, jobConf.set(hive_metastoreConstants.META_TABLE_PARTITION_COLUMNS, PARTITION_COLUMN); } jobConf.set(hive_metastoreConstants.META_TABLE_COLUMNS, hiveOrderedColumnNames); +jobConf.set("columns.types", "string,string,string,string,string,string,string,string,bigint,string,string"); } Review Comment: What is this changed for ? ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala: ## @@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase { .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL) .option("hoodie.clustering.inline", "true") .option("hoodie.clustering.inline.max.commits", "1") + .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key") .mode(SaveMode.Append) Review Comment: What is this changed for ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347667491 ## CI report: * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660) * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling
hudi-bot commented on PR #7366: URL: https://github.com/apache/hudi/pull/7366#issuecomment-1347667416 ## CI report: * 419d479d3469507566ad7d856f41ffb2182d7765 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13632) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13647) * 3f6572349834d904a697fbd8c8546f56a7f2844a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13662) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1347667377 ## CI report: * 34c111a0fe150fe513fea39697976da06a912f5c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13460) * be82fc49989f5262d833fb2b803fd6ea69af8d0c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13661) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-5373) Different fileids are assigned to the same bucket
[ https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646400#comment-17646400 ] Danny Chen commented on HUDI-5373: -- Fixed via master branch: 5beadbfbe544e513ae2391e534a0ad8443566e9a > Different fileids are assigned to the same bucket > -- > > Key: HUDI-5373 > URL: https://issues.apache.org/jira/browse/HUDI-5373 > Project: Apache Hudi > Issue Type: Bug >Reporter: loukey_j >Assignee: loukey_j >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > > partition =30 bucketNum=11 > bucketId = 3011 > partition =301 bucketNum=1 > bucketId = 3011 > > Different fileids are assigned to the same bucket > final String bucketId = partition + bucketNum; > if (incBucketIndex.contains(bucketId)) { > location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum)); > } else if (bucketToFileId.containsKey(bucketNum)) { > location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum)); > } else { > String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum); > location = new HoodieRecordLocation("I", newFileId); > bucketToFileId.put(bucketNum, newFileId); > incBucketIndex.add(bucketId); > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5373) Different fileids are assigned to the same bucket
[ https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-5373. -- > Different fileids are assigned to the same bucket > -- > > Key: HUDI-5373 > URL: https://issues.apache.org/jira/browse/HUDI-5373 > Project: Apache Hudi > Issue Type: Bug >Reporter: loukey_j >Assignee: loukey_j >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > > partition =30 bucketNum=11 > bucketId = 3011 > partition =301 bucketNum=1 > bucketId = 3011 > > Different fileids are assigned to the same bucket > final String bucketId = partition + bucketNum; > if (incBucketIndex.contains(bucketId)) { > location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum)); > } else if (bucketToFileId.containsKey(bucketNum)) { > location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum)); > } else { > String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum); > location = new HoodieRecordLocation("I", newFileId); > bucketToFileId.put(bucketNum, newFileId); > incBucketIndex.add(bucketId); > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5373) Different fileids are assigned to the same bucket
[ https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-5373: - Fix Version/s: 0.12.2 0.13.0 > Different fileids are assigned to the same bucket > -- > > Key: HUDI-5373 > URL: https://issues.apache.org/jira/browse/HUDI-5373 > Project: Apache Hudi > Issue Type: Bug >Reporter: loukey_j >Assignee: loukey_j >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > > partition =30 bucketNum=11 > bucketId = 3011 > partition =301 bucketNum=1 > bucketId = 3011 > > Different fileids are assigned to the same bucket > final String bucketId = partition + bucketNum; > if (incBucketIndex.contains(bucketId)) { > location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum)); > } else if (bucketToFileId.containsKey(bucketNum)) { > location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum)); > } else { > String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum); > location = new HoodieRecordLocation("I", newFileId); > bucketToFileId.put(bucketNum, newFileId); > incBucketIndex.add(bucketId); > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated (13a8e5c7297 -> 5beadbfbe54)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 13a8e5c7297 [HUDI-5348] Cache file slices in HoodieBackedTableMetadata (#7436) add 5beadbfbe54 [HUDI-5373] Different fileids are assigned to the same bucket (#7433) No new revisions were added by this update. Summary of changes: .../java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [hudi] danny0405 merged pull request #7433: [HUDI-5373] Different fileids are assigned to the same bucket
danny0405 merged PR #7433: URL: https://github.com/apache/hudi/pull/7433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347663548 ## CI report: * 17a887b98e0dd10d71a596ea87382911f3fdcef7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13490) * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660) * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling
hudi-bot commented on PR #7366: URL: https://github.com/apache/hudi/pull/7366#issuecomment-1347663469 ## CI report: * 419d479d3469507566ad7d856f41ffb2182d7765 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13632) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13647) * 3f6572349834d904a697fbd8c8546f56a7f2844a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1347663412 ## CI report: * 34c111a0fe150fe513fea39697976da06a912f5c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13460) * be82fc49989f5262d833fb2b803fd6ea69af8d0c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347660023 ## CI report: * baa62578663a77cc37725533fad04e4b75a47e1a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] aizain commented on issue #7375: [SUPPORT] Hudi 0.12.1 support for Spark Structured Streaming. read clustering metadata replace avro file error. Unrecognized token 'Obj^A^B^Vavro'
aizain commented on issue #7375: URL: https://github.com/apache/hudi/issues/7375#issuecomment-1347658226 thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7175: [HUDI-5191] Fix compatibility with avro 1.10
danny0405 commented on code in PR #7175: URL: https://github.com/apache/hudi/pull/7175#discussion_r1046588788 ## .github/workflows/bot.yml: ## @@ -73,6 +73,14 @@ jobs: run: | HUDI_VERSION=$(mvn help:evaluate -Dexpression=project.version -q -DforceStdout) ./packaging/bundle-validation/ci_run.sh $HUDI_VERSION + - name: Common Test Review Comment: From the cmd, it seems to test the `hudi-common` module specifically. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #6588: [SUPPORT]Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.avro.util.Utf8
danny0405 commented on issue #6588: URL: https://github.com/apache/hudi/issues/6588#issuecomment-1347638244 @xushiyan Did you know the background that the spark-bundle does not include avro as a dependency ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj commented on pull request #7433: [HUDI-5373] Different fileids are assigned to the same bucket
loukey-lj commented on PR #7433: URL: https://github.com/apache/hudi/pull/7433#issuecomment-1347636220 @danny0405 Please review the code for me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #6588: [SUPPORT]Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.avro.util.Utf8
danny0405 commented on issue #6588: URL: https://github.com/apache/hudi/issues/6588#issuecomment-1347633205 It seems you did not shade the avro correctly for your spark bundle jar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
danny0405 commented on code in PR #7394: URL: https://github.com/apache/hudi/pull/7394#discussion_r1046580467 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala: ## @@ -245,6 +245,10 @@ class HoodieCatalogTable(val spark: SparkSession, var table: CatalogTable) exten case (_, false) => ValidationUtils.checkArgument(table.schema.nonEmpty, s"Missing schema for Create Table: $catalogTableName") +if (sqlOptions.contains("hoodie.datasource.write.keygenerator.class") && +!sqlOptions.contains("hoodie.table.keygenerator.class")) { Review Comment: In `HoodieSparkSqlWriter#mergeParamsAndGetHoodieConfig` we can see similar logic, we can unify the code base, also we should not use hard-code option keys. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] daihw commented on issue #6588: [SUPPORT]Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.avro.util.Utf8
daihw commented on issue #6588: URL: https://github.com/apache/hudi/issues/6588#issuecomment-1347627637 > I solved this problem by adding the following configuration in Packaging/Hudi-spark-bundle/pom.xml > > ``` > ... > org.apache.avro:avro > ... > ... > > org.apache.avro. > org.apache.hudi.org.apache.avro. > > ... > ... > > org.apache.avro > avro > 1.8.2 > compile > > ... > ``` hi,i got the same promblem as you ,when I repaired it according to your method,I encountered a new problem and could not use Spark to insert data,the error message is java.lang.ClassCastException: org.apache.hudi.org.apache.avro.Schema$RecordSchema cannot be cast to org.apache.avro.Schema at org.apache.spark.SparkConf$$anonfun$registerAvroSchemas$1.apply(SparkConf.scala:221) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at org.apache.spark.SparkConf.registerAvroSchemas(SparkConf.scala:221) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:280) at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:101) at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:60) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369) at org.apache.spark.sql.Dataset.(Dataset.scala:194) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 165739 [HiveServer2-Background-Pool: Thread-133] ERROR org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation - Error running hive query: org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException: org.apache.hudi.org.apache.avro.Schema$RecordSchema cannot be cast to org.apache.avro.Schema at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:269) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347605174 ## CI report: * 17a887b98e0dd10d71a596ea87382911f3fdcef7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13490) * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7411: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner
hudi-bot commented on PR #7411: URL: https://github.com/apache/hudi/pull/7411#issuecomment-1347600350 ## CI report: * a2739c7a7cc5f6ebd38a4b1c4be46a7a652f1d38 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13656) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
hudi-bot commented on PR #7437: URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347600547 ## CI report: * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table
hudi-bot commented on PR #7394: URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347600245 ## CI report: * 17a887b98e0dd10d71a596ea87382911f3fdcef7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13490) * bb40c512b05286e266eb5b05e2f31b9ea926 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata (#7436)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 13a8e5c7297 [HUDI-5348] Cache file slices in HoodieBackedTableMetadata (#7436) 13a8e5c7297 is described below commit 13a8e5c729750ba5907d75df3d22473feaaa2a03 Author: Y Ethan Guo AuthorDate: Mon Dec 12 17:00:10 2022 -0800 [HUDI-5348] Cache file slices in HoodieBackedTableMetadata (#7436) --- .../org/apache/hudi/metadata/HoodieBackedTableMetadata.java | 13 +++-- .../org/apache/hudi/metadata/HoodieTableMetadataUtil.java | 10 ++ .../java/org/apache/hudi/utilities/TestHoodieIndexer.java | 7 +-- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java b/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java index 7743a65bf05..e2fbc4e6716 100644 --- a/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java +++ b/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java @@ -40,6 +40,7 @@ import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; import org.apache.hudi.common.table.timeline.HoodieInstant; import org.apache.hudi.common.table.timeline.HoodieTimeline; import org.apache.hudi.common.table.timeline.TimelineMetadataUtils; +import org.apache.hudi.common.table.view.HoodieTableFileSystemView; import org.apache.hudi.common.util.ClosableIterator; import org.apache.hudi.common.util.HoodieTimer; import org.apache.hudi.common.util.Option; @@ -78,6 +79,7 @@ import static org.apache.hudi.common.util.ValidationUtils.checkArgument; import static org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_BLOOM_FILTERS; import static org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_COLUMN_STATS; import static org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_FILES; +import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getFileSystemView; /** * Table metadata provided by an internal DFS backed Hudi metadata table. @@ -92,6 +94,7 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata { // Metadata table's timeline and metaclient private HoodieTableMetaClient metadataMetaClient; private HoodieTableConfig metadataTableConfig; + private HoodieTableFileSystemView metadataFileSystemView; // should we reuse the open file handles, across calls private final boolean reuse; @@ -120,6 +123,7 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata { } else if (this.metadataMetaClient == null) { try { this.metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf.get()).setBasePath(metadataBasePath).build(); +this.metadataFileSystemView = getFileSystemView(metadataMetaClient); this.metadataTableConfig = metadataMetaClient.getTableConfig(); this.isBloomFilterIndexEnabled = metadataConfig.isBloomFilterIndexEnabled(); this.isColumnStatsIndexEnabled = metadataConfig.isColumnStatsIndexEnabled(); @@ -127,11 +131,13 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata { LOG.warn("Metadata table was not found at path " + metadataBasePath); this.isMetadataTableEnabled = false; this.metadataMetaClient = null; +this.metadataFileSystemView = null; this.metadataTableConfig = null; } catch (Exception e) { LOG.error("Failed to initialize metadata table at path " + metadataBasePath, e); this.isMetadataTableEnabled = false; this.metadataMetaClient = null; +this.metadataFileSystemView = null; this.metadataTableConfig = null; } } @@ -162,7 +168,8 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata { // to scan all file-groups for all key-prefixes as each of these might contain some // records matching the key-prefix List partitionFileSlices = - HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, partitionName); +HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices( +metadataMetaClient, metadataFileSystemView, partitionName); return (shouldLoadInMemory ? HoodieListData.lazy(partitionFileSlices) : engineContext.parallelize(partitionFileSlices)) .flatMap((SerializableFunction>>) fileSlice -> { @@ -379,7 +386,8 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata { private Map, List> getPartitionFileSliceToKeysMapping(final String partitionName, final List keys) { // Metadata is in sync till the latest completed instant on the dataset List latestFileSlices = -
[GitHub] [hudi] nsivabalan merged pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata
nsivabalan merged PR #7436: URL: https://github.com/apache/hudi/pull/7436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata
nsivabalan commented on code in PR #7436: URL: https://github.com/apache/hudi/pull/7436#discussion_r1046561322 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ## @@ -379,7 +386,8 @@ private HoodieRecord composeRecord(GenericRecord avroReco private Map, List> getPartitionFileSliceToKeysMapping(final String partitionName, final List keys) { // Metadata is in sync till the latest completed instant on the dataset List latestFileSlices = - HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, partitionName); +HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices( Review Comment: looks like the FileSystemView (MDFSV) caches the entities and so we are good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on issue #7351: [SUPPORT] The keygenerator.class value set when using SparkSQL to create a table does not finally take effect in hoodie.properties
jonvex commented on issue #7351: URL: https://github.com/apache/hudi/issues/7351#issuecomment-1347568412 PR's are ready for review and then we can close this out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering
nsivabalan commented on code in PR #7035: URL: https://github.com/apache/hudi/pull/7035#discussion_r1046543459 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, HoodieCommitMetadata me metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), "true"); inlineScheduleClustering(extraMetadata); } + + // if clustering is disabled, but we might need to rollback any inflight clustering when clustering was enabled previously. + if (!config.inlineClusteringEnabled() && !config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering() Review Comment: this is already the case. The issue we are trying to solve here is, if the replace commit requested is left in the data timeline, then metadata table compaction is stopped. thats why. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5376) Update quickstart guide for hudi hoodie.datasource.write.keygenerator.class spark-sql change
[ https://issues.apache.org/jira/browse/HUDI-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5376: - Labels: pull-request-available (was: ) > Update quickstart guide for hudi hoodie.datasource.write.keygenerator.class > spark-sql change > > > Key: HUDI-5376 > URL: https://issues.apache.org/jira/browse/HUDI-5376 > Project: Apache Hudi > Issue Type: Improvement > Components: docs >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] jonvex opened a new pull request, #7439: [HUDI-5376] Remove incorrect spark-sql keygen info from quickstart guide
jonvex opened a new pull request, #7439: URL: https://github.com/apache/hudi/pull/7439 ### Change Logs since 0.11.1 keygen logic in spark-sql is the same as everywhere else but the quickstart guide was never updated. ### Impact Documentation is correct now. ### Risk level (write none, low medium or high below) none ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata
nsivabalan commented on code in PR #7436: URL: https://github.com/apache/hudi/pull/7436#discussion_r1046539085 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ## @@ -379,7 +386,8 @@ private HoodieRecord composeRecord(GenericRecord avroReco private Map, List> getPartitionFileSliceToKeysMapping(final String partitionName, final List keys) { // Metadata is in sync till the latest completed instant on the dataset List latestFileSlices = - HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, partitionName); +HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices( Review Comment: I am thinking we can cache the file slices also similar to how we cache the file readers. I don't see a reason for file slices to change unless there is a change in timeline on which case entire FileSystemView will be refreshed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5376) Update quickstart guide for hudi hoodie.datasource.write.keygenerator.class spark-sql change
Jonathan Vexler created HUDI-5376: - Summary: Update quickstart guide for hudi hoodie.datasource.write.keygenerator.class spark-sql change Key: HUDI-5376 URL: https://issues.apache.org/jira/browse/HUDI-5376 Project: Apache Hudi Issue Type: Improvement Components: docs Reporter: Jonathan Vexler Assignee: Jonathan Vexler -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5262) When creating table in spark-sql setting wrong keygenerator config does not warn
[ https://issues.apache.org/jira/browse/HUDI-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-5262: -- Status: Patch Available (was: In Progress) > When creating table in spark-sql setting wrong keygenerator config does not > warn > > > Key: HUDI-5262 > URL: https://issues.apache.org/jira/browse/HUDI-5262 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Minor > Labels: pull-request-available > > Setting `hoodie.datasource.write.keygenerator.class` when creating a table > does nothing. `hoodie.table.keygenerator.class` needs to be set. We should > warn when this is set on create table. Maybe we should warn about any configs > that do nothing when set on table creation? The error will present on the > first write if the keygenerator is not the default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5262) When creating table in spark-sql setting wrong keygenerator config does not warn
[ https://issues.apache.org/jira/browse/HUDI-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-5262: -- Status: In Progress (was: Open) > When creating table in spark-sql setting wrong keygenerator config does not > warn > > > Key: HUDI-5262 > URL: https://issues.apache.org/jira/browse/HUDI-5262 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Minor > Labels: pull-request-available > > Setting `hoodie.datasource.write.keygenerator.class` when creating a table > does nothing. `hoodie.table.keygenerator.class` needs to be set. We should > warn when this is set on create table. Maybe we should warn about any configs > that do nothing when set on table creation? The error will present on the > first write if the keygenerator is not the default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5262) When creating table in spark-sql setting wrong keygenerator config does not warn
[ https://issues.apache.org/jira/browse/HUDI-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler reassigned HUDI-5262: - Assignee: Jonathan Vexler > When creating table in spark-sql setting wrong keygenerator config does not > warn > > > Key: HUDI-5262 > URL: https://issues.apache.org/jira/browse/HUDI-5262 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Minor > Labels: pull-request-available > > Setting `hoodie.datasource.write.keygenerator.class` when creating a table > does nothing. `hoodie.table.keygenerator.class` needs to be set. We should > warn when this is set on create table. Maybe we should warn about any configs > that do nothing when set on table creation? The error will present on the > first write if the keygenerator is not the default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347535112 ## CI report: * baa62578663a77cc37725533fad04e4b75a47e1a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata
hudi-bot commented on PR #7436: URL: https://github.com/apache/hudi/pull/7436#issuecomment-1347513477 ## CI report: * 96b2aff47666ae63124f1a7601167388b501fe1b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13655) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7175: [HUDI-5191] Fix compatibility with avro 1.10
alexeykudinkin commented on code in PR #7175: URL: https://github.com/apache/hudi/pull/7175#discussion_r1046493214 ## .github/workflows/bot.yml: ## @@ -73,6 +73,14 @@ jobs: run: | HUDI_VERSION=$(mvn help:evaluate -Dexpression=project.version -q -DforceStdout) ./packaging/bundle-validation/ci_run.sh $HUDI_VERSION + - name: Common Test Review Comment: @Zouxxyy what are we specifically looking for to be tested in here? We need to be careful in expanding the scope here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7438: URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347432717 ## CI report: * baa62578663a77cc37725533fad04e4b75a47e1a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
hudi-bot commented on PR #7437: URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347432690 ## CI report: * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient
hudi-bot commented on PR #7437: URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347427053 ## CI report: * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org