[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-12 Thread GitBox


zhuanshenbsj1 commented on PR #7159:
URL: https://github.com/apache/hudi/pull/7159#issuecomment-1347877944

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] waywtdcc commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

2022-12-12 Thread GitBox


waywtdcc commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1347854543

   Hope this pr can be merged into 0.12.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


codope commented on code in PR #7437:
URL: https://github.com/apache/hudi/pull/7437#discussion_r1046736411


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1125,6 +1130,14 @@ private HoodieData 
getFilesPartitionRecords(String createInstantTi
 return filesPartitionRecords.union(fileListRecords);
   }
 
+  protected void closeInternal() {
+try {
+  close();
+} catch (Exception e) {

Review Comment:
   AutoCloseable would throw an exception if the resource is not closed. 
However, I think it is better to catch and wrap in HoodieException. Easier to 
search and debug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-12 Thread GitBox


zhuanshenbsj1 commented on code in PR #7159:
URL: https://github.com/apache/hudi/pull/7159#discussion_r1046699321


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##
@@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase {
   .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL)
   .option("hoodie.clustering.inline", "true")
   .option("hoodie.clustering.inline.max.commits", "1")
+  .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key")
   .mode(SaveMode.Append)

Review Comment:
   Without this change,after inputDF5 the timeline will be:
  
commit(instantC,cleaned)->clustering(instantD,cleaned)->commit(instantE)->clean(instantF)
   and file belong to instantA will archived,  it will case line-193 -> line 
194:
  allVisibleCDCData = cdcDataFrame((commitTime1.toLong - 1).toString)
  assertCDCOpCnt(allVisibleCDCData, totalInsertedCnt, totalUpdatedCnt, 
totalDeletedCnt)
   can't get instantC file error. 
   
   With this change, after after inputDF5 the timeline will be:
  
clustering(instantD,cleaned)->clustering(instantE,cleaned)->commit(instantF)->clean(instantG),
   it will fix this problem.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-12 Thread GitBox


zhuanshenbsj1 commented on code in PR #7159:
URL: https://github.com/apache/hudi/pull/7159#discussion_r1046731112


##
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java:
##
@@ -141,6 +141,7 @@ private void setHiveColumnNameProps(List 
fields, JobConf jobConf,
   jobConf.set(hive_metastoreConstants.META_TABLE_PARTITION_COLUMNS, 
PARTITION_COLUMN);
 }
 jobConf.set(hive_metastoreConstants.META_TABLE_COLUMNS, 
hiveOrderedColumnNames);
+jobConf.set("columns.types", 
"string,string,string,string,string,string,string,string,bigint,string,string");
   }

Review Comment:
   This function (TestHoodieRealtimeRecordReader#testIncrementalWithOnlylog) 
has a warn, and  I should move this change to here.
   
![image](https://user-images.githubusercontent.com/34104400/207249735-5b4443c3-6fae-4287-9eb3-d680679ece7a.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-12 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1347828520

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 29e31dd516112fa9a38463a9fedccb423db589cb Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13676)
 
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   * 1930cfe77fc3ddbd75564a75558b1211f823be89 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-12 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1347824795

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * f559ebdc000ac712c15ce2d7b1f6fda3302dfabf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13580)
 
   * 29e31dd516112fa9a38463a9fedccb423db589cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13676)
 
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-12 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1347821193

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * f559ebdc000ac712c15ce2d7b1f6fda3302dfabf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13580)
 
   * 29e31dd516112fa9a38463a9fedccb423db589cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



svn commit: r58689 - /dev/hudi/KEYS

2022-12-12 Thread sivabalan
Author: sivabalan
Date: Tue Dec 13 06:46:05 2022
New Revision: 58689

Log:
Adding satish's gpg keys

Modified:
dev/hudi/KEYS

Modified: dev/hudi/KEYS
==
--- dev/hudi/KEYS (original)
+++ dev/hudi/KEYS Tue Dec 13 06:46:05 2022
@@ -1171,3 +1171,62 @@ vVmwNnpErMRCa+GMaulS06s2mkJdLVX8EW5z3BLz
 RRaeFMCVTqi/Xw==
 =LZ7a
 -END PGP PUBLIC KEY BLOCK-
+pub   rsa4096 2022-11-19 [SC] [expires: 2026-11-19]
+  6DA0B39A13C2658D22AE7D14D08C4B6BD98EA659
+uid   [ultimate] hudi 
+sig 3D08C4B6BD98EA659 2022-11-19  hudi 
+sub   rsa4096 2022-11-19 [E] [expires: 2026-11-19]
+sig  D08C4B6BD98EA659 2022-11-19  hudi 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBGN5EkYBEACeR5uneVMbI5W5BtHEibjXEuskBHT7Z+MiU158YQEAmVvy+9NK
+TBntrIpNioqgdQlyZypcGQvuse11+TYh0AbCmK6y+8iOqi7EiDsSsdDOwTR7ROXJ
+Br1/ZLQbjb9poPFQxOzSLQWeQT6ETU6wZwrZ8ZC/ZJ1hGKX+SsDjqHWGZhPZInfB
+c/uqmB1advGZEdWRKSN4b4IIcOL69vO50NfGFTbu5n6MQjpFGnBoW9Ed3IO+UsvN
+3K7opD6/DYth5F88shvW5YEEuS/yHBHKHPs4XAqVCtjUozDmMmpbIEKuF3Id9eO6
+l+mhAUBLy9Tj6dgkIyQ6nkjzJwe0BsijyL/U1O5XFpnP+ETK1QsrPbQrQ4ykFv3R
+LV8qnXilyiM3iKtxDwvEtZPF7fMtGwhQEKDBwWu0zfsVZ1kQSvuvYC/T83BCQ2Rx
+faP+Xy3bc/979WtvxM30yxp2aQ+ZcCLB6ORF95irXTVu3Z6QSZRGCUfJmAfckXOv
+u7mLh4wH2Lua8ppruqE8Ic1cpl/VQOLuYOBM0yMuJSpAUuSVt9k8XKdZt8zrg0BV
+NdIo9uir7lf0csKtFR79vPJq0YBOo7sj56C6KDQZLQ6B9Dx54qGr03RZTAeidb2R
+qTaNSrvJzVtMNdPwgHtn507OGt2ZJOo4cIbl+Im5IzMesvC4uRzrqZTrJwARAQAB
+tBhodWRpIDxzYXRpc2hAYXBhY2hlLm9yZz6JAlQEEwEIAD4WIQRtoLOaE8JljSKu
+fRTQjEtr2Y6mWQUCY3kSRgIbAwUJB4YfXQULCQgHAgYVCgkICwIEFgIDAQIeAQIX
+gAAKCRDQjEtr2Y6mWdyuD/9AgaM08CYxbAYDtPAb6uC1edCZbvPzkP98us4m8jL/
+979grfvgyPkH2c87f8ec/JlGIOZSaDZOsNO9hhsCyfT3SrN/DQnIqlimEkh4k7Wb
+DGp3aktP5Qv80BtExkIca8J92Z7Cs5FRub9Vp51bqfS9wDgBZDvTbOpXc2snJgK5
+Bh9JlfFUyb4ev6pFizrT/sL5COhkqYKgyunl8fMOiX2hgl/aNyOjOCOQrHQNpW8d
+EsdTvj8+IVkadeCkD5+lqaNS1cY0U7ycGpciwjGZ4aNypb27lF2L0o5zmKT0u2yD
+gtOg28RpMV6uz4rQWNibz1vH/USGxIdd67dPFVYshUqicdhjmH4848qGkkXvhgqE
+wL0e5EH9HxbN6VocxZ9YHvnNfA8hy2K0sJTm+TMQJvD7dW112LLX3u8XLmDp4URQ
+bYwtEw82VfcbYIZbXUIWY5NPLNevDKVs3SXmkdXXz0OzsX0ODb+pp3rW/NztOeQ9
+4huvwXLmm9WiKiTgz7SQXvhZNpi6sUIlX82yEHr/+KbCXRTsz4xmSMBrqKufbTrF
+P/QyH6mONeXdsCb50jkMG8L2TzFEQdElchInobcfAZ0E2SuZ8Rmm6HdB2iS4SN6O
+jexAkN0VVi63f2Zsl2XjZhckW3x/X52CzlyAPc6m0NxsrEYjsmdzX0ACn0g/6KCj
+M7kCDQRjeRJGARAAs8p3JcS7icMBJIl1MHDjF0nGBMrway7HpqzROfnXJLogjUu9
+L0ASGojloytecQzcDGDbB8zuF2o+qyu0EtEzMc8m2PrRgBmOg+TMEZOovCSjiIEZ
+/w7ZOlfOU8Iva3fBbAg++oFb24LEOC+z1gjcUie0QlvReZWLWvZ1ATD3y0oWapqR
+IyyqOHaSF/l8cIRvv2kgigEvLch8iVuVHnc/ZOjyQ5iEbBZpe/ejg08dlPU2VCdO
+jGcL17JOqoltCKsQmK+xBnAHQL5VNcTd9fEo6FeiUIofyf/9d/LpqPVjo1zo6Eld
+9hk7q3I5Ms4Lh+cbtslnzi7t7U+cI+Zs0s7G0FcMBIXzqdgiHP3doQVm1Viex12f
+Wp+lN+QJDmyo+wEtkxbWSXKutiL0OAdSmO/1Cx901ygSTw5F/lmTxzqn3oc8F3vq
+lMQKn+WpKRcMwWeQU3nhtSi/zw7Zto/LyWmt8JQdQUoFYAXIXIbQaihP1k3COnnC
+wPj3XbzIBUWCM2642jcX/ieUrsKLRu7/WoVf0L6CPLqk8QKPBI85HUooB5oA8ZzX
+U4Io/VzRZ9plo1q8I9JOR9g4HLoGh4GovWlfjsifa/h5j2W1o7Z/Ix9Ze5fDvKW+
+0wvRONQKiAeiKYy0k/SfX6WBxHhMNg9VmaWT6bhCKstSJX/Mo1uu4d+BUxkAEQEA
+AYkCPAQYAQgAJhYhBG2gs5oTwmWNIq59FNCMS2vZjqZZBQJjeRJGAhsMBQkHhh9d
+AAoJENCMS2vZjqZZYs0P/RspAruxRd5dhWc0YA1KM8BkGg7UZDa1o3EYBkX/clm5
+QaeI2ozTphVPACyonCSxsH1AkC8Vi5TFkg3PKHMe7MAlPDxlW94nLnBBIk+ncDeL
+kz+CI1oFDXF1KrohSyzgxTfw9wHMn5vsBMJ+Of1+YSSNhTN5XmMgA1qwz9po6SU2
+FgzTrfrMSv8E7vANusqcl+hfGpdUg6oOA9LRJziIzd+Zrddq0urq49qAaDF3VEq6
+kh/nAMtvRiT/idLJE6z0O1INRpj6Bq8J6JsadM9CSsVHYn4Vn/38rJTl4FPJpaxn
+Hyfn/j+BsaWCr1mCRqVsUexcIhDQCtND9mVkYt0RBJaCJ+jVpGReoXcxL/yqpV2H
+rQyKTQYYJmBTUimbvX30ct+7UH7wM7llTcRRqF+EnVU+5+y8AMtbDlIoByX1NnPM
+qgJYRxlxbo79FGg03fKA0NRSxszpZb2BqGuVZtVfXMLCorMpua/5S8KriHujGLlx
+KJtkpiC/npPXxvWvVyi/4h184Xrp7wCQ0ITapxnCHaxLHdBak2kSCbhBuAcNnalJ
+FDeahocca6V+Sxosds8J9keQNIzz+HAoFbdGBBRsPv/3rZbG50s4CTGGxPpiBIpR
+bTekUhOFAo/Xl12LSY0Wv5c7YEWWgbFH9qfKg5srtYEGqjJe0yWKWpzQKZVuSMA4
+=pbmq
+-END PGP PUBLIC KEY BLOCK-
+




[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347817544

   
   ## CI report:
   
   * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN
   * 6d3d125caa257a3b290ae286dd77499a39683750 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13665)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS] Update community sync schedule and fix broken links (#7442)

2022-12-12 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 226ff1ffd1f [DOCS] Update community sync schedule and fix broken links 
(#7442)
226ff1ffd1f is described below

commit 226ff1ffd1f1eeac0353ce1a65a621416efb7f5a
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Mon Dec 12 22:31:12 2022 -0800

[DOCS] Update community sync schedule and fix broken links (#7442)
---
 website/docusaurus.config.js   |   2 +-
 .../assets/images/upcoming-community-calls.png | Bin 191558 -> 80694 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index 66490d2cd53..eb79bc7a8ee 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -338,7 +338,7 @@ module.exports = {
   items: [
 {
   label: 'Get Involved',
-  to: '/contribute/get-involved'
+  to: '/community/get-involved'
 },
 {
   label: 'Slack',
diff --git a/website/static/assets/images/upcoming-community-calls.png 
b/website/static/assets/images/upcoming-community-calls.png
index 72451a76d10..dbe42c7f8b0 100644
Binary files a/website/static/assets/images/upcoming-community-calls.png and 
b/website/static/assets/images/upcoming-community-calls.png differ



[GitHub] [hudi] xushiyan merged pull request #7442: [DOCS] Update community sync schedule and fix broken links

2022-12-12 Thread GitBox


xushiyan merged PR #7442:
URL: https://github.com/apache/hudi/pull/7442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-12 Thread GitBox


zhuanshenbsj1 commented on code in PR #7159:
URL: https://github.com/apache/hudi/pull/7159#discussion_r1046699321


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##
@@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase {
   .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL)
   .option("hoodie.clustering.inline", "true")
   .option("hoodie.clustering.inline.max.commits", "1")
+  .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key")
   .mode(SaveMode.Append)

Review Comment:
   Without this change,after inputDF5 the timeline will be:
  
commit(instantC,cleaned)->clustering(instantD,cleaned)->commit(instantE)->clean(instantF)
   and file belong to instantA will archived,  it will case line-193 -> line 
194:
  allVisibleCDCData = cdcDataFrame((commitTime1.toLong - 1).toString)
  assertCDCOpCnt(allVisibleCDCData, totalInsertedCnt, totalUpdatedCnt, 
totalDeletedCnt)
   can't get instantA file error. 
   
   With this change, after after inputDF5 the timeline will be:
  
clustering(instantD,cleaned)->clustering(instantE,cleaned)->commit(instantF)->clean(instantG),
   it will fix this problem.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-12 Thread GitBox


zhuanshenbsj1 commented on code in PR #7159:
URL: https://github.com/apache/hudi/pull/7159#discussion_r1046698488


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##
@@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase {
   .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL)
   .option("hoodie.clustering.inline", "true")
   .option("hoodie.clustering.inline.max.commits", "1")
+  .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key")
   .mode(SaveMode.Append)

Review Comment:
   Without this change,after inputDF5 the timeline will be:
  
commit(instantC,cleaned)->clustering(instantD,cleaned)->commit(instantE)->clean(instantF)
   and file belong to instantA will archived,  it will case line-193 -> line 
194:
   
---
  allVisibleCDCData = cdcDataFrame((commitTime1.toLong - 1).toString)
  assertCDCOpCnt(allVisibleCDCData, totalInsertedCnt, totalUpdatedCnt, 
totalDeletedCnt)
   
---
   can't get instantA file error. 
   
   With this change, after after inputDF5 the timeline will be:
  
clustering(instantD,cleaned)->clustering(instantE,cleaned)->commit(instantF)->clean(instantG),
   it will fix this problem.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bhasudha opened a new pull request, #7442: [DOCS] Update community sync schedule and fix broken links

2022-12-12 Thread GitBox


bhasudha opened a new pull request, #7442:
URL: https://github.com/apache/hudi/pull/7442

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file

2022-12-12 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347769648

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 3ae5c2f76e605c9674d216cb87279bb662f07e2f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13667)
 
   * e8abc2db2ed326381ca8de35611b40467d7c17ae Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13671)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


hudi-bot commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347769625

   
   ## CI report:
   
   * 5c821f53d8eef00588491421dd751e3bb04866fb Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13666)
 
   * 0e1dfccb119b2595e420528986ca1d8cf0431543 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13675)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


hudi-bot commented on PR #7437:
URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347769590

   
   ## CI report:
   
   * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657)
 
   * 40c69ac7d433245f25296fd2883205c890596dd9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13674)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7423: [MINOR] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2022-12-12 Thread GitBox


hudi-bot commented on PR #7423:
URL: https://github.com/apache/hudi/pull/7423#issuecomment-1347769543

   
   ## CI report:
   
   * c6623fc8d2a5c5eb71181bc5c458bfdbc976d15a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13605)
 
   * 4c28887b7079a7e00ca0543a7ac3daee9872422b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13673)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file

2022-12-12 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347766446

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 3ae5c2f76e605c9674d216cb87279bb662f07e2f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13667)
 
   * e8abc2db2ed326381ca8de35611b40467d7c17ae UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7441: [HUDI-5378] Remove minlog.Log

2022-12-12 Thread GitBox


hudi-bot commented on PR #7441:
URL: https://github.com/apache/hudi/pull/7441#issuecomment-1347766468

   
   ## CI report:
   
   * 4aceef648f1ec5513df5283945f2ba3e42733ae4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13672)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


hudi-bot commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347766413

   
   ## CI report:
   
   * baa62578663a77cc37725533fad04e4b75a47e1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658)
 
   * 5c821f53d8eef00588491421dd751e3bb04866fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13666)
 
   * 0e1dfccb119b2595e420528986ca1d8cf0431543 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


hudi-bot commented on PR #7437:
URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347766367

   
   ## CI report:
   
   * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657)
 
   * 40c69ac7d433245f25296fd2883205c890596dd9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7423: [MINOR] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2022-12-12 Thread GitBox


hudi-bot commented on PR #7423:
URL: https://github.com/apache/hudi/pull/7423#issuecomment-1347766281

   
   ## CI report:
   
   * c6623fc8d2a5c5eb71181bc5c458bfdbc976d15a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13605)
 
   * 4c28887b7079a7e00ca0543a7ac3daee9872422b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7383: [HUDI-4432] Checkpoint management for muti-writer scenario

2022-12-12 Thread GitBox


hudi-bot commented on PR #7383:
URL: https://github.com/apache/hudi/pull/7383#issuecomment-1347766161

   
   ## CI report:
   
   * bcb09ed13fc86a7b68219384161ed0c6a8ee8556 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13461)
 
   * 1d096935d79eadebfaa64fcab5439681547a9223 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13670)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7383: [HUDI-4432] Checkpoint management for muti-writer scenario

2022-12-12 Thread GitBox


hudi-bot commented on PR #7383:
URL: https://github.com/apache/hudi/pull/7383#issuecomment-1347762210

   
   ## CI report:
   
   * bcb09ed13fc86a7b68219384161ed0c6a8ee8556 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13461)
 
   * 1d096935d79eadebfaa64fcab5439681547a9223 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field

2022-12-12 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1347762103

   
   ## CI report:
   
   * be82fc49989f5262d833fb2b803fd6ea69af8d0c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13661)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7441: [HUDI-5378] Remove minlog.Log

2022-12-12 Thread GitBox


hudi-bot commented on PR #7441:
URL: https://github.com/apache/hudi/pull/7441#issuecomment-1347762441

   
   ## CI report:
   
   * 4aceef648f1ec5513df5283945f2ba3e42733ae4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling

2022-12-12 Thread GitBox


hudi-bot commented on PR #7366:
URL: https://github.com/apache/hudi/pull/7366#issuecomment-1347762156

   
   ## CI report:
   
   * 3f6572349834d904a697fbd8c8546f56a7f2844a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13662)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner (#7411)

2022-12-12 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ae426bc483f [HUDI-5351] Handle populateMetaFields when repartitioning 
in sort partitioner (#7411)
ae426bc483f is described below

commit ae426bc483ffb310e99738219e6ecc9cb8336c0c
Author: Sagar Sumit 
AuthorDate: Tue Dec 13 10:22:10 2022 +0530

[HUDI-5351] Handle populateMetaFields when repartitioning in sort 
partitioner (#7411)
---
 .../MultipleSparkJobExecutionStrategy.java |  6 +-
 .../BulkInsertInternalPartitionerFactory.java  | 26 +++
 ...lkInsertInternalPartitionerWithRowsFactory.java | 19 ++---
 .../bulkinsert/GlobalSortPartitioner.java  | 14 
 .../bulkinsert/GlobalSortPartitionerWithRows.java  | 14 
 ...PartitionPathRepartitionAndSortPartitioner.java | 12 +++-
 ...nPathRepartitionAndSortPartitionerWithRows.java | 12 +++-
 .../PartitionPathRepartitionPartitioner.java   | 12 +++-
 ...artitionPathRepartitionPartitionerWithRows.java | 12 +++-
 .../PartitionSortPartitionerWithRows.java  | 14 
 .../bulkinsert/RDDPartitionSortPartitioner.java| 14 
 .../TestBulkInsertInternalPartitioner.java | 83 ++
 .../TestBulkInsertInternalPartitionerForRows.java  | 69 --
 .../org/apache/hudi/HoodieSparkSqlWriter.scala |  5 +-
 14 files changed, 227 insertions(+), 85 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
index 074deaa6212..954daaad1e1 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
@@ -206,10 +206,8 @@ public abstract class MultipleSparkJobExecutionStrategy> get(BulkInsertSortMode 
sortMode,
+  public static BulkInsertPartitioner> get(HoodieWriteConfig 
config,
 boolean 
isTablePartitioned) {
-return get(sortMode, isTablePartitioned, false);
+return get(config, isTablePartitioned, false);
   }
 
-  public static BulkInsertPartitioner> get(
-  BulkInsertSortMode sortMode, boolean isTablePartitioned, boolean 
enforceNumOutputPartitions) {
+  public static BulkInsertPartitioner> get(HoodieWriteConfig 
config,
+boolean 
isTablePartitioned,
+boolean 
enforceNumOutputPartitions) {
+BulkInsertSortMode sortMode = config.getBulkInsertSortMode();
 switch (sortMode) {
   case NONE:
 return new NonSortPartitionerWithRows(enforceNumOutputPartitions);
   case GLOBAL_SORT:
-return new GlobalSortPartitionerWithRows();
+return new GlobalSortPartitionerWithRows(config);
   case PARTITION_SORT:
-return new PartitionSortPartitionerWithRows();
+return new PartitionSortPartitionerWithRows(config);
   case PARTITION_PATH_REPARTITION:
-return new 
PartitionPathRepartitionPartitionerWithRows(isTablePartitioned);
+return new 
PartitionPathRepartitionPartitionerWithRows(isTablePartitioned, config);
   case PARTITION_PATH_REPARTITION_AND_SORT:
-return new 
PartitionPathRepartitionAndSortPartitionerWithRows(isTablePartitioned);
+return new 
PartitionPathRepartitionAndSortPartitionerWithRows(isTablePartitioned, config);
   default:
 throw new UnsupportedOperationException("The bulk insert sort mode \"" 
+ sortMode.name() + "\" is not supported.");
 }
diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java
index a184c009a1b..e10d23743da 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitioner.java
@@ -20,10 +20,14 @@ package org.apache.hudi.execution.bulkinsert;
 
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.table.BulkInsertPartitioner;
 
 import org.apache.spark.api.java.JavaRDD;
 
+import static 
org.apache.hudi.execution.bulkinsert.BulkInsertSortMode.GLOBAL_SORT;
+
 /**
  * A built-in partitioner 

[GitHub] [hudi] codope merged pull request #7411: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner

2022-12-12 Thread GitBox


codope merged PR #7411:
URL: https://github.com/apache/hudi/pull/7411


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #7411: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner

2022-12-12 Thread GitBox


codope commented on PR #7411:
URL: https://github.com/apache/hudi/pull/7411#issuecomment-1347745239

   
   https://user-images.githubusercontent.com/16440354/207229581-83e9c594-b690-44a9-8894-2df6791ae683.png;>
   
   CI succeeded for the latest commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7397: [HUDI-5205] Upgrade Flink to 1.16.0

2022-12-12 Thread GitBox


danny0405 commented on code in PR #7397:
URL: https://github.com/apache/hudi/pull/7397#discussion_r1046653338


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java:
##
@@ -291,19 +291,19 @@ public void handleEventFromOperator(int i, OperatorEvent 
operatorEvent) {
   }
 
   @Override
-  public void subtaskFailed(int i, @Nullable Throwable throwable) {
-// reset the event
-this.eventBuffer[i] = null;
-LOG.warn("Reset the event for task [" + i + "]", throwable);
+  public void subtaskReset(int i, long l) {

Review Comment:
   Is the change Forward-compatible ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (5beadbfbe54 -> ef721d0af7d)

2022-12-12 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 5beadbfbe54 [HUDI-5373] Different fileids are assigned to the same 
bucket (#7433)
 add ef721d0af7d 【HUDI-4917】Optimized the way to get HoodieBaseFile of 
loadColumnRangesFromFiles of Bloom Index (#6793)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/index/bloom/HoodieBloomIndex.java  | 13 +++--
 .../main/java/org/apache/hudi/io/HoodieRangeInfoHandle.java |  8 
 .../src/main/java/org/apache/hudi/io/HoodieReadHandle.java  |  5 +
 3 files changed, 20 insertions(+), 6 deletions(-)



[GitHub] [hudi] nsivabalan merged pull request #6793: [HUDI-4917] Optimized the way to get HoodieBaseFile of loadColumnRangesFromFiles of Bloom Index

2022-12-12 Thread GitBox


nsivabalan merged PR #6793:
URL: https://github.com/apache/hudi/pull/6793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


nsivabalan commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347735635

   @danny0405 : no issues from querying standpoint. might have some perf hit, 
but no correctness or failures. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file

2022-12-12 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347720835

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 3ae5c2f76e605c9674d216cb87279bb662f07e2f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13667)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5378) Remove minlog.Log

2022-12-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5378:
-
Labels: pull-request-available  (was: )

> Remove minlog.Log
> -
>
> Key: HUDI-5378
> URL: https://issues.apache.org/jira/browse/HUDI-5378
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] cxzl25 opened a new pull request, #7441: [HUDI-5378] Remove minlog.Log

2022-12-12 Thread GitBox


cxzl25 opened a new pull request, #7441:
URL: https://github.com/apache/hudi/pull/7441

   ### Change Logs
   
   Remove minlog.Log
   
   ### Impact
   
   Use the correct log4j configuration
   
   ### Risk level (write none, low medium or high below)
   
   
   
   ### Documentation Update
   
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file

2022-12-12 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347713314

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 3ae5c2f76e605c9674d216cb87279bb662f07e2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


hudi-bot commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347713244

   
   ## CI report:
   
   * baa62578663a77cc37725533fad04e4b75a47e1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658)
 
   * 5c821f53d8eef00588491421dd751e3bb04866fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13666)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347712770

   
   ## CI report:
   
   * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN
   * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13663)
 
   * 6d3d125caa257a3b290ae286dd77499a39683750 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13665)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5378) Remove minlog.Log

2022-12-12 Thread dzcxzl (Jira)
dzcxzl created HUDI-5378:


 Summary: Remove minlog.Log
 Key: HUDI-5378
 URL: https://issues.apache.org/jira/browse/HUDI-5378
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Add call stack information to lock file

2022-12-12 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1347708193

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


hudi-bot commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347708169

   
   ## CI report:
   
   * baa62578663a77cc37725533fad04e4b75a47e1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658)
 
   * 5c821f53d8eef00588491421dd751e3bb04866fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347708063

   
   ## CI report:
   
   * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660)
 
   * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN
   * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13663)
 
   * 6d3d125caa257a3b290ae286dd77499a39683750 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347704997

   
   ## CI report:
   
   * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660)
 
   * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN
   * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13663)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5377) Add call stack information to lock file

2022-12-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5377:
-
Labels: pull-request-available  (was: )

> Add call stack information to lock file
> ---
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
>  
> LOCK-TIME : 2022-12-13 11:13:15.015
> LOCK-STACK-INFO :
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:148)
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:100)
>      org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:102)
>      org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
> (BaseHoodieWriteClient.java:1425)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
> (BaseHoodieWriteClient.java:1037)
>      org.apache.hudi.util.CompactionUtil.scheduleCompaction 
> (CompactionUtil.java:72)
>      
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
>  (StreamWriteOperatorCoordinator.java:250)
>      org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130)
>      java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
>      java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
>      java.lang.Thread.run (Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] LinMingQiang opened a new pull request, #7440: [HUDI-5377] Add call stack information to lock file

2022-12-12 Thread GitBox


LinMingQiang opened a new pull request, #7440:
URL: https://github.com/apache/hudi/pull/7440

   ### Change Logs
   
   Add call stack information to lock file.
   
   ### Impact
   When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
lock',
   We need to know which step caused the deadlock.
   ### Risk level (write none, low medium or high below)
   none
   ### Documentation Update
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-12 Thread GitBox


zhuanshenbsj1 commented on code in PR #7159:
URL: https://github.com/apache/hudi/pull/7159#discussion_r1046629024


##
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java:
##
@@ -141,6 +141,7 @@ private void setHiveColumnNameProps(List 
fields, JobConf jobConf,
   jobConf.set(hive_metastoreConstants.META_TABLE_PARTITION_COLUMNS, 
PARTITION_COLUMN);
 }
 jobConf.set(hive_metastoreConstants.META_TABLE_COLUMNS, 
hiveOrderedColumnNames);
+jobConf.set("columns.types", 
"string,string,string,string,string,string,string,string,bigint,string,string");
   }

Review Comment:
   Without this change,after inputDF5 the timeline will be 
commit(instantA)->clustering(instantB)->commit(instantC)->clean(instantD), and  
instantA will archive by instantB,  line-193:allVisibleCDCData = 
cdcDataFrame((commitTime1.toLong - 1).toString)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-5377) Add call stack information to lock file

2022-12-12 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5377:
---

Assignee: HunterXHunter

> Add call stack information to lock file
> ---
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
>  
> LOCK-TIME : 2022-12-13 11:13:15.015
> LOCK-STACK-INFO :
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:148)
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:100)
>      org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:102)
>      org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
> (BaseHoodieWriteClient.java:1425)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
> (BaseHoodieWriteClient.java:1037)
>      org.apache.hudi.util.CompactionUtil.scheduleCompaction 
> (CompactionUtil.java:72)
>      
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
>  (StreamWriteOperatorCoordinator.java:250)
>      org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130)
>      java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
>      java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
>      java.lang.Thread.run (Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] codope commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


codope commented on code in PR #7437:
URL: https://github.com/apache/hudi/pull/7437#discussion_r1046622218


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -752,6 +752,7 @@ public void 
dropMetadataPartitions(List metadataPartition
   LOG.warn("Deleting pending indexing instant from the timeline for 
partition: " + partitionPath);
   deletePendingIndexingInstant(dataMetaClient, partitionPath);
 }
+closeInternal();
   }
 

Review Comment:
   `HoodieBackedTableMetadataWriter` extends `HoodieTableMetadataWriter` which 
implements `AutoClosable`. But yeah we could create the writer in 
try-with-resource.
   
   +1 for fixing `HoodieFlinkWriteClient` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5377) Add call stack information to lock file

2022-12-12 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5377:

Description: 
When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.

like :

 

LOCK-TIME : 2022-12-13 11:13:15.015
LOCK-STACK-INFO :
     
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock 
(FileSystemBasedLockProvider.java:148)
     
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
(FileSystemBasedLockProvider.java:100)
     org.apache.hudi.client.transaction.lock.LockManager.lock 
(LockManager.java:102)
     org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
(TransactionManager.java:58)
     org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
(BaseHoodieWriteClient.java:1425)
     org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
(BaseHoodieWriteClient.java:1037)
     org.apache.hudi.util.CompactionUtil.scheduleCompaction 
(CompactionUtil.java:72)
     
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
 (StreamWriteOperatorCoordinator.java:250)
     org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
(NonThrownExecutor.java:130)
     java.util.concurrent.ThreadPoolExecutor.runWorker 
(ThreadPoolExecutor.java:1149)
     java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:624)
     java.lang.Thread.run (Thread.java:750)

  was:
When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.


> Add call stack information to lock file
> ---
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Priority: Major
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
>  
> LOCK-TIME : 2022-12-13 11:13:15.015
> LOCK-STACK-INFO :
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:148)
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:100)
>      org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:102)
>      org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
> (BaseHoodieWriteClient.java:1425)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
> (BaseHoodieWriteClient.java:1037)
>      org.apache.hudi.util.CompactionUtil.scheduleCompaction 
> (CompactionUtil.java:72)
>      
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
>  (StreamWriteOperatorCoordinator.java:250)
>      org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130)
>      java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
>      java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
>      java.lang.Thread.run (Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


danny0405 commented on code in PR #7437:
URL: https://github.com/apache/hudi/pull/7437#discussion_r1046617576


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1125,6 +1130,14 @@ private HoodieData 
getFilesPartitionRecords(String createInstantTi
 return filesPartitionRecords.union(fileListRecords);
   }
 
+  protected void closeInternal() {
+try {
+  close();
+} catch (Exception e) {

Review Comment:
   If we do not want `#close` to throw checked exception everywhere, just 
remote the throws from the interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


danny0405 commented on code in PR #7437:
URL: https://github.com/apache/hudi/pull/7437#discussion_r1046617209


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -752,6 +752,7 @@ public void 
dropMetadataPartitions(List metadataPartition
   LOG.warn("Deleting pending indexing instant from the timeline for 
partition: " + partitionPath);
   deletePendingIndexingInstant(dataMetaClient, partitionPath);
 }
+closeInternal();
   }
 

Review Comment:
   Can we let the `HoodieBackedTableMetadataWriter` implement `AutoClosable` so 
that it can be used in the try finally block. Can we also fix 
`HoodieFlinkWriteClient#writeTableMetadata`,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5377) Add call stack information to lock file

2022-12-12 Thread HunterXHunter (Jira)
HunterXHunter created HUDI-5377:
---

 Summary: Add call stack information to lock file
 Key: HUDI-5377
 URL: https://issues.apache.org/jira/browse/HUDI-5377
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter


When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


danny0405 commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347672319

   Thanks for the fix, what is the affect without this fix, the user can not 
query the latest result set if the file index is cached ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #6260: [SUPPORT]Caused by: java.util.NoSuchElementException: No value present in Option

2022-12-12 Thread GitBox


danny0405 commented on issue #6260:
URL: https://github.com/apache/hudi/issues/6260#issuecomment-1347668956

   Take a look at this PR: https://github.com/apache/hudi/pull/6766


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-12 Thread GitBox


danny0405 commented on code in PR #7159:
URL: https://github.com/apache/hudi/pull/7159#discussion_r1046605266


##
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java:
##
@@ -141,6 +141,7 @@ private void setHiveColumnNameProps(List 
fields, JobConf jobConf,
   jobConf.set(hive_metastoreConstants.META_TABLE_PARTITION_COLUMNS, 
PARTITION_COLUMN);
 }
 jobConf.set(hive_metastoreConstants.META_TABLE_COLUMNS, 
hiveOrderedColumnNames);
+jobConf.set("columns.types", 
"string,string,string,string,string,string,string,string,bigint,string,string");
   }

Review Comment:
   What is this changed for ?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##
@@ -118,6 +118,7 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase {
   .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL)
   .option("hoodie.clustering.inline", "true")
   .option("hoodie.clustering.inline.max.commits", "1")
+  .option("hoodie.clustering.plan.strategy.sort.columns", "_row_key")
   .mode(SaveMode.Append)

Review Comment:
   What is this changed for ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347667491

   
   ## CI report:
   
   * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660)
 
   * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN
   * bbe54597ed63dcf9eb94b84cdd4f80d45c49634f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling

2022-12-12 Thread GitBox


hudi-bot commented on PR #7366:
URL: https://github.com/apache/hudi/pull/7366#issuecomment-1347667416

   
   ## CI report:
   
   * 419d479d3469507566ad7d856f41ffb2182d7765 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13632)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13647)
 
   * 3f6572349834d904a697fbd8c8546f56a7f2844a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13662)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field

2022-12-12 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1347667377

   
   ## CI report:
   
   * 34c111a0fe150fe513fea39697976da06a912f5c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13460)
 
   * be82fc49989f5262d833fb2b803fd6ea69af8d0c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13661)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-5373) Different fileids are assigned to the same bucket

2022-12-12 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646400#comment-17646400
 ] 

Danny Chen commented on HUDI-5373:
--

Fixed via master branch: 5beadbfbe544e513ae2391e534a0ad8443566e9a

>  Different fileids are assigned to the same bucket
> --
>
> Key: HUDI-5373
> URL: https://issues.apache.org/jira/browse/HUDI-5373
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: loukey_j
>Assignee: loukey_j
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>
> partition =30 bucketNum=11 
> bucketId = 3011
> partition =301 bucketNum=1
> bucketId = 3011
>  
> Different fileids are assigned to the same bucket
> final String bucketId = partition  + bucketNum;
> if (incBucketIndex.contains(bucketId)) {
> location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum));
> } else if (bucketToFileId.containsKey(bucketNum)) {
> location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum));
> } else {
> String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum);
> location = new HoodieRecordLocation("I", newFileId);
> bucketToFileId.put(bucketNum, newFileId);
> incBucketIndex.add(bucketId);
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5373) Different fileids are assigned to the same bucket

2022-12-12 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-5373.
--

>  Different fileids are assigned to the same bucket
> --
>
> Key: HUDI-5373
> URL: https://issues.apache.org/jira/browse/HUDI-5373
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: loukey_j
>Assignee: loukey_j
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>
> partition =30 bucketNum=11 
> bucketId = 3011
> partition =301 bucketNum=1
> bucketId = 3011
>  
> Different fileids are assigned to the same bucket
> final String bucketId = partition  + bucketNum;
> if (incBucketIndex.contains(bucketId)) {
> location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum));
> } else if (bucketToFileId.containsKey(bucketNum)) {
> location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum));
> } else {
> String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum);
> location = new HoodieRecordLocation("I", newFileId);
> bucketToFileId.put(bucketNum, newFileId);
> incBucketIndex.add(bucketId);
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5373) Different fileids are assigned to the same bucket

2022-12-12 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5373:
-
Fix Version/s: 0.12.2
   0.13.0

>  Different fileids are assigned to the same bucket
> --
>
> Key: HUDI-5373
> URL: https://issues.apache.org/jira/browse/HUDI-5373
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: loukey_j
>Assignee: loukey_j
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>
> partition =30 bucketNum=11 
> bucketId = 3011
> partition =301 bucketNum=1
> bucketId = 3011
>  
> Different fileids are assigned to the same bucket
> final String bucketId = partition  + bucketNum;
> if (incBucketIndex.contains(bucketId)) {
> location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum));
> } else if (bucketToFileId.containsKey(bucketNum)) {
> location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum));
> } else {
> String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum);
> location = new HoodieRecordLocation("I", newFileId);
> bucketToFileId.put(bucketNum, newFileId);
> incBucketIndex.add(bucketId);
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (13a8e5c7297 -> 5beadbfbe54)

2022-12-12 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 13a8e5c7297 [HUDI-5348] Cache file slices in HoodieBackedTableMetadata 
(#7436)
 add 5beadbfbe54 [HUDI-5373] Different fileids are assigned to the same 
bucket (#7433)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[GitHub] [hudi] danny0405 merged pull request #7433: [HUDI-5373] Different fileids are assigned to the same bucket

2022-12-12 Thread GitBox


danny0405 merged PR #7433:
URL: https://github.com/apache/hudi/pull/7433


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347663548

   
   ## CI report:
   
   * 17a887b98e0dd10d71a596ea87382911f3fdcef7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13490)
 
   * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660)
 
   * 43a31c8ce9849f487e521c1c9b467dd4eada6331 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling

2022-12-12 Thread GitBox


hudi-bot commented on PR #7366:
URL: https://github.com/apache/hudi/pull/7366#issuecomment-1347663469

   
   ## CI report:
   
   * 419d479d3469507566ad7d856f41ffb2182d7765 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13632)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13647)
 
   * 3f6572349834d904a697fbd8c8546f56a7f2844a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field

2022-12-12 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1347663412

   
   ## CI report:
   
   * 34c111a0fe150fe513fea39697976da06a912f5c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13460)
 
   * be82fc49989f5262d833fb2b803fd6ea69af8d0c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


hudi-bot commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347660023

   
   ## CI report:
   
   * baa62578663a77cc37725533fad04e4b75a47e1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] aizain commented on issue #7375: [SUPPORT] Hudi 0.12.1 support for Spark Structured Streaming. read clustering metadata replace avro file error. Unrecognized token 'Obj^A^B^Vavro'

2022-12-12 Thread GitBox


aizain commented on issue #7375:
URL: https://github.com/apache/hudi/issues/7375#issuecomment-1347658226

   thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7175: [HUDI-5191] Fix compatibility with avro 1.10

2022-12-12 Thread GitBox


danny0405 commented on code in PR #7175:
URL: https://github.com/apache/hudi/pull/7175#discussion_r1046588788


##
.github/workflows/bot.yml:
##
@@ -73,6 +73,14 @@ jobs:
 run: |
   HUDI_VERSION=$(mvn help:evaluate -Dexpression=project.version -q 
-DforceStdout)
   ./packaging/bundle-validation/ci_run.sh $HUDI_VERSION
+  - name: Common Test

Review Comment:
   From the cmd, it seems to test the `hudi-common` module specifically.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #6588: [SUPPORT]Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.avro.util.Utf8

2022-12-12 Thread GitBox


danny0405 commented on issue #6588:
URL: https://github.com/apache/hudi/issues/6588#issuecomment-1347638244

   @xushiyan Did you know the background that the spark-bundle does not include 
avro as a dependency ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] loukey-lj commented on pull request #7433: [HUDI-5373] Different fileids are assigned to the same bucket

2022-12-12 Thread GitBox


loukey-lj commented on PR #7433:
URL: https://github.com/apache/hudi/pull/7433#issuecomment-1347636220

   @danny0405 Please review the code for me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #6588: [SUPPORT]Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.avro.util.Utf8

2022-12-12 Thread GitBox


danny0405 commented on issue #6588:
URL: https://github.com/apache/hudi/issues/6588#issuecomment-1347633205

   It seems you did not shade the avro correctly for your spark bundle jar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


danny0405 commented on code in PR #7394:
URL: https://github.com/apache/hudi/pull/7394#discussion_r1046580467


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala:
##
@@ -245,6 +245,10 @@ class HoodieCatalogTable(val spark: SparkSession, var 
table: CatalogTable) exten
   case (_, false) =>
 ValidationUtils.checkArgument(table.schema.nonEmpty,
   s"Missing schema for Create Table: $catalogTableName")
+if (sqlOptions.contains("hoodie.datasource.write.keygenerator.class") 
&&
+!sqlOptions.contains("hoodie.table.keygenerator.class")) {

Review Comment:
   In `HoodieSparkSqlWriter#mergeParamsAndGetHoodieConfig` we can see similar 
logic, we can unify the code base, also we should not use hard-code option keys.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] daihw commented on issue #6588: [SUPPORT]Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.avro.util.Utf8

2022-12-12 Thread GitBox


daihw commented on issue #6588:
URL: https://github.com/apache/hudi/issues/6588#issuecomment-1347627637

   > I solved this problem by adding the following configuration in 
Packaging/Hudi-spark-bundle/pom.xml
   > 
   > ```
   > ...
   >  org.apache.avro:avro
   > ...
   > ...
   > 
   >   org.apache.avro.
   >   org.apache.hudi.org.apache.avro.
   > 
   > ...
   > ...
   > 
   >   org.apache.avro
   >   avro
   >   1.8.2
   >   compile
   > 
   > ...
   > ```
   
   hi,i got the same promblem as you ,when I repaired it according to your 
method,I encountered a new problem and could not use Spark to insert data,the 
error message is 
   java.lang.ClassCastException: 
org.apache.hudi.org.apache.avro.Schema$RecordSchema cannot be cast to 
org.apache.avro.Schema
   at 
org.apache.spark.SparkConf$$anonfun$registerAvroSchemas$1.apply(SparkConf.scala:221)
   at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at 
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
   at 
org.apache.spark.SparkConf.registerAvroSchemas(SparkConf.scala:221)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:280)
   at 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:101)
   at 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:60)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
   at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
   at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232)
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185)
   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   165739 [HiveServer2-Background-Pool: Thread-133] ERROR 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation  - Error 
running hive query: 
   org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException: 
org.apache.hudi.org.apache.avro.Schema$RecordSchema cannot be cast to 
org.apache.avro.Schema
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:269)
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at 

[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347605174

   
   ## CI report:
   
   * 17a887b98e0dd10d71a596ea87382911f3fdcef7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13490)
 
   * bb40c512b05286e266eb5b05e2f31b9ea926 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13660)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7411: [HUDI-5351] Handle populateMetaFields when repartitioning in sort partitioner

2022-12-12 Thread GitBox


hudi-bot commented on PR #7411:
URL: https://github.com/apache/hudi/pull/7411#issuecomment-1347600350

   
   ## CI report:
   
   * a2739c7a7cc5f6ebd38a4b1c4be46a7a652f1d38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13656)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


hudi-bot commented on PR #7437:
URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347600547

   
   ## CI report:
   
   * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7394: [HUDI-5262] Allow hoodie.datasource.write.keygenerator.class to be used in spark-sql create table

2022-12-12 Thread GitBox


hudi-bot commented on PR #7394:
URL: https://github.com/apache/hudi/pull/7394#issuecomment-1347600245

   
   ## CI report:
   
   * 17a887b98e0dd10d71a596ea87382911f3fdcef7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13490)
 
   * bb40c512b05286e266eb5b05e2f31b9ea926 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata (#7436)

2022-12-12 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 13a8e5c7297 [HUDI-5348] Cache file slices in HoodieBackedTableMetadata 
(#7436)
13a8e5c7297 is described below

commit 13a8e5c729750ba5907d75df3d22473feaaa2a03
Author: Y Ethan Guo 
AuthorDate: Mon Dec 12 17:00:10 2022 -0800

[HUDI-5348] Cache file slices in HoodieBackedTableMetadata (#7436)
---
 .../org/apache/hudi/metadata/HoodieBackedTableMetadata.java | 13 +++--
 .../org/apache/hudi/metadata/HoodieTableMetadataUtil.java   | 10 ++
 .../java/org/apache/hudi/utilities/TestHoodieIndexer.java   |  7 +--
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
 
b/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
index 7743a65bf05..e2fbc4e6716 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
@@ -40,6 +40,7 @@ import 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
 import org.apache.hudi.common.util.ClosableIterator;
 import org.apache.hudi.common.util.HoodieTimer;
 import org.apache.hudi.common.util.Option;
@@ -78,6 +79,7 @@ import static 
org.apache.hudi.common.util.ValidationUtils.checkArgument;
 import static 
org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_BLOOM_FILTERS;
 import static 
org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_COLUMN_STATS;
 import static 
org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_FILES;
+import static 
org.apache.hudi.metadata.HoodieTableMetadataUtil.getFileSystemView;
 
 /**
  * Table metadata provided by an internal DFS backed Hudi metadata table.
@@ -92,6 +94,7 @@ public class HoodieBackedTableMetadata extends 
BaseTableMetadata {
   // Metadata table's timeline and metaclient
   private HoodieTableMetaClient metadataMetaClient;
   private HoodieTableConfig metadataTableConfig;
+  private HoodieTableFileSystemView metadataFileSystemView;
   // should we reuse the open file handles, across calls
   private final boolean reuse;
 
@@ -120,6 +123,7 @@ public class HoodieBackedTableMetadata extends 
BaseTableMetadata {
 } else if (this.metadataMetaClient == null) {
   try {
 this.metadataMetaClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf.get()).setBasePath(metadataBasePath).build();
+this.metadataFileSystemView = getFileSystemView(metadataMetaClient);
 this.metadataTableConfig = metadataMetaClient.getTableConfig();
 this.isBloomFilterIndexEnabled = 
metadataConfig.isBloomFilterIndexEnabled();
 this.isColumnStatsIndexEnabled = 
metadataConfig.isColumnStatsIndexEnabled();
@@ -127,11 +131,13 @@ public class HoodieBackedTableMetadata extends 
BaseTableMetadata {
 LOG.warn("Metadata table was not found at path " + metadataBasePath);
 this.isMetadataTableEnabled = false;
 this.metadataMetaClient = null;
+this.metadataFileSystemView = null;
 this.metadataTableConfig = null;
   } catch (Exception e) {
 LOG.error("Failed to initialize metadata table at path " + 
metadataBasePath, e);
 this.isMetadataTableEnabled = false;
 this.metadataMetaClient = null;
+this.metadataFileSystemView = null;
 this.metadataTableConfig = null;
   }
 }
@@ -162,7 +168,8 @@ public class HoodieBackedTableMetadata extends 
BaseTableMetadata {
 //   to scan all file-groups for all key-prefixes as each of these 
might contain some
 //   records matching the key-prefix
 List partitionFileSlices =
-
HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, 
partitionName);
+HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(
+metadataMetaClient, metadataFileSystemView, partitionName);
 
 return (shouldLoadInMemory ? HoodieListData.lazy(partitionFileSlices) : 
engineContext.parallelize(partitionFileSlices))
 .flatMap((SerializableFunction>>) fileSlice -> {
@@ -379,7 +386,8 @@ public class HoodieBackedTableMetadata extends 
BaseTableMetadata {
   private Map, List> 
getPartitionFileSliceToKeysMapping(final String partitionName, final 
List keys) {
 // Metadata is in sync till the latest completed instant on the dataset
 List latestFileSlices =
-

[GitHub] [hudi] nsivabalan merged pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata

2022-12-12 Thread GitBox


nsivabalan merged PR #7436:
URL: https://github.com/apache/hudi/pull/7436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata

2022-12-12 Thread GitBox


nsivabalan commented on code in PR #7436:
URL: https://github.com/apache/hudi/pull/7436#discussion_r1046561322


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##
@@ -379,7 +386,8 @@ private HoodieRecord 
composeRecord(GenericRecord avroReco
   private Map, List> 
getPartitionFileSliceToKeysMapping(final String partitionName, final 
List keys) {
 // Metadata is in sync till the latest completed instant on the dataset
 List latestFileSlices =
-
HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, 
partitionName);
+HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(

Review Comment:
   looks like the FileSystemView (MDFSV) caches the entities and so we are 
good. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on issue #7351: [SUPPORT] The keygenerator.class value set when using SparkSQL to create a table does not finally take effect in hoodie.properties

2022-12-12 Thread GitBox


jonvex commented on issue #7351:
URL: https://github.com/apache/hudi/issues/7351#issuecomment-1347568412

   PR's are ready for review and then we can close this out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering

2022-12-12 Thread GitBox


nsivabalan commented on code in PR #7035:
URL: https://github.com/apache/hudi/pull/7035#discussion_r1046543459


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, 
HoodieCommitMetadata me
 
metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), 
"true");
 inlineScheduleClustering(extraMetadata);
   }
+
+  // if clustering is disabled, but we might need to rollback any inflight 
clustering when clustering was enabled previously.
+  if (!config.inlineClusteringEnabled() && 
!config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering()

Review Comment:
   this is already the case. The issue we are trying to solve here is, 
   if the replace commit requested is left in the data timeline, then metadata 
table compaction is stopped. 
   thats why.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5376) Update quickstart guide for hudi hoodie.datasource.write.keygenerator.class spark-sql change

2022-12-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5376:
-
Labels: pull-request-available  (was: )

> Update quickstart guide for hudi hoodie.datasource.write.keygenerator.class 
> spark-sql change
> 
>
> Key: HUDI-5376
> URL: https://issues.apache.org/jira/browse/HUDI-5376
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] jonvex opened a new pull request, #7439: [HUDI-5376] Remove incorrect spark-sql keygen info from quickstart guide

2022-12-12 Thread GitBox


jonvex opened a new pull request, #7439:
URL: https://github.com/apache/hudi/pull/7439

   ### Change Logs
   
   since 0.11.1 keygen logic in spark-sql is the same as everywhere else but 
the quickstart guide was never updated.
   
   ### Impact
   
   Documentation is correct now.
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata

2022-12-12 Thread GitBox


nsivabalan commented on code in PR #7436:
URL: https://github.com/apache/hudi/pull/7436#discussion_r1046539085


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##
@@ -379,7 +386,8 @@ private HoodieRecord 
composeRecord(GenericRecord avroReco
   private Map, List> 
getPartitionFileSliceToKeysMapping(final String partitionName, final 
List keys) {
 // Metadata is in sync till the latest completed instant on the dataset
 List latestFileSlices =
-
HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, 
partitionName);
+HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(

Review Comment:
   I am thinking we can cache the file slices also similar to how we cache the 
file readers. I don't see a reason for file slices to change unless there is a 
change in timeline on which case entire FileSystemView will be refreshed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5376) Update quickstart guide for hudi hoodie.datasource.write.keygenerator.class spark-sql change

2022-12-12 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-5376:
-

 Summary: Update quickstart guide for hudi 
hoodie.datasource.write.keygenerator.class spark-sql change
 Key: HUDI-5376
 URL: https://issues.apache.org/jira/browse/HUDI-5376
 Project: Apache Hudi
  Issue Type: Improvement
  Components: docs
Reporter: Jonathan Vexler
Assignee: Jonathan Vexler






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5262) When creating table in spark-sql setting wrong keygenerator config does not warn

2022-12-12 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-5262:
--
Status: Patch Available  (was: In Progress)

> When creating table in spark-sql setting wrong keygenerator config does not 
> warn
> 
>
> Key: HUDI-5262
> URL: https://issues.apache.org/jira/browse/HUDI-5262
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Minor
>  Labels: pull-request-available
>
> Setting `hoodie.datasource.write.keygenerator.class` when creating a table 
> does nothing. `hoodie.table.keygenerator.class` needs to be set. We should 
> warn when this is set on create table. Maybe we should warn about any configs 
> that do nothing when set on table creation? The error will present on the 
> first write if the keygenerator is not the default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5262) When creating table in spark-sql setting wrong keygenerator config does not warn

2022-12-12 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-5262:
--
Status: In Progress  (was: Open)

> When creating table in spark-sql setting wrong keygenerator config does not 
> warn
> 
>
> Key: HUDI-5262
> URL: https://issues.apache.org/jira/browse/HUDI-5262
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Minor
>  Labels: pull-request-available
>
> Setting `hoodie.datasource.write.keygenerator.class` when creating a table 
> does nothing. `hoodie.table.keygenerator.class` needs to be set. We should 
> warn when this is set on create table. Maybe we should warn about any configs 
> that do nothing when set on table creation? The error will present on the 
> first write if the keygenerator is not the default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5262) When creating table in spark-sql setting wrong keygenerator config does not warn

2022-12-12 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler reassigned HUDI-5262:
-

Assignee: Jonathan Vexler

> When creating table in spark-sql setting wrong keygenerator config does not 
> warn
> 
>
> Key: HUDI-5262
> URL: https://issues.apache.org/jira/browse/HUDI-5262
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Minor
>  Labels: pull-request-available
>
> Setting `hoodie.datasource.write.keygenerator.class` when creating a table 
> does nothing. `hoodie.table.keygenerator.class` needs to be set. We should 
> warn when this is set on create table. Maybe we should warn about any configs 
> that do nothing when set on table creation? The error will present on the 
> first write if the keygenerator is not the default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


hudi-bot commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347535112

   
   ## CI report:
   
   * baa62578663a77cc37725533fad04e4b75a47e1a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13658)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7436: [HUDI-5348] Cache file slices in HoodieBackedTableMetadata

2022-12-12 Thread GitBox


hudi-bot commented on PR #7436:
URL: https://github.com/apache/hudi/pull/7436#issuecomment-1347513477

   
   ## CI report:
   
   * 96b2aff47666ae63124f1a7601167388b501fe1b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13655)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7175: [HUDI-5191] Fix compatibility with avro 1.10

2022-12-12 Thread GitBox


alexeykudinkin commented on code in PR #7175:
URL: https://github.com/apache/hudi/pull/7175#discussion_r1046493214


##
.github/workflows/bot.yml:
##
@@ -73,6 +73,14 @@ jobs:
 run: |
   HUDI_VERSION=$(mvn help:evaluate -Dexpression=project.version -q 
-DforceStdout)
   ./packaging/bundle-validation/ci_run.sh $HUDI_VERSION
+  - name: Common Test

Review Comment:
   @Zouxxyy what are we specifically looking for to be tested in here? We need 
to be careful in expanding the scope here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7438: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-12 Thread GitBox


hudi-bot commented on PR #7438:
URL: https://github.com/apache/hudi/pull/7438#issuecomment-1347432717

   
   ## CI report:
   
   * baa62578663a77cc37725533fad04e4b75a47e1a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


hudi-bot commented on PR #7437:
URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347432690

   
   ## CI report:
   
   * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13657)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7437: [HUDI-5366] Closing metadata writer from within writeClient

2022-12-12 Thread GitBox


hudi-bot commented on PR #7437:
URL: https://github.com/apache/hudi/pull/7437#issuecomment-1347427053

   
   ## CI report:
   
   * 7aae826c0ffc9be3dbb72e48a38c5a595d2fe4bb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >