[jira] [Updated] (HUDI-5353) Close file reader wherever missing
[ https://issues.apache.org/jira/browse/HUDI-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5353: - Fix Version/s: 0.13.0 (was: 0.12.2) > Close file reader wherever missing > -- > > Key: HUDI-5353 > URL: https://issues.apache.org/jira/browse/HUDI-5353 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > If not closed, open file handles could lead to > {code:java} > java.io.InterruptedIOException: getFileStatus on > s3a://bucket/base/path/274df949-03a5-4837-840f-a0b558b82827-0_0-9095-234238_20221206220929477.parquet: > com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout > waiting for connection from pool {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5353) Close file reader wherever missing
[ https://issues.apache.org/jira/browse/HUDI-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5353. Resolution: Fixed > Close file reader wherever missing > -- > > Key: HUDI-5353 > URL: https://issues.apache.org/jira/browse/HUDI-5353 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2 > > > If not closed, open file handles could lead to > {code:java} > java.io.InterruptedIOException: getFileStatus on > s3a://bucket/base/path/274df949-03a5-4837-840f-a0b558b82827-0_0-9095-234238_20221206220929477.parquet: > com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout > waiting for connection from pool {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5353) Close file reader wherever missing
[ https://issues.apache.org/jira/browse/HUDI-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish resolved HUDI-5353. -- > Close file reader wherever missing > -- > > Key: HUDI-5353 > URL: https://issues.apache.org/jira/browse/HUDI-5353 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2 > > > If not closed, open file handles could lead to > {code:java} > java.io.InterruptedIOException: getFileStatus on > s3a://bucket/base/path/274df949-03a5-4837-840f-a0b558b82827-0_0-9095-234238_20221206220929477.parquet: > com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout > waiting for connection from pool {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5091) MergeInto syntax merge_condition does not support Non-Equal
[ https://issues.apache.org/jira/browse/HUDI-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5091: - Fix Version/s: (was: 0.12.2) > MergeInto syntax merge_condition does not support Non-Equal > --- > > Key: HUDI-5091 > URL: https://issues.apache.org/jira/browse/HUDI-5091 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: KnightChess >Assignee: KnightChess >Priority: Major > > Merge into sql merge condition support Non-equal condition > https://github.com/apache/hudi/issues/6400 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5393) Remove the reuse of metadata table writer for flink write client
[ https://issues.apache.org/jira/browse/HUDI-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5393. Resolution: Resolved > Remove the reuse of metadata table writer for flink write client > > > Key: HUDI-5393 > URL: https://issues.apache.org/jira/browse/HUDI-5393 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5350) oom cause compaction event lost
[ https://issues.apache.org/jira/browse/HUDI-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5350. Resolution: Resolved > oom cause compaction event lost > --- > > Key: HUDI-5350 > URL: https://issues.apache.org/jira/browse/HUDI-5350 > Project: Apache Hudi > Issue Type: Bug > Components: compaction, flink >Reporter: HBG >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5221) Make the decision for flink sql bucket index case-insensitive
[ https://issues.apache.org/jira/browse/HUDI-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5221. Resolution: Resolved > Make the decision for flink sql bucket index case-insensitive > - > > Key: HUDI-5221 > URL: https://issues.apache.org/jira/browse/HUDI-5221 > Project: Apache Hudi > Issue Type: Task > Components: flink-sql >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5228) Flink table service job fs view conf overwrites the one of writing job
[ https://issues.apache.org/jira/browse/HUDI-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5228. Resolution: Resolved > Flink table service job fs view conf overwrites the one of writing job > -- > > Key: HUDI-5228 > URL: https://issues.apache.org/jira/browse/HUDI-5228 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5227) Upgrade Jetty to 9.4.48
[ https://issues.apache.org/jira/browse/HUDI-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5227. Resolution: Resolved > Upgrade Jetty to 9.4.48 > --- > > Key: HUDI-5227 > URL: https://issues.apache.org/jira/browse/HUDI-5227 > Project: Apache Hudi > Issue Type: Task >Reporter: Rahil Chertara >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5412) Send the boostrap event if the JM also rebooted
[ https://issues.apache.org/jira/browse/HUDI-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5412. Resolution: Resolved > Send the boostrap event if the JM also rebooted > --- > > Key: HUDI-5412 > URL: https://issues.apache.org/jira/browse/HUDI-5412 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-3661) Flink async compaction is not thread safe when use watermark
[ https://issues.apache.org/jira/browse/HUDI-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-3661. Resolution: Resolved > Flink async compaction is not thread safe when use watermark > > > Key: HUDI-3661 > URL: https://issues.apache.org/jira/browse/HUDI-3661 > Project: Apache Hudi > Issue Type: Bug >Reporter: hd zhou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > Attachments: image-2022-03-18-19-38-39-257.png > > > async compaction will start a executor async compaciton and send compaction > result message to next flink operator. But collector.collect() is not a > threadsafe function. when use watermark or latencyMarker, they both call > collector.collect() may cause issue. > we should not let async compaction = false > > !image-2022-03-18-19-38-39-257.png! > > > !https://git.bilibili.co/datacenter/bili-hudi/uploads/79608d01b0301de84d1d9e3cf24f1d21/image.png! > > !https://git.bilibili.co/datacenter/bili-hudi/uploads/e9c2f27d395e708a407bcf40f672c870/image.png! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5223) Partial failover for flink
[ https://issues.apache.org/jira/browse/HUDI-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5223. Resolution: Resolved > Partial failover for flink > -- > > Key: HUDI-5223 > URL: https://issues.apache.org/jira/browse/HUDI-5223 > Project: Apache Hudi > Issue Type: Task > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5373) Different fileids are assigned to the same bucket
[ https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-5373. Resolution: Resolved > Different fileids are assigned to the same bucket > -- > > Key: HUDI-5373 > URL: https://issues.apache.org/jira/browse/HUDI-5373 > Project: Apache Hudi > Issue Type: Bug >Reporter: loukey_j >Assignee: loukey_j >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > > partition =30 bucketNum=11 > bucketId = 3011 > partition =301 bucketNum=1 > bucketId = 3011 > > Different fileids are assigned to the same bucket > final String bucketId = partition + bucketNum; > if (incBucketIndex.contains(bucketId)) { > location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum)); > } else if (bucketToFileId.containsKey(bucketNum)) { > location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum)); > } else { > String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum); > location = new HoodieRecordLocation("I", newFileId); > bucketToFileId.put(bucketNum, newFileId); > incBucketIndex.add(bucketId); > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5372) Fix NPE caused by alter table add column
[ https://issues.apache.org/jira/browse/HUDI-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5372: - Fix Version/s: (was: 0.12.2) > Fix NPE caused by alter table add column > > > Key: HUDI-5372 > URL: https://issues.apache.org/jira/browse/HUDI-5372 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5101) Adding spark structured streaming tests to integ tests
[ https://issues.apache.org/jira/browse/HUDI-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5101: - Fix Version/s: 0.13.0 (was: 0.12.2) > Adding spark structured streaming tests to integ tests > -- > > Key: HUDI-5101 > URL: https://issues.apache.org/jira/browse/HUDI-5101 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
svn commit: r58988 - in /dev/hudi/hudi-0.12.2: ./ hudi-0.12.2.src.tgz hudi-0.12.2.src.tgz.asc hudi-0.12.2.src.tgz.sha512
Author: satish Date: Mon Dec 26 08:52:59 2022 New Revision: 58988 Log: Add hudi-0.12.2 release binaries Added: dev/hudi/hudi-0.12.2/ dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz (with props) dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512 Added: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz == Binary file - no diff available. Propchange: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz -- svn:mime-type = application/octet-stream Added: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc == --- dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc (added) +++ dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc Mon Dec 26 08:52:59 2022 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEEbaCzmhPCZY0irn0U0IxLa9mOplkFAmOpXpoACgkQ0IxLa9mO +plntVg//aEPjDc03kzSuShWjcmdU94OuBoMW+j1urw43UA+bmC1ENC65HuxfUvVO +nAQW6ZiHsHSKAGZBHP846jZIKXRfIQMVNv/Yj+fFAtsKC4UliAKfof5+srwzveZf +NKa0zyurYKxwPbFjy/8jZSyO91Hwf22sx+oe7NkcuaY/7s7cVTs8Qu45kH6VAUQG +WOOSpDCTPGaHPknUhQ/kiGdIlQSvpzMdsmZIYKOmyWUeF0LvtbTg0bOe/s2yKbpJ +7A55Xq2pTc0vx3icJmwZCDuUCDeFeB5bMSi+j3pmDpar1lX5OUhpgkO+hg9Riz6b +lloiRRDpeNfbll9gJxSjOXvuS64CUIo6hffQ3OywQj0wCVZIDPtKynSMrBjHmNUh +kQibDwoDKMlwDWCrnn/v3UHl2c1XhjgWnhMI848VQFaKWC1qlzKGrlYhQl2YEZrL +e4NlENM75rKYSf+QUOTRo76/bXlBumuySnXg+r7NAFcXsZMr4p91mig6HwXE7VvW +zSPbMTfzZHOvAY/9OOJK5wxCuLp2n0+2WwSex7Jcn8Kd0slOHGDNuY2JhByxEmN3 +IGx7vuqq4nVScSleFqeEmdL7lnPffX8RgHXJncaDxbRKruyFie3DrzpKDXsSESzG +g70ZQBTmi06uGacg8U8m2S2MpSMKpuRSuxoWxNRsy/rWPbf8HMo= +=7LE4 +-END PGP SIGNATURE- Added: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512 == --- dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512 (added) +++ dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512 Mon Dec 26 08:52:59 2022 @@ -0,0 +1 @@ +8cb2cf9844c1280fa0a16371a7e39103f09d8a48eae57f2a9c7861db245a3c41625c0012472b553699dcb97495f224290cbd6657120d017496da385474d12b8e hudi-0.12.2.src.tgz
[hudi] branch release-0.12.2 updated (94db72e2c9 -> aea5bb6f0a)
This is an automated email from the ASF dual-hosted git repository. satish pushed a change to branch release-0.12.2 in repository https://gitbox.apache.org/repos/asf/hudi.git from 94db72e2c9 Bumping mvn version to 0.12.2-1 add 975eb91b21 [HUDI-5357] Fix release build commands (#7501) add aea5bb6f0a [MINOR] Update release version to reflect published version 0.12.2 No new revisions were added by this update. Summary of changes: docker/hoodie/hadoop/base/pom.xml | 2 +- docker/hoodie/hadoop/base_java11/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml | 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml | 2 +- docker/hoodie/hadoop/pom.xml | 2 +- docker/hoodie/hadoop/prestobase/pom.xml| 2 +- docker/hoodie/hadoop/spark_base/pom.xml| 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml| 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +- docker/hoodie/hadoop/sparkworker/pom.xml | 2 +- docker/hoodie/hadoop/trinobase/pom.xml | 2 +- docker/hoodie/hadoop/trinocoordinator/pom.xml | 2 +- docker/hoodie/hadoop/trinoworker/pom.xml | 2 +- hudi-aws/pom.xml | 4 +-- hudi-cli/pom.xml | 2 +- hudi-client/hudi-client-common/pom.xml | 4 +-- hudi-client/hudi-flink-client/pom.xml | 4 +-- hudi-client/hudi-java-client/pom.xml | 4 +-- hudi-client/hudi-spark-client/pom.xml | 4 +-- hudi-client/pom.xml| 2 +- hudi-common/pom.xml| 2 +- hudi-examples/hudi-examples-common/pom.xml | 2 +- hudi-examples/hudi-examples-flink/pom.xml | 2 +- hudi-examples/hudi-examples-java/pom.xml | 2 +- hudi-examples/hudi-examples-spark/pom.xml | 2 +- hudi-examples/pom.xml | 2 +- hudi-flink-datasource/hudi-flink/pom.xml | 4 +-- hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 4 +-- hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 +-- hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 +-- hudi-flink-datasource/pom.xml | 4 +-- hudi-gcp/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml| 2 +- hudi-kafka-connect/pom.xml | 4 +-- hudi-spark-datasource/hudi-spark-common/pom.xml| 4 +-- hudi-spark-datasource/hudi-spark/pom.xml | 4 +-- hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark2/pom.xml | 4 +-- hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.1.x/pom.xml | 4 +-- hudi-spark-datasource/hudi-spark3.2.x/pom.xml | 4 +-- .../hudi-spark3.2plus-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.3.x/pom.xml | 4 +-- hudi-spark-datasource/pom.xml | 2 +- hudi-sync/hudi-adb-sync/pom.xml| 2 +- hudi-sync/hudi-datahub-sync/pom.xml| 2 +- hudi-sync/hudi-hive-sync/pom.xml | 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml | 2 +- hudi-tests-common/pom.xml | 2 +- hudi-timeline-service/pom.xml | 2 +- hudi-utilities/pom.xml | 2 +- packaging/hudi-aws-bundle/pom.xml | 2 +- packaging/hudi-datahub-sync-bundle/pom.xml | 2 +- packaging/hudi-flink-bundle/pom.xml| 2 +- packaging/hudi-gcp-bundle/pom.xml | 2 +- packaging/hudi-hadoop-mr-bundle/pom.xml| 2 +- packaging/hudi-hive-sync-bundle/pom.xml| 2 +- packaging/hudi-integ-test-bundle/pom.xml | 2 +- packaging/hudi-kafka-connect-bundle/pom.xml| 2 +- packaging/hudi-presto-bundle/pom.xml | 2 +- packaging/hudi-spark-bundle/pom.xml| 2 +- packaging/hudi-timeline-server-bundle/pom.xml | 2 +- packaging/hudi-trino-bundle/pom.xml| 2 +- packaging/hudi-utilities-bundle/pom.xml| 2 +- packaging/hudi-utilities-slim-bundle/pom.xml | 2 +- pom.xml| 2 +- scripts/release/deploy_staging_jars.sh | 37 ++ scripts/release/validate_staged_bundles.sh | 4 +-- 72 files changed, 111 insertions(+), 104 deletions(-)
[hudi] annotated tag release-0.12.2 updated (aea5bb6f0a -> db9e7e8830)
This is an automated email from the ASF dual-hosted git repository. satish pushed a change to annotated tag release-0.12.2 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.12.2 was modified! *** from aea5bb6f0a (commit) to db9e7e8830 (tag) tagging aea5bb6f0ab824247f5e3498762ad94f643a2cb6 (commit) replaces release-0.12.2-rc1 by Satish Kotha on Sat Dec 24 15:51:59 2022 -0800 - Log - 0.12.2 -BEGIN PGP SIGNATURE- iQIzBAABCAAdFiEEbaCzmhPCZY0irn0U0IxLa9mOplkFAmOnkJ8ACgkQ0IxLa9mO pllbCg/+MsCqEWauNhqd6VjY3+eP/Ii1Un6/7xP30dbMMuMMOIFW5MrPjAO1ceRM 6jzizpp/TKSRJ8JtHLU/cF36H4v3jt8VrUjGbAX+HAhiDUSo5q+n/fivZKlXNFtZ BXu+CqiTMC1eZRKAcx9Yo9B4wxpIDX3VMXVo9Pjwheg7PzZlBUgrI8zDu51v0qUI IQahgUxeQKlABEd11G1m9o6bANw/KfMl2bKRxn/ZbUntX61oiwxGYlQF95M09n8f aWj2BaigYN3wk0csUO326mPxXJz126Xx6A7kDiXu0yNpg2WMB4k+xTB3WIodnXC/ 9cWP7l2/yLe4YfCDAraJgAeNxUTGl9t2dijieSVwgTfmx/XOKGWejSI6JGW1XYiH jHYzYnY4n2sMnIgLk+5p8TIdTxR2JyLn9hI1hzcNhABMQFVUlxUH9qKP7aLMoHd6 PMmsfOEIhFscG8H6rG8YJnsEqffFjRideFvqdvegCtp5m5577NyMy1wGFf09+QHj iC/CXas3gN3YYOk6j5/bbPzJCsDDh1kAnbwl7yPTMFibeAgX0bCWV9BgEIALq+/c uH4DIEKof+z6uxNO244kyGJp6GWCCnLTQBQNmne7DTiJcJvceAYyCT+f6TRPw7/8 Q1+2xyg5AGL1IBBmmIUwmcWKk40OFGTIWijESVDVae1+mdPrdLY= =kFeM -END PGP SIGNATURE- --- No new revisions were added by this update. Summary of changes:
[jira] [Updated] (HUDI-5022) Add better error messages to pr compliance
[ https://issues.apache.org/jira/browse/HUDI-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5022: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add better error messages to pr compliance > -- > > Key: HUDI-5022 > URL: https://issues.apache.org/jira/browse/HUDI-5022 > Project: Apache Hudi > Issue Type: Bug > Components: code-quality, dev-experience, docs, tests-ci >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > > When the pr compliance fails, the messages could be more helpful to users -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4970) hudi-kafka-connect-bundle: Could not initialize class org.apache.hadoop.security.UserGroupInformation
[ https://issues.apache.org/jira/browse/HUDI-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4970: - Fix Version/s: 0.13.0 (was: 0.12.2) > hudi-kafka-connect-bundle: Could not initialize class > org.apache.hadoop.security.UserGroupInformation > - > > Key: HUDI-4970 > URL: https://issues.apache.org/jira/browse/HUDI-4970 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > The Kafka connect sink loads successfully but fails to sync Hudi table due to > NoClassDefFoundError: Could not initialize class > org.apache.hadoop.security.UserGroupInformation > {code:java} > [2022-10-03 14:31:49,872] INFO The value of > hoodie.datasource.write.keygenerator.type is empty, using SIMPLE > (org.apache.hudi.keygen.factory.HoodieAvroKeyGeneratorFactory:63)[2022-10-03 > 14:31:49,872] INFO Setting record key volume and partition fields date for > table file:///tmp/hoodie/hudi-test-topichudi-test-topic > (org.apache.hudi.connect.writers.KafkaConnectTransactionServices:93)[2022-10-03 > 14:31:49,872] INFO Initializing file:///tmp/hoodie/hudi-test-topic as hoodie > table file:///tmp/hoodie/hudi-test-topic > (org.apache.hudi.common.table.HoodieTableMetaClient:424)[2022-10-03 > 14:31:49,872] INFO Existing partitions deleted [hudi-test-topic-0] > (org.apache.hudi.connect.HoodieSinkTask:156)[2022-10-03 14:31:49,872] ERROR > WorkerSinkTask{id=hudi-sink-3} Task threw an uncaught and unrecoverable > exception. Task is being killed and will not recover until manually restarted > (org.apache.kafka.connect.runtime.WorkerTask:184)java.lang.NoClassDefFoundError: > Could not initialize class org.apache.hadoop.security.UserGroupInformation > at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3431) > at > org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3421) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3263) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475) at > org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) at > org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:110)at > org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:103)at > org.apache.hudi.common.table.HoodieTableMetaClient.initTableAndGetMetaClient(HoodieTableMetaClient.java:426) > at > org.apache.hudi.common.table.HoodieTableMetaClient$PropertyBuilder.initTable(HoodieTableMetaClient.java:1110) > at > org.apache.hudi.connect.writers.KafkaConnectTransactionServices.(KafkaConnectTransactionServices.java:104) > at > org.apache.hudi.connect.transaction.ConnectTransactionCoordinator.(ConnectTransactionCoordinator.java:88) > at > org.apache.hudi.connect.HoodieSinkTask.bootstrap(HoodieSinkTask.java:191) > at org.apache.hudi.connect.HoodieSinkTask.open(HoodieSinkTask.java:151) at > org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:635) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:71){code} > Follow [https://github.com/apache/hudi/tree/master/hudi-kafka-connect#readme] > to reproduce. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5285) Exclude hive-site.xml from packaging in hudi-utilities
[ https://issues.apache.org/jira/browse/HUDI-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5285: - Fix Version/s: 0.13.0 (was: 0.12.2) > Exclude hive-site.xml from packaging in hudi-utilities > -- > > Key: HUDI-5285 > URL: https://issues.apache.org/jira/browse/HUDI-5285 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > the spark cluster can fail to access the external hive source normally due to > conflict with hive-site.xml packaged with hudi -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4963) Extend InProcessLockProvider to support multiple table ingestion
[ https://issues.apache.org/jira/browse/HUDI-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4963: - Fix Version/s: 0.13.0 (was: 0.12.2) > Extend InProcessLockProvider to support multiple table ingestion > > > Key: HUDI-4963 > URL: https://issues.apache.org/jira/browse/HUDI-4963 > Project: Apache Hudi > Issue Type: Task >Reporter: Rajesh Mahindra >Assignee: Rajesh Mahindra >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5404) add flink bundle validation
[ https://issues.apache.org/jira/browse/HUDI-5404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5404: - Fix Version/s: 0.13.0 (was: 0.12.2) > add flink bundle validation > --- > > Key: HUDI-5404 > URL: https://issues.apache.org/jira/browse/HUDI-5404 > Project: Apache Hudi > Issue Type: Test > Components: tests-ci >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > Make flink bundles validated via GitHub actions CI -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4605) Upgrade hudi-presto-bundle version to 0.12.0
[ https://issues.apache.org/jira/browse/HUDI-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4605: - Fix Version/s: 0.13.0 (was: 0.12.2) > Upgrade hudi-presto-bundle version to 0.12.0 > > > Key: HUDI-4605 > URL: https://issues.apache.org/jira/browse/HUDI-4605 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5145) Remove HDFS from DeltaStreamer UT/FT
[ https://issues.apache.org/jira/browse/HUDI-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5145: - Fix Version/s: 0.13.0 (was: 0.12.2) > Remove HDFS from DeltaStreamer UT/FT > > > Key: HUDI-5145 > URL: https://issues.apache.org/jira/browse/HUDI-5145 > Project: Apache Hudi > Issue Type: Test >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5131) Bundle validation: upgrade/downgrade
[ https://issues.apache.org/jira/browse/HUDI-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5131: - Fix Version/s: 0.13.0 (was: 0.12.2) > Bundle validation: upgrade/downgrade > > > Key: HUDI-5131 > URL: https://issues.apache.org/jira/browse/HUDI-5131 > Project: Apache Hudi > Issue Type: Test > Components: tests-ci >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5132) Bundle validation: Hive QL 3
[ https://issues.apache.org/jira/browse/HUDI-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5132: - Fix Version/s: 0.13.0 (was: 0.12.2) > Bundle validation: Hive QL 3 > > > Key: HUDI-5132 > URL: https://issues.apache.org/jira/browse/HUDI-5132 > Project: Apache Hudi > Issue Type: Test > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5371) Fix flaky testMetadataColumnStatsIndex
[ https://issues.apache.org/jira/browse/HUDI-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5371: - Fix Version/s: 0.13.0 (was: 0.12.2) > Fix flaky testMetadataColumnStatsIndex > -- > > Key: HUDI-5371 > URL: https://issues.apache.org/jira/browse/HUDI-5371 > Project: Apache Hudi > Issue Type: Test >Reporter: Sagar Sumit >Priority: Major > Fix For: 0.13.0 > > > The test started flaking after [https://github.com/apache/hudi/pull/7349] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5099) Update stock data so that new records are added in batch_2
[ https://issues.apache.org/jira/browse/HUDI-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5099: - Fix Version/s: 0.13.0 (was: 0.12.2) > Update stock data so that new records are added in batch_2 > -- > > Key: HUDI-5099 > URL: https://issues.apache.org/jira/browse/HUDI-5099 > Project: Apache Hudi > Issue Type: Test >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > The record key is "\{stock name}_\{date} \{hour}". We have the data from > 9:30-10:29 in batch_1 and batch_2 contains data from 10:30-10:59. This means > that no new records are introduced, and therefore, only updates occur when > ingesting batch_2. This makes validation of the data take too long for our > testing. Proposed solution is to move the data from 10:00-10:29 into batch_2 > so that we will have updates and inserts in both files -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5200) Resources are not cleaned up in UT
[ https://issues.apache.org/jira/browse/HUDI-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5200: - Fix Version/s: 0.13.0 (was: 0.12.2) > Resources are not cleaned up in UT > -- > > Key: HUDI-5200 > URL: https://issues.apache.org/jira/browse/HUDI-5200 > Project: Apache Hudi > Issue Type: Test > Components: tests-ci >Reporter: zouxxyy >Assignee: zouxxyy >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Resources are not cleaned up at UT -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4209) Avoid using HDFS in HoodieClientTestHarness
[ https://issues.apache.org/jira/browse/HUDI-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4209: - Fix Version/s: 0.13.0 (was: 0.12.2) > Avoid using HDFS in HoodieClientTestHarness > --- > > Key: HUDI-4209 > URL: https://issues.apache.org/jira/browse/HUDI-4209 > Project: Apache Hudi > Issue Type: Test > Components: tests-ci >Reporter: Sagar Sumit >Assignee: Raymond Xu >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4982) Make bundle combination testing covered in CI
[ https://issues.apache.org/jira/browse/HUDI-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4982: - Fix Version/s: 0.13.0 (was: 0.12.2) > Make bundle combination testing covered in CI > - > > Key: HUDI-4982 > URL: https://issues.apache.org/jira/browse/HUDI-4982 > Project: Apache Hudi > Issue Type: Test >Reporter: Raymond Xu >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > this is to cover > - spark-bundle > - utilities-bundle > - utilities-slim-bundle -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5098) Enable Spark2.4 bundle testing in GH Actions
[ https://issues.apache.org/jira/browse/HUDI-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5098: - Fix Version/s: 0.13.0 (was: 0.12.2) > Enable Spark2.4 bundle testing in GH Actions > > > Key: HUDI-5098 > URL: https://issues.apache.org/jira/browse/HUDI-5098 > Project: Apache Hudi > Issue Type: Test >Reporter: Jonathan Vexler >Priority: Major > Fix For: 0.13.0 > > > Bundle testing works for 3.1,3.2,3.3, but there was a hive setup issue that > wasn't being handled properly. Because we have azure-ci running with 2.4, we > decided to resolve this issue in the future -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-2673) Add integration/e2e test for kafka-connect functionality
[ https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-2673: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add integration/e2e test for kafka-connect functionality > > > Key: HUDI-2673 > URL: https://issues.apache.org/jira/browse/HUDI-2673 > Project: Apache Hudi > Issue Type: Test > Components: kafka-connect, tests-ci >Reporter: Ethan Guo >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The integration test should use bundle jar and run in docker setup. This can > prevent any issue in the bundle, like HUDI-3903, that is not covered by unit > and functional tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5358) Fix flaky tests in TestCleanerInsertAndCleanByCommits
[ https://issues.apache.org/jira/browse/HUDI-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5358: - Fix Version/s: 0.13.0 (was: 0.12.2) > Fix flaky tests in TestCleanerInsertAndCleanByCommits > - > > Key: HUDI-5358 > URL: https://issues.apache.org/jira/browse/HUDI-5358 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > In the tests, the {{KEEP_LATEST_COMMITS}} cleaner policy is used. This policy > first figures out the earliest commit to retain based on the config of the > number of retained commits ({{{}hoodie.cleaner.commits.retained{}}}). Then, > for each file group, one more version before the earliest commit to retain is > also kept from cleaning. The commit for the version can be different among > file groups. > However, the current validation logic only statically picks the one commit > before the earliest commit to retain in the Hudi timeline for all file > groups, which does not match the {{KEEP_LATEST_COMMITS}} cleaner policy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5330) Add docs for virtual keys
[ https://issues.apache.org/jira/browse/HUDI-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5330: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add docs for virtual keys > - > > Key: HUDI-5330 > URL: https://issues.apache.org/jira/browse/HUDI-5330 > Project: Apache Hudi > Issue Type: Improvement > Components: docs >Reporter: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > > Currently, the virtual key support is only presented in a blog: > [https://hudi.apache.org/blog/2021/08/18/virtual-keys/#virtual-key-support.] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5339) Update docs regarding the behavior change in NONE sort mode for bulk insert
[ https://issues.apache.org/jira/browse/HUDI-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5339: - Fix Version/s: 0.13.0 (was: 0.12.2) > Update docs regarding the behavior change in NONE sort mode for bulk insert > --- > > Key: HUDI-5339 > URL: https://issues.apache.org/jira/browse/HUDI-5339 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5295) With multiple meta syncs, one meta sync failure should not impact other meta syncs.
[ https://issues.apache.org/jira/browse/HUDI-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5295: - Fix Version/s: 0.13.0 (was: 0.12.2) > With multiple meta syncs, one meta sync failure should not impact other meta > syncs. > --- > > Key: HUDI-5295 > URL: https://issues.apache.org/jira/browse/HUDI-5295 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer, meta-sync, spark-sql >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > For example, if you are using HMS and glue, if HMS sync fails, we should > still sync with glue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode
[ https://issues.apache.org/jira/browse/HUDI-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5343: - Fix Version/s: 0.13.0 (was: 0.12.2) > HoodieFlinkStreamer supports async clustering for append mode > - > > Key: HUDI-5343 > URL: https://issues.apache.org/jira/browse/HUDI-5343 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > > HoodieFlinkStreamer supports async clustering for append mode, which keeps > the consistent with the pipeline of HoodieTableSink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5292) Exclude the test resources from every module packaging
[ https://issues.apache.org/jira/browse/HUDI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5292: - Fix Version/s: 0.13.0 (was: 0.12.2) > Exclude the test resources from every module packaging > -- > > Key: HUDI-5292 > URL: https://issues.apache.org/jira/browse/HUDI-5292 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Priority: Major > Fix For: 0.13.0 > > > Exclude the test resources, especially the properties files that conflict > with user-provided resources, from every module. This is a followup to > https://github.com/apache/hudi/pull/7310#issuecomment-1328728297 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5294) Support type change for schema on read enable + reconcile schema
[ https://issues.apache.org/jira/browse/HUDI-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5294: - Fix Version/s: 0.13.0 (was: 0.12.2) > Support type change for schema on read enable + reconcile schema > > > Key: HUDI-5294 > URL: https://issues.apache.org/jira/browse/HUDI-5294 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Tao Meng >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > https://github.com/apache/hudi/issues/7283 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5283) Replace deprecated method Schema.parse with Schema.Parser
[ https://issues.apache.org/jira/browse/HUDI-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5283: - Fix Version/s: (was: 0.12.2) > Replace deprecated method Schema.parse with Schema.Parser > - > > Key: HUDI-5283 > URL: https://issues.apache.org/jira/browse/HUDI-5283 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > When reading the code, I found that > HoodieBootstrapSchemaProvider#getBootstrapSchema uses the deprecated method > Schema.parse, which can be replaced by Schema.Parser().parse(), > At the same time, I searched at the moudle level, only to find that this > place uses an deprecated method. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5293) Schema on read + reconcile schema fails w/ 0.12.1
[ https://issues.apache.org/jira/browse/HUDI-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5293: - Fix Version/s: 0.13.0 (was: 0.12.2) > Schema on read + reconcile schema fails w/ 0.12.1 > - > > Key: HUDI-5293 > URL: https://issues.apache.org/jira/browse/HUDI-5293 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > if I do schema on read on commit1 and then schema on read + reconcile schema > for 2nd batch, it fails w/ > {code:java} > warning: there was one deprecation warning; re-run with -deprecation for > details > 22/11/28 16:44:26 ERROR BaseSparkCommitActionExecutor: Error upserting > bucketType UPDATE for partition :2 > java.lang.IllegalArgumentException: cannot modify hudi meta col: > _hoodie_commit_time > at > org.apache.hudi.internal.schema.action.TableChange$BaseColumnChange.checkColModifyIsLegal(TableChange.java:157) > at > org.apache.hudi.internal.schema.action.TableChanges$ColumnAddChange.addColumns(TableChanges.java:314) > at > org.apache.hudi.internal.schema.utils.AvroSchemaEvolutionUtils.lambda$reconcileSchema$5(AvroSchemaEvolutionUtils.java:92) > at > java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2969) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hudi.internal.schema.utils.AvroSchemaEvolutionUtils.reconcileSchema(AvroSchemaEvolutionUtils.java:80) > at > org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:103) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:308) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(Thre
[jira] [Updated] (HUDI-5258) Address checkstyle warnings in hudi-common module
[ https://issues.apache.org/jira/browse/HUDI-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5258: - Fix Version/s: 0.13.0 (was: 0.12.2) > Address checkstyle warnings in hudi-common module > - > > Key: HUDI-5258 > URL: https://issues.apache.org/jira/browse/HUDI-5258 > Project: Apache Hudi > Issue Type: Improvement > Components: dev-experience >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5261) Use proper parallelism for engine context APIs
[ https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5261: - Fix Version/s: 0.13.0 (was: 0.12.2) > Use proper parallelism for engine context APIs > -- > > Key: HUDI-5261 > URL: https://issues.apache.org/jira/browse/HUDI-5261 > Project: Apache Hudi > Issue Type: Improvement > Components: performance >Reporter: Raymond Xu >Assignee: Jonathan Vexler >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > do a global search of these APIs > - org.apache.hudi.common.engine.HoodieEngineContext#flatMap > - org.apache.hudi.common.engine.HoodieEngineContext#map > and similar ones take in parallelism. > A lot of occurrences are using number of items as parallelism, which affect > performance. Parallelism should be based on num cores available in the > cluster and set by user via parallelism configs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5269) Enhancing core user flow tests for spark-sql writes
[ https://issues.apache.org/jira/browse/HUDI-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5269: - Fix Version/s: 0.13.0 (was: 0.12.2) > Enhancing core user flow tests for spark-sql writes > --- > > Key: HUDI-5269 > URL: https://issues.apache.org/jira/browse/HUDI-5269 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, tests-ci >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > We triaged some of the core user flows and looks like we don't have a good > coverage for those flows. > > # > ## {{COW and MOR(w/ and w/o metadata enabled)}} > ### {{{}Partitioned(BLOOM, SIMPLE, GLOBAL_BLOOM, }}BUCKET\{{{}){}}}, > {{{}non-partitioned(GLOBAL_BLOOM){}}}. > > # {\{Immutable data. pure bulk_insert row writing. }} > # {\{Immutable w/ file sizing. pure inserts. }} > # {\{initial bulk ingest, followed by updates. bulk_insert followed by > upserts. }} > # {{{}regular inserts + updates combined{*}{{*}{ \{{ ** }} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5252) ClusteringCommitSink supports to rollback clustering
[ https://issues.apache.org/jira/browse/HUDI-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5252: - Fix Version/s: (was: 0.12.2) > ClusteringCommitSink supports to rollback clustering > > > Key: HUDI-5252 > URL: https://issues.apache.org/jira/browse/HUDI-5252 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > When commit buffer has failed ClusteringCommitEvent, the ClusteringCommitSink > invokes the CompactionUtil#rollbackCompaction to rollback clustering. > ClusteringCommitSink should call ClusteringUtil#rollbackClustering to > rollback clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5246) Improve validation for partition path
[ https://issues.apache.org/jira/browse/HUDI-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5246: - Fix Version/s: (was: 0.12.2) > Improve validation for partition path > - > > Key: HUDI-5246 > URL: https://issues.apache.org/jira/browse/HUDI-5246 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Raymond Xu >Assignee: Hemanth Gowda >Priority: Minor > Labels: hudi-on-call, new-to-hudi, pull-request-available > > To fail early if absolute path is set for partition (e.g. with leading `/`) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5241) Optimize HoodieDefaultTimeline API
[ https://issues.apache.org/jira/browse/HUDI-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5241: - Fix Version/s: 0.13.0 (was: 0.12.2) > Optimize HoodieDefaultTimeline API > -- > > Key: HUDI-5241 > URL: https://issues.apache.org/jira/browse/HUDI-5241 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Yann Byron >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5246) Improve validation for partition path
[ https://issues.apache.org/jira/browse/HUDI-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5246: - Fix Version/s: 0.13.0 > Improve validation for partition path > - > > Key: HUDI-5246 > URL: https://issues.apache.org/jira/browse/HUDI-5246 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Raymond Xu >Assignee: Hemanth Gowda >Priority: Minor > Labels: hudi-on-call, new-to-hudi, pull-request-available > Fix For: 0.13.0 > > > To fail early if absolute path is set for partition (e.g. with leading `/`) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5198) add in minor perf wins in hudi-utilities and locking related tests
[ https://issues.apache.org/jira/browse/HUDI-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5198: - Fix Version/s: 0.13.0 (was: 0.12.2) > add in minor perf wins in hudi-utilities and locking related tests > -- > > Key: HUDI-5198 > URL: https://issues.apache.org/jira/browse/HUDI-5198 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5234) Streaming read skip clustering instants Configurable
[ https://issues.apache.org/jira/browse/HUDI-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5234: - Fix Version/s: (was: 0.12.2) > Streaming read skip clustering instants Configurable > > > Key: HUDI-5234 > URL: https://issues.apache.org/jira/browse/HUDI-5234 > Project: Apache Hudi > Issue Type: Improvement > Components: clustering >Reporter: zhuanshenbsj1 >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5167) Reduce test run time for virtual key tests
[ https://issues.apache.org/jira/browse/HUDI-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5167: - Fix Version/s: 0.13.0 (was: 0.12.2) > Reduce test run time for virtual key tests > -- > > Key: HUDI-5167 > URL: https://issues.apache.org/jira/browse/HUDI-5167 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > We did parametrized for quite a few tests when we added virtual keys. some of > them may not be required. so lets revisit them and reduce whereever > applicable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5181) Enhance keygen class validation
[ https://issues.apache.org/jira/browse/HUDI-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5181: - Fix Version/s: 0.13.0 (was: 0.12.2) > Enhance keygen class validation > --- > > Key: HUDI-5181 > URL: https://issues.apache.org/jira/browse/HUDI-5181 > Project: Apache Hudi > Issue Type: Improvement > Components: configs >Reporter: Raymond Xu >Priority: Major > Fix For: 0.13.0 > > > Some in-code validations can be added to early alert users who set keygen > configs improperly. For example, in TimestampBased keygen, output format > cannot be empty. > We should audit all built-in keygen classes and add UTs and proper > validations. This is to improve usability and save time in troubleshooting > when misconfig happened. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5166) Reduce test run time for top time consuming tests
[ https://issues.apache.org/jira/browse/HUDI-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5166: - Fix Version/s: 0.13.0 (was: 0.12.2) > Reduce test run time for top time consuming tests > - > > Key: HUDI-5166 > URL: https://issues.apache.org/jira/browse/HUDI-5166 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5178) Add Call show_table_properties for spark sql
[ https://issues.apache.org/jira/browse/HUDI-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5178: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add Call show_table_properties for spark sql > > > Key: HUDI-5178 > URL: https://issues.apache.org/jira/browse/HUDI-5178 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5162) Allow user specified start offset for streaming query
[ https://issues.apache.org/jira/browse/HUDI-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5162: - Fix Version/s: 0.13.0 (was: 0.12.2) > Allow user specified start offset for streaming query > - > > Key: HUDI-5162 > URL: https://issues.apache.org/jira/browse/HUDI-5162 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core, spark >Reporter: Hui An >Assignee: Hui An >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Add new configure: hoodie.datasource.streaming.startOffset to allow users to > specify start offset for streaming query -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5112) Add presto query validation support for all tests in integ tests
[ https://issues.apache.org/jira/browse/HUDI-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5112: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add presto query validation support for all tests in integ tests > > > Key: HUDI-5112 > URL: https://issues.apache.org/jira/browse/HUDI-5112 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5113) Add support to test different indexes with integ test
[ https://issues.apache.org/jira/browse/HUDI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5113: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add support to test different indexes with integ test > - > > Key: HUDI-5113 > URL: https://issues.apache.org/jira/browse/HUDI-5113 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5060) Make all clean policies support incremental mode to find partition paths
[ https://issues.apache.org/jira/browse/HUDI-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5060: - Fix Version/s: (was: 0.12.2) > Make all clean policies support incremental mode to find partition paths > > > Key: HUDI-5060 > URL: https://issues.apache.org/jira/browse/HUDI-5060 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: 董可伦 >Assignee: 董可伦 >Priority: Major > Labels: pull-request-available > > Make all clean policies support incremental mode to find partition paths -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5072) Extract transform duplicate code
[ https://issues.apache.org/jira/browse/HUDI-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5072: - Fix Version/s: 0.13.0 (was: 0.12.2) > Extract transform duplicate code > > > Key: HUDI-5072 > URL: https://issues.apache.org/jira/browse/HUDI-5072 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > When reading the code, I found that the transform methods of > MultipleSparkJobExecutionStrategy and SingleSparkJobExecutionStrategy have > redundant code. I think we can extract them to make the code cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5052) Update 0.12.0 docs for regression
[ https://issues.apache.org/jira/browse/HUDI-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5052: - Fix Version/s: 0.13.0 (was: 0.12.2) > Update 0.12.0 docs for regression > - > > Key: HUDI-5052 > URL: https://issues.apache.org/jira/browse/HUDI-5052 > Project: Apache Hudi > Issue Type: Improvement > Components: docs >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5051) Add a functional regression test for Bloom Index followed on w/ Upserts
[ https://issues.apache.org/jira/browse/HUDI-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5051: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add a functional regression test for Bloom Index followed on w/ Upserts > --- > > Key: HUDI-5051 > URL: https://issues.apache.org/jira/browse/HUDI-5051 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: Alexey Kudinkin >Assignee: Jonathan Vexler >Priority: Blocker > Fix For: 0.13.0 > > > In the test > * State is initially bootstrapped by Bulk Insert (row-writing) > * Follow-up w/ upserts -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5035) Remove deprecated API usage in SparkPreCommitValidator#validate
[ https://issues.apache.org/jira/browse/HUDI-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5035: - Fix Version/s: 0.13.0 (was: 0.12.2) > Remove deprecated API usage in SparkPreCommitValidator#validate > --- > > Key: HUDI-5035 > URL: https://issues.apache.org/jira/browse/HUDI-5035 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: image-2022-10-15-07-23-43-689.png > > > I found that the code uses the deprecated API, modify the code to use the > recommended API > > !image-2022-10-15-07-23-43-689.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5032) Add Archiving to the CLI
[ https://issues.apache.org/jira/browse/HUDI-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5032: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add Archiving to the CLI > > > Key: HUDI-5032 > URL: https://issues.apache.org/jira/browse/HUDI-5032 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving, cli >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4990) Parallelize deduplication in CLI tool
[ https://issues.apache.org/jira/browse/HUDI-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4990: - Fix Version/s: 0.13.0 (was: 0.12.2) > Parallelize deduplication in CLI tool > - > > Key: HUDI-4990 > URL: https://issues.apache.org/jira/browse/HUDI-4990 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: sivabalan narayanan >Priority: Minor > Fix For: 0.13.0 > > > The CLI tool command `repair deduplicate` repair one partition at a time. To > repair hundreds of partitions, this takes time. We should add a mode to take > multiple partition paths for the CLI and run the dedup job for multiple > partitions at the same time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5018) Make user-provided copyOnWriteRecordSizeEstimate first precedence
[ https://issues.apache.org/jira/browse/HUDI-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5018: - Fix Version/s: 0.13.0 (was: 0.12.2) > Make user-provided copyOnWriteRecordSizeEstimate first precedence > - > > Key: HUDI-5018 > URL: https://issues.apache.org/jira/browse/HUDI-5018 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: Raymond Xu >Assignee: xi chaomin >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > For estimated avg record size > https://hudi.apache.org/docs/configurations/#hoodiecopyonwriterecordsizeestimate > which is used here > https://github.com/apache/hudi/blob/86a1efbff1300603a8180111eae117c7f9dbd8a5/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L372 > Propose to respect user setting by following the precedence as below > 1) if user sets a value, then use it as is > 2) if user not setting it, infer from timeline commit metadata > 3) if timeline is empty, use a default (current: 1024) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4967) Improve docs for meta sync with TimestampBasedKeyGenerator
[ https://issues.apache.org/jira/browse/HUDI-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4967: - Fix Version/s: 0.13.0 (was: 0.12.2) > Improve docs for meta sync with TimestampBasedKeyGenerator > -- > > Key: HUDI-4967 > URL: https://issues.apache.org/jira/browse/HUDI-4967 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Related fix: HUDI-4966 > We need to add docs on how to properly set the meta sync configuration, > especially the hoodie.datasource.hive_sync.partition_value_extractor, in > [https://hudi.apache.org/docs/key_generation] (for different Hudi versions, > the config can be different). Check the ticket above and PR description of > [https://github.com/apache/hudi/pull/6851] for more details. > We should also add the migration setup on the key generation page as well: > [https://hudi.apache.org/releases/release-0.12.0/#configuration-updates] > * {{{}hoodie.datasource.hive_sync.partition_value_extractor{}}}: This config > is used to extract and transform partition value during Hive sync. Its > default value has been changed from > {{SlashEncodedDayPartitionValueExtractor}} to > {{{}MultiPartKeysValueExtractor{}}}. If you relied on the previous default > value (i.e., have not set it explicitly), you are required to set the config > to {{{}org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor{}}}. From > this release, if this config is not set and Hive sync is enabled, then > partition value extractor class will be *automatically inferred* on the basis > of number of partition fields and whether or not hive style partitioning is > enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4888) Add validation to block COW table to use consistent hashing bucket index
[ https://issues.apache.org/jira/browse/HUDI-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4888: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add validation to block COW table to use consistent hashing bucket index > > > Key: HUDI-4888 > URL: https://issues.apache.org/jira/browse/HUDI-4888 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Yuwei Xiao >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Consistent hashing bucket index's resizing relies on the log feature of MOR > table. So with COW table, the consistent hashing bucket index can not achieve > resizing currently. > We should block the user from using it at the very beginning(i.e., table > creation), and suggest them to use MOR table or Simple Bucket Index. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4881) Push down filters if possible when syncing partitions to Hive
[ https://issues.apache.org/jira/browse/HUDI-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4881: - Fix Version/s: 0.13.0 (was: 0.12.2) > Push down filters if possible when syncing partitions to Hive > - > > Key: HUDI-4881 > URL: https://issues.apache.org/jira/browse/HUDI-4881 > Project: Apache Hudi > Issue Type: Improvement > Components: hive, meta-sync >Reporter: Hui An >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4839) rocksdbjni is not compatible with apple silicon
[ https://issues.apache.org/jira/browse/HUDI-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4839: - Fix Version/s: 0.13.0 (was: 0.12.2) > rocksdbjni is not compatible with apple silicon > --- > > Key: HUDI-4839 > URL: https://issues.apache.org/jira/browse/HUDI-4839 > Project: Apache Hudi > Issue Type: Improvement >Reporter: zouxxyy >Assignee: zouxxyy >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > rocksdbjni 5.17.2 is not compatible with apple silicon > when set FileSystemViewStorageType.EMBEDDED_KV_STORE in apple m1 raise error > like this > {code:java} > java.lang.UnsatisfiedLinkError: > /private/var/folders/px/y3gybll50ggctcjp2t4r2b50gp/T/librocksdbjni1847223031371241574.jnilib: > > dlopen(/private/var/folders/px/y3gybll50ggctcjp2t4r2b50gp/T/librocksdbjni1847223031371241574.jnilib, > 0x0001): tried: > '/private/var/folders/px/y3gybll50ggctcjp2t4r2b50gp/T/librocksdbjni1847223031371241574.jnilib' > (mach-o file, but is an incompatible architecture (have 'x86_64', need > 'arm64e')) {code} > After 6.29.4.1, rocksdb can work on M1 macs. > [here|https://github.com/facebook/rocksdb/issues/7720] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4823) Add read_optimize spark_session config to use in spark-sql
[ https://issues.apache.org/jira/browse/HUDI-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4823: - Fix Version/s: 0.13.0 (was: 0.12.2) > Add read_optimize spark_session config to use in spark-sql > -- > > Key: HUDI-4823 > URL: https://issues.apache.org/jira/browse/HUDI-4823 > Project: Apache Hudi > Issue Type: Improvement >Reporter: yonghua jian >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > When create a table not using hive catalog in spark, we can not easily do > read_optimized query in sqark-sql(using global hudi config file is > inconvenient),so I add the read_optimize spark_session config to use in > spark-sql -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-2913) Disable auto clean in writer task
[ https://issues.apache.org/jira/browse/HUDI-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-2913: - Fix Version/s: 0.13.0 (was: 0.12.2) > Disable auto clean in writer task > - > > Key: HUDI-2913 > URL: https://issues.apache.org/jira/browse/HUDI-2913 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Zhaojing Yu >Assignee: Zhaojing Yu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3954) Don't keep the last commit before the earliest commit to retain
[ https://issues.apache.org/jira/browse/HUDI-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-3954: - Fix Version/s: 0.13.0 (was: 0.12.2) > Don't keep the last commit before the earliest commit to retain > --- > > Key: HUDI-3954 > URL: https://issues.apache.org/jira/browse/HUDI-3954 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: 董可伦 >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > Don't keep the last commit before the earliest commit to retain > According to the document of {{{}hoodie.cleaner.commits.retained{}}}: > Number of commits to retain, without cleaning. This will be retained for > num_of_commits * time_between_commits (scheduled). This also directly > translates into how much data retention the table supports for incremental > queries. > > We only need to keep the number of commit configured through parameters > {{{}hoodie.cleaner.commits.retained{}}}. > And the commit retained by clean is completed.This ensures that “This will be > retained for num_of_commits * time_between_commits” in the document. > So we don't need to keep the last commit before the earliest commit to > retain,If we want to keep more versions, we can increase the parameters > {{hoodie.cleaner.commits.retained}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-712) Improve exporter performance and memory usage
[ https://issues.apache.org/jira/browse/HUDI-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-712: Fix Version/s: 0.13.0 (was: 0.12.2) > Improve exporter performance and memory usage > - > > Key: HUDI-712 > URL: https://issues.apache.org/jira/browse/HUDI-712 > Project: Apache Hudi > Issue Type: Improvement > Components: Utilities >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > > [https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L103-L107] > The way the data file list for export is collected can be improved due to > * not parallelized among partitions > * the list can be too large > * listing partition to get the latest files requires scanning all files > (RFC-15 could solve this) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-1570) Add Avg record size in commit metadata
[ https://issues.apache.org/jira/browse/HUDI-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1570: - Fix Version/s: (was: 0.12.2) > Add Avg record size in commit metadata > -- > > Key: HUDI-1570 > URL: https://issues.apache.org/jira/browse/HUDI-1570 > Project: Apache Hudi > Issue Type: Improvement > Components: Utilities >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2021-01-31 at 7.05.55 PM.png > > Original Estimate: 2h > Remaining Estimate: 2h > > Many users want to understand what would be their avg record size in hudi > storage. They need this so that they can deduce their bloom config values. > As of now, there is no easy way to fetch record size for the end user. Even > w/ hudi-cli, we could decipher from commit metadata, but we need to make some > rough calculation. So, it would be better if we store the avg record size w/ > WriteStats (total bytes written/ total records written) , as well as in > commit metadata. So, in hudi_cli, we could expose this info along w/ "commit > showpartitions" or expose another command "commit showmetadata" or something. > As of now, we could calculate the avg size from bytes written/records written > from commit metadata. > !Screen Shot 2021-01-31 at 7.05.55 PM.png! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5105) Add Call show_commit_extra_metadata for spark sql
[ https://issues.apache.org/jira/browse/HUDI-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5105: - Fix Version/s: 0.13.0 > Add Call show_commit_extra_metadata for spark sql > - > > Key: HUDI-5105 > URL: https://issues.apache.org/jira/browse/HUDI-5105 > Project: Apache Hudi > Issue Type: New Feature > Components: spark-sql >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5201) add totalRecordsDeleted metric
[ https://issues.apache.org/jira/browse/HUDI-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5201: - Fix Version/s: (was: 0.12.2) > add totalRecordsDeleted metric > -- > > Key: HUDI-5201 > URL: https://issues.apache.org/jira/browse/HUDI-5201 > Project: Apache Hudi > Issue Type: New Feature > Components: metrics >Reporter: Hussein Awala >Assignee: Hussein Awala >Priority: Major > Labels: pull-request-available > > Add missing {{totalRecordsDeleted}} metric to commit action metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5105) Add Call show_commit_extra_metadata for spark sql
[ https://issues.apache.org/jira/browse/HUDI-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5105: - Fix Version/s: (was: 0.12.2) > Add Call show_commit_extra_metadata for spark sql > - > > Key: HUDI-5105 > URL: https://issues.apache.org/jira/browse/HUDI-5105 > Project: Apache Hudi > Issue Type: New Feature > Components: spark-sql >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5059) Support automatic setting of certain attributes when creating a table in the flash catalog
[ https://issues.apache.org/jira/browse/HUDI-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5059: - Fix Version/s: 0.13.0 (was: 0.12.2) > Support automatic setting of certain attributes when creating a table in the > flash catalog > -- > > Key: HUDI-5059 > URL: https://issues.apache.org/jira/browse/HUDI-5059 > Project: Apache Hudi > Issue Type: New Feature > Components: flink-sql >Reporter: waywtdcc >Priority: Major > Fix For: 0.13.0 > > > Support the automatic setting of certain attributes when creating a table in > the flash catalog For example, when creating a hudi catalog, execute some > default attributes, such as the number of write.tasks. Automatically bring > these attributes when creating tables to reduce development workload -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5048) add CopyToTempView support
[ https://issues.apache.org/jira/browse/HUDI-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5048: - Fix Version/s: 0.13.0 (was: 0.12.2) > add CopyToTempView support > -- > > Key: HUDI-5048 > URL: https://issues.apache.org/jira/browse/HUDI-5048 > Project: Apache Hudi > Issue Type: New Feature >Reporter: scx >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Before, when we used spark sql, we still didn't have a good way to > incrementally read and time travel the hudi table. So, I added the > CopyToTempView Procedure. This method will register the hudi table with > spark's temporary view table, and data development can directly access the > view table for different ways of reading. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4809) Hudi Support AWS Glue DropPartitions
[ https://issues.apache.org/jira/browse/HUDI-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4809: - Fix Version/s: 0.13.0 > Hudi Support AWS Glue DropPartitions > - > > Key: HUDI-4809 > URL: https://issues.apache.org/jira/browse/HUDI-4809 > Project: Apache Hudi > Issue Type: New Feature > Components: metadata >Reporter: XixiHua >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4809) Hudi Support AWS Glue DropPartitions
[ https://issues.apache.org/jira/browse/HUDI-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4809: - Fix Version/s: (was: 0.12.2) > Hudi Support AWS Glue DropPartitions > - > > Key: HUDI-4809 > URL: https://issues.apache.org/jira/browse/HUDI-4809 > Project: Apache Hudi > Issue Type: New Feature > Components: metadata >Reporter: XixiHua >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5168) Flink metrics integration
[ https://issues.apache.org/jira/browse/HUDI-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5168: - Fix Version/s: 0.13.0 (was: 0.12.2) > Flink metrics integration > - > > Key: HUDI-5168 > URL: https://issues.apache.org/jira/browse/HUDI-5168 > Project: Apache Hudi > Issue Type: Epic > Components: flink, flink-sql >Reporter: Zhaojing Yu >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5334) Get checkpoint from non-completed instant
[ https://issues.apache.org/jira/browse/HUDI-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5334: - Fix Version/s: 0.13.0 (was: 0.12.2) > Get checkpoint from non-completed instant > - > > Key: HUDI-5334 > URL: https://issues.apache.org/jira/browse/HUDI-5334 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Original issue https://github.com/apache/hudi/issues/7375 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5318) Clustering schduling now will list all partition in table when PARTITION_SELECTED is set
[ https://issues.apache.org/jira/browse/HUDI-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5318: - Fix Version/s: 0.13.0 (was: 0.12.2) > Clustering schduling now will list all partition in table when > PARTITION_SELECTED is set > > > Key: HUDI-5318 > URL: https://issues.apache.org/jira/browse/HUDI-5318 > Project: Apache Hudi > Issue Type: Bug > Components: clustering >Reporter: Qijun Fu >Assignee: Qijun Fu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Currently PartitionAwareClusteringPlanStrategy will list all partition in > table whether PARTITION_SELECTED is set or not. List all partition in the > dataset is a very expensive operation when the number of partition is huge. > We can skip list all partition when PARTITION_SELECTED is set, so that > clustering scheduling can benefit a lot from partition pruning. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5229) Add flink avro version entry in root pom
[ https://issues.apache.org/jira/browse/HUDI-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5229: - Fix Version/s: (was: 0.12.2) > Add flink avro version entry in root pom > > > Key: HUDI-5229 > URL: https://issues.apache.org/jira/browse/HUDI-5229 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5220) failed to snapshot query in hive when query a empty partition
[ https://issues.apache.org/jira/browse/HUDI-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5220: - Fix Version/s: (was: 0.12.2) > failed to snapshot query in hive when query a empty partition > -- > > Key: HUDI-5220 > URL: https://issues.apache.org/jira/browse/HUDI-5220 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: yuehanwang >Priority: Major > Labels: pull-request-available > > When query a empty partition hive will return a empty file in split path. > This path will be added as a NonHoodieInputPaths. In this case > HoodieParquetRealtimeInputFormat read a file split rather than a > RealtimeSplit. Throw a exception: > HoodieRealtimeRecordReader can only work on RealtimeSplit and not with > hdfs://test-cluster/tmp/hive/20220520/hive/4273589d-49be-4a60-9890-a29660d81927/hive_2022-11-14_11-32-41_221_5694963332005566615-17/-mr-10004/74adf5bb-b07e-4eac-a90b-1b5a7fc3d5c4/emptyFile:0+466 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5270) Duplicate key error when insert_overwrite same partition in multi writer
[ https://issues.apache.org/jira/browse/HUDI-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5270: - Fix Version/s: 0.13.0 (was: 0.12.2) > Duplicate key error when insert_overwrite same partition in multi writer > > > Key: HUDI-5270 > URL: https://issues.apache.org/jira/browse/HUDI-5270 > Project: Apache Hudi > Issue Type: Bug > Components: multi-writer, spark-sql >Affects Versions: 0.11.0 >Reporter: weiming >Assignee: weiming >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > If the occ is enabled for hudi spark table, multiple threads insert_overwrite > the same partition. The data of the later task should overwrite the data of > the previous task. However, an error occurs. > {code:java} > // execute sql insert overwrite same partition > ##THREAD-1 EXECUTE SQL > insert overwrite table hudi_test_wm1_mor_02 partition (dt = '2021-12-14',hh = > '6') select id,name,price,ts from hudi_test_wm1_mor_01 where dt='2021-12-11' > and hh ='2'; > ##THREAD-2 EXECUTE SQL > insert overwrite table hudi_test_wm1_mor_02 partition (dt = '2021-12-14',hh = > '6') select id,name,price,ts from hudi_test_wm1_mor_01 where dt='2021-12-11' > and hh ='4'; {code} > {code:java} > // ERROR LOG > 22/11/07 15:24:53 ERROR SparkSQLDriver: Failed in [insert overwrite table > hudi_test_wm1_mor_02 partition (dt = '2021-12-14',hh = '6') select > id,name,price,ts from hudi_test_wm1_mor_01 where dt='2021-12-11' and hh > ='4']java.lang.IllegalStateException: Duplicate key > [20221107152403967__replacecommit__COMPLETED]at > java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133) > at java.util.HashMap.merge(HashMap.java:1245)at > java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)at > java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270) > at java.util.Iterator.forEachRemaining(Iterator.java:116)at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:244) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:108) > at > org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:108) > at > org.apache.hudi.co
[jira] [Updated] (HUDI-5174) Clustering w/ two multi-writers could lead to issues
[ https://issues.apache.org/jira/browse/HUDI-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5174: - Fix Version/s: (was: 0.12.2) > Clustering w/ two multi-writers could lead to issues > > > Key: HUDI-5174 > URL: https://issues.apache.org/jira/browse/HUDI-5174 > Project: Apache Hudi > Issue Type: Bug > Components: clustering, table-service >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > > if two writers have enabled clustering, each could rollback the clustering > that the other writer is currently executing and could lead to unrecoverable > issues. > > > {code:java} > t1 t2 > ➝ t > writer1 |-| > writer2 |--|{code} > lets say writer1 starts a clustering at t1. > and then writer2 starts clustering at time t2. at this time, it will rollback > the clustering started at time t1. but writer 1 could still be continuing to > execute the clustering. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5177) Revisit HiveIncrPullSource and JdbcSource for interleaved inflight commits
[ https://issues.apache.org/jira/browse/HUDI-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5177: - Fix Version/s: (was: 0.12.2) > Revisit HiveIncrPullSource and JdbcSource for interleaved inflight commits > -- > > Key: HUDI-5177 > URL: https://issues.apache.org/jira/browse/HUDI-5177 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Jonathan Vexler >Priority: Critical > > HUDI-5176 > We have fixed the Hudi incremental source when there are inflight commits > before completed commits. We need to revisit the logic for > HiveIncrPullSource and JdbcSource as well regarding the same scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5171) Ensure validateTableConfig also checks for partition path field value switch
[ https://issues.apache.org/jira/browse/HUDI-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5171: - Fix Version/s: 0.13.0 (was: 0.12.2) > Ensure validateTableConfig also checks for partition path field value switch > > > Key: HUDI-5171 > URL: https://issues.apache.org/jira/browse/HUDI-5171 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Affects Versions: 0.12.1 >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > as of now, validateTableConfig does not consider change in partition path > field value switch. we need to consider that as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5107) Fix hadoop config in DirectWriteMarkers, HoodieFlinkEngineContext and StreamerUtil are not consistent issue
[ https://issues.apache.org/jira/browse/HUDI-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5107: - Fix Version/s: (was: 0.12.2) > Fix hadoop config in DirectWriteMarkers, HoodieFlinkEngineContext and > StreamerUtil are not consistent issue > --- > > Key: HUDI-5107 > URL: https://issues.apache.org/jira/browse/HUDI-5107 > Project: Apache Hudi > Issue Type: Bug >Reporter: JinxinTang >Assignee: JinxinTang >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5069) TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky
[ https://issues.apache.org/jira/browse/HUDI-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5069: - Fix Version/s: (was: 0.12.2) > TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky > --- > > Key: HUDI-5069 > URL: https://issues.apache.org/jira/browse/HUDI-5069 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: xi chaomin >Priority: Major > Labels: pull-request-available > > {code:java} > org.opentest4j.AssertionFailedError: Expect baseInstant to be less than or > equal to latestDeltaCommit ==> > Expected :true > Actual :false > > at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) > at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40) > at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:193) > at > org.apache.hudi.table.action.compact.CompactionTestBase.lambda$validateDeltaCommit$0(CompactionTestBase.java:103) > at java.util.ArrayList.forEach(ArrayList.java:1257) > at > org.apache.hudi.table.action.compact.CompactionTestBase.validateDeltaCommit(CompactionTestBase.java:95) > at > org.apache.hudi.table.action.compact.CompactionTestBase.runNextDeltaCommits(CompactionTestBase.java:148) > at > org.apache.hudi.table.action.compact.TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime(TestInlineCompaction.java:227) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5017) Modify the logic of defaultMode in BootstrapRegexModeSelector
[ https://issues.apache.org/jira/browse/HUDI-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5017: - Fix Version/s: (was: 0.12.2) > Modify the logic of defaultMode in BootstrapRegexModeSelector > - > > Key: HUDI-5017 > URL: https://issues.apache.org/jira/browse/HUDI-5017 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: 董可伦 >Assignee: 董可伦 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4876) DT archival is blocked by MDT compaction
[ https://issues.apache.org/jira/browse/HUDI-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4876: - Fix Version/s: (was: 0.12.2) > DT archival is blocked by MDT compaction > > > Key: HUDI-4876 > URL: https://issues.apache.org/jira/browse/HUDI-4876 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: sivabalan narayanan >Priority: Major > > Reference GitHub Issue: > [https://github.com/apache/hudi/issues/6716] > > If ONLY INSERT-OVERWRITEs are performed on a DT, MDT will not be compacted, > causing DT commits to not be archived. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5078) When applying changes to MDT, any replace commit is considered a table service
[ https://issues.apache.org/jira/browse/HUDI-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5078: - Fix Version/s: (was: 0.12.2) > When applying changes to MDT, any replace commit is considered a table service > -- > > Key: HUDI-5078 > URL: https://issues.apache.org/jira/browse/HUDI-5078 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > > Table services in metadata table can only be invoked by non table service > operations from data table. in other words, compaction, clustering from data > table cannot trigger compaction in MDT. > but we mistakenly considered any replace commit as a table service. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4629) Create hive table from existing hoodie Table failed when the table schema is not defined
[ https://issues.apache.org/jira/browse/HUDI-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4629: - Fix Version/s: (was: 0.12.2) > Create hive table from existing hoodie Table failed when the table schema is > not defined > > > Key: HUDI-4629 > URL: https://issues.apache.org/jira/browse/HUDI-4629 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: 董可伦 >Assignee: 董可伦 >Priority: Major > Labels: pull-request-available > > Create hive table from existing hoodie Table failed when the table schema is > not defined > {code:java} > WARN CreateHoodieTableCommand: Failed to create catalog table in metastore: > org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be > specified for the table{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4852) Incremental sync not updating pending file groups under clustering
[ https://issues.apache.org/jira/browse/HUDI-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4852: - Fix Version/s: (was: 0.12.2) > Incremental sync not updating pending file groups under clustering > -- > > Key: HUDI-4852 > URL: https://issues.apache.org/jira/browse/HUDI-4852 > Project: Apache Hudi > Issue Type: Bug >Reporter: Surya Prasanna Yalla >Assignee: Surya Prasanna Yalla >Priority: Major > > Pending file groups under clustering are not updated through incremental sync > calls. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4625) Clean up KafkaOffsetGen
[ https://issues.apache.org/jira/browse/HUDI-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4625: - Fix Version/s: (was: 0.12.2) > Clean up KafkaOffsetGen > --- > > Key: HUDI-4625 > URL: https://issues.apache.org/jira/browse/HUDI-4625 > Project: Apache Hudi > Issue Type: Bug > Components: deltastreamer >Reporter: Alexey Kudinkin >Priority: Major > > There are a few issues w/in KafkaOffsetGen that we should follow-up on > annotated w/ corresponding TODOs: > # Using proper retrying client (instead of using sleeps for coordination) > # Cleaning up incorrect assertions -- This message was sent by Atlassian Jira (v8.20.10#820010)