from:"satish"

[jira] [Updated] (HUDI-5353) Close file reader wherever missing

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5353:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Close file reader wherever missing
> --
>
> Key: HUDI-5353
> URL: https://issues.apache.org/jira/browse/HUDI-5353
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> If not closed, open file handles could lead to 
> {code:java}
> java.io.InterruptedIOException: getFileStatus on 
> s3a://bucket/base/path/274df949-03a5-4837-840f-a0b558b82827-0_0-9095-234238_20221206220929477.parquet:
>  com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from pool {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5353) Close file reader wherever missing

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5353.

Resolution: Fixed

> Close file reader wherever missing
> --
>
> Key: HUDI-5353
> URL: https://issues.apache.org/jira/browse/HUDI-5353
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> If not closed, open file handles could lead to 
> {code:java}
> java.io.InterruptedIOException: getFileStatus on 
> s3a://bucket/base/path/274df949-03a5-4837-840f-a0b558b82827-0_0-9095-234238_20221206220929477.parquet:
>  com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from pool {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-5353) Close file reader wherever missing

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish resolved HUDI-5353.
--

> Close file reader wherever missing
> --
>
> Key: HUDI-5353
> URL: https://issues.apache.org/jira/browse/HUDI-5353
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> If not closed, open file handles could lead to 
> {code:java}
> java.io.InterruptedIOException: getFileStatus on 
> s3a://bucket/base/path/274df949-03a5-4837-840f-a0b558b82827-0_0-9095-234238_20221206220929477.parquet:
>  com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from pool {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5091) MergeInto syntax merge_condition does not support Non-Equal

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5091:
-
Fix Version/s: (was: 0.12.2)

> MergeInto syntax merge_condition does not support Non-Equal
> ---
>
> Key: HUDI-5091
> URL: https://issues.apache.org/jira/browse/HUDI-5091
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: KnightChess
>Assignee: KnightChess
>Priority: Major
>
> Merge into sql merge condition support Non-equal condition
> https://github.com/apache/hudi/issues/6400



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5393) Remove the reuse of metadata table writer for flink write client

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5393.

Resolution: Resolved

> Remove the reuse of metadata table writer for flink write client
> 
>
> Key: HUDI-5393
> URL: https://issues.apache.org/jira/browse/HUDI-5393
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5350) oom cause compaction event lost

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5350.

Resolution: Resolved

> oom cause compaction event lost
> ---
>
> Key: HUDI-5350
> URL: https://issues.apache.org/jira/browse/HUDI-5350
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: compaction, flink
>Reporter: HBG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5221) Make the decision for flink sql bucket index case-insensitive

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5221.

Resolution: Resolved

> Make the decision for flink sql bucket index case-insensitive
> -
>
> Key: HUDI-5221
> URL: https://issues.apache.org/jira/browse/HUDI-5221
> Project: Apache Hudi
>  Issue Type: Task
>  Components: flink-sql
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5228) Flink table service job fs view conf overwrites the one of writing job

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5228.

Resolution: Resolved

> Flink table service job fs view conf overwrites the one of writing job
> --
>
> Key: HUDI-5228
> URL: https://issues.apache.org/jira/browse/HUDI-5228
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5227) Upgrade Jetty to 9.4.48

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5227.

Resolution: Resolved

> Upgrade Jetty to 9.4.48
> ---
>
> Key: HUDI-5227
> URL: https://issues.apache.org/jira/browse/HUDI-5227
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Rahil Chertara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5412) Send the boostrap event if the JM also rebooted

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5412.

Resolution: Resolved

> Send the boostrap event if the JM also rebooted
> ---
>
> Key: HUDI-5412
> URL: https://issues.apache.org/jira/browse/HUDI-5412
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-3661) Flink async compaction is not thread safe when use watermark

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-3661.

Resolution: Resolved

> Flink async compaction is not thread safe when use watermark
> 
>
> Key: HUDI-3661
> URL: https://issues.apache.org/jira/browse/HUDI-3661
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: hd zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
> Attachments: image-2022-03-18-19-38-39-257.png
>
>
> async compaction will start a executor async compaciton and send compaction 
> result message to next flink operator. But collector.collect() is not a 
> threadsafe function. when use watermark or latencyMarker, they both call 
> collector.collect() may cause issue.
> we should not let async compaction = false
>  
> !image-2022-03-18-19-38-39-257.png!
>  
>  
> !https://git.bilibili.co/datacenter/bili-hudi/uploads/79608d01b0301de84d1d9e3cf24f1d21/image.png!
>  
> !https://git.bilibili.co/datacenter/bili-hudi/uploads/e9c2f27d395e708a407bcf40f672c870/image.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5223) Partial failover for flink

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5223.

Resolution: Resolved

> Partial failover for flink
> --
>
> Key: HUDI-5223
> URL: https://issues.apache.org/jira/browse/HUDI-5223
> Project: Apache Hudi
>  Issue Type: Task
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5373) Different fileids are assigned to the same bucket

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-5373.

Resolution: Resolved

>  Different fileids are assigned to the same bucket
> --
>
> Key: HUDI-5373
> URL: https://issues.apache.org/jira/browse/HUDI-5373
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: loukey_j
>Assignee: loukey_j
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>
> partition =30 bucketNum=11 
> bucketId = 3011
> partition =301 bucketNum=1
> bucketId = 3011
>  
> Different fileids are assigned to the same bucket
> final String bucketId = partition  + bucketNum;
> if (incBucketIndex.contains(bucketId)) {
> location = new HoodieRecordLocation("I", bucketToFileId.get(bucketNum));
> } else if (bucketToFileId.containsKey(bucketNum)) {
> location = new HoodieRecordLocation("U", bucketToFileId.get(bucketNum));
> } else {
> String newFileId = BucketIdentifier.newBucketFileIdPrefix(bucketNum);
> location = new HoodieRecordLocation("I", newFileId);
> bucketToFileId.put(bucketNum, newFileId);
> incBucketIndex.add(bucketId);
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5372) Fix NPE caused by alter table add column

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5372:
-
Fix Version/s: (was: 0.12.2)

> Fix NPE caused by alter table add column
> 
>
> Key: HUDI-5372
> URL: https://issues.apache.org/jira/browse/HUDI-5372
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5101) Adding spark structured streaming tests to integ tests

2022-12-26 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5101:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Adding spark structured streaming tests to integ tests
> --
>
> Key: HUDI-5101
> URL: https://issues.apache.org/jira/browse/HUDI-5101
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

svn commit: r58988 - in /dev/hudi/hudi-0.12.2: ./ hudi-0.12.2.src.tgz hudi-0.12.2.src.tgz.asc hudi-0.12.2.src.tgz.sha512

2022-12-26 Thread satish

Author: satish
Date: Mon Dec 26 08:52:59 2022
New Revision: 58988

Log:
Add hudi-0.12.2 release binaries

Added:
dev/hudi/hudi-0.12.2/
dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz   (with props)
dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc
dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512

Added: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz
==
Binary file - no diff available.

Propchange: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz
--
svn:mime-type = application/octet-stream

Added: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc
==
--- dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc (added)
+++ dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.asc Mon Dec 26 08:52:59 2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEEbaCzmhPCZY0irn0U0IxLa9mOplkFAmOpXpoACgkQ0IxLa9mO
+plntVg//aEPjDc03kzSuShWjcmdU94OuBoMW+j1urw43UA+bmC1ENC65HuxfUvVO
+nAQW6ZiHsHSKAGZBHP846jZIKXRfIQMVNv/Yj+fFAtsKC4UliAKfof5+srwzveZf
+NKa0zyurYKxwPbFjy/8jZSyO91Hwf22sx+oe7NkcuaY/7s7cVTs8Qu45kH6VAUQG
+WOOSpDCTPGaHPknUhQ/kiGdIlQSvpzMdsmZIYKOmyWUeF0LvtbTg0bOe/s2yKbpJ
+7A55Xq2pTc0vx3icJmwZCDuUCDeFeB5bMSi+j3pmDpar1lX5OUhpgkO+hg9Riz6b
+lloiRRDpeNfbll9gJxSjOXvuS64CUIo6hffQ3OywQj0wCVZIDPtKynSMrBjHmNUh
+kQibDwoDKMlwDWCrnn/v3UHl2c1XhjgWnhMI848VQFaKWC1qlzKGrlYhQl2YEZrL
+e4NlENM75rKYSf+QUOTRo76/bXlBumuySnXg+r7NAFcXsZMr4p91mig6HwXE7VvW
+zSPbMTfzZHOvAY/9OOJK5wxCuLp2n0+2WwSex7Jcn8Kd0slOHGDNuY2JhByxEmN3
+IGx7vuqq4nVScSleFqeEmdL7lnPffX8RgHXJncaDxbRKruyFie3DrzpKDXsSESzG
+g70ZQBTmi06uGacg8U8m2S2MpSMKpuRSuxoWxNRsy/rWPbf8HMo=
+=7LE4
+-END PGP SIGNATURE-

Added: dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512
==
--- dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512 (added)
+++ dev/hudi/hudi-0.12.2/hudi-0.12.2.src.tgz.sha512 Mon Dec 26 08:52:59 2022
@@ -0,0 +1 @@
+8cb2cf9844c1280fa0a16371a7e39103f09d8a48eae57f2a9c7861db245a3c41625c0012472b553699dcb97495f224290cbd6657120d017496da385474d12b8e
  hudi-0.12.2.src.tgz

[hudi] branch release-0.12.2 updated (94db72e2c9 -> aea5bb6f0a)

2022-12-24 Thread satish

This is an automated email from the ASF dual-hosted git repository.

satish pushed a change to branch release-0.12.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 94db72e2c9 Bumping mvn version to 0.12.2-1
 add 975eb91b21 [HUDI-5357] Fix release build commands (#7501)
 add aea5bb6f0a [MINOR] Update release version to reflect published version 
0.12.2

No new revisions were added by this update.

Summary of changes:
 docker/hoodie/hadoop/base/pom.xml  |  2 +-
 docker/hoodie/hadoop/base_java11/pom.xml   |  2 +-
 docker/hoodie/hadoop/datanode/pom.xml  |  2 +-
 docker/hoodie/hadoop/historyserver/pom.xml |  2 +-
 docker/hoodie/hadoop/hive_base/pom.xml |  2 +-
 docker/hoodie/hadoop/namenode/pom.xml  |  2 +-
 docker/hoodie/hadoop/pom.xml   |  2 +-
 docker/hoodie/hadoop/prestobase/pom.xml|  2 +-
 docker/hoodie/hadoop/spark_base/pom.xml|  2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml|  2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml   |  2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml   |  2 +-
 docker/hoodie/hadoop/trinobase/pom.xml |  2 +-
 docker/hoodie/hadoop/trinocoordinator/pom.xml  |  2 +-
 docker/hoodie/hadoop/trinoworker/pom.xml   |  2 +-
 hudi-aws/pom.xml   |  4 +--
 hudi-cli/pom.xml   |  2 +-
 hudi-client/hudi-client-common/pom.xml |  4 +--
 hudi-client/hudi-flink-client/pom.xml  |  4 +--
 hudi-client/hudi-java-client/pom.xml   |  4 +--
 hudi-client/hudi-spark-client/pom.xml  |  4 +--
 hudi-client/pom.xml|  2 +-
 hudi-common/pom.xml|  2 +-
 hudi-examples/hudi-examples-common/pom.xml |  2 +-
 hudi-examples/hudi-examples-flink/pom.xml  |  2 +-
 hudi-examples/hudi-examples-java/pom.xml   |  2 +-
 hudi-examples/hudi-examples-spark/pom.xml  |  2 +-
 hudi-examples/pom.xml  |  2 +-
 hudi-flink-datasource/hudi-flink/pom.xml   |  4 +--
 hudi-flink-datasource/hudi-flink1.13.x/pom.xml |  4 +--
 hudi-flink-datasource/hudi-flink1.14.x/pom.xml |  4 +--
 hudi-flink-datasource/hudi-flink1.15.x/pom.xml |  4 +--
 hudi-flink-datasource/pom.xml  |  4 +--
 hudi-gcp/pom.xml   |  2 +-
 hudi-hadoop-mr/pom.xml |  2 +-
 hudi-integ-test/pom.xml|  2 +-
 hudi-kafka-connect/pom.xml |  4 +--
 hudi-spark-datasource/hudi-spark-common/pom.xml|  4 +--
 hudi-spark-datasource/hudi-spark/pom.xml   |  4 +--
 hudi-spark-datasource/hudi-spark2-common/pom.xml   |  2 +-
 hudi-spark-datasource/hudi-spark2/pom.xml  |  4 +--
 hudi-spark-datasource/hudi-spark3-common/pom.xml   |  2 +-
 hudi-spark-datasource/hudi-spark3.1.x/pom.xml  |  4 +--
 hudi-spark-datasource/hudi-spark3.2.x/pom.xml  |  4 +--
 .../hudi-spark3.2plus-common/pom.xml   |  2 +-
 hudi-spark-datasource/hudi-spark3.3.x/pom.xml  |  4 +--
 hudi-spark-datasource/pom.xml  |  2 +-
 hudi-sync/hudi-adb-sync/pom.xml|  2 +-
 hudi-sync/hudi-datahub-sync/pom.xml|  2 +-
 hudi-sync/hudi-hive-sync/pom.xml   |  2 +-
 hudi-sync/hudi-sync-common/pom.xml |  2 +-
 hudi-sync/pom.xml  |  2 +-
 hudi-tests-common/pom.xml  |  2 +-
 hudi-timeline-service/pom.xml  |  2 +-
 hudi-utilities/pom.xml |  2 +-
 packaging/hudi-aws-bundle/pom.xml  |  2 +-
 packaging/hudi-datahub-sync-bundle/pom.xml |  2 +-
 packaging/hudi-flink-bundle/pom.xml|  2 +-
 packaging/hudi-gcp-bundle/pom.xml  |  2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml|  2 +-
 packaging/hudi-hive-sync-bundle/pom.xml|  2 +-
 packaging/hudi-integ-test-bundle/pom.xml   |  2 +-
 packaging/hudi-kafka-connect-bundle/pom.xml|  2 +-
 packaging/hudi-presto-bundle/pom.xml   |  2 +-
 packaging/hudi-spark-bundle/pom.xml|  2 +-
 packaging/hudi-timeline-server-bundle/pom.xml  |  2 +-
 packaging/hudi-trino-bundle/pom.xml|  2 +-
 packaging/hudi-utilities-bundle/pom.xml|  2 +-
 packaging/hudi-utilities-slim-bundle/pom.xml   |  2 +-
 pom.xml|  2 +-
 scripts/release/deploy_staging_jars.sh | 37 ++
 scripts/release/validate_staged_bundles.sh |  4 +--
 72 files changed, 111 insertions(+), 104 deletions(-)

[hudi] annotated tag release-0.12.2 updated (aea5bb6f0a -> db9e7e8830)

2022-12-24 Thread satish

This is an automated email from the ASF dual-hosted git repository.

satish pushed a change to annotated tag release-0.12.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


*** WARNING: tag release-0.12.2 was modified! ***

from aea5bb6f0a (commit)
  to db9e7e8830 (tag)
 tagging aea5bb6f0ab824247f5e3498762ad94f643a2cb6 (commit)
 replaces release-0.12.2-rc1
  by Satish Kotha
  on Sat Dec 24 15:51:59 2022 -0800

- Log -
0.12.2
-BEGIN PGP SIGNATURE-

iQIzBAABCAAdFiEEbaCzmhPCZY0irn0U0IxLa9mOplkFAmOnkJ8ACgkQ0IxLa9mO
pllbCg/+MsCqEWauNhqd6VjY3+eP/Ii1Un6/7xP30dbMMuMMOIFW5MrPjAO1ceRM
6jzizpp/TKSRJ8JtHLU/cF36H4v3jt8VrUjGbAX+HAhiDUSo5q+n/fivZKlXNFtZ
BXu+CqiTMC1eZRKAcx9Yo9B4wxpIDX3VMXVo9Pjwheg7PzZlBUgrI8zDu51v0qUI
IQahgUxeQKlABEd11G1m9o6bANw/KfMl2bKRxn/ZbUntX61oiwxGYlQF95M09n8f
aWj2BaigYN3wk0csUO326mPxXJz126Xx6A7kDiXu0yNpg2WMB4k+xTB3WIodnXC/
9cWP7l2/yLe4YfCDAraJgAeNxUTGl9t2dijieSVwgTfmx/XOKGWejSI6JGW1XYiH
jHYzYnY4n2sMnIgLk+5p8TIdTxR2JyLn9hI1hzcNhABMQFVUlxUH9qKP7aLMoHd6
PMmsfOEIhFscG8H6rG8YJnsEqffFjRideFvqdvegCtp5m5577NyMy1wGFf09+QHj
iC/CXas3gN3YYOk6j5/bbPzJCsDDh1kAnbwl7yPTMFibeAgX0bCWV9BgEIALq+/c
uH4DIEKof+z6uxNO244kyGJp6GWCCnLTQBQNmne7DTiJcJvceAYyCT+f6TRPw7/8
Q1+2xyg5AGL1IBBmmIUwmcWKk40OFGTIWijESVDVae1+mdPrdLY=
=kFeM
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:

[jira] [Updated] (HUDI-5022) Add better error messages to pr compliance

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5022:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add better error messages to pr compliance
> --
>
> Key: HUDI-5022
> URL: https://issues.apache.org/jira/browse/HUDI-5022
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: code-quality, dev-experience, docs, tests-ci
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When the pr compliance fails, the messages could be more helpful to users



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4970) hudi-kafka-connect-bundle: Could not initialize class org.apache.hadoop.security.UserGroupInformation

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4970:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> hudi-kafka-connect-bundle: Could not initialize class 
> org.apache.hadoop.security.UserGroupInformation
> -
>
> Key: HUDI-4970
> URL: https://issues.apache.org/jira/browse/HUDI-4970
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The Kafka connect sink loads successfully but fails to sync Hudi table due to 
> NoClassDefFoundError: Could not initialize class 
> org.apache.hadoop.security.UserGroupInformation
> {code:java}
> [2022-10-03 14:31:49,872] INFO The value of 
> hoodie.datasource.write.keygenerator.type is empty, using SIMPLE 
> (org.apache.hudi.keygen.factory.HoodieAvroKeyGeneratorFactory:63)[2022-10-03 
> 14:31:49,872] INFO Setting record key volume and partition fields date for 
> table file:///tmp/hoodie/hudi-test-topichudi-test-topic 
> (org.apache.hudi.connect.writers.KafkaConnectTransactionServices:93)[2022-10-03
>  14:31:49,872] INFO Initializing file:///tmp/hoodie/hudi-test-topic as hoodie 
> table file:///tmp/hoodie/hudi-test-topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient:424)[2022-10-03 
> 14:31:49,872] INFO Existing partitions deleted [hudi-test-topic-0] 
> (org.apache.hudi.connect.HoodieSinkTask:156)[2022-10-03 14:31:49,872] ERROR 
> WorkerSinkTask{id=hudi-sink-3} Task threw an uncaught and unrecoverable 
> exception. Task is being killed and will not recover until manually restarted 
> (org.apache.kafka.connect.runtime.WorkerTask:184)java.lang.NoClassDefFoundError:
>  Could not initialize class org.apache.hadoop.security.UserGroupInformation   
> at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3431) 
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3421)   
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3263)  at 
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475) at 
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)   at 
> org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:110)at 
> org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:103)at 
> org.apache.hudi.common.table.HoodieTableMetaClient.initTableAndGetMetaClient(HoodieTableMetaClient.java:426)
>  at 
> org.apache.hudi.common.table.HoodieTableMetaClient$PropertyBuilder.initTable(HoodieTableMetaClient.java:1110)
> at 
> org.apache.hudi.connect.writers.KafkaConnectTransactionServices.(KafkaConnectTransactionServices.java:104)
>  at 
> org.apache.hudi.connect.transaction.ConnectTransactionCoordinator.(ConnectTransactionCoordinator.java:88)
>   at 
> org.apache.hudi.connect.HoodieSinkTask.bootstrap(HoodieSinkTask.java:191)
> at org.apache.hudi.connect.HoodieSinkTask.open(HoodieSinkTask.java:151) at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:635)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:71){code}
> Follow [https://github.com/apache/hudi/tree/master/hudi-kafka-connect#readme] 
> to reproduce.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5285) Exclude hive-site.xml from packaging in hudi-utilities

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5285:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Exclude hive-site.xml from packaging in hudi-utilities
> --
>
> Key: HUDI-5285
> URL: https://issues.apache.org/jira/browse/HUDI-5285
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> the spark cluster can fail to access the external hive source normally due to 
> conflict with hive-site.xml packaged with hudi



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4963) Extend InProcessLockProvider to support multiple table ingestion

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4963:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Extend InProcessLockProvider to support multiple table ingestion
> 
>
> Key: HUDI-4963
> URL: https://issues.apache.org/jira/browse/HUDI-4963
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5404) add flink bundle validation

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5404:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> add flink bundle validation
> ---
>
> Key: HUDI-5404
> URL: https://issues.apache.org/jira/browse/HUDI-5404
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Make flink bundles validated via GitHub actions CI



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4605) Upgrade hudi-presto-bundle version to 0.12.0

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4605:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Upgrade hudi-presto-bundle version to 0.12.0
> 
>
> Key: HUDI-4605
> URL: https://issues.apache.org/jira/browse/HUDI-4605
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5145) Remove HDFS from DeltaStreamer UT/FT

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5145:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Remove HDFS from DeltaStreamer UT/FT
> 
>
> Key: HUDI-5145
> URL: https://issues.apache.org/jira/browse/HUDI-5145
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5131) Bundle validation: upgrade/downgrade

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5131:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Bundle validation: upgrade/downgrade
> 
>
> Key: HUDI-5131
> URL: https://issues.apache.org/jira/browse/HUDI-5131
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5132) Bundle validation: Hive QL 3

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5132:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Bundle validation: Hive QL 3
> 
>
> Key: HUDI-5132
> URL: https://issues.apache.org/jira/browse/HUDI-5132
> Project: Apache Hudi
>  Issue Type: Test
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5371) Fix flaky testMetadataColumnStatsIndex

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5371:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Fix flaky testMetadataColumnStatsIndex
> --
>
> Key: HUDI-5371
> URL: https://issues.apache.org/jira/browse/HUDI-5371
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Sagar Sumit
>Priority: Major
> Fix For: 0.13.0
>
>
> The test started flaking after [https://github.com/apache/hudi/pull/7349]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5099) Update stock data so that new records are added in batch_2

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5099:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Update stock data so that new records are added in batch_2
> --
>
> Key: HUDI-5099
> URL: https://issues.apache.org/jira/browse/HUDI-5099
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The record key is  "\{stock name}_\{date} \{hour}". We have the data from 
> 9:30-10:29 in batch_1 and batch_2 contains data from 10:30-10:59. This means 
> that no new records are introduced, and therefore, only updates occur when 
> ingesting batch_2. This makes validation of the data take too long for our 
> testing. Proposed solution is to move the data from 10:00-10:29 into batch_2 
> so that we will have updates and inserts in both files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5200) Resources are not cleaned up in UT

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5200:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Resources are not cleaned up in UT
> --
>
> Key: HUDI-5200
> URL: https://issues.apache.org/jira/browse/HUDI-5200
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: zouxxyy
>Assignee: zouxxyy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Resources are not cleaned up at UT



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4209) Avoid using HDFS in HoodieClientTestHarness

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4209:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Avoid using HDFS in HoodieClientTestHarness
> ---
>
> Key: HUDI-4209
> URL: https://issues.apache.org/jira/browse/HUDI-4209
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Sagar Sumit
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4982) Make bundle combination testing covered in CI

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4982:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Make bundle combination testing covered in CI
> -
>
> Key: HUDI-4982
> URL: https://issues.apache.org/jira/browse/HUDI-4982
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Raymond Xu
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> this is to cover 
> - spark-bundle 
> - utilities-bundle
> - utilities-slim-bundle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5098) Enable Spark2.4 bundle testing in GH Actions

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5098:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Enable Spark2.4 bundle testing in GH Actions
> 
>
> Key: HUDI-5098
> URL: https://issues.apache.org/jira/browse/HUDI-5098
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Jonathan Vexler
>Priority: Major
> Fix For: 0.13.0
>
>
> Bundle testing works for 3.1,3.2,3.3, but there was a hive setup issue that 
> wasn't being handled properly. Because we have azure-ci running with 2.4, we 
> decided to resolve this issue in the future



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2673) Add integration/e2e test for kafka-connect functionality

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-2673:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add integration/e2e test for kafka-connect functionality
> 
>
> Key: HUDI-2673
> URL: https://issues.apache.org/jira/browse/HUDI-2673
> Project: Apache Hudi
>  Issue Type: Test
>  Components: kafka-connect, tests-ci
>Reporter: Ethan Guo
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The integration test should use bundle jar and run in docker setup.  This can 
> prevent any issue in the bundle, like HUDI-3903, that is not covered by unit 
> and functional tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5358) Fix flaky tests in TestCleanerInsertAndCleanByCommits

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5358:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Fix flaky tests in TestCleanerInsertAndCleanByCommits
> -
>
> Key: HUDI-5358
> URL: https://issues.apache.org/jira/browse/HUDI-5358
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> In the tests, the {{KEEP_LATEST_COMMITS}} cleaner policy is used. This policy 
> first figures out the earliest commit to retain based on the config of the 
> number of retained commits ({{{}hoodie.cleaner.commits.retained{}}}). Then, 
> for each file group, one more version before the earliest commit to retain is 
> also kept from cleaning. The commit for the version can be different among 
> file groups. 
> However, the current validation logic only statically picks the one commit 
> before the earliest commit to retain in the Hudi timeline for all file 
> groups, which does not match the {{KEEP_LATEST_COMMITS}} cleaner policy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5330) Add docs for virtual keys

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5330:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add docs for virtual keys
> -
>
> Key: HUDI-5330
> URL: https://issues.apache.org/jira/browse/HUDI-5330
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently, the virtual key support is only presented in a blog: 
> [https://hudi.apache.org/blog/2021/08/18/virtual-keys/#virtual-key-support.]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5339) Update docs regarding the behavior change in NONE sort mode for bulk insert

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5339:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Update docs regarding the behavior change in NONE sort mode for bulk insert
> ---
>
> Key: HUDI-5339
> URL: https://issues.apache.org/jira/browse/HUDI-5339
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5295) With multiple meta syncs, one meta sync failure should not impact other meta syncs.

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5295:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> With multiple meta syncs, one meta sync failure should not impact other meta 
> syncs.
> ---
>
> Key: HUDI-5295
> URL: https://issues.apache.org/jira/browse/HUDI-5295
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer, meta-sync, spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> For example, if you are using HMS and glue, if HMS sync fails, we should 
> still sync with glue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5343:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> HoodieFlinkStreamer supports async clustering for append mode
> -
>
> Key: HUDI-5343
> URL: https://issues.apache.org/jira/browse/HUDI-5343
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> HoodieFlinkStreamer supports async clustering for append mode, which keeps 
> the consistent with the pipeline of HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5292) Exclude the test resources from every module packaging

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5292:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Exclude the test resources from every module packaging
> --
>
> Key: HUDI-5292
> URL: https://issues.apache.org/jira/browse/HUDI-5292
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Sagar Sumit
>Priority: Major
> Fix For: 0.13.0
>
>
> Exclude the test resources, especially the properties files that conflict 
> with user-provided resources, from every module. This is a followup to 
> https://github.com/apache/hudi/pull/7310#issuecomment-1328728297



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5294) Support type change for schema on read enable + reconcile schema

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5294:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Support type change for schema on read enable + reconcile schema
> 
>
> Key: HUDI-5294
> URL: https://issues.apache.org/jira/browse/HUDI-5294
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Tao Meng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> https://github.com/apache/hudi/issues/7283



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5283) Replace deprecated method Schema.parse with Schema.Parser

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5283:
-
Fix Version/s: (was: 0.12.2)

> Replace deprecated method Schema.parse with Schema.Parser
> -
>
> Key: HUDI-5283
> URL: https://issues.apache.org/jira/browse/HUDI-5283
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading the code, I found that 
> HoodieBootstrapSchemaProvider#getBootstrapSchema uses the deprecated method 
> Schema.parse, which can be replaced by Schema.Parser().parse(),
> At the same time, I searched at the moudle level, only to find that this 
> place uses an deprecated method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5293) Schema on read + reconcile schema fails w/ 0.12.1

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5293:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Schema on read + reconcile schema fails w/ 0.12.1
> -
>
> Key: HUDI-5293
> URL: https://issues.apache.org/jira/browse/HUDI-5293
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> if I do schema on read on commit1 and then schema on read + reconcile schema 
> for 2nd batch, it fails w/ 
> {code:java}
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> 22/11/28 16:44:26 ERROR BaseSparkCommitActionExecutor: Error upserting 
> bucketType UPDATE for partition :2
> java.lang.IllegalArgumentException: cannot modify hudi meta col: 
> _hoodie_commit_time
>   at 
> org.apache.hudi.internal.schema.action.TableChange$BaseColumnChange.checkColModifyIsLegal(TableChange.java:157)
>   at 
> org.apache.hudi.internal.schema.action.TableChanges$ColumnAddChange.addColumns(TableChanges.java:314)
>   at 
> org.apache.hudi.internal.schema.utils.AvroSchemaEvolutionUtils.lambda$reconcileSchema$5(AvroSchemaEvolutionUtils.java:92)
>   at 
> java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2969)
>   at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
>   at 
> org.apache.hudi.internal.schema.utils.AvroSchemaEvolutionUtils.reconcileSchema(AvroSchemaEvolutionUtils.java:80)
>   at 
> org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:103)
>   at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358)
>   at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349)
>   at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
>   at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Thre

[jira] [Updated] (HUDI-5258) Address checkstyle warnings in hudi-common module

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5258:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Address checkstyle warnings in hudi-common module
> -
>
> Key: HUDI-5258
> URL: https://issues.apache.org/jira/browse/HUDI-5258
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dev-experience
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5261) Use proper parallelism for engine context APIs

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5261:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Use proper parallelism for engine context APIs
> --
>
> Key: HUDI-5261
> URL: https://issues.apache.org/jira/browse/HUDI-5261
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: performance
>Reporter: Raymond Xu
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> do a global search of these APIs
> - org.apache.hudi.common.engine.HoodieEngineContext#flatMap
> - org.apache.hudi.common.engine.HoodieEngineContext#map
> and similar ones take in parallelism.
> A lot of occurrences are using number of items as parallelism, which affect 
> performance. Parallelism should be based on num cores available in the 
> cluster and set by user via parallelism configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5269) Enhancing core user flow tests for spark-sql writes

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5269:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Enhancing core user flow tests for spark-sql writes
> ---
>
> Key: HUDI-5269
> URL: https://issues.apache.org/jira/browse/HUDI-5269
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql, tests-ci
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> We triaged some of the core user flows and looks like we don't have a good 
> coverage for those flows. 
>  
>  # 
>  ## {{COW and MOR(w/ and w/o metadata enabled)}}
>  ### {{{}Partitioned(BLOOM, SIMPLE, GLOBAL_BLOOM, }}BUCKET\{{{}){}}}, 
> {{{}non-partitioned(GLOBAL_BLOOM){}}}.
>   
>  # {\{Immutable data. pure bulk_insert row writing. }}
>  # {\{Immutable w/ file sizing. pure inserts. }}
>  # {\{initial bulk ingest, followed by updates. bulk_insert followed by 
> upserts. }}
>  # {{{}regular inserts + updates combined{*}{{*}{ \{{ ** }}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5252) ClusteringCommitSink supports to rollback clustering

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5252:
-
Fix Version/s: (was: 0.12.2)

> ClusteringCommitSink supports to rollback clustering
> 
>
> Key: HUDI-5252
> URL: https://issues.apache.org/jira/browse/HUDI-5252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When commit buffer has failed ClusteringCommitEvent, the ClusteringCommitSink 
> invokes the CompactionUtil#rollbackCompaction to rollback clustering. 
> ClusteringCommitSink should call ClusteringUtil#rollbackClustering to 
> rollback clustering. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5246) Improve validation for partition path

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5246:
-
Fix Version/s: (was: 0.12.2)

> Improve validation for partition path
> -
>
> Key: HUDI-5246
> URL: https://issues.apache.org/jira/browse/HUDI-5246
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Raymond Xu
>Assignee: Hemanth Gowda
>Priority: Minor
>  Labels: hudi-on-call, new-to-hudi, pull-request-available
>
> To fail early if absolute path is set for partition (e.g. with leading `/`)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5241) Optimize HoodieDefaultTimeline API

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5241:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Optimize HoodieDefaultTimeline API
> --
>
> Key: HUDI-5241
> URL: https://issues.apache.org/jira/browse/HUDI-5241
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5246) Improve validation for partition path

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5246:
-
Fix Version/s: 0.13.0

> Improve validation for partition path
> -
>
> Key: HUDI-5246
> URL: https://issues.apache.org/jira/browse/HUDI-5246
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Raymond Xu
>Assignee: Hemanth Gowda
>Priority: Minor
>  Labels: hudi-on-call, new-to-hudi, pull-request-available
> Fix For: 0.13.0
>
>
> To fail early if absolute path is set for partition (e.g. with leading `/`)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5198) add in minor perf wins in hudi-utilities and locking related tests

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5198:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> add in minor perf wins in hudi-utilities and locking related tests
> --
>
> Key: HUDI-5198
> URL: https://issues.apache.org/jira/browse/HUDI-5198
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5234) Streaming read skip clustering instants Configurable

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5234:
-
Fix Version/s: (was: 0.12.2)

> Streaming read skip clustering instants Configurable
> 
>
> Key: HUDI-5234
> URL: https://issues.apache.org/jira/browse/HUDI-5234
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: clustering
>Reporter: zhuanshenbsj1
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5167) Reduce test run time for virtual key tests

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5167:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Reduce test run time for virtual key tests
> --
>
> Key: HUDI-5167
> URL: https://issues.apache.org/jira/browse/HUDI-5167
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> We did parametrized for quite a few tests when we added virtual keys. some of 
> them may not be required. so lets revisit them and reduce whereever 
> applicable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5181) Enhance keygen class validation

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5181:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Enhance keygen class validation
> ---
>
> Key: HUDI-5181
> URL: https://issues.apache.org/jira/browse/HUDI-5181
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: configs
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.13.0
>
>
> Some in-code validations can be added to early alert users who set keygen 
> configs improperly. For example, in TimestampBased keygen, output format 
> cannot be empty.
> We should audit all built-in keygen classes and add UTs and proper 
> validations. This is to improve usability and save time in troubleshooting 
> when misconfig happened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5166) Reduce test run time for top time consuming tests

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5166:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Reduce test run time for top time consuming tests
> -
>
> Key: HUDI-5166
> URL: https://issues.apache.org/jira/browse/HUDI-5166
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5178) Add Call show_table_properties for spark sql

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5178:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add Call show_table_properties for spark sql
> 
>
> Key: HUDI-5178
> URL: https://issues.apache.org/jira/browse/HUDI-5178
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5162) Allow user specified start offset for streaming query

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5162:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Allow user specified start offset for streaming query
> -
>
> Key: HUDI-5162
> URL: https://issues.apache.org/jira/browse/HUDI-5162
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core, spark
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Add new configure: hoodie.datasource.streaming.startOffset to allow users to 
> specify start offset for streaming query



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5112) Add presto query validation support for all tests in integ tests

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5112:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add presto query validation support for all tests in integ tests
> 
>
> Key: HUDI-5112
> URL: https://issues.apache.org/jira/browse/HUDI-5112
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5113) Add support to test different indexes with integ test

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5113:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add support to test different indexes with integ test
> -
>
> Key: HUDI-5113
> URL: https://issues.apache.org/jira/browse/HUDI-5113
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5060) Make all clean policies support incremental mode to find partition paths

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5060:
-
Fix Version/s: (was: 0.12.2)

> Make all clean policies support incremental mode to find partition paths
> 
>
> Key: HUDI-5060
> URL: https://issues.apache.org/jira/browse/HUDI-5060
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
>
> Make all clean policies support incremental mode to find partition paths



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5072) Extract transform duplicate code

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5072:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Extract transform duplicate code
> 
>
> Key: HUDI-5072
> URL: https://issues.apache.org/jira/browse/HUDI-5072
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading the code, I found that the transform methods of 
> MultipleSparkJobExecutionStrategy and SingleSparkJobExecutionStrategy have 
> redundant code. I think we can extract them to make the code cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5052) Update 0.12.0 docs for regression

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5052:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Update 0.12.0 docs for regression
> -
>
> Key: HUDI-5052
> URL: https://issues.apache.org/jira/browse/HUDI-5052
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5051) Add a functional regression test for Bloom Index followed on w/ Upserts

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5051:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add a functional regression test for Bloom Index followed on w/ Upserts
> ---
>
> Key: HUDI-5051
> URL: https://issues.apache.org/jira/browse/HUDI-5051
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: Alexey Kudinkin
>Assignee: Jonathan Vexler
>Priority: Blocker
> Fix For: 0.13.0
>
>
> In the test
>  * State is initially bootstrapped by Bulk Insert (row-writing)
>  * Follow-up w/ upserts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5035) Remove deprecated API usage in SparkPreCommitValidator#validate

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5035:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Remove deprecated API usage in SparkPreCommitValidator#validate
> ---
>
> Key: HUDI-5035
> URL: https://issues.apache.org/jira/browse/HUDI-5035
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: image-2022-10-15-07-23-43-689.png
>
>
> I found that the code uses the deprecated API, modify the code to use the 
> recommended API
>  
> !image-2022-10-15-07-23-43-689.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5032) Add Archiving to the CLI

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5032:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add Archiving to the CLI
> 
>
> Key: HUDI-5032
> URL: https://issues.apache.org/jira/browse/HUDI-5032
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving, cli
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4990) Parallelize deduplication in CLI tool

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4990:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Parallelize deduplication in CLI tool
> -
>
> Key: HUDI-4990
> URL: https://issues.apache.org/jira/browse/HUDI-4990
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: sivabalan narayanan
>Priority: Minor
> Fix For: 0.13.0
>
>
> The CLI tool command `repair deduplicate` repair one partition at a time.  To 
> repair hundreds of partitions, this takes time.  We should add a mode to take 
> multiple partition paths for the CLI and run the dedup job for multiple 
> partitions at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5018) Make user-provided copyOnWriteRecordSizeEstimate first precedence

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5018:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Make user-provided copyOnWriteRecordSizeEstimate first precedence
> -
>
> Key: HUDI-5018
> URL: https://issues.apache.org/jira/browse/HUDI-5018
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Raymond Xu
>Assignee: xi chaomin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> For estimated avg record size
> https://hudi.apache.org/docs/configurations/#hoodiecopyonwriterecordsizeestimate
> which is used here
> https://github.com/apache/hudi/blob/86a1efbff1300603a8180111eae117c7f9dbd8a5/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L372
> Propose to respect user setting by following the precedence as below
> 1) if user sets a value, then use it as is 
> 2) if user not setting it, infer from timeline commit metadata 
> 3) if timeline is empty, use a default (current: 1024)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4967) Improve docs for meta sync with TimestampBasedKeyGenerator

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4967:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Improve docs for meta sync with TimestampBasedKeyGenerator
> --
>
> Key: HUDI-4967
> URL: https://issues.apache.org/jira/browse/HUDI-4967
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Related fix: HUDI-4966
> We need to add docs on how to properly set the meta sync configuration, 
> especially the hoodie.datasource.hive_sync.partition_value_extractor, in 
> [https://hudi.apache.org/docs/key_generation] (for different Hudi versions, 
> the config can be different).  Check the ticket above and PR description of 
> [https://github.com/apache/hudi/pull/6851] for more details.
> We should also add the migration setup on the key generation page as well: 
> [https://hudi.apache.org/releases/release-0.12.0/#configuration-updates]
>  * {{{}hoodie.datasource.hive_sync.partition_value_extractor{}}}: This config 
> is used to extract and transform partition value during Hive sync. Its 
> default value has been changed from 
> {{SlashEncodedDayPartitionValueExtractor}} to 
> {{{}MultiPartKeysValueExtractor{}}}. If you relied on the previous default 
> value (i.e., have not set it explicitly), you are required to set the config 
> to {{{}org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor{}}}. From 
> this release, if this config is not set and Hive sync is enabled, then 
> partition value extractor class will be *automatically inferred* on the basis 
> of number of partition fields and whether or not hive style partitioning is 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4888) Add validation to block COW table to use consistent hashing bucket index

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4888:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add validation to block COW table to use consistent hashing bucket index
> 
>
> Key: HUDI-4888
> URL: https://issues.apache.org/jira/browse/HUDI-4888
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Yuwei Xiao
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Consistent hashing bucket index's resizing relies on the log feature of MOR 
> table. So with COW table, the consistent hashing bucket index can not achieve 
> resizing currently. 
> We should block the user from using it at the very beginning(i.e., table 
> creation), and suggest them to use MOR table or Simple Bucket Index. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4881) Push down filters if possible when syncing partitions to Hive

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4881:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Push down filters if possible when syncing partitions to Hive
> -
>
> Key: HUDI-4881
> URL: https://issues.apache.org/jira/browse/HUDI-4881
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive, meta-sync
>Reporter: Hui An
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4839) rocksdbjni is not compatible with apple silicon

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4839:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> rocksdbjni is not compatible with apple silicon
> ---
>
> Key: HUDI-4839
> URL: https://issues.apache.org/jira/browse/HUDI-4839
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: zouxxyy
>Assignee: zouxxyy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> rocksdbjni 5.17.2 is not compatible with apple silicon
> when set FileSystemViewStorageType.EMBEDDED_KV_STORE in apple m1 raise error 
> like this
> {code:java}
> java.lang.UnsatisfiedLinkError: 
> /private/var/folders/px/y3gybll50ggctcjp2t4r2b50gp/T/librocksdbjni1847223031371241574.jnilib:
>  
> dlopen(/private/var/folders/px/y3gybll50ggctcjp2t4r2b50gp/T/librocksdbjni1847223031371241574.jnilib,
>  0x0001): tried: 
> '/private/var/folders/px/y3gybll50ggctcjp2t4r2b50gp/T/librocksdbjni1847223031371241574.jnilib'
>  (mach-o file, but is an incompatible architecture (have 'x86_64', need 
> 'arm64e')) {code}
> After 6.29.4.1, rocksdb can work on M1 macs.  
> [here|https://github.com/facebook/rocksdb/issues/7720]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4823) Add read_optimize spark_session config to use in spark-sql

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4823:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Add read_optimize spark_session config to use in spark-sql
> --
>
> Key: HUDI-4823
> URL: https://issues.apache.org/jira/browse/HUDI-4823
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: yonghua jian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When create a table not using hive catalog in spark, we can not easily do 
> read_optimized query in sqark-sql(using global hudi config file is 
> inconvenient),so I add the read_optimize spark_session config to use in 
> spark-sql



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2913) Disable auto clean in writer task

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-2913:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Disable auto clean in writer task
> -
>
> Key: HUDI-2913
> URL: https://issues.apache.org/jira/browse/HUDI-2913
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Zhaojing Yu
>Assignee: Zhaojing Yu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3954) Don't keep the last commit before the earliest commit to retain

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-3954:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Don't keep the last commit before the earliest commit to retain
> ---
>
> Key: HUDI-3954
> URL: https://issues.apache.org/jira/browse/HUDI-3954
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: 董可伦
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Don't keep the last commit before the earliest commit to retain
> According to the document of {{{}hoodie.cleaner.commits.retained{}}}:
> Number of commits to retain, without cleaning. This will be retained for 
> num_of_commits * time_between_commits (scheduled). This also directly 
> translates into how much data retention the table supports for incremental 
> queries.
>  
> We only need to keep the number of commit configured through parameters 
> {{{}hoodie.cleaner.commits.retained{}}}.
> And the commit retained by clean is completed.This ensures that “This will be 
> retained for num_of_commits * time_between_commits” in the document.
> So we don't need to keep the last commit before the earliest commit to 
> retain,If we want to keep more versions, we can increase the parameters 
> {{hoodie.cleaner.commits.retained}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-712) Improve exporter performance and memory usage

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-712:

Fix Version/s: 0.13.0
   (was: 0.12.2)

> Improve exporter performance and memory usage
> -
>
> Key: HUDI-712
> URL: https://issues.apache.org/jira/browse/HUDI-712
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> [https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L103-L107]
> The way the data file list for export is collected can be improved due to
>  * not parallelized among partitions
>  * the list can be too large
>  * listing partition to get the latest files requires scanning all files 
> (RFC-15 could solve this)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1570) Add Avg record size in commit metadata

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-1570:
-
Fix Version/s: (was: 0.12.2)

> Add Avg record size in commit metadata
> --
>
> Key: HUDI-1570
> URL: https://issues.apache.org/jira/browse/HUDI-1570
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-31 at 7.05.55 PM.png
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Many users want to understand what would be their avg record size in hudi 
> storage. They need this so that they can deduce their bloom config values. 
>  As of now, there is no easy way to fetch record size for the end user. Even 
> w/ hudi-cli, we could decipher from commit metadata, but we need to make some 
> rough calculation. So, it would be better if we store the avg record size w/ 
> WriteStats (total bytes written/ total records written) , as well as in 
> commit metadata. So, in hudi_cli, we could expose this info along w/ "commit 
> showpartitions" or expose another command "commit showmetadata" or something. 
> As of now, we could calculate the avg size from bytes written/records written 
> from commit metadata. 
> !Screen Shot 2021-01-31 at 7.05.55 PM.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5105) Add Call show_commit_extra_metadata for spark sql

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5105:
-
Fix Version/s: 0.13.0

> Add Call show_commit_extra_metadata for spark sql
> -
>
> Key: HUDI-5105
> URL: https://issues.apache.org/jira/browse/HUDI-5105
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: spark-sql
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5201) add totalRecordsDeleted metric

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5201:
-
Fix Version/s: (was: 0.12.2)

> add totalRecordsDeleted metric
> --
>
> Key: HUDI-5201
> URL: https://issues.apache.org/jira/browse/HUDI-5201
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: metrics
>Reporter: Hussein Awala
>Assignee: Hussein Awala
>Priority: Major
>  Labels: pull-request-available
>
> Add missing {{totalRecordsDeleted}} metric to commit action metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5105) Add Call show_commit_extra_metadata for spark sql

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5105:
-
Fix Version/s: (was: 0.12.2)

> Add Call show_commit_extra_metadata for spark sql
> -
>
> Key: HUDI-5105
> URL: https://issues.apache.org/jira/browse/HUDI-5105
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: spark-sql
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5059) Support automatic setting of certain attributes when creating a table in the flash catalog

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5059:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Support automatic setting of certain attributes when creating a table in the 
> flash catalog
> --
>
> Key: HUDI-5059
> URL: https://issues.apache.org/jira/browse/HUDI-5059
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink-sql
>Reporter: waywtdcc
>Priority: Major
> Fix For: 0.13.0
>
>
> Support the automatic setting of certain attributes when creating a table in 
> the flash catalog For example, when creating a hudi catalog, execute some 
> default attributes, such as the number of write.tasks. Automatically bring 
> these attributes when creating tables to reduce development workload



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5048) add CopyToTempView support

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5048:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> add CopyToTempView support
> --
>
> Key: HUDI-5048
> URL: https://issues.apache.org/jira/browse/HUDI-5048
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: scx
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Before, when we used spark sql, we still didn't have a good way to 
> incrementally read and time travel the hudi table. So, I added the 
> CopyToTempView Procedure. This method will register the hudi table with 
> spark's temporary view table, and data development can directly access the 
> view table for different ways of reading.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4809) Hudi Support AWS Glue DropPartitions

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4809:
-
Fix Version/s: 0.13.0

> Hudi Support AWS Glue DropPartitions 
> -
>
> Key: HUDI-4809
> URL: https://issues.apache.org/jira/browse/HUDI-4809
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: metadata
>Reporter: XixiHua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4809) Hudi Support AWS Glue DropPartitions

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4809:
-
Fix Version/s: (was: 0.12.2)

> Hudi Support AWS Glue DropPartitions 
> -
>
> Key: HUDI-4809
> URL: https://issues.apache.org/jira/browse/HUDI-4809
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: metadata
>Reporter: XixiHua
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5168) Flink metrics integration

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5168:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Flink metrics integration
> -
>
> Key: HUDI-5168
> URL: https://issues.apache.org/jira/browse/HUDI-5168
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: flink, flink-sql
>Reporter: Zhaojing Yu
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5334) Get checkpoint from non-completed instant

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5334:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Get checkpoint from non-completed instant
> -
>
> Key: HUDI-5334
> URL: https://issues.apache.org/jira/browse/HUDI-5334
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Original issue https://github.com/apache/hudi/issues/7375



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5318) Clustering schduling now will list all partition in table when PARTITION_SELECTED is set

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5318:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Clustering schduling now will list all partition in table when 
> PARTITION_SELECTED is set
> 
>
> Key: HUDI-5318
> URL: https://issues.apache.org/jira/browse/HUDI-5318
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: Qijun Fu
>Assignee: Qijun Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Currently PartitionAwareClusteringPlanStrategy will list all partition in 
> table whether PARTITION_SELECTED is set or not. List all partition in the 
> dataset is a very expensive operation when the number of partition is huge. 
> We can skip list all partition when PARTITION_SELECTED is set, so that 
> clustering scheduling can benefit a lot from  partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5229) Add flink avro version entry in root pom

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5229:
-
Fix Version/s: (was: 0.12.2)

> Add flink avro version entry in root pom
> 
>
> Key: HUDI-5229
> URL: https://issues.apache.org/jira/browse/HUDI-5229
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5220) failed to snapshot query in hive when query a empty partition

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5220:
-
Fix Version/s: (was: 0.12.2)

> failed to snapshot query in hive when query a empty partition 
> --
>
> Key: HUDI-5220
> URL: https://issues.apache.org/jira/browse/HUDI-5220
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: yuehanwang
>Priority: Major
>  Labels: pull-request-available
>
> When query a empty partition hive will return a empty file in split path. 
> This path will be added as a NonHoodieInputPaths. In this case 
> HoodieParquetRealtimeInputFormat read a file split rather than a 
> RealtimeSplit. Throw a exception:
> HoodieRealtimeRecordReader can only work on RealtimeSplit and not with 
> hdfs://test-cluster/tmp/hive/20220520/hive/4273589d-49be-4a60-9890-a29660d81927/hive_2022-11-14_11-32-41_221_5694963332005566615-17/-mr-10004/74adf5bb-b07e-4eac-a90b-1b5a7fc3d5c4/emptyFile:0+466



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5270) Duplicate key error when insert_overwrite same partition in multi writer

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5270:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Duplicate key error when insert_overwrite same partition in multi writer
> 
>
> Key: HUDI-5270
> URL: https://issues.apache.org/jira/browse/HUDI-5270
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: multi-writer, spark-sql
>Affects Versions: 0.11.0
>Reporter: weiming
>Assignee: weiming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> If the occ is enabled for hudi spark table, multiple threads insert_overwrite 
> the same partition. The data of the later task should overwrite the data of 
> the previous task. However, an error occurs.
> {code:java}
> // execute sql insert overwrite same partition
> ##THREAD-1 EXECUTE SQL
> insert overwrite table hudi_test_wm1_mor_02 partition (dt = '2021-12-14',hh = 
> '6') select id,name,price,ts from hudi_test_wm1_mor_01 where dt='2021-12-11' 
> and hh ='2';
> ##THREAD-2 EXECUTE SQL
> insert overwrite table hudi_test_wm1_mor_02 partition (dt = '2021-12-14',hh = 
> '6') select id,name,price,ts from hudi_test_wm1_mor_01 where dt='2021-12-11' 
> and hh ='4'; {code}
> {code:java}
> // ERROR LOG
> 22/11/07 15:24:53 ERROR SparkSQLDriver: Failed in [insert overwrite table 
> hudi_test_wm1_mor_02 partition (dt = '2021-12-14',hh = '6') select 
> id,name,price,ts from hudi_test_wm1_mor_01 where dt='2021-12-11' and hh 
> ='4']java.lang.IllegalStateException: Duplicate key 
> [20221107152403967__replacecommit__COMPLETED]at 
> java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
> at java.util.HashMap.merge(HashMap.java:1245)at 
> java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)at 
> java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)  
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)  
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) 
>    at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)  
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)   
>  at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270) 
>    at java.util.Iterator.forEachRemaining(Iterator.java:116)at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)  
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)  
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) 
>    at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)  
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)   
>  at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270) 
>    at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)  
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)  
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:244)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:108)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:108)
> at 
> org.apache.hudi.co

[jira] [Updated] (HUDI-5174) Clustering w/ two multi-writers could lead to issues

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5174:
-
Fix Version/s: (was: 0.12.2)

> Clustering w/ two multi-writers could lead to issues
> 
>
> Key: HUDI-5174
> URL: https://issues.apache.org/jira/browse/HUDI-5174
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering, table-service
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>
> if two writers have enabled clustering, each could rollback the clustering 
> that the other writer is currently executing and could lead to unrecoverable 
> issues. 
>  
>  
> {code:java}
>  t1   t2
> ➝ t
>  writer1 |-| 
>  writer2 |--|{code}
> lets say writer1 starts a clustering at t1. 
> and then writer2 starts clustering at time t2. at this time, it will rollback 
> the clustering started at time t1. but writer 1 could still be continuing to 
> execute the clustering. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5177) Revisit HiveIncrPullSource and JdbcSource for interleaved inflight commits

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5177:
-
Fix Version/s: (was: 0.12.2)

> Revisit HiveIncrPullSource and JdbcSource for interleaved inflight commits
> --
>
> Key: HUDI-5177
> URL: https://issues.apache.org/jira/browse/HUDI-5177
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Critical
>
> HUDI-5176
> We have fixed the Hudi incremental source when there are inflight commits 
> before completed commits.  We need to revisit the logic for 
> HiveIncrPullSource and JdbcSource as well regarding the same scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5171) Ensure validateTableConfig also checks for partition path field value switch

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5171:
-
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Ensure validateTableConfig also checks for partition path field value switch
> 
>
> Key: HUDI-5171
> URL: https://issues.apache.org/jira/browse/HUDI-5171
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Affects Versions: 0.12.1
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> as of now, validateTableConfig does not consider change in partition path 
> field value switch. we need to consider that as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5107) Fix hadoop config in DirectWriteMarkers, HoodieFlinkEngineContext and StreamerUtil are not consistent issue

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5107:
-
Fix Version/s: (was: 0.12.2)

> Fix hadoop config in DirectWriteMarkers, HoodieFlinkEngineContext and 
> StreamerUtil are not consistent issue
> ---
>
> Key: HUDI-5107
> URL: https://issues.apache.org/jira/browse/HUDI-5107
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: JinxinTang
>Assignee: JinxinTang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5069) TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5069:
-
Fix Version/s: (was: 0.12.2)

> TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky
> ---
>
> Key: HUDI-5069
> URL: https://issues.apache.org/jira/browse/HUDI-5069
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: xi chaomin
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> org.opentest4j.AssertionFailedError: Expect baseInstant to be less than or 
> equal to latestDeltaCommit ==> 
> Expected :true
> Actual   :false
> 
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40)
>   at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:193)
>   at 
> org.apache.hudi.table.action.compact.CompactionTestBase.lambda$validateDeltaCommit$0(CompactionTestBase.java:103)
>   at java.util.ArrayList.forEach(ArrayList.java:1257)
>   at 
> org.apache.hudi.table.action.compact.CompactionTestBase.validateDeltaCommit(CompactionTestBase.java:95)
>   at 
> org.apache.hudi.table.action.compact.CompactionTestBase.runNextDeltaCommits(CompactionTestBase.java:148)
>   at 
> org.apache.hudi.table.action.compact.TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime(TestInlineCompaction.java:227)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5017) Modify the logic of defaultMode in BootstrapRegexModeSelector

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5017:
-
Fix Version/s: (was: 0.12.2)

> Modify the logic of defaultMode in BootstrapRegexModeSelector
> -
>
> Key: HUDI-5017
> URL: https://issues.apache.org/jira/browse/HUDI-5017
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4876) DT archival is blocked by MDT compaction

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4876:
-
Fix Version/s: (was: 0.12.2)

> DT archival is blocked by MDT compaction
> 
>
> Key: HUDI-4876
> URL: https://issues.apache.org/jira/browse/HUDI-4876
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: sivabalan narayanan
>Priority: Major
>
> Reference GitHub Issue:
> [https://github.com/apache/hudi/issues/6716]
>  
> If ONLY INSERT-OVERWRITEs are performed on a DT, MDT will not be compacted, 
> causing DT commits to not be archived.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5078) When applying changes to MDT, any replace commit is considered a table service

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-5078:
-
Fix Version/s: (was: 0.12.2)

> When applying changes to MDT, any replace commit is considered a table service
> --
>
> Key: HUDI-5078
> URL: https://issues.apache.org/jira/browse/HUDI-5078
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
>
> Table services in metadata table can only be invoked by non table service 
> operations from data table. in other words, compaction, clustering from data 
> table cannot trigger compaction in MDT. 
> but we mistakenly considered any replace commit as a table service. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4629) Create hive table from existing hoodie Table failed when the table schema is not defined

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4629:
-
Fix Version/s: (was: 0.12.2)

> Create hive table from existing hoodie Table failed when the table schema is 
> not defined
> 
>
> Key: HUDI-4629
> URL: https://issues.apache.org/jira/browse/HUDI-4629
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
>
> Create hive table from existing hoodie Table failed when the table schema is 
> not defined
> {code:java}
> WARN CreateHoodieTableCommand: Failed to create catalog table in metastore: 
> org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be 
> specified for the table{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4852) Incremental sync not updating pending file groups under clustering

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4852:
-
Fix Version/s: (was: 0.12.2)

> Incremental sync not updating pending file groups under clustering
> --
>
> Key: HUDI-4852
> URL: https://issues.apache.org/jira/browse/HUDI-4852
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>
> Pending file groups under clustering are not updated through incremental sync 
> calls. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4625) Clean up KafkaOffsetGen

2022-12-20 Thread satish (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-4625:
-
Fix Version/s: (was: 0.12.2)

> Clean up KafkaOffsetGen
> ---
>
> Key: HUDI-4625
> URL: https://issues.apache.org/jira/browse/HUDI-4625
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Alexey Kudinkin
>Priority: Major
>
> There are a few issues w/in KafkaOffsetGen that we should follow-up on 
> annotated w/ corresponding TODOs:
>  # Using proper retrying client (instead of using sleeps for coordination)
>  # Cleaning up incorrect assertions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 4 >

1 - 100 of 379 matches

Mail list logo