[jira] [Updated] (HUDI-7929) Add Flink Hudi Example for K8s

2024-06-24 Thread Shiyan Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiyan Xu updated HUDI-7929:

Component/s: flink

> Add Flink Hudi Example for K8s
> --
>
> Key: HUDI-7929
> URL: https://issues.apache.org/jira/browse/HUDI-7929
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7929) Add Flink Hudi Example for K8s

2024-06-24 Thread Shiyan Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiyan Xu updated HUDI-7929:

Fix Version/s: 1.0.0

> Add Flink Hudi Example for K8s
> --
>
> Key: HUDI-7929
> URL: https://issues.apache.org/jira/browse/HUDI-7929
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7929) Add Flink Hudi Example for K8s

2024-06-24 Thread Shiyan Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiyan Xu reassigned HUDI-7929:
---

Assignee: Zhenqiu Huang

> Add Flink Hudi Example for K8s
> --
>
> Key: HUDI-7929
> URL: https://issues.apache.org/jira/browse/HUDI-7929
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7929) Add Flink Hudi Example for K8s

2024-06-24 Thread Zhenqiu Huang (Jira)
Zhenqiu Huang created HUDI-7929:
---

 Summary: Add Flink Hudi Example for K8s
 Key: HUDI-7929
 URL: https://issues.apache.org/jira/browse/HUDI-7929
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Zhenqiu Huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7928:

Fix Version/s: 1.0.0-beta2

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0-beta2, 1.0.0
>
>
> The shared HFile reader in HoodieNativeAvroHFileReader uses significant 
> memory for reading meta info from the HFile.  We should avoid keeping the 
> reference to the shared HFile reader and cache the meta info only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7903) Partition Stats Index not getting created with SQL

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7903:

Status: In Progress  (was: Open)

> Partition Stats Index not getting created with SQL
> --
>
> Key: HUDI-7903
> URL: https://issues.apache.org/jira/browse/HUDI-7903
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 1.0.0-beta2, 1.0.0
>
>
> {code:java}
> spark.sql(
>   s"""
>  | create table $tableName using hudi
>  | partitioned by (dt)
>  | tblproperties(
>  |primaryKey = 'id',
>  |preCombineField = 'ts',
>  |'hoodie.metadata.index.partition.stats.enable' = 'true'
>  | )
>  | location '$tablePath'
>  | AS
>  | select 1 as id, 'a1' as name, 10 as price, 1000 as ts, 
> cast('2021-05-06' as date) as dt
>""".stripMargin
> ) {code}
> Even when partition stats is enabled, index is not created with SQL. Works 
> for datasource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-7395) Fix computation for metrics in HoodieMetadataMetrics

2024-06-24 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HUDI-7395.
---

> Fix computation for metrics in HoodieMetadataMetrics
> 
>
> Key: HUDI-7395
> URL: https://issues.apache.org/jira/browse/HUDI-7395
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata, metrics
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> For some of the metrics type like duration we are using incrementMetric 
> instead of setMetric.
> Also some of the redundant metrics are removed. For example a count type 
> metric has both count and duration metric getting pushed even though duration 
> is not calculated.
> File lookup count metric is added for bloom filter and column stat



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7922:

Reviewers: Jonathan Vexler

> Add Hudi CLI bundle for Scala 2.13
> --
>
> Key: HUDI-7922
> URL: https://issues.apache.org/jira/browse/HUDI-7922
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 
> and Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7922:

Status: In Progress  (was: Open)

> Add Hudi CLI bundle for Scala 2.13
> --
>
> Key: HUDI-7922
> URL: https://issues.apache.org/jira/browse/HUDI-7922
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 
> and Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7928:

Status: In Progress  (was: Open)

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> The shared HFile reader in HoodieNativeAvroHFileReader uses significant 
> memory for reading meta info from the HFile.  We should avoid keeping the 
> reference to the shared HFile reader and cache the meta info only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7922:

Sprint: 2024/06/17-30

> Add Hudi CLI bundle for Scala 2.13
> --
>
> Key: HUDI-7922
> URL: https://issues.apache.org/jira/browse/HUDI-7922
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 
> and Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7922:

Status: Patch Available  (was: In Progress)

> Add Hudi CLI bundle for Scala 2.13
> --
>
> Key: HUDI-7922
> URL: https://issues.apache.org/jira/browse/HUDI-7922
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 
> and Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6508) Java 11 compile time support

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6508:

Status: Patch Available  (was: In Progress)

> Java 11 compile time support
> 
>
> Key: HUDI-6508
> URL: https://issues.apache.org/jira/browse/HUDI-6508
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Certify Hudi with Java 11 runtime support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7928:

Description: The shared HFile reader in HoodieNativeAvroHFileReader uses 
significant memory for reading meta info from the HFile.  We should avoid 
keeping the reference to the shared HFile reader and cache the meta info only.  
(was: The shared HFile reader in uses a significant memory )

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> The shared HFile reader in HoodieNativeAvroHFileReader uses significant 
> memory for reading meta info from the HFile.  We should avoid keeping the 
> reference to the shared HFile reader and cache the meta info only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7928:

Story Points: 4

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> The shared HFile reader in HoodieNativeAvroHFileReader uses significant 
> memory for reading meta info from the HFile.  We should avoid keeping the 
> reference to the shared HFile reader and cache the meta info only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7928:

Sprint: 2024/06/17-30

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> The shared HFile reader in HoodieNativeAvroHFileReader uses significant 
> memory for reading meta info from the HFile.  We should avoid keeping the 
> reference to the shared HFile reader and cache the meta info only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7928:
---

Assignee: Ethan Guo

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7928:
---

 Summary: Fix shared HFile reader in HoodieNativeAvroHFileReader
 Key: HUDI-7928
 URL: https://issues.apache.org/jira/browse/HUDI-7928
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7928:

Fix Version/s: 1.0.0

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7928:

Description: The shared HFile reader in uses a significant memory 

> Fix shared HFile reader in HoodieNativeAvroHFileReader
> --
>
> Key: HUDI-7928
> URL: https://issues.apache.org/jira/browse/HUDI-7928
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> The shared HFile reader in uses a significant memory 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [MINOR] Reduce logging volume (#11505)

2024-06-24 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6192cfb0e95 [MINOR] Reduce logging volume (#11505)
6192cfb0e95 is described below

commit 6192cfb0e95d23da462d01bb5b4c849dfbaa1f2a
Author: Tim Brown 
AuthorDate: Mon Jun 24 19:16:53 2024 -0500

[MINOR] Reduce logging volume (#11505)
---
 .../apache/hudi/client/timeline/HoodieTimelineArchiver.java| 10 --
 .../plan/generators/BaseHoodieCompactionPlanGenerator.java |  4 +++-
 .../common/table/log/BaseHoodieMergedLogRecordScanner.java |  9 +++--
 .../apache/hudi/common/table/view/FileSystemViewManager.java   | 10 +-
 .../src/main/java/org/apache/hudi/hive/HiveSyncTool.java   |  2 +-
 5 files changed, 16 insertions(+), 19 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
index 2f5ecb2816d..817c3f650d9 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
@@ -112,14 +112,14 @@ public class HoodieTimelineArchiver {
   // Sort again because the cleaning and rollback instants could break the 
sequence.
   List instantsToArchive = 
getInstantsToArchive().sorted().collect(Collectors.toList());
   if (!instantsToArchive.isEmpty()) {
-LOG.info("Archiving instants " + instantsToArchive);
+LOG.info("Archiving and deleting instants {}", instantsToArchive);
 Consumer exceptionHandler = e -> {
   if (this.config.isFailOnTimelineArchivingEnabled()) {
 throw new HoodieException(e);
   }
 };
 this.timelineWriter.write(instantsToArchive, Option.of(action -> 
deleteAnyLeftOverMarkers(context, action)), Option.of(exceptionHandler));
-LOG.info("Deleting archived instants " + instantsToArchive);
+LOG.debug("Deleting archived instants");
 deleteArchivedInstants(instantsToArchive, context);
 // triggers compaction and cleaning only after archiving action
 this.timelineWriter.compactAndClean(context);
@@ -221,7 +221,7 @@ public class HoodieTimelineArchiver {
   LOG.info("Not archiving as there is no compaction yet on the 
metadata table");
   return Collections.emptyList();
 } else {
-  LOG.info("Limiting archiving of instants to latest compaction on 
metadata table at " + latestCompactionTime.get());
+  LOG.info("Limiting archiving of instants to latest compaction on 
metadata table at {}", latestCompactionTime.get());
   earliestInstantToRetainCandidates.add(
   
completedCommitsTimeline.findInstantsModifiedAfterByCompletionTime(latestCompactionTime.get()).firstInstant());
 }
@@ -324,8 +324,6 @@ public class HoodieTimelineArchiver {
   }
 
   private boolean deleteArchivedInstants(List activeActions, 
HoodieEngineContext context) {
-LOG.info("Deleting instants " + activeActions);
-
 List pendingInstants = new ArrayList<>();
 List completedInstants = new ArrayList<>();
 
@@ -365,7 +363,7 @@ public class HoodieTimelineArchiver {
   private void deleteAnyLeftOverMarkers(HoodieEngineContext context, 
ActiveAction activeAction) {
 WriteMarkers writeMarkers = 
WriteMarkersFactory.get(config.getMarkersType(), table, 
activeAction.getInstantTime());
 if (writeMarkers.deleteMarkerDir(context, 
config.getMarkersDeleteParallelism())) {
-  LOG.info("Cleaned up left over marker directory for instant :" + 
activeAction);
+  LOG.info("Cleaned up left over marker directory for instant: {}", 
activeAction);
 }
   }
 }
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
index f768004cbce..e5ac5af9f64 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
@@ -91,7 +91,9 @@ public abstract class BaseHoodieCompactionPlanGenerator e
 this.numMergedRecordsInLog = records.size();
 
 if (LOG.isInfoEnabled()) {
-  LOG.info("Number of log files scanned => {}", logFilePaths.size());
-  LOG.info("MaxMemoryInBytes allowed for compaction => {}", 
maxMemorySizeInBytes);
-  LOG.info("Number of entries in Memory

[jira] [Updated] (HUDI-7586) Use table or write schema instead of deducing schema per file group for clustering

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7586:

Description: Right now each clustering group derives the schema on its own. 
Conceptually we can use one schema for all clustering groups. This is a 
behavior change and needs revisiting clustering logic end-to-end.

> Use table or write schema instead of deducing schema per file group for 
> clustering
> --
>
> Key: HUDI-7586
> URL: https://issues.apache.org/jira/browse/HUDI-7586
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> Right now each clustering group derives the schema on its own. Conceptually 
> we can use one schema for all clustering groups. This is a behavior change 
> and needs revisiting clustering logic end-to-end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7585:

Description: The table schema resolver needs to read schema from the data 
files (base or log files) to see whether _hoodie_operation field is present for 
Flink CDC use cases.  This can cause overhead of reading data file footers 
multiple times.  We should see if we can store a table config to indicate if or 
simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation 
field and schema resolver).  (was: The table schema resolver needs to read 
schema from the data files (base or log files) to see whether _hoodie_operation 
field is present for Flink CDC use cases.  This can cause overhead of reading 
data file footers multiple times.  We should see if we can store or simplify 
the Flink CDC format in Hudi 1.0 (thus no need of ).)

> Avoid reading log files for resolving schema for _hoodie_operation field
> 
>
> Key: HUDI-7585
> URL: https://issues.apache.org/jira/browse/HUDI-7585
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jing Zhang
>Priority: Major
> Fix For: 1.0.0
>
>
> The table schema resolver needs to read schema from the data files (base or 
> log files) to see whether _hoodie_operation field is present for Flink CDC 
> use cases.  This can cause overhead of reading data file footers multiple 
> times.  We should see if we can store a table config to indicate if or 
> simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation 
> field and schema resolver).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7585:

Description: The table schema resolver needs to read schema from the data 
files (base or log files) to see whether _hoodie_operation field is present for 
Flink CDC use cases.  This can cause overhead of reading data file footers 
multiple times.  We should see if we can store a table config to indicate if 
_hoodie_operation field is present in the table, or simplify the Flink CDC 
format in Hudi 1.0 (thus no need of _hoodie_operation field and schema 
resolver).  (was: The table schema resolver needs to read schema from the data 
files (base or log files) to see whether _hoodie_operation field is present for 
Flink CDC use cases.  This can cause overhead of reading data file footers 
multiple times.  We should see if we can store a table config to indicate if or 
simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation 
field and schema resolver).)

> Avoid reading log files for resolving schema for _hoodie_operation field
> 
>
> Key: HUDI-7585
> URL: https://issues.apache.org/jira/browse/HUDI-7585
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jing Zhang
>Priority: Major
> Fix For: 1.0.0
>
>
> The table schema resolver needs to read schema from the data files (base or 
> log files) to see whether _hoodie_operation field is present for Flink CDC 
> use cases.  This can cause overhead of reading data file footers multiple 
> times.  We should see if we can store a table config to indicate if 
> _hoodie_operation field is present in the table, or simplify the Flink CDC 
> format in Hudi 1.0 (thus no need of _hoodie_operation field and schema 
> resolver).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7585:

Description: The table schema resolver needs to read schema from the data 
files (base or log files) to see whether _hoodie_operation field is present for 
Flink CDC use cases.  This can cause overhead of reading data file footers 
multiple times.  We should see if we can store or simplify the Flink CDC format 
in Hudi 1.0 (thus no need of ).  (was: The table schema resolver needs to read 
schema from the data files (base or log files) to see whether )

> Avoid reading log files for resolving schema for _hoodie_operation field
> 
>
> Key: HUDI-7585
> URL: https://issues.apache.org/jira/browse/HUDI-7585
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jing Zhang
>Priority: Major
> Fix For: 1.0.0
>
>
> The table schema resolver needs to read schema from the data files (base or 
> log files) to see whether _hoodie_operation field is present for Flink CDC 
> use cases.  This can cause overhead of reading data file footers multiple 
> times.  We should see if we can store or simplify the Flink CDC format in 
> Hudi 1.0 (thus no need of ).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field

2024-06-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7585:

Description: The table schema resolver needs to read schema from the data 
files (base or log files) to see whether 

> Avoid reading log files for resolving schema for _hoodie_operation field
> 
>
> Key: HUDI-7585
> URL: https://issues.apache.org/jira/browse/HUDI-7585
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jing Zhang
>Priority: Major
> Fix For: 1.0.0
>
>
> The table schema resolver needs to read schema from the data files (base or 
> log files) to see whether 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch asf-site updated: [DOCS] Update video guides (#11504)

2024-06-24 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6f798f73561 [DOCS] Update video guides (#11504)
6f798f73561 is described below

commit 6f798f7356194e805d385e882275ba7410236c2a
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Mon Jun 24 10:51:38 2024 -0700

[DOCS] Update video guides (#11504)
---
 ...emental-etl-and-broadcast-joins-for-faster-etl.png | Bin 0 -> 117903 bytes
 ...-changing-dimension-and-query-that-using-trino.png | Bin 0 -> 126625 bytes
 ...ing-dimension-type-2-and-query-real-time-trino.png | Bin 0 -> 125492 bytes
 ...utes-with-spark-sql-minio-and-query-with-trino.png | Bin 0 -> 120015 bytes
 ...from-pulsar-topic-into-hudi-with-deltastreamer.png | Bin 0 -> 121697 bytes
 ...24-06-05-multiple-spark-writers-to-hudi-tables.png | Bin 0 -> 114232 bytes
 commits-and-hoodie.keep.max.commits-explained.png | Bin 0 -> 123320 bytes
 ...time-travel-query-to-investigate-bid-and-spend.png | Bin 0 -> 124663 bytes
 ...tes-delete-incremental-query-stored-procedures.png | Bin 0 -> 125426 bytes
 ...st-xml-files-with-aws-glue-into-hudi-datalakes.png | Bin 0 -> 122656 bytes
 ...-Apache-Hudi-Commit-time-in-Python-and-PySpark.png | Bin 0 -> 205616 bytes
 ...emental-etl-and-broadcast-joins-for-faster-etl.mdx |  17 +
 ...-changing-dimension-and-query-that-using-trino.mdx |  17 +
 ...ing-dimension-type-2-and-query-real-time-trino.mdx |  18 ++
 ...utes-with-spark-sql-minio-and-query-with-trino.mdx |  18 ++
 ...from-pulsar-topic-into-hudi-with-deltastreamer.mdx |  16 
 ...24-06-05-multiple-spark-writers-to-hudi-tables.mdx |  15 +++
 commits-and-hoodie.keep.max.commits-explained.mdx |  14 ++
 ...time-travel-query-to-investigate-bid-and-spend.mdx |  14 ++
 ...tes-delete-incremental-query-stored-procedures.mdx |  18 ++
 ...st-xml-files-with-aws-glue-into-hudi-datalakes.mdx |  15 +++
 ...-Apache-Hudi-Commit-time-in-Python-and-PySpark.mdx |  16 
 22 files changed, 178 insertions(+)

diff --git 
a/website/static/assets/images/video_blogs/2024-05-20-deltastreamer-with-incremental-etl-and-broadcast-joins-for-faster-etl.png
 
b/website/static/assets/images/video_blogs/2024-05-20-deltastreamer-with-incremental-etl-and-broadcast-joins-for-faster-etl.png
new file mode 100644
index 000..f35bccc20fb
Binary files /dev/null and 
b/website/static/assets/images/video_blogs/2024-05-20-deltastreamer-with-incremental-etl-and-broadcast-joins-for-faster-etl.png
 differ
diff --git 
a/website/static/assets/images/video_blogs/2024-05-22-hudi-delta-streamer-implementing-slowly-changing-dimension-and-query-that-using-trino.png
 
b/website/static/assets/images/video_blogs/2024-05-22-hudi-delta-streamer-implementing-slowly-changing-dimension-and-query-that-using-trino.png
new file mode 100644
index 000..b2a0c0e6cdc
Binary files /dev/null and 
b/website/static/assets/images/video_blogs/2024-05-22-hudi-delta-streamer-implementing-slowly-changing-dimension-and-query-that-using-trino.png
 differ
diff --git 
a/website/static/assets/images/video_blogs/2024-05-22-hudi-streamer-implementing-slowly-changing-dimension-type-2-and-query-real-time-trino.png
 
b/website/static/assets/images/video_blogs/2024-05-22-hudi-streamer-implementing-slowly-changing-dimension-type-2-and-query-real-time-trino.png
new file mode 100644
index 000..d80b0677277
Binary files /dev/null and 
b/website/static/assets/images/video_blogs/2024-05-22-hudi-streamer-implementing-slowly-changing-dimension-type-2-and-query-real-time-trino.png
 differ
diff --git 
a/website/static/assets/images/video_blogs/2024-05-23-build-hudi-date-dimension-in-minutes-with-spark-sql-minio-and-query-with-trino.png
 
b/website/static/assets/images/video_blogs/2024-05-23-build-hudi-date-dimension-in-minutes-with-spark-sql-minio-and-query-with-trino.png
new file mode 100644
index 000..08424b3e2bd
Binary files /dev/null and 
b/website/static/assets/images/video_blogs/2024-05-23-build-hudi-date-dimension-in-minutes-with-spark-sql-minio-and-query-with-trino.png
 differ
diff --git 
a/website/static/assets/images/video_blogs/2024-05-25-learn-how-to-ingest-data-from-pulsar-topic-into-hudi-with-deltastreamer.png
 
b/website/static/assets/images/video_blogs/2024-05-25-learn-how-to-ingest-data-from-pulsar-topic-into-hudi-with-deltastreamer.png
new file mode 100644
index 000..4ba03a5692a
Binary files /dev/null and 
b/website/static/assets/images/video_blogs/2024-05-25-learn-how-to-ingest-data-from-pulsar-topic-into-hudi-with-deltastreamer.png
 differ
diff --git 
a/website/static/assets/images/video_blogs/2024-06-05-multiple-spark-writers-to-hudi-tables.png
 
b/websit

(hudi) branch asf-site updated: [DOCS] Update Roadmap page (#11491)

2024-06-24 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new fac2aaaf5e2 [DOCS] Update Roadmap page (#11491)
fac2aaaf5e2 is described below

commit fac2aaaf5e2b4a293fa31543e0a6ab9e102c45c7
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Mon Jun 24 10:27:15 2024 -0700

[DOCS] Update Roadmap page (#11491)
---
 website/src/pages/roadmap.md | 89 
 1 file changed, 48 insertions(+), 41 deletions(-)

diff --git a/website/src/pages/roadmap.md b/website/src/pages/roadmap.md
index 32c2ad2c081..19dabef81ad 100644
--- a/website/src/pages/roadmap.md
+++ b/website/src/pages/roadmap.md
@@ -8,64 +8,71 @@ Hudi community strives to deliver major releases every 3-4 
months, while offerin
 This page captures the forward-looking roadmap of ongoing & upcoming projects 
and when they are expected to land, broken
 down by areas on our 
[stack](blog/2021/07/21/streaming-data-lake-platform/#hudi-stack).
 
+## Recent Release
+[0.15.0](https://hudi.apache.org/releases/release-0.15.0) (June 2024)
+
 ## Future Releases
 
-Next upcoming release : 
[0.14.1](https://issues.apache.org/jira/projects/HUDI/versions/12353493) (Dec 
2023)
+| Release| 
Timeline  |
+||---|
+| 1.0.0-beta2| 
July 2024 |
+| 0.16.0 (Bridge release supporting reads of both 1.x and 0.x Hudi versions) | 
Q3, 2024  |
+| 1.0.0  | 
Q3, 2024  |
+
 
 
 ## Transactional Database Layer
 
-| Feature| Target 
Release | Tracking  


   |
-|||--|
-| Support for primary key-less table | 0.14.0  
   | [HUDI-4699](https://issues.apache.org/jira/browse/HUDI-4699)   


  |
-| Efficient bootstrap and migration of existing non-Hudi dataset | 0.14.0  
   | [HUDI-1265](https://issues.apache.org/jira/browse/HUDI-1265)   


  |
-| Record-level index to speed up UUID-based upserts and deletes  | 0.14.0  
   | 
[RFC-08](https://cwiki.apache.org/confluence/display/HUDI/RFC-08++Record+level+indexing+mechanisms+for+Hudi+datasets),
 [HUDI-53](https://issues.apache.org/jira/browse/HUDI-53)   
   |
-|1.x Storage format   | 1.0.0  
| [HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242)  

|
-| Writer performance improvements | 1.0.0  
  |[HUDI-3249](https://issues.apache.org/jira/browse/HUDI-3249) 
  |
-| Non-blocking concurrency control| 1.0.0  
| [HUDI-3187](https://issues.apache.org/jira/browse/HUDI-3187), 
[HUDI-1042](https://issues.apache.org/jira/browse/HUDI-1042), 
[RFC-66](https://github.com/apache/hudi/pull/7907) |
-| Time Travel updates, deletes   | 1.0.0   
   ||
+| Feature| Target 
Release | Tracking  

 |
+||||
+| 1.x Storage

Re: [I] CDC data_before_after mode does not convert Spark DecimalType correctly [hudi]

2024-06-24 Thread via GitHub


phamvinh1712 commented on issue #8616:
URL: https://github.com/apache/hudi/issues/8616#issuecomment-2186698563

   hi @danny0405 , is there any news on this issue or any new plan to solve 
this?
   
   We're planning to use CDC format to handle some complex incremental 
processing use cases like presented in this blog 
https://www.onehouse.ai/blog/getting-started-incrementally-process-data-with-apache-hudi.
 However, with decimal values not returned correctly, we couldn't make use of 
CDC format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7906] improve the parallelism deduce in rdd write [hudi]

2024-06-24 Thread via GitHub


KnightChess commented on PR #11470:
URL: https://github.com/apache/hudi/pull/11470#issuecomment-2186680135

   @bibhu107 0.16.0 and 1.0.0, but you can cherrypick or copy it in you verstion
   
![image](https://github.com/apache/hudi/assets/20125927/b706356b-063c-4305-8422-05a035820802)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DOCS] Update Blogs [hudi]

2024-06-24 Thread via GitHub


bhasudha merged PR #11503:
URL: https://github.com/apache/hudi/pull/11503


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch asf-site updated: [DOCS] Update Blogs (#11503)

2024-06-24 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c00568f2ade [DOCS] Update Blogs (#11503)
c00568f2ade is described below

commit c00568f2ade760b447e2b7d17bf7a4ef8b96ce02
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Mon Jun 24 07:11:36 2024 -0700

[DOCS] Update Blogs (#11503)
---
 ...s-apache-iceberg-a-comprehensive-comparison.mdx |  19 ++
 ...he-hudi-tables-python-using-daft-spark-free.mdx |  19 ++
 ...ead-hudi-data-aws-glue-ray-using-daft-spark.mdx |  20 +++
 ...-lakehouse-using-apache-hudi-daft-streamlit.mdx |  21 
 .../blog/2024-05-19-apache-hudi-on-aws-glue.mdx|  18 +
 ...ge-to-seamlessly-share-apache-hudi-datasets.mdx |  22 +
 ...ng-the-right-tool-for-your-data-lake-on-aws.mdx |  19 ++
 ...-hudi-a-deep-dive-with-python-code-examples.mdx |  19 ++
 ...6-18-how-to-use-apache-hudi-with-databricks.mdx |  18 +
 website/src/pages/talks.md |   2 ++
 ...s-apache-iceberg-a-comprehensive-comparison.png | Bin 0 -> 632131 bytes
 ...he-hudi-tables-python-using-daft-spark-free.png | Bin 0 -> 128244 bytes
 ...ead-hudi-data-aws-glue-ray-using-daft-spark.png | Bin 0 -> 251050 bytes
 ...-lakehouse-using-apache-hudi-daft-streamlit.png | Bin 0 -> 204137 bytes
 .../blog/2024-05-19-apache-hudi-on-aws-glue.png| Bin 0 -> 184116 bytes
 ...ge-to-seamlessly-share-apache-hudi-datasets.png | Bin 0 -> 115478 bytes
 ...ng-the-right-tool-for-your-data-lake-on-aws.png | Bin 0 -> 678081 bytes
 ...-hudi-a-deep-dive-with-python-code-examples.png | Bin 0 -> 103833 bytes
 ...-18-how-to-use-apache-hudi-with-databricks.jpeg | Bin 0 -> 563195 bytes
 19 files changed, 177 insertions(+)

diff --git 
a/website/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.mdx
 
b/website/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.mdx
new file mode 100644
index 000..10a8864de4b
--- /dev/null
+++ 
b/website/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.mdx
@@ -0,0 +1,19 @@
+---
+title: "Apache Hudi vs Apache Iceberg: A Comprehensive Comparison"
+author: RisingWave marketing team
+category: blog
+image: 
/assets/images/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.png
+tags:
+- blog
+- apache hudi
+- apache iceberg
+- comparison
+- risingwave
+---
+
+
+
+import Redirect from '@site/src/components/Redirect';
+
+https://risingwave.com/blog/apache-hudi-vs-apache-iceberg-a-comprehensive-comparison/";>Redirecting...
 please wait!! 
+
diff --git 
a/website/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.mdx
 
b/website/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.mdx
new file mode 100644
index 000..31a0dfce34e
--- /dev/null
+++ 
b/website/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.mdx
@@ -0,0 +1,19 @@
+---
+title: "How to Query Apache Hudi Tables with Python Using Daft: A Spark-Free 
Approach"
+author: Soumil Shah
+category: blog
+image: 
/assets/images/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.png
+tags:
+- blog
+- apache hudi
+- python
+- daft
+- linkedin
+---
+
+
+
+import Redirect from '@site/src/components/Redirect';
+
+https://www.linkedin.com/pulse/how-query-apache-hudi-tables-python-using-daft-spark-free-soumil-shah-hpdwf/";>Redirecting...
 please wait!! 
+
diff --git 
a/website/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.mdx
 
b/website/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.mdx
new file mode 100644
index 000..ce695f31fd9
--- /dev/null
+++ 
b/website/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.mdx
@@ -0,0 +1,20 @@
+---
+title: "Learn how to read Hudi data with AWS Glue Ray using Daft (No Spark)"
+author: Soumil Shah
+category: blog
+image: 
/assets/images/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.png
+tags:
+- blog
+- apache hudi
+- aws glue
+- ray
+- daft
+- linkedin
+---
+
+
+
+import Redirect from '@site/src/components/Redirect';
+
+https://www.linkedin.com/pulse/learn-how-read-hudi-data-aws-glue-ray-using-daft-spark-soumil-shah-kycbe/";>Redirecting...
 please wait!! 
+
diff --git 
a/website/blog/2024-05-10-building-analytical-apps-on-the-lakehouse-using-apache-hudi-daft-streamlit.mdx
 
b/website/blog/2024-05-10-building-analytical-apps-on-the-lakehouse-using-apache-hudi-daft-streamlit.mdx
new file mode 100644
index 000..15f1523053b
--- /dev/null
+++ 
b/website/blog/2024-05-10-building-analytical-apps-on-the-lakehouse-using-apache-hudi-daft-streamlit.mdx

(hudi) branch asf-site updated: [DOCS][MINOR] Update team and syncs pages (#11500)

2024-06-24 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 3db29c96405 [DOCS][MINOR] Update team and syncs pages (#11500)
3db29c96405 is described below

commit 3db29c9640574985a87935f4321f2d14b57b2415
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Mon Jun 24 07:11:52 2024 -0700

[DOCS][MINOR] Update team and syncs pages (#11500)
---
 website/community/syncs.md |   6 +-
 website/community/team.md  |  96 -
 .../assets/images/upcoming-community-calls.png | Bin 273228 -> 0 bytes
 3 files changed, 59 insertions(+), 43 deletions(-)

diff --git a/website/community/syncs.md b/website/community/syncs.md
index 4870ed4a4da..d8c148fc297 100644
--- a/website/community/syncs.md
+++ b/website/community/syncs.md
@@ -37,5 +37,9 @@ If you would like to present in one of the community calls, 
please fill out the
 Refer to the [Apache Hudi events 
calendar](https://calendar.google.com/calendar/embed?src=rgpb1ta2mgp5au38fr2834poa8%40group.calendar.google.com&ctz=America%2FLos_Angeles)
 to find upcoming Hudi events.
 
 Here's a quick view of the upcoming community calls: 
-![Upcoming calls](/assets/images/upcoming-community-calls.png)
+ - 24th Jul 2024, 9:00 - 10:00am pacific time
+ - 28th Aug 2024, 9:00 - 10:00am pacific time
+ - 25th Sep 2024, 9:00 - 10:00am pacific time
+ - 23rd Oct 2024, 9:00 - 10:00am pacific time
+ - 27th Nov 2024, 9:00 - 10:00am pacific time
 
diff --git a/website/community/team.md b/website/community/team.md
index 75c662f1b76..13e687873c2 100644
--- a/website/community/team.md
+++ b/website/community/team.md
@@ -5,46 +5,58 @@ toc: true
 last_modified_at: 2020-09-01T15:59:57-04:00
 ---
 
-### Active Team
-
-| Image| Name  
   | Role| Apache ID
|
-|  | 
 | --- 
|  |
-| https://avatars.githubusercontent.com/alexeykudinkin"} 
className="profile-pic" alt="alexeykudinkin" align="middle" /> | [Alexey 
Kudinkin](https://github.com/alexeykudinkin)   | Committer | akudinkin  
 |
-| https://avatars.githubusercontent.com/alunarbeach"} 
className="profile-pic" alt="alunarbeach" align="middle" /> | [Anbu 
Cheeralan](https://github.com/alunarbeach) | PMC, Committer | 
anchee   |
-| https://avatars.githubusercontent.com/bhasudha"} 
className="profile-pic" alt="bhasudha" align="middle" /> | [Bhavani 
Sudha](https://github.com/bhasudha) | PMC, Committer | 
bhavanisudha |
-| https://avatars.githubusercontent.com/bvaradar"} 
className="profile-pic" alt="bvaradar" align="middle" /> | [Balaji 
Varadarajan](https://github.com/bvaradar)| PMC, Committer | vbalaji 
 |
-| https://avatars.githubusercontent.com/danny0405"} 
className="profile-pic" alt="danny0405" align="middle" /> | [Danny 
Chan](https://github.com/danny0405)  | PMC, Committer   
| danny0405|
-| https://avatars.githubusercontent.com/yihua"} 
className="profile-pic" alt="yihua" align="middle" /> | [Ethan 
Guo](https://github.com/yihua)  | PMC, Committer| yihua 
   |
-| https://avatars.githubusercontent.com/XuQianJin-Stars"} 
className="profile-pic" alt="XuQianJin-Stars" align="middle" /> | [Forward 
Xu](https://github.com/XuQianJin-Stars)  | Committer   
| forwardxu|
-| https://avatars.githubusercontent.com/garyli1019"} 
className="profile-pic" alt="garyli1019" align="middle" /> | [Gary 
Li](https://github.com/garyli1019)  | PMC, Committer   
| garyli|
-| https://avatars.githubusercontent.com/boneanxs"} 
className="profile-pic" alt="boneanxs" align="middle" /> | [Hui 
An](https://github.com/boneanxs)  | Committer   | rexan 
|
-| https://avatars.githubusercontent.com/jonvex"} 
className="profile-pic" alt="jonvex" align="middle" /> | [Jonathan 
Vexler](https://github.com/jonvex)   | Committer   | jonvex 
  |
-| https://avatars.githubusercontent.com/beyond1920"} 
className="profile-pic" alt="beyond1920" align="middle" /> | [Jing 
Zhang](https://github.com/beyond1920)| Committer   | beyond1920 
  |
-| https://avatars.githubusercontent.com/lresende"} 
className="profile-pic" alt="lresende" align="middle" /> | [Luciano 
Resende](https://github.com/lresende)   | PMC, Committer | lresende 
|
-| https://avatars.githubusercontent.com/lamberken"} 
className="profile-pic" alt="lamberken" className="profile-pic" align="middle" 
/

Re: [PR] [DOCS][MINOR] Update team and syncs pages [hudi]

2024-06-24 Thread via GitHub


bhasudha merged PR #11500:
URL: https://github.com/apache/hudi/pull/11500


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11499:
URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186586054

   
   ## CI report:
   
   * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541)
 
   * b0a476fa5eae1ac5d3488e335d92f776305f2cdb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24545)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7927) Secondary View should only initialize when required

2024-06-24 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7927:
---

Assignee: Timothy Brown

> Secondary View should only initialize when required
> ---
>
> Key: HUDI-7927
> URL: https://issues.apache.org/jira/browse/HUDI-7927
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> In the PriorityBasedFileSystemView, the secondary view will be initialized 
> eagerly causing extra overhead including file listing. We should avoid this 
> to reduce the cost for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7927) Secondary View should only initialize when required

2024-06-24 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7927:
---

 Summary: Secondary View should only initialize when required
 Key: HUDI-7927
 URL: https://issues.apache.org/jira/browse/HUDI-7927
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


In the PriorityBasedFileSystemView, the secondary view will be initialized 
eagerly causing extra overhead including file listing. We should avoid this to 
reduce the cost for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7906] improve the parallelism deduce in rdd write [hudi]

2024-06-24 Thread via GitHub


bibhu107 commented on PR #11470:
URL: https://github.com/apache/hudi/pull/11470#issuecomment-2186573812

   Hi @KnightChess , This fix will be in which release? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Performance degrade for migrating from Hudi 0.7 to Hudi 0.14 [hudi]

2024-06-24 Thread via GitHub


bibhu107 commented on issue #11274:
URL: https://github.com/apache/hudi/issues/11274#issuecomment-2186570524

   I am closing this Issue. Thanks for support @KnightChess  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Unable to read Hudi table after hudi upgrade [hudi]

2024-06-24 Thread via GitHub


ad1happy2go commented on issue #11492:
URL: https://github.com/apache/hudi/issues/11492#issuecomment-2186511758

   Can you share the command you are using to submit jobs and entire stack 
trace now? I hope you are no longer getting `java.lang.ClassNotFoundException: 
org.apache.spark.sql.adapter.Spark2Adapter`
   If you getting that Then your Hudi Version used are not correct. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11502:
URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186494481

   
   ## CI report:
   
   * 2d80ec8e1a7df051c5e62423b21f29711b684630 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24543)
 
   * f85ebcae1fdf20ba678d5e1471285f7e4b8a2bd2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24544)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DOCS] Update Blogs [hudi]

2024-06-24 Thread via GitHub


bhasudha commented on PR #11503:
URL: https://github.com/apache/hudi/pull/11503#issuecomment-2186494468

   Tested locally!
   
   https://github.com/apache/hudi/assets/2179254/98887ed3-cf0d-4c63-97f7-32ff84ec64a9";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11499:
URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186494394

   
   ## CI report:
   
   * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541)
 
   * b0a476fa5eae1ac5d3488e335d92f776305f2cdb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [DOCS] Update Blogs [hudi]

2024-06-24 Thread via GitHub


bhasudha opened a new pull request, #11503:
URL: https://github.com/apache/hudi/pull/11503

   ### Change Logs
   
   Update blogs in the website
   
   ### Impact
   
   website changes
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Performance Variation in Hudi 0.14 [hudi]

2024-06-24 Thread via GitHub


RuyRoaV commented on issue #11481:
URL: https://github.com/apache/hudi/issues/11481#issuecomment-2186492501

   Hello @ad1happy2go 
   
   I have attached some screenshots of the Spark UI. Is there any specific 
screen that you'd like to see?
   
   ![Screenshot 2024-06-24 at 13 23 
33](https://github.com/apache/hudi/assets/173461014/47ff1f07-23df-41f2-b12f-c5befcfdfb85)
   
   ![Screenshot 2024-06-24 at 13 23 
53](https://github.com/apache/hudi/assets/173461014/68b61f60-c76c-4dab-9884-a5d677377997)
   
   
   Thanks for the input, will take that into account. I've also seen on some 
other GitHub issues, seen changing to and RLI index being recommended. Would 
that work for a COW table? or would the SIMPLE index still be a better approach?
   
   Best regards,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11499:
URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186479637

   
   ## CI report:
   
   * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11502:
URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186479786

   
   ## CI report:
   
   * 2d80ec8e1a7df051c5e62423b21f29711b684630 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24543)
 
   * f85ebcae1fdf20ba678d5e1471285f7e4b8a2bd2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11502:
URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186464201

   
   ## CI report:
   
   * 2d80ec8e1a7df051c5e62423b21f29711b684630 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24543)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11501:
URL: https://github.com/apache/hudi/pull/11501#issuecomment-2186464134

   
   ## CI report:
   
   * 84080accf11a859cea238d12025d69f7f3ce4269 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24542)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11490:
URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186463946

   
   ## CI report:
   
   * 3fa4ba583278c67c62dd3c063b55f07301815c2c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24540)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11499:
URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186379582

   
   ## CI report:
   
   * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11502:
URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186379738

   
   ## CI report:
   
   * 2d80ec8e1a7df051c5e62423b21f29711b684630 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11501:
URL: https://github.com/apache/hudi/pull/11501#issuecomment-2186379640

   
   ## CI report:
   
   * 84080accf11a859cea238d12025d69f7f3ce4269 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24542)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11490:
URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186379443

   
   ## CI report:
   
   * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538)
 
   * 3fa4ba583278c67c62dd3c063b55f07301815c2c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24540)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]

2024-06-24 Thread via GitHub


KnightChess commented on code in PR #11502:
URL: https://github.com/apache/hudi/pull/11502#discussion_r1650888209


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##
@@ -472,6 +473,11 @@ object HoodieFileIndex extends Logging {
   
properties.setProperty(DataSourceReadOptions.FILE_INDEX_LISTING_MODE_OVERRIDE.key,
 listingModeOverride)
 }
 
+if (tableConfig != null) {
+  properties.setProperty(RECORDKEY_FIELD.key, 
tableConfig.getRecordKeyFields.orElse(Array.empty).mkString(","))

Review Comment:
   bucket query index need use it to create correct `KetGenerator`, otherwise 
will create `AutoRecordGenWrapperKeyGenerator`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11499:
URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186366377

   
   ## CI report:
   
   * ca09e626015a153e6351c3d2dcc5e2f95fe3988c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11501:
URL: https://github.com/apache/hudi/pull/11501#issuecomment-2186366427

   
   ## CI report:
   
   * 84080accf11a859cea238d12025d69f7f3ce4269 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11490:
URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186366222

   
   ## CI report:
   
   * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538)
 
   * 3fa4ba583278c67c62dd3c063b55f07301815c2c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7926) dataskipping failure mode should be strict in test

2024-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7926:
-
Labels: pull-request-available  (was: )

> dataskipping failure mode should be strict in test
> --
>
> Key: HUDI-7926
> URL: https://issues.apache.org/jira/browse/HUDI-7926
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: KnightChess
>Assignee: KnightChess
>Priority: Critical
>  Labels: pull-request-available
>
> dataskipping failure mode should be strict in test. if use fallback mode 
> default, the query ut is meaningless.
> There may be other codes that have been introduced into bugs but cannot be 
> measured.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]

2024-06-24 Thread via GitHub


KnightChess opened a new pull request, #11502:
URL: https://github.com/apache/hudi/pull/11502

   ### Change Logs
   
   dataskipping failure mode should be strict in test. if use fallback mode 
default, the query ut is meaningless.
   
   There may be other codes that have been introduced into bugs but cannot be 
measured.
   
   - query index use dataskipping strick mode 
   - bucket query index add required parameters
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7709) Class Cast Exception while reading the data using TimestampBasedKeyGenerator

2024-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7709:
-
Labels: pull-request-available  (was: )

> Class Cast Exception while reading the data using TimestampBasedKeyGenerator
> 
>
> Key: HUDI-7709
> URL: https://issues.apache.org/jira/browse/HUDI-7709
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core
>Reporter: Aditya Goenka
>Assignee: Geser Dugarov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Github Issue - [https://github.com/apache/hudi/issues/11140]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]

2024-06-24 Thread via GitHub


geserdugarov opened a new pull request, #11501:
URL: https://github.com/apache/hudi/pull/11501

   ### Change Logs
   
   Before this MR for `TimestampBasedKeyGenerator` we got
   ```
   Failed to cast value `2004-02-29 01` to `LongType` for partition column `ts`
   ```
   When we read data by Spark, `listPartitionPaths()` is called with 
`parsePartitionColumnValues()`, and we got ClassCastException during parsing. 
But we couldn't reconstruct partition column values from partition paths when 
`TimestampBasedKeyGenerator` is used, due to lost information after 
corresponding processing of values.
   
   This MR fixes ClassCastException, but there are still not finished separate 
tasks (old ones) mentioned in the added `TestSparkSqlWithTimestampKeyGenerator`:
   
   - Fix for [HUDI-3896] overwrites 
`shouldExtractPartitionValuesFromPartitionPath` in `BaseFileOnlyRelation`. I 
couldn't figure out during fixing this issue, should it be fixed, or it should 
be left as it is.
   
   - There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieBaseHadoopFsRelationFactory`. Couldn't find a corresponding task, so 
created a new one, HUDI-7925.
   
   ### Impact
   
   Fixes ClassCastException.
   
   ### Risk level (write none, low medium or high below)
   
   Low. Affects only if `TimestampBasedKeyGenerator` is used. There is added 
`TestSparkSqlWithTimestampKeyGenerator`.
   
   ### Documentation Update
   
   No need.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7926) dataskipping failure mode should be strict in test

2024-06-24 Thread KnightChess (Jira)
KnightChess created HUDI-7926:
-

 Summary: dataskipping failure mode should be strict in test
 Key: HUDI-7926
 URL: https://issues.apache.org/jira/browse/HUDI-7926
 Project: Apache Hudi
  Issue Type: Bug
  Components: spark-sql
Reporter: KnightChess
Assignee: KnightChess


dataskipping failure mode should be strict in test. if use fallback mode 
default, the query ut is meaningless.

There may be other codes that have been introduced into bugs but cannot be 
measured.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7925) Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieHadoopFsRelationFactory`

2024-06-24 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-7925:

Summary: Implement logic for 
`shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieHadoopFsRelationFactory`  (was: Do not extract values from partition 
paths in `HoodieHadoopFsRelationFactory`)

> Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in 
> `HoodieHadoopFsRelationFactory`
> --
>
> Key: HUDI-7925
> URL: https://issues.apache.org/jira/browse/HUDI-7925
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Priority: Major
>
> There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
> `HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
> "hoodie.file.group.reader.enabled" = "true", which is default behavior, we 
> got null values.
> Need to implement logic similar to `HoodieBaseRelation`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [DOCS][MINOR] Update team and syncs pages [hudi]

2024-06-24 Thread via GitHub


bhasudha commented on PR #11500:
URL: https://github.com/apache/hudi/pull/11500#issuecomment-2186304536

   Tested locally!
   
   https://github.com/apache/hudi/assets/2179254/b8810a49-0586-45c0-8a29-e9c681af982e";>
   
   https://github.com/apache/hudi/assets/2179254/dc8cc157-6d8a-4514-a043-d8715d81a165";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [DOCS][MINOR] Update team and syncs pages [hudi]

2024-06-24 Thread via GitHub


bhasudha opened a new pull request, #11500:
URL: https://github.com/apache/hudi/pull/11500

   ### Change Logs
   
   Update team page to remove pic/avatars. Update community sync page to 
replace upcoming calls image with table.
   
   ### Impact
   
   site changes
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]

2024-06-24 Thread via GitHub


wombatu-kun opened a new pull request, #11499:
URL: https://github.com/apache/hudi/pull/11499

   ### Change Logs
   
   Removed checks in apply method of SqlQueryBasedTransformer and 
SqlFileBasedTransformer.  
   `getStringWithAltKeys` never returns `null`, if there is no config in 
properties - it throws IllegalArgumentExceprion (property xxx not found), so 
null-checking of the result is unreachable (useless) here.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7925) Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`

2024-06-24 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-7925:

Description: 
There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
"hoodie.file.group.reader.enabled" = "true", which is default behavior, we got 
null values.
Need to implement logic similar to `HoodieBaseRelation`.

  was:`shouldExtractPartitionValuesFromPartitionPath` is not used in 


> Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`
> -
>
> Key: HUDI-7925
> URL: https://issues.apache.org/jira/browse/HUDI-7925
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Priority: Major
>
> There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
> `HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
> "hoodie.file.group.reader.enabled" = "true", which is default behavior, we 
> got null values.
> Need to implement logic similar to `HoodieBaseRelation`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7925) Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`

2024-06-24 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-7925:

Description: `shouldExtractPartitionValuesFromPartitionPath` is not used in 

> Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`
> -
>
> Key: HUDI-7925
> URL: https://issues.apache.org/jira/browse/HUDI-7925
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Priority: Major
>
> `shouldExtractPartitionValuesFromPartitionPath` is not used in 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7925) Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`

2024-06-24 Thread Geser Dugarov (Jira)
Geser Dugarov created HUDI-7925:
---

 Summary: Do not extract values from partition paths in 
`HoodieHadoopFsRelationFactory`
 Key: HUDI-7925
 URL: https://issues.apache.org/jira/browse/HUDI-7925
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Geser Dugarov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [DOCS] Update Roadmap page [hudi]

2024-06-24 Thread via GitHub


bhasudha commented on PR #11491:
URL: https://github.com/apache/hudi/pull/11491#issuecomment-2186196563

   Addressed the comments. Please take a look @yihua 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DOCS] Update Roadmap page [hudi]

2024-06-24 Thread via GitHub


bhasudha commented on code in PR #11491:
URL: https://github.com/apache/hudi/pull/11491#discussion_r1650752092


##
website/src/pages/roadmap.md:
##
@@ -8,64 +8,68 @@ Hudi community strives to deliver major releases every 3-4 
months, while offerin
 This page captures the forward-looking roadmap of ongoing & upcoming projects 
and when they are expected to land, broken
 down by areas on our 
[stack](blog/2021/07/21/streaming-data-lake-platform/#hudi-stack).
 
+## Recent Release
+[0.15.0](https://issues.apache.org/jira/projects/HUDI/versions/12353381) (June 
2024)
+
 ## Future Releases
 
-Next upcoming release : 
[0.14.1](https://issues.apache.org/jira/projects/HUDI/versions/12353493) (Dec 
2023)
+- 
[1.0.0-beta2](https://issues.apache.org/jira/projects/HUDI/versions/12354810) 
(July 2024)
+- [0.16.0](https://issues.apache.org/jira/projects/HUDI/versions/12354773) - 
Bridge release supporting reads of both 1.x and 0.x Hudi versions. (Q3, 2024)
+- [1.0.0](https://issues.apache.org/jira/projects/HUDI/versions/12353848) (Q3, 
2024)

Review Comment:
   Looks like they are accessible only after signing in. Let me remove them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11490:
URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186045435

   
   ## CI report:
   
   * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11498:
URL: https://github.com/apache/hudi/pull/11498#issuecomment-2186045558

   
   ## CI report:
   
   * b95c4726bcf60bcf68bee0480069f24b1fb9ed15 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24539)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11498:
URL: https://github.com/apache/hudi/pull/11498#issuecomment-2185951989

   
   ## CI report:
   
   * b95c4726bcf60bcf68bee0480069f24b1fb9ed15 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24539)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11490:
URL: https://github.com/apache/hudi/pull/11490#issuecomment-2185951843

   
   ## CI report:
   
   * 964ce5513d52a12f592a19e0374506ab39453fca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24535)
 
   * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11498:
URL: https://github.com/apache/hudi/pull/11498#issuecomment-2185937360

   
   ## CI report:
   
   * b95c4726bcf60bcf68bee0480069f24b1fb9ed15 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]

2024-06-24 Thread via GitHub


hudi-bot commented on PR #11490:
URL: https://github.com/apache/hudi/pull/11490#issuecomment-2185937174

   
   ## CI report:
   
   * 964ce5513d52a12f592a19e0374506ab39453fca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24535)
 
   * 79a022c4a314465a3a313e3aafd5937cc673c9d6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7924) Capture Latency and Failure Metrics For Hive Table recreation

2024-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7924:
-
Labels: pull-request-available  (was: )

> Capture Latency and Failure Metrics For Hive Table recreation
> -
>
> Key: HUDI-7924
> URL: https://issues.apache.org/jira/browse/HUDI-7924
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vamsi Karnika
>Priority: Major
>  Labels: pull-request-available
>
> As part of recreating the glue and hive table whenever sync schema or 
> partition fails, we want to capture and push metrics related to latency(time 
> taken to recreate and sync the table) and a failure metric(when recreating 
> the table fails). * Push Latency metric to capture time taken to recreate and 
> sync the table
>  * Push a failure metric if recreate and sync fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]

2024-06-24 Thread via GitHub


vamsikarnika opened a new pull request, #11498:
URL: https://github.com/apache/hudi/pull/11498

   ### Change Logs
   
   Added latency and failure metrics for recreate table on meta sync failure.
   
   ### Impact
   
   - Results in pushing new metrics to prometheus which helps in monitoring the 
performance of recreating table.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (HUDI-7033) Fix read error for schema evolution + partition value extraction

2024-06-24 Thread Geser Dugarov (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859598#comment-17859598
 ] 

Geser Dugarov edited comment on HUDI-7033 at 6/24/24 7:50 AM:
--

Merged a4fa3451916de11dc082792076b62013586dadaf in linked MR 9994
refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889]


was (Author: JIRAUSER301110):
Merged a4fa3451916de11dc082792076b62013586dadaf
refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889]

> Fix read error for schema evolution + partition value extraction
> 
>
> Key: HUDI-7033
> URL: https://issues.apache.org/jira/browse/HUDI-7033
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Priority: Major
>  Labels: pull-request-available
>
> After HUDI-6960 is merged, there 
> *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore 
> partition columns in requiredSchema.
>  
> When using the configs below, there will be read errors.
>  
> {code:java}
> hoodie.datasource.read.extract.partition.values.from.path = true {code}
>  
>  
> When the config above is added together with:
>  
> {code:java}
> hoodie.schema.on.read.enable = true {code}
>  
> The query schema will be pruned to **{*}NOT{*}** contain any partition 
> columns.
>  
> When rebuilding parquet filters, file schema's columns are scanned against 
> querySchema. However, Hudi files (file schema) might still contain partition 
> columns. And when partition filters are being rebuilt with these file schema 
> against query schema, it will lead to partition columns not being found.
>  
> {code:java}
> Caused by: java.lang.IllegalArgumentException: cannot found filter col 
> name:region from querySchema: table {
>  5: id: optional int
>  6: name: optional string
>  7: ts: optional long
> }
> at 
> org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] Hudi partitions not dropped by Hive sync after `insert_overwrite_table` operation [hudi]

2024-06-24 Thread via GitHub


Limess commented on issue #8114:
URL: https://github.com/apache/hudi/issues/8114#issuecomment-2185835901

   > @codope :As stated by the Issue, the problem is a necessary occurrence. 
The version we are currently using is 0.14. @Limess :Have you not encountered 
this problem again? May I ask how was it avoided?Thanks!
   
   We never pursued this and are still on 0.13.0 for now, so I can't verify 
either way, sorry!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Reopened] (HUDI-7033) Fix read error for schema evolution + partition value extraction

2024-06-24 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov reopened HUDI-7033:
-

Merged a4fa3451916de11dc082792076b62013586dadaf
refer to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889]

> Fix read error for schema evolution + partition value extraction
> 
>
> Key: HUDI-7033
> URL: https://issues.apache.org/jira/browse/HUDI-7033
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Priority: Major
>  Labels: pull-request-available
>
> After HUDI-6960 is merged, there 
> *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore 
> partition columns in requiredSchema.
>  
> When using the configs below, there will be read errors.
>  
> {code:java}
> hoodie.datasource.read.extract.partition.values.from.path = true {code}
>  
>  
> When the config above is added together with:
>  
> {code:java}
> hoodie.schema.on.read.enable = true {code}
>  
> The query schema will be pruned to **{*}NOT{*}** contain any partition 
> columns.
>  
> When rebuilding parquet filters, file schema's columns are scanned against 
> querySchema. However, Hudi files (file schema) might still contain partition 
> columns. And when partition filters are being rebuilt with these file schema 
> against query schema, it will lead to partition columns not being found.
>  
> {code:java}
> Caused by: java.lang.IllegalArgumentException: cannot found filter col 
> name:region from querySchema: table {
>  5: id: optional int
>  6: name: optional string
>  7: ts: optional long
> }
> at 
> org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HUDI-7033) Fix read error for schema evolution + partition value extraction

2024-06-24 Thread Geser Dugarov (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859598#comment-17859598
 ] 

Geser Dugarov edited comment on HUDI-7033 at 6/24/24 7:47 AM:
--

Merged a4fa3451916de11dc082792076b62013586dadaf
refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889]


was (Author: JIRAUSER301110):
Merged a4fa3451916de11dc082792076b62013586dadaf
refer to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889]

> Fix read error for schema evolution + partition value extraction
> 
>
> Key: HUDI-7033
> URL: https://issues.apache.org/jira/browse/HUDI-7033
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Priority: Major
>  Labels: pull-request-available
>
> After HUDI-6960 is merged, there 
> *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore 
> partition columns in requiredSchema.
>  
> When using the configs below, there will be read errors.
>  
> {code:java}
> hoodie.datasource.read.extract.partition.values.from.path = true {code}
>  
>  
> When the config above is added together with:
>  
> {code:java}
> hoodie.schema.on.read.enable = true {code}
>  
> The query schema will be pruned to **{*}NOT{*}** contain any partition 
> columns.
>  
> When rebuilding parquet filters, file schema's columns are scanned against 
> querySchema. However, Hudi files (file schema) might still contain partition 
> columns. And when partition filters are being rebuilt with these file schema 
> against query schema, it will lead to partition columns not being found.
>  
> {code:java}
> Caused by: java.lang.IllegalArgumentException: cannot found filter col 
> name:region from querySchema: table {
>  5: id: optional int
>  6: name: optional string
>  7: ts: optional long
> }
> at 
> org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HUDI-7033) Fix read error for schema evolution + partition value extraction

2024-06-24 Thread Geser Dugarov (Jira)


[ https://issues.apache.org/jira/browse/HUDI-7033 ]


Geser Dugarov deleted comment on HUDI-7033:
-

was (Author: JIRAUSER301110):
Fixed in master, a4fa3451916de11dc082792076b62013586dadaf

> Fix read error for schema evolution + partition value extraction
> 
>
> Key: HUDI-7033
> URL: https://issues.apache.org/jira/browse/HUDI-7033
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Priority: Major
>  Labels: pull-request-available
>
> After HUDI-6960 is merged, there 
> *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore 
> partition columns in requiredSchema.
>  
> When using the configs below, there will be read errors.
>  
> {code:java}
> hoodie.datasource.read.extract.partition.values.from.path = true {code}
>  
>  
> When the config above is added together with:
>  
> {code:java}
> hoodie.schema.on.read.enable = true {code}
>  
> The query schema will be pruned to **{*}NOT{*}** contain any partition 
> columns.
>  
> When rebuilding parquet filters, file schema's columns are scanned against 
> querySchema. However, Hudi files (file schema) might still contain partition 
> columns. And when partition filters are being rebuilt with these file schema 
> against query schema, it will lead to partition columns not being found.
>  
> {code:java}
> Caused by: java.lang.IllegalArgumentException: cannot found filter col 
> name:region from querySchema: table {
>  5: id: optional int
>  6: name: optional string
>  7: ts: optional long
> }
> at 
> org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7033) Fix read error for schema evolution + partition value extraction

2024-06-24 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov closed HUDI-7033.
---
Resolution: Fixed

Fixed in master, a4fa3451916de11dc082792076b62013586dadaf

> Fix read error for schema evolution + partition value extraction
> 
>
> Key: HUDI-7033
> URL: https://issues.apache.org/jira/browse/HUDI-7033
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Priority: Major
>  Labels: pull-request-available
>
> After HUDI-6960 is merged, there 
> *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore 
> partition columns in requiredSchema.
>  
> When using the configs below, there will be read errors.
>  
> {code:java}
> hoodie.datasource.read.extract.partition.values.from.path = true {code}
>  
>  
> When the config above is added together with:
>  
> {code:java}
> hoodie.schema.on.read.enable = true {code}
>  
> The query schema will be pruned to **{*}NOT{*}** contain any partition 
> columns.
>  
> When rebuilding parquet filters, file schema's columns are scanned against 
> querySchema. However, Hudi files (file schema) might still contain partition 
> columns. And when partition filters are being rebuilt with these file schema 
> against query schema, it will lead to partition columns not being found.
>  
> {code:java}
> Caused by: java.lang.IllegalArgumentException: cannot found filter col 
> name:region from querySchema: table {
>  5: id: optional int
>  6: name: optional string
>  7: ts: optional long
> }
> at 
> org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)