[jira] [Updated] (HUDI-7929) Add Flink Hudi Example for K8s
[ https://issues.apache.org/jira/browse/HUDI-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiyan Xu updated HUDI-7929: Component/s: flink > Add Flink Hudi Example for K8s > -- > > Key: HUDI-7929 > URL: https://issues.apache.org/jira/browse/HUDI-7929 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7929) Add Flink Hudi Example for K8s
[ https://issues.apache.org/jira/browse/HUDI-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiyan Xu updated HUDI-7929: Fix Version/s: 1.0.0 > Add Flink Hudi Example for K8s > -- > > Key: HUDI-7929 > URL: https://issues.apache.org/jira/browse/HUDI-7929 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7929) Add Flink Hudi Example for K8s
[ https://issues.apache.org/jira/browse/HUDI-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiyan Xu reassigned HUDI-7929: --- Assignee: Zhenqiu Huang > Add Flink Hudi Example for K8s > -- > > Key: HUDI-7929 > URL: https://issues.apache.org/jira/browse/HUDI-7929 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7929) Add Flink Hudi Example for K8s
Zhenqiu Huang created HUDI-7929: --- Summary: Add Flink Hudi Example for K8s Key: HUDI-7929 URL: https://issues.apache.org/jira/browse/HUDI-7929 Project: Apache Hudi Issue Type: New Feature Reporter: Zhenqiu Huang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Fix Version/s: 1.0.0-beta2 > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0-beta2, 1.0.0 > > > The shared HFile reader in HoodieNativeAvroHFileReader uses significant > memory for reading meta info from the HFile. We should avoid keeping the > reference to the shared HFile reader and cache the meta info only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7903) Partition Stats Index not getting created with SQL
[ https://issues.apache.org/jira/browse/HUDI-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7903: Status: In Progress (was: Open) > Partition Stats Index not getting created with SQL > -- > > Key: HUDI-7903 > URL: https://issues.apache.org/jira/browse/HUDI-7903 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 1.0.0-beta2, 1.0.0 > > > {code:java} > spark.sql( > s""" > | create table $tableName using hudi > | partitioned by (dt) > | tblproperties( > |primaryKey = 'id', > |preCombineField = 'ts', > |'hoodie.metadata.index.partition.stats.enable' = 'true' > | ) > | location '$tablePath' > | AS > | select 1 as id, 'a1' as name, 10 as price, 1000 as ts, > cast('2021-05-06' as date) as dt >""".stripMargin > ) {code} > Even when partition stats is enabled, index is not created with SQL. Works > for datasource. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-7395) Fix computation for metrics in HoodieMetadataMetrics
[ https://issues.apache.org/jira/browse/HUDI-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HUDI-7395. --- > Fix computation for metrics in HoodieMetadataMetrics > > > Key: HUDI-7395 > URL: https://issues.apache.org/jira/browse/HUDI-7395 > Project: Apache Hudi > Issue Type: Bug > Components: metadata, metrics >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > For some of the metrics type like duration we are using incrementMetric > instead of setMetric. > Also some of the redundant metrics are removed. For example a count type > metric has both count and duration metric getting pushed even though duration > is not calculated. > File lookup count metric is added for bloom filter and column stat -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13
[ https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7922: Reviewers: Jonathan Vexler > Add Hudi CLI bundle for Scala 2.13 > -- > > Key: HUDI-7922 > URL: https://issues.apache.org/jira/browse/HUDI-7922 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 > and Scala 2.13. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13
[ https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7922: Status: In Progress (was: Open) > Add Hudi CLI bundle for Scala 2.13 > -- > > Key: HUDI-7922 > URL: https://issues.apache.org/jira/browse/HUDI-7922 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 > and Scala 2.13. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Status: In Progress (was: Open) > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > The shared HFile reader in HoodieNativeAvroHFileReader uses significant > memory for reading meta info from the HFile. We should avoid keeping the > reference to the shared HFile reader and cache the meta info only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13
[ https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7922: Sprint: 2024/06/17-30 > Add Hudi CLI bundle for Scala 2.13 > -- > > Key: HUDI-7922 > URL: https://issues.apache.org/jira/browse/HUDI-7922 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 > and Scala 2.13. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7922) Add Hudi CLI bundle for Scala 2.13
[ https://issues.apache.org/jira/browse/HUDI-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7922: Status: Patch Available (was: In Progress) > Add Hudi CLI bundle for Scala 2.13 > -- > > Key: HUDI-7922 > URL: https://issues.apache.org/jira/browse/HUDI-7922 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Build of Hudi CLI bundle should succeed on Scala 2.13 and work on Spark 3.5 > and Scala 2.13. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6508) Java 11 compile time support
[ https://issues.apache.org/jira/browse/HUDI-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6508: Status: Patch Available (was: In Progress) > Java 11 compile time support > > > Key: HUDI-6508 > URL: https://issues.apache.org/jira/browse/HUDI-6508 > Project: Apache Hudi > Issue Type: Bug >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Certify Hudi with Java 11 runtime support -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Description: The shared HFile reader in HoodieNativeAvroHFileReader uses significant memory for reading meta info from the HFile. We should avoid keeping the reference to the shared HFile reader and cache the meta info only. (was: The shared HFile reader in uses a significant memory ) > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > The shared HFile reader in HoodieNativeAvroHFileReader uses significant > memory for reading meta info from the HFile. We should avoid keeping the > reference to the shared HFile reader and cache the meta info only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Story Points: 4 > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > The shared HFile reader in HoodieNativeAvroHFileReader uses significant > memory for reading meta info from the HFile. We should avoid keeping the > reference to the shared HFile reader and cache the meta info only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Sprint: 2024/06/17-30 > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > The shared HFile reader in HoodieNativeAvroHFileReader uses significant > memory for reading meta info from the HFile. We should avoid keeping the > reference to the shared HFile reader and cache the meta info only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7928: --- Assignee: Ethan Guo > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
Ethan Guo created HUDI-7928: --- Summary: Fix shared HFile reader in HoodieNativeAvroHFileReader Key: HUDI-7928 URL: https://issues.apache.org/jira/browse/HUDI-7928 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Fix Version/s: 1.0.0 > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Description: The shared HFile reader in uses a significant memory > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > The shared HFile reader in uses a significant memory -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [MINOR] Reduce logging volume (#11505)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6192cfb0e95 [MINOR] Reduce logging volume (#11505) 6192cfb0e95 is described below commit 6192cfb0e95d23da462d01bb5b4c849dfbaa1f2a Author: Tim Brown AuthorDate: Mon Jun 24 19:16:53 2024 -0500 [MINOR] Reduce logging volume (#11505) --- .../apache/hudi/client/timeline/HoodieTimelineArchiver.java| 10 -- .../plan/generators/BaseHoodieCompactionPlanGenerator.java | 4 +++- .../common/table/log/BaseHoodieMergedLogRecordScanner.java | 9 +++-- .../apache/hudi/common/table/view/FileSystemViewManager.java | 10 +- .../src/main/java/org/apache/hudi/hive/HiveSyncTool.java | 2 +- 5 files changed, 16 insertions(+), 19 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java index 2f5ecb2816d..817c3f650d9 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java @@ -112,14 +112,14 @@ public class HoodieTimelineArchiver { // Sort again because the cleaning and rollback instants could break the sequence. List instantsToArchive = getInstantsToArchive().sorted().collect(Collectors.toList()); if (!instantsToArchive.isEmpty()) { -LOG.info("Archiving instants " + instantsToArchive); +LOG.info("Archiving and deleting instants {}", instantsToArchive); Consumer exceptionHandler = e -> { if (this.config.isFailOnTimelineArchivingEnabled()) { throw new HoodieException(e); } }; this.timelineWriter.write(instantsToArchive, Option.of(action -> deleteAnyLeftOverMarkers(context, action)), Option.of(exceptionHandler)); -LOG.info("Deleting archived instants " + instantsToArchive); +LOG.debug("Deleting archived instants"); deleteArchivedInstants(instantsToArchive, context); // triggers compaction and cleaning only after archiving action this.timelineWriter.compactAndClean(context); @@ -221,7 +221,7 @@ public class HoodieTimelineArchiver { LOG.info("Not archiving as there is no compaction yet on the metadata table"); return Collections.emptyList(); } else { - LOG.info("Limiting archiving of instants to latest compaction on metadata table at " + latestCompactionTime.get()); + LOG.info("Limiting archiving of instants to latest compaction on metadata table at {}", latestCompactionTime.get()); earliestInstantToRetainCandidates.add( completedCommitsTimeline.findInstantsModifiedAfterByCompletionTime(latestCompactionTime.get()).firstInstant()); } @@ -324,8 +324,6 @@ public class HoodieTimelineArchiver { } private boolean deleteArchivedInstants(List activeActions, HoodieEngineContext context) { -LOG.info("Deleting instants " + activeActions); - List pendingInstants = new ArrayList<>(); List completedInstants = new ArrayList<>(); @@ -365,7 +363,7 @@ public class HoodieTimelineArchiver { private void deleteAnyLeftOverMarkers(HoodieEngineContext context, ActiveAction activeAction) { WriteMarkers writeMarkers = WriteMarkersFactory.get(config.getMarkersType(), table, activeAction.getInstantTime()); if (writeMarkers.deleteMarkerDir(context, config.getMarkersDeleteParallelism())) { - LOG.info("Cleaned up left over marker directory for instant :" + activeAction); + LOG.info("Cleaned up left over marker directory for instant: {}", activeAction); } } } diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java index f768004cbce..e5ac5af9f64 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java @@ -91,7 +91,9 @@ public abstract class BaseHoodieCompactionPlanGenerator e this.numMergedRecordsInLog = records.size(); if (LOG.isInfoEnabled()) { - LOG.info("Number of log files scanned => {}", logFilePaths.size()); - LOG.info("MaxMemoryInBytes allowed for compaction => {}", maxMemorySizeInBytes); - LOG.info("Number of entries in Memory
[jira] [Updated] (HUDI-7586) Use table or write schema instead of deducing schema per file group for clustering
[ https://issues.apache.org/jira/browse/HUDI-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7586: Description: Right now each clustering group derives the schema on its own. Conceptually we can use one schema for all clustering groups. This is a behavior change and needs revisiting clustering logic end-to-end. > Use table or write schema instead of deducing schema per file group for > clustering > -- > > Key: HUDI-7586 > URL: https://issues.apache.org/jira/browse/HUDI-7586 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > Right now each clustering group derives the schema on its own. Conceptually > we can use one schema for all clustering groups. This is a behavior change > and needs revisiting clustering logic end-to-end. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field
[ https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7585: Description: The table schema resolver needs to read schema from the data files (base or log files) to see whether _hoodie_operation field is present for Flink CDC use cases. This can cause overhead of reading data file footers multiple times. We should see if we can store a table config to indicate if or simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation field and schema resolver). (was: The table schema resolver needs to read schema from the data files (base or log files) to see whether _hoodie_operation field is present for Flink CDC use cases. This can cause overhead of reading data file footers multiple times. We should see if we can store or simplify the Flink CDC format in Hudi 1.0 (thus no need of ).) > Avoid reading log files for resolving schema for _hoodie_operation field > > > Key: HUDI-7585 > URL: https://issues.apache.org/jira/browse/HUDI-7585 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jing Zhang >Priority: Major > Fix For: 1.0.0 > > > The table schema resolver needs to read schema from the data files (base or > log files) to see whether _hoodie_operation field is present for Flink CDC > use cases. This can cause overhead of reading data file footers multiple > times. We should see if we can store a table config to indicate if or > simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation > field and schema resolver). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field
[ https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7585: Description: The table schema resolver needs to read schema from the data files (base or log files) to see whether _hoodie_operation field is present for Flink CDC use cases. This can cause overhead of reading data file footers multiple times. We should see if we can store a table config to indicate if _hoodie_operation field is present in the table, or simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation field and schema resolver). (was: The table schema resolver needs to read schema from the data files (base or log files) to see whether _hoodie_operation field is present for Flink CDC use cases. This can cause overhead of reading data file footers multiple times. We should see if we can store a table config to indicate if or simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation field and schema resolver).) > Avoid reading log files for resolving schema for _hoodie_operation field > > > Key: HUDI-7585 > URL: https://issues.apache.org/jira/browse/HUDI-7585 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jing Zhang >Priority: Major > Fix For: 1.0.0 > > > The table schema resolver needs to read schema from the data files (base or > log files) to see whether _hoodie_operation field is present for Flink CDC > use cases. This can cause overhead of reading data file footers multiple > times. We should see if we can store a table config to indicate if > _hoodie_operation field is present in the table, or simplify the Flink CDC > format in Hudi 1.0 (thus no need of _hoodie_operation field and schema > resolver). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field
[ https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7585: Description: The table schema resolver needs to read schema from the data files (base or log files) to see whether _hoodie_operation field is present for Flink CDC use cases. This can cause overhead of reading data file footers multiple times. We should see if we can store or simplify the Flink CDC format in Hudi 1.0 (thus no need of ). (was: The table schema resolver needs to read schema from the data files (base or log files) to see whether ) > Avoid reading log files for resolving schema for _hoodie_operation field > > > Key: HUDI-7585 > URL: https://issues.apache.org/jira/browse/HUDI-7585 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jing Zhang >Priority: Major > Fix For: 1.0.0 > > > The table schema resolver needs to read schema from the data files (base or > log files) to see whether _hoodie_operation field is present for Flink CDC > use cases. This can cause overhead of reading data file footers multiple > times. We should see if we can store or simplify the Flink CDC format in > Hudi 1.0 (thus no need of ). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field
[ https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7585: Description: The table schema resolver needs to read schema from the data files (base or log files) to see whether > Avoid reading log files for resolving schema for _hoodie_operation field > > > Key: HUDI-7585 > URL: https://issues.apache.org/jira/browse/HUDI-7585 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jing Zhang >Priority: Major > Fix For: 1.0.0 > > > The table schema resolver needs to read schema from the data files (base or > log files) to see whether -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch asf-site updated: [DOCS] Update video guides (#11504)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 6f798f73561 [DOCS] Update video guides (#11504) 6f798f73561 is described below commit 6f798f7356194e805d385e882275ba7410236c2a Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Mon Jun 24 10:51:38 2024 -0700 [DOCS] Update video guides (#11504) --- ...emental-etl-and-broadcast-joins-for-faster-etl.png | Bin 0 -> 117903 bytes ...-changing-dimension-and-query-that-using-trino.png | Bin 0 -> 126625 bytes ...ing-dimension-type-2-and-query-real-time-trino.png | Bin 0 -> 125492 bytes ...utes-with-spark-sql-minio-and-query-with-trino.png | Bin 0 -> 120015 bytes ...from-pulsar-topic-into-hudi-with-deltastreamer.png | Bin 0 -> 121697 bytes ...24-06-05-multiple-spark-writers-to-hudi-tables.png | Bin 0 -> 114232 bytes commits-and-hoodie.keep.max.commits-explained.png | Bin 0 -> 123320 bytes ...time-travel-query-to-investigate-bid-and-spend.png | Bin 0 -> 124663 bytes ...tes-delete-incremental-query-stored-procedures.png | Bin 0 -> 125426 bytes ...st-xml-files-with-aws-glue-into-hudi-datalakes.png | Bin 0 -> 122656 bytes ...-Apache-Hudi-Commit-time-in-Python-and-PySpark.png | Bin 0 -> 205616 bytes ...emental-etl-and-broadcast-joins-for-faster-etl.mdx | 17 + ...-changing-dimension-and-query-that-using-trino.mdx | 17 + ...ing-dimension-type-2-and-query-real-time-trino.mdx | 18 ++ ...utes-with-spark-sql-minio-and-query-with-trino.mdx | 18 ++ ...from-pulsar-topic-into-hudi-with-deltastreamer.mdx | 16 ...24-06-05-multiple-spark-writers-to-hudi-tables.mdx | 15 +++ commits-and-hoodie.keep.max.commits-explained.mdx | 14 ++ ...time-travel-query-to-investigate-bid-and-spend.mdx | 14 ++ ...tes-delete-incremental-query-stored-procedures.mdx | 18 ++ ...st-xml-files-with-aws-glue-into-hudi-datalakes.mdx | 15 +++ ...-Apache-Hudi-Commit-time-in-Python-and-PySpark.mdx | 16 22 files changed, 178 insertions(+) diff --git a/website/static/assets/images/video_blogs/2024-05-20-deltastreamer-with-incremental-etl-and-broadcast-joins-for-faster-etl.png b/website/static/assets/images/video_blogs/2024-05-20-deltastreamer-with-incremental-etl-and-broadcast-joins-for-faster-etl.png new file mode 100644 index 000..f35bccc20fb Binary files /dev/null and b/website/static/assets/images/video_blogs/2024-05-20-deltastreamer-with-incremental-etl-and-broadcast-joins-for-faster-etl.png differ diff --git a/website/static/assets/images/video_blogs/2024-05-22-hudi-delta-streamer-implementing-slowly-changing-dimension-and-query-that-using-trino.png b/website/static/assets/images/video_blogs/2024-05-22-hudi-delta-streamer-implementing-slowly-changing-dimension-and-query-that-using-trino.png new file mode 100644 index 000..b2a0c0e6cdc Binary files /dev/null and b/website/static/assets/images/video_blogs/2024-05-22-hudi-delta-streamer-implementing-slowly-changing-dimension-and-query-that-using-trino.png differ diff --git a/website/static/assets/images/video_blogs/2024-05-22-hudi-streamer-implementing-slowly-changing-dimension-type-2-and-query-real-time-trino.png b/website/static/assets/images/video_blogs/2024-05-22-hudi-streamer-implementing-slowly-changing-dimension-type-2-and-query-real-time-trino.png new file mode 100644 index 000..d80b0677277 Binary files /dev/null and b/website/static/assets/images/video_blogs/2024-05-22-hudi-streamer-implementing-slowly-changing-dimension-type-2-and-query-real-time-trino.png differ diff --git a/website/static/assets/images/video_blogs/2024-05-23-build-hudi-date-dimension-in-minutes-with-spark-sql-minio-and-query-with-trino.png b/website/static/assets/images/video_blogs/2024-05-23-build-hudi-date-dimension-in-minutes-with-spark-sql-minio-and-query-with-trino.png new file mode 100644 index 000..08424b3e2bd Binary files /dev/null and b/website/static/assets/images/video_blogs/2024-05-23-build-hudi-date-dimension-in-minutes-with-spark-sql-minio-and-query-with-trino.png differ diff --git a/website/static/assets/images/video_blogs/2024-05-25-learn-how-to-ingest-data-from-pulsar-topic-into-hudi-with-deltastreamer.png b/website/static/assets/images/video_blogs/2024-05-25-learn-how-to-ingest-data-from-pulsar-topic-into-hudi-with-deltastreamer.png new file mode 100644 index 000..4ba03a5692a Binary files /dev/null and b/website/static/assets/images/video_blogs/2024-05-25-learn-how-to-ingest-data-from-pulsar-topic-into-hudi-with-deltastreamer.png differ diff --git a/website/static/assets/images/video_blogs/2024-06-05-multiple-spark-writers-to-hudi-tables.png b/websit
(hudi) branch asf-site updated: [DOCS] Update Roadmap page (#11491)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new fac2aaaf5e2 [DOCS] Update Roadmap page (#11491) fac2aaaf5e2 is described below commit fac2aaaf5e2b4a293fa31543e0a6ab9e102c45c7 Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Mon Jun 24 10:27:15 2024 -0700 [DOCS] Update Roadmap page (#11491) --- website/src/pages/roadmap.md | 89 1 file changed, 48 insertions(+), 41 deletions(-) diff --git a/website/src/pages/roadmap.md b/website/src/pages/roadmap.md index 32c2ad2c081..19dabef81ad 100644 --- a/website/src/pages/roadmap.md +++ b/website/src/pages/roadmap.md @@ -8,64 +8,71 @@ Hudi community strives to deliver major releases every 3-4 months, while offerin This page captures the forward-looking roadmap of ongoing & upcoming projects and when they are expected to land, broken down by areas on our [stack](blog/2021/07/21/streaming-data-lake-platform/#hudi-stack). +## Recent Release +[0.15.0](https://hudi.apache.org/releases/release-0.15.0) (June 2024) + ## Future Releases -Next upcoming release : [0.14.1](https://issues.apache.org/jira/projects/HUDI/versions/12353493) (Dec 2023) +| Release| Timeline | +||---| +| 1.0.0-beta2| July 2024 | +| 0.16.0 (Bridge release supporting reads of both 1.x and 0.x Hudi versions) | Q3, 2024 | +| 1.0.0 | Q3, 2024 | + ## Transactional Database Layer -| Feature| Target Release | Tracking | -|||--| -| Support for primary key-less table | 0.14.0 | [HUDI-4699](https://issues.apache.org/jira/browse/HUDI-4699) | -| Efficient bootstrap and migration of existing non-Hudi dataset | 0.14.0 | [HUDI-1265](https://issues.apache.org/jira/browse/HUDI-1265) | -| Record-level index to speed up UUID-based upserts and deletes | 0.14.0 | [RFC-08](https://cwiki.apache.org/confluence/display/HUDI/RFC-08++Record+level+indexing+mechanisms+for+Hudi+datasets), [HUDI-53](https://issues.apache.org/jira/browse/HUDI-53) | -|1.x Storage format | 1.0.0 | [HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242) | -| Writer performance improvements | 1.0.0 |[HUDI-3249](https://issues.apache.org/jira/browse/HUDI-3249) | -| Non-blocking concurrency control| 1.0.0 | [HUDI-3187](https://issues.apache.org/jira/browse/HUDI-3187), [HUDI-1042](https://issues.apache.org/jira/browse/HUDI-1042), [RFC-66](https://github.com/apache/hudi/pull/7907) | -| Time Travel updates, deletes | 1.0.0 || +| Feature| Target Release | Tracking | +|||| +| 1.x Storage
Re: [I] CDC data_before_after mode does not convert Spark DecimalType correctly [hudi]
phamvinh1712 commented on issue #8616: URL: https://github.com/apache/hudi/issues/8616#issuecomment-2186698563 hi @danny0405 , is there any news on this issue or any new plan to solve this? We're planning to use CDC format to handle some complex incremental processing use cases like presented in this blog https://www.onehouse.ai/blog/getting-started-incrementally-process-data-with-apache-hudi. However, with decimal values not returned correctly, we couldn't make use of CDC format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7906] improve the parallelism deduce in rdd write [hudi]
KnightChess commented on PR #11470: URL: https://github.com/apache/hudi/pull/11470#issuecomment-2186680135 @bibhu107 0.16.0 and 1.0.0, but you can cherrypick or copy it in you verstion ![image](https://github.com/apache/hudi/assets/20125927/b706356b-063c-4305-8422-05a035820802) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Update Blogs [hudi]
bhasudha merged PR #11503: URL: https://github.com/apache/hudi/pull/11503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: [DOCS] Update Blogs (#11503)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new c00568f2ade [DOCS] Update Blogs (#11503) c00568f2ade is described below commit c00568f2ade760b447e2b7d17bf7a4ef8b96ce02 Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Mon Jun 24 07:11:36 2024 -0700 [DOCS] Update Blogs (#11503) --- ...s-apache-iceberg-a-comprehensive-comparison.mdx | 19 ++ ...he-hudi-tables-python-using-daft-spark-free.mdx | 19 ++ ...ead-hudi-data-aws-glue-ray-using-daft-spark.mdx | 20 +++ ...-lakehouse-using-apache-hudi-daft-streamlit.mdx | 21 .../blog/2024-05-19-apache-hudi-on-aws-glue.mdx| 18 + ...ge-to-seamlessly-share-apache-hudi-datasets.mdx | 22 + ...ng-the-right-tool-for-your-data-lake-on-aws.mdx | 19 ++ ...-hudi-a-deep-dive-with-python-code-examples.mdx | 19 ++ ...6-18-how-to-use-apache-hudi-with-databricks.mdx | 18 + website/src/pages/talks.md | 2 ++ ...s-apache-iceberg-a-comprehensive-comparison.png | Bin 0 -> 632131 bytes ...he-hudi-tables-python-using-daft-spark-free.png | Bin 0 -> 128244 bytes ...ead-hudi-data-aws-glue-ray-using-daft-spark.png | Bin 0 -> 251050 bytes ...-lakehouse-using-apache-hudi-daft-streamlit.png | Bin 0 -> 204137 bytes .../blog/2024-05-19-apache-hudi-on-aws-glue.png| Bin 0 -> 184116 bytes ...ge-to-seamlessly-share-apache-hudi-datasets.png | Bin 0 -> 115478 bytes ...ng-the-right-tool-for-your-data-lake-on-aws.png | Bin 0 -> 678081 bytes ...-hudi-a-deep-dive-with-python-code-examples.png | Bin 0 -> 103833 bytes ...-18-how-to-use-apache-hudi-with-databricks.jpeg | Bin 0 -> 563195 bytes 19 files changed, 177 insertions(+) diff --git a/website/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.mdx b/website/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.mdx new file mode 100644 index 000..10a8864de4b --- /dev/null +++ b/website/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.mdx @@ -0,0 +1,19 @@ +--- +title: "Apache Hudi vs Apache Iceberg: A Comprehensive Comparison" +author: RisingWave marketing team +category: blog +image: /assets/images/blog/2024-04-25-apache-hudi-vs-apache-iceberg-a-comprehensive-comparison.png +tags: +- blog +- apache hudi +- apache iceberg +- comparison +- risingwave +--- + + + +import Redirect from '@site/src/components/Redirect'; + +https://risingwave.com/blog/apache-hudi-vs-apache-iceberg-a-comprehensive-comparison/";>Redirecting... please wait!! + diff --git a/website/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.mdx b/website/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.mdx new file mode 100644 index 000..31a0dfce34e --- /dev/null +++ b/website/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.mdx @@ -0,0 +1,19 @@ +--- +title: "How to Query Apache Hudi Tables with Python Using Daft: A Spark-Free Approach" +author: Soumil Shah +category: blog +image: /assets/images/blog/2024-05-02-how-query-apache-hudi-tables-python-using-daft-spark-free.png +tags: +- blog +- apache hudi +- python +- daft +- linkedin +--- + + + +import Redirect from '@site/src/components/Redirect'; + +https://www.linkedin.com/pulse/how-query-apache-hudi-tables-python-using-daft-spark-free-soumil-shah-hpdwf/";>Redirecting... please wait!! + diff --git a/website/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.mdx b/website/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.mdx new file mode 100644 index 000..ce695f31fd9 --- /dev/null +++ b/website/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.mdx @@ -0,0 +1,20 @@ +--- +title: "Learn how to read Hudi data with AWS Glue Ray using Daft (No Spark)" +author: Soumil Shah +category: blog +image: /assets/images/blog/2024-05-07-learn-how-read-hudi-data-aws-glue-ray-using-daft-spark.png +tags: +- blog +- apache hudi +- aws glue +- ray +- daft +- linkedin +--- + + + +import Redirect from '@site/src/components/Redirect'; + +https://www.linkedin.com/pulse/learn-how-read-hudi-data-aws-glue-ray-using-daft-spark-soumil-shah-kycbe/";>Redirecting... please wait!! + diff --git a/website/blog/2024-05-10-building-analytical-apps-on-the-lakehouse-using-apache-hudi-daft-streamlit.mdx b/website/blog/2024-05-10-building-analytical-apps-on-the-lakehouse-using-apache-hudi-daft-streamlit.mdx new file mode 100644 index 000..15f1523053b --- /dev/null +++ b/website/blog/2024-05-10-building-analytical-apps-on-the-lakehouse-using-apache-hudi-daft-streamlit.mdx
(hudi) branch asf-site updated: [DOCS][MINOR] Update team and syncs pages (#11500)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 3db29c96405 [DOCS][MINOR] Update team and syncs pages (#11500) 3db29c96405 is described below commit 3db29c9640574985a87935f4321f2d14b57b2415 Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Mon Jun 24 07:11:52 2024 -0700 [DOCS][MINOR] Update team and syncs pages (#11500) --- website/community/syncs.md | 6 +- website/community/team.md | 96 - .../assets/images/upcoming-community-calls.png | Bin 273228 -> 0 bytes 3 files changed, 59 insertions(+), 43 deletions(-) diff --git a/website/community/syncs.md b/website/community/syncs.md index 4870ed4a4da..d8c148fc297 100644 --- a/website/community/syncs.md +++ b/website/community/syncs.md @@ -37,5 +37,9 @@ If you would like to present in one of the community calls, please fill out the Refer to the [Apache Hudi events calendar](https://calendar.google.com/calendar/embed?src=rgpb1ta2mgp5au38fr2834poa8%40group.calendar.google.com&ctz=America%2FLos_Angeles) to find upcoming Hudi events. Here's a quick view of the upcoming community calls: -![Upcoming calls](/assets/images/upcoming-community-calls.png) + - 24th Jul 2024, 9:00 - 10:00am pacific time + - 28th Aug 2024, 9:00 - 10:00am pacific time + - 25th Sep 2024, 9:00 - 10:00am pacific time + - 23rd Oct 2024, 9:00 - 10:00am pacific time + - 27th Nov 2024, 9:00 - 10:00am pacific time diff --git a/website/community/team.md b/website/community/team.md index 75c662f1b76..13e687873c2 100644 --- a/website/community/team.md +++ b/website/community/team.md @@ -5,46 +5,58 @@ toc: true last_modified_at: 2020-09-01T15:59:57-04:00 --- -### Active Team - -| Image| Name | Role| Apache ID | -| | | --- | | -| https://avatars.githubusercontent.com/alexeykudinkin"} className="profile-pic" alt="alexeykudinkin" align="middle" /> | [Alexey Kudinkin](https://github.com/alexeykudinkin) | Committer | akudinkin | -| https://avatars.githubusercontent.com/alunarbeach"} className="profile-pic" alt="alunarbeach" align="middle" /> | [Anbu Cheeralan](https://github.com/alunarbeach) | PMC, Committer | anchee | -| https://avatars.githubusercontent.com/bhasudha"} className="profile-pic" alt="bhasudha" align="middle" /> | [Bhavani Sudha](https://github.com/bhasudha) | PMC, Committer | bhavanisudha | -| https://avatars.githubusercontent.com/bvaradar"} className="profile-pic" alt="bvaradar" align="middle" /> | [Balaji Varadarajan](https://github.com/bvaradar)| PMC, Committer | vbalaji | -| https://avatars.githubusercontent.com/danny0405"} className="profile-pic" alt="danny0405" align="middle" /> | [Danny Chan](https://github.com/danny0405) | PMC, Committer | danny0405| -| https://avatars.githubusercontent.com/yihua"} className="profile-pic" alt="yihua" align="middle" /> | [Ethan Guo](https://github.com/yihua) | PMC, Committer| yihua | -| https://avatars.githubusercontent.com/XuQianJin-Stars"} className="profile-pic" alt="XuQianJin-Stars" align="middle" /> | [Forward Xu](https://github.com/XuQianJin-Stars) | Committer | forwardxu| -| https://avatars.githubusercontent.com/garyli1019"} className="profile-pic" alt="garyli1019" align="middle" /> | [Gary Li](https://github.com/garyli1019) | PMC, Committer | garyli| -| https://avatars.githubusercontent.com/boneanxs"} className="profile-pic" alt="boneanxs" align="middle" /> | [Hui An](https://github.com/boneanxs) | Committer | rexan | -| https://avatars.githubusercontent.com/jonvex"} className="profile-pic" alt="jonvex" align="middle" /> | [Jonathan Vexler](https://github.com/jonvex) | Committer | jonvex | -| https://avatars.githubusercontent.com/beyond1920"} className="profile-pic" alt="beyond1920" align="middle" /> | [Jing Zhang](https://github.com/beyond1920)| Committer | beyond1920 | -| https://avatars.githubusercontent.com/lresende"} className="profile-pic" alt="lresende" align="middle" /> | [Luciano Resende](https://github.com/lresende) | PMC, Committer | lresende | -| https://avatars.githubusercontent.com/lamberken"} className="profile-pic" alt="lamberken" className="profile-pic" align="middle" /
Re: [PR] [DOCS][MINOR] Update team and syncs pages [hudi]
bhasudha merged PR #11500: URL: https://github.com/apache/hudi/pull/11500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]
hudi-bot commented on PR #11499: URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186586054 ## CI report: * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541) * b0a476fa5eae1ac5d3488e335d92f776305f2cdb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24545) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7927) Secondary View should only initialize when required
[ https://issues.apache.org/jira/browse/HUDI-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-7927: --- Assignee: Timothy Brown > Secondary View should only initialize when required > --- > > Key: HUDI-7927 > URL: https://issues.apache.org/jira/browse/HUDI-7927 > Project: Apache Hudi > Issue Type: Bug >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > > In the PriorityBasedFileSystemView, the secondary view will be initialized > eagerly causing extra overhead including file listing. We should avoid this > to reduce the cost for users. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7927) Secondary View should only initialize when required
Timothy Brown created HUDI-7927: --- Summary: Secondary View should only initialize when required Key: HUDI-7927 URL: https://issues.apache.org/jira/browse/HUDI-7927 Project: Apache Hudi Issue Type: Bug Reporter: Timothy Brown In the PriorityBasedFileSystemView, the secondary view will be initialized eagerly causing extra overhead including file listing. We should avoid this to reduce the cost for users. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7906] improve the parallelism deduce in rdd write [hudi]
bibhu107 commented on PR #11470: URL: https://github.com/apache/hudi/pull/11470#issuecomment-2186573812 Hi @KnightChess , This fix will be in which release? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Performance degrade for migrating from Hudi 0.7 to Hudi 0.14 [hudi]
bibhu107 commented on issue #11274: URL: https://github.com/apache/hudi/issues/11274#issuecomment-2186570524 I am closing this Issue. Thanks for support @KnightChess -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Unable to read Hudi table after hudi upgrade [hudi]
ad1happy2go commented on issue #11492: URL: https://github.com/apache/hudi/issues/11492#issuecomment-2186511758 Can you share the command you are using to submit jobs and entire stack trace now? I hope you are no longer getting `java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark2Adapter` If you getting that Then your Hudi Version used are not correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]
hudi-bot commented on PR #11502: URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186494481 ## CI report: * 2d80ec8e1a7df051c5e62423b21f29711b684630 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24543) * f85ebcae1fdf20ba678d5e1471285f7e4b8a2bd2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24544) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Update Blogs [hudi]
bhasudha commented on PR #11503: URL: https://github.com/apache/hudi/pull/11503#issuecomment-2186494468 Tested locally! https://github.com/apache/hudi/assets/2179254/98887ed3-cf0d-4c63-97f7-32ff84ec64a9";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]
hudi-bot commented on PR #11499: URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186494394 ## CI report: * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541) * b0a476fa5eae1ac5d3488e335d92f776305f2cdb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [DOCS] Update Blogs [hudi]
bhasudha opened a new pull request, #11503: URL: https://github.com/apache/hudi/pull/11503 ### Change Logs Update blogs in the website ### Impact website changes ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Performance Variation in Hudi 0.14 [hudi]
RuyRoaV commented on issue #11481: URL: https://github.com/apache/hudi/issues/11481#issuecomment-2186492501 Hello @ad1happy2go I have attached some screenshots of the Spark UI. Is there any specific screen that you'd like to see? ![Screenshot 2024-06-24 at 13 23 33](https://github.com/apache/hudi/assets/173461014/47ff1f07-23df-41f2-b12f-c5befcfdfb85) ![Screenshot 2024-06-24 at 13 23 53](https://github.com/apache/hudi/assets/173461014/68b61f60-c76c-4dab-9884-a5d677377997) Thanks for the input, will take that into account. I've also seen on some other GitHub issues, seen changing to and RLI index being recommended. Would that work for a COW table? or would the SIMPLE index still be a better approach? Best regards, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]
hudi-bot commented on PR #11499: URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186479637 ## CI report: * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]
hudi-bot commented on PR #11502: URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186479786 ## CI report: * 2d80ec8e1a7df051c5e62423b21f29711b684630 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24543) * f85ebcae1fdf20ba678d5e1471285f7e4b8a2bd2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]
hudi-bot commented on PR #11502: URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186464201 ## CI report: * 2d80ec8e1a7df051c5e62423b21f29711b684630 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24543) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]
hudi-bot commented on PR #11501: URL: https://github.com/apache/hudi/pull/11501#issuecomment-2186464134 ## CI report: * 84080accf11a859cea238d12025d69f7f3ce4269 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24542) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]
hudi-bot commented on PR #11490: URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186463946 ## CI report: * 3fa4ba583278c67c62dd3c063b55f07301815c2c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24540) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]
hudi-bot commented on PR #11499: URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186379582 ## CI report: * ca09e626015a153e6351c3d2dcc5e2f95fe3988c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24541) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]
hudi-bot commented on PR #11502: URL: https://github.com/apache/hudi/pull/11502#issuecomment-2186379738 ## CI report: * 2d80ec8e1a7df051c5e62423b21f29711b684630 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]
hudi-bot commented on PR #11501: URL: https://github.com/apache/hudi/pull/11501#issuecomment-2186379640 ## CI report: * 84080accf11a859cea238d12025d69f7f3ce4269 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24542) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]
hudi-bot commented on PR #11490: URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186379443 ## CI report: * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538) * 3fa4ba583278c67c62dd3c063b55f07301815c2c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24540) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]
KnightChess commented on code in PR #11502: URL: https://github.com/apache/hudi/pull/11502#discussion_r1650888209 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -472,6 +473,11 @@ object HoodieFileIndex extends Logging { properties.setProperty(DataSourceReadOptions.FILE_INDEX_LISTING_MODE_OVERRIDE.key, listingModeOverride) } +if (tableConfig != null) { + properties.setProperty(RECORDKEY_FIELD.key, tableConfig.getRecordKeyFields.orElse(Array.empty).mkString(",")) Review Comment: bucket query index need use it to create correct `KetGenerator`, otherwise will create `AutoRecordGenWrapperKeyGenerator` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]
hudi-bot commented on PR #11499: URL: https://github.com/apache/hudi/pull/11499#issuecomment-2186366377 ## CI report: * ca09e626015a153e6351c3d2dcc5e2f95fe3988c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]
hudi-bot commented on PR #11501: URL: https://github.com/apache/hudi/pull/11501#issuecomment-2186366427 ## CI report: * 84080accf11a859cea238d12025d69f7f3ce4269 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]
hudi-bot commented on PR #11490: URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186366222 ## CI report: * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538) * 3fa4ba583278c67c62dd3c063b55f07301815c2c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7926) dataskipping failure mode should be strict in test
[ https://issues.apache.org/jira/browse/HUDI-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7926: - Labels: pull-request-available (was: ) > dataskipping failure mode should be strict in test > -- > > Key: HUDI-7926 > URL: https://issues.apache.org/jira/browse/HUDI-7926 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: KnightChess >Assignee: KnightChess >Priority: Critical > Labels: pull-request-available > > dataskipping failure mode should be strict in test. if use fallback mode > default, the query ut is meaningless. > There may be other codes that have been introduced into bugs but cannot be > measured. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7926] dataskipping failure mode should be strict in query test [hudi]
KnightChess opened a new pull request, #11502: URL: https://github.com/apache/hudi/pull/11502 ### Change Logs dataskipping failure mode should be strict in test. if use fallback mode default, the query ut is meaningless. There may be other codes that have been introduced into bugs but cannot be measured. - query index use dataskipping strick mode - bucket query index add required parameters ### Impact None ### Risk level (write none, low medium or high below) none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7709) Class Cast Exception while reading the data using TimestampBasedKeyGenerator
[ https://issues.apache.org/jira/browse/HUDI-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7709: - Labels: pull-request-available (was: ) > Class Cast Exception while reading the data using TimestampBasedKeyGenerator > > > Key: HUDI-7709 > URL: https://issues.apache.org/jira/browse/HUDI-7709 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core >Reporter: Aditya Goenka >Assignee: Geser Dugarov >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > > Github Issue - [https://github.com/apache/hudi/issues/11140] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]
geserdugarov opened a new pull request, #11501: URL: https://github.com/apache/hudi/pull/11501 ### Change Logs Before this MR for `TimestampBasedKeyGenerator` we got ``` Failed to cast value `2004-02-29 01` to `LongType` for partition column `ts` ``` When we read data by Spark, `listPartitionPaths()` is called with `parsePartitionColumnValues()`, and we got ClassCastException during parsing. But we couldn't reconstruct partition column values from partition paths when `TimestampBasedKeyGenerator` is used, due to lost information after corresponding processing of values. This MR fixes ClassCastException, but there are still not finished separate tasks (old ones) mentioned in the added `TestSparkSqlWithTimestampKeyGenerator`: - Fix for [HUDI-3896] overwrites `shouldExtractPartitionValuesFromPartitionPath` in `BaseFileOnlyRelation`. I couldn't figure out during fixing this issue, should it be fixed, or it should be left as it is. - There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieBaseHadoopFsRelationFactory`. Couldn't find a corresponding task, so created a new one, HUDI-7925. ### Impact Fixes ClassCastException. ### Risk level (write none, low medium or high below) Low. Affects only if `TimestampBasedKeyGenerator` is used. There is added `TestSparkSqlWithTimestampKeyGenerator`. ### Documentation Update No need. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7926) dataskipping failure mode should be strict in test
KnightChess created HUDI-7926: - Summary: dataskipping failure mode should be strict in test Key: HUDI-7926 URL: https://issues.apache.org/jira/browse/HUDI-7926 Project: Apache Hudi Issue Type: Bug Components: spark-sql Reporter: KnightChess Assignee: KnightChess dataskipping failure mode should be strict in test. if use fallback mode default, the query ut is meaningless. There may be other codes that have been introduced into bugs but cannot be measured. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7925) Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieHadoopFsRelationFactory`
[ https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-7925: Summary: Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieHadoopFsRelationFactory` (was: Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`) > Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in > `HoodieHadoopFsRelationFactory` > -- > > Key: HUDI-7925 > URL: https://issues.apache.org/jira/browse/HUDI-7925 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Geser Dugarov >Priority: Major > > There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in > `HoodieHadoopFsRelationFactory`. Therefore during reading of data with > "hoodie.file.group.reader.enabled" = "true", which is default behavior, we > got null values. > Need to implement logic similar to `HoodieBaseRelation`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [DOCS][MINOR] Update team and syncs pages [hudi]
bhasudha commented on PR #11500: URL: https://github.com/apache/hudi/pull/11500#issuecomment-2186304536 Tested locally! https://github.com/apache/hudi/assets/2179254/b8810a49-0586-45c0-8a29-e9c681af982e";> https://github.com/apache/hudi/assets/2179254/dc8cc157-6d8a-4514-a043-d8715d81a165";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [DOCS][MINOR] Update team and syncs pages [hudi]
bhasudha opened a new pull request, #11500: URL: https://github.com/apache/hudi/pull/11500 ### Change Logs Update team page to remove pic/avatars. Update community sync page to replace upcoming calls image with table. ### Impact site changes ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Removed useless checks from SqlBasedTransformers [hudi]
wombatu-kun opened a new pull request, #11499: URL: https://github.com/apache/hudi/pull/11499 ### Change Logs Removed checks in apply method of SqlQueryBasedTransformer and SqlFileBasedTransformer. `getStringWithAltKeys` never returns `null`, if there is no config in properties - it throws IllegalArgumentExceprion (property xxx not found), so null-checking of the result is unreachable (useless) here. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7925) Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`
[ https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-7925: Description: There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieHadoopFsRelationFactory`. Therefore during reading of data with "hoodie.file.group.reader.enabled" = "true", which is default behavior, we got null values. Need to implement logic similar to `HoodieBaseRelation`. was:`shouldExtractPartitionValuesFromPartitionPath` is not used in > Do not extract values from partition paths in `HoodieHadoopFsRelationFactory` > - > > Key: HUDI-7925 > URL: https://issues.apache.org/jira/browse/HUDI-7925 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Geser Dugarov >Priority: Major > > There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in > `HoodieHadoopFsRelationFactory`. Therefore during reading of data with > "hoodie.file.group.reader.enabled" = "true", which is default behavior, we > got null values. > Need to implement logic similar to `HoodieBaseRelation`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7925) Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`
[ https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-7925: Description: `shouldExtractPartitionValuesFromPartitionPath` is not used in > Do not extract values from partition paths in `HoodieHadoopFsRelationFactory` > - > > Key: HUDI-7925 > URL: https://issues.apache.org/jira/browse/HUDI-7925 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Geser Dugarov >Priority: Major > > `shouldExtractPartitionValuesFromPartitionPath` is not used in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7925) Do not extract values from partition paths in `HoodieHadoopFsRelationFactory`
Geser Dugarov created HUDI-7925: --- Summary: Do not extract values from partition paths in `HoodieHadoopFsRelationFactory` Key: HUDI-7925 URL: https://issues.apache.org/jira/browse/HUDI-7925 Project: Apache Hudi Issue Type: Improvement Reporter: Geser Dugarov -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [DOCS] Update Roadmap page [hudi]
bhasudha commented on PR #11491: URL: https://github.com/apache/hudi/pull/11491#issuecomment-2186196563 Addressed the comments. Please take a look @yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Update Roadmap page [hudi]
bhasudha commented on code in PR #11491: URL: https://github.com/apache/hudi/pull/11491#discussion_r1650752092 ## website/src/pages/roadmap.md: ## @@ -8,64 +8,68 @@ Hudi community strives to deliver major releases every 3-4 months, while offerin This page captures the forward-looking roadmap of ongoing & upcoming projects and when they are expected to land, broken down by areas on our [stack](blog/2021/07/21/streaming-data-lake-platform/#hudi-stack). +## Recent Release +[0.15.0](https://issues.apache.org/jira/projects/HUDI/versions/12353381) (June 2024) + ## Future Releases -Next upcoming release : [0.14.1](https://issues.apache.org/jira/projects/HUDI/versions/12353493) (Dec 2023) +- [1.0.0-beta2](https://issues.apache.org/jira/projects/HUDI/versions/12354810) (July 2024) +- [0.16.0](https://issues.apache.org/jira/projects/HUDI/versions/12354773) - Bridge release supporting reads of both 1.x and 0.x Hudi versions. (Q3, 2024) +- [1.0.0](https://issues.apache.org/jira/projects/HUDI/versions/12353848) (Q3, 2024) Review Comment: Looks like they are accessible only after signing in. Let me remove them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]
hudi-bot commented on PR #11490: URL: https://github.com/apache/hudi/pull/11490#issuecomment-2186045435 ## CI report: * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]
hudi-bot commented on PR #11498: URL: https://github.com/apache/hudi/pull/11498#issuecomment-2186045558 ## CI report: * b95c4726bcf60bcf68bee0480069f24b1fb9ed15 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24539) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]
hudi-bot commented on PR #11498: URL: https://github.com/apache/hudi/pull/11498#issuecomment-2185951989 ## CI report: * b95c4726bcf60bcf68bee0480069f24b1fb9ed15 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24539) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]
hudi-bot commented on PR #11490: URL: https://github.com/apache/hudi/pull/11490#issuecomment-2185951843 ## CI report: * 964ce5513d52a12f592a19e0374506ab39453fca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24535) * 79a022c4a314465a3a313e3aafd5937cc673c9d6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24538) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]
hudi-bot commented on PR #11498: URL: https://github.com/apache/hudi/pull/11498#issuecomment-2185937360 ## CI report: * b95c4726bcf60bcf68bee0480069f24b1fb9ed15 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7911] Enable cdc log for MOR table [hudi]
hudi-bot commented on PR #11490: URL: https://github.com/apache/hudi/pull/11490#issuecomment-2185937174 ## CI report: * 964ce5513d52a12f592a19e0374506ab39453fca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24535) * 79a022c4a314465a3a313e3aafd5937cc673c9d6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7924) Capture Latency and Failure Metrics For Hive Table recreation
[ https://issues.apache.org/jira/browse/HUDI-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7924: - Labels: pull-request-available (was: ) > Capture Latency and Failure Metrics For Hive Table recreation > - > > Key: HUDI-7924 > URL: https://issues.apache.org/jira/browse/HUDI-7924 > Project: Apache Hudi > Issue Type: Task >Reporter: Vamsi Karnika >Priority: Major > Labels: pull-request-available > > As part of recreating the glue and hive table whenever sync schema or > partition fails, we want to capture and push metrics related to latency(time > taken to recreate and sync the table) and a failure metric(when recreating > the table fails). * Push Latency metric to capture time taken to recreate and > sync the table > * Push a failure metric if recreate and sync fails. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7924] Capture Latency and Failure Metrics For Hive Table recreation [hudi]
vamsikarnika opened a new pull request, #11498: URL: https://github.com/apache/hudi/pull/11498 ### Change Logs Added latency and failure metrics for recreate table on meta sync failure. ### Impact - Results in pushing new metrics to prometheus which helps in monitoring the performance of recreating table. ### Risk level (write none, low medium or high below) None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-7033) Fix read error for schema evolution + partition value extraction
[ https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859598#comment-17859598 ] Geser Dugarov edited comment on HUDI-7033 at 6/24/24 7:50 AM: -- Merged a4fa3451916de11dc082792076b62013586dadaf in linked MR 9994 refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] was (Author: JIRAUSER301110): Merged a4fa3451916de11dc082792076b62013586dadaf refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] > Fix read error for schema evolution + partition value extraction > > > Key: HUDI-7033 > URL: https://issues.apache.org/jira/browse/HUDI-7033 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Priority: Major > Labels: pull-request-available > > After HUDI-6960 is merged, there > *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore > partition columns in requiredSchema. > > When using the configs below, there will be read errors. > > {code:java} > hoodie.datasource.read.extract.partition.values.from.path = true {code} > > > When the config above is added together with: > > {code:java} > hoodie.schema.on.read.enable = true {code} > > The query schema will be pruned to **{*}NOT{*}** contain any partition > columns. > > When rebuilding parquet filters, file schema's columns are scanned against > querySchema. However, Hudi files (file schema) might still contain partition > columns. And when partition filters are being rebuilt with these file schema > against query schema, it will lead to partition columns not being found. > > {code:java} > Caused by: java.lang.IllegalArgumentException: cannot found filter col > name:region from querySchema: table { > 5: id: optional int > 6: name: optional string > 7: ts: optional long > } > at > org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] Hudi partitions not dropped by Hive sync after `insert_overwrite_table` operation [hudi]
Limess commented on issue #8114: URL: https://github.com/apache/hudi/issues/8114#issuecomment-2185835901 > @codope :As stated by the Issue, the problem is a necessary occurrence. The version we are currently using is 0.14. @Limess :Have you not encountered this problem again? May I ask how was it avoided?Thanks! We never pursued this and are still on 0.13.0 for now, so I can't verify either way, sorry! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Reopened] (HUDI-7033) Fix read error for schema evolution + partition value extraction
[ https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov reopened HUDI-7033: - Merged a4fa3451916de11dc082792076b62013586dadaf refer to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] > Fix read error for schema evolution + partition value extraction > > > Key: HUDI-7033 > URL: https://issues.apache.org/jira/browse/HUDI-7033 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Priority: Major > Labels: pull-request-available > > After HUDI-6960 is merged, there > *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore > partition columns in requiredSchema. > > When using the configs below, there will be read errors. > > {code:java} > hoodie.datasource.read.extract.partition.values.from.path = true {code} > > > When the config above is added together with: > > {code:java} > hoodie.schema.on.read.enable = true {code} > > The query schema will be pruned to **{*}NOT{*}** contain any partition > columns. > > When rebuilding parquet filters, file schema's columns are scanned against > querySchema. However, Hudi files (file schema) might still contain partition > columns. And when partition filters are being rebuilt with these file schema > against query schema, it will lead to partition columns not being found. > > {code:java} > Caused by: java.lang.IllegalArgumentException: cannot found filter col > name:region from querySchema: table { > 5: id: optional int > 6: name: optional string > 7: ts: optional long > } > at > org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-7033) Fix read error for schema evolution + partition value extraction
[ https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859598#comment-17859598 ] Geser Dugarov edited comment on HUDI-7033 at 6/24/24 7:47 AM: -- Merged a4fa3451916de11dc082792076b62013586dadaf refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] was (Author: JIRAUSER301110): Merged a4fa3451916de11dc082792076b62013586dadaf refer to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] > Fix read error for schema evolution + partition value extraction > > > Key: HUDI-7033 > URL: https://issues.apache.org/jira/browse/HUDI-7033 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Priority: Major > Labels: pull-request-available > > After HUDI-6960 is merged, there > *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore > partition columns in requiredSchema. > > When using the configs below, there will be read errors. > > {code:java} > hoodie.datasource.read.extract.partition.values.from.path = true {code} > > > When the config above is added together with: > > {code:java} > hoodie.schema.on.read.enable = true {code} > > The query schema will be pruned to **{*}NOT{*}** contain any partition > columns. > > When rebuilding parquet filters, file schema's columns are scanned against > querySchema. However, Hudi files (file schema) might still contain partition > columns. And when partition filters are being rebuilt with these file schema > against query schema, it will lead to partition columns not being found. > > {code:java} > Caused by: java.lang.IllegalArgumentException: cannot found filter col > name:region from querySchema: table { > 5: id: optional int > 6: name: optional string > 7: ts: optional long > } > at > org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (HUDI-7033) Fix read error for schema evolution + partition value extraction
[ https://issues.apache.org/jira/browse/HUDI-7033 ] Geser Dugarov deleted comment on HUDI-7033: - was (Author: JIRAUSER301110): Fixed in master, a4fa3451916de11dc082792076b62013586dadaf > Fix read error for schema evolution + partition value extraction > > > Key: HUDI-7033 > URL: https://issues.apache.org/jira/browse/HUDI-7033 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Priority: Major > Labels: pull-request-available > > After HUDI-6960 is merged, there > *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore > partition columns in requiredSchema. > > When using the configs below, there will be read errors. > > {code:java} > hoodie.datasource.read.extract.partition.values.from.path = true {code} > > > When the config above is added together with: > > {code:java} > hoodie.schema.on.read.enable = true {code} > > The query schema will be pruned to **{*}NOT{*}** contain any partition > columns. > > When rebuilding parquet filters, file schema's columns are scanned against > querySchema. However, Hudi files (file schema) might still contain partition > columns. And when partition filters are being rebuilt with these file schema > against query schema, it will lead to partition columns not being found. > > {code:java} > Caused by: java.lang.IllegalArgumentException: cannot found filter col > name:region from querySchema: table { > 5: id: optional int > 6: name: optional string > 7: ts: optional long > } > at > org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7033) Fix read error for schema evolution + partition value extraction
[ https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov closed HUDI-7033. --- Resolution: Fixed Fixed in master, a4fa3451916de11dc082792076b62013586dadaf > Fix read error for schema evolution + partition value extraction > > > Key: HUDI-7033 > URL: https://issues.apache.org/jira/browse/HUDI-7033 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Priority: Major > Labels: pull-request-available > > After HUDI-6960 is merged, there > *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore > partition columns in requiredSchema. > > When using the configs below, there will be read errors. > > {code:java} > hoodie.datasource.read.extract.partition.values.from.path = true {code} > > > When the config above is added together with: > > {code:java} > hoodie.schema.on.read.enable = true {code} > > The query schema will be pruned to **{*}NOT{*}** contain any partition > columns. > > When rebuilding parquet filters, file schema's columns are scanned against > querySchema. However, Hudi files (file schema) might still contain partition > columns. And when partition filters are being rebuilt with these file schema > against query schema, it will lead to partition columns not being found. > > {code:java} > Caused by: java.lang.IllegalArgumentException: cannot found filter col > name:region from querySchema: table { > 5: id: optional int > 6: name: optional string > 7: ts: optional long > } > at > org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)