[GitHub] [hudi] silencily opened a new issue, #8096: [SUPPORT]Whether there are some problems when using hudi-trino-bundle of 0.13.0 version to query hudi table of 0.12.2 version
silencily opened a new issue, #8096: URL: https://github.com/apache/hudi/issues/8096 **Describe the problem you faced** Now I use trino of 407 version to query hudi table,hudi-trino-bundle version i use 0.13.0 version, and our hudi tables were created by 0.12.2 version. I need some help to confirm whether there are some problems of known by community when using different hudi version. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-4372) Enable matadata table by default for flink
[ https://issues.apache.org/jira/browse/HUDI-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-4372. Fix Version/s: 0.14.0 (was: 0.13.1) Resolution: Fixed Fixed via master branch: 9bb6b55440cf385844c757344f66148039e657e8 > Enable matadata table by default for flink > -- > > Key: HUDI-4372 > URL: https://issues.apache.org/jira/browse/HUDI-4372 > Project: Apache Hudi > Issue Type: New Feature > Components: flink, metadata >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated (2ddcf96cddb -> 9bb6b55440c)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 2ddcf96cddb [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server (#8079) add 9bb6b55440c [HUDI-4372] Enable matadata table by default for flink (#8070) No new revisions were added by this update. Summary of changes: .../apache/hudi/client/BaseHoodieWriteClient.java | 2 +- .../client/transaction/ConcurrentOperation.java| 12 +-- .../hudi/table/action/clean/CleanPlanner.java | 37 +++--- .../functional/TestHoodieBackedMetadata.java | 8 +++-- .../apache/hudi/configuration/FlinkOptions.java| 2 +- .../sink/clustering/HoodieFlinkClusteringJob.java | 3 ++ .../hudi/sink/compact/HoodieFlinkCompactor.java| 3 ++ .../hudi/table/catalog/HoodieHiveCatalog.java | 2 +- .../java/org/apache/hudi/util/CompactionUtil.java | 19 ++- .../apache/hudi/table/ITTestHoodieDataSource.java | 20 ++-- .../org/apache/hudi/utils/TestClusteringUtil.java | 4 +++ .../org/apache/hudi/utils/TestCompactionUtil.java | 13 packaging/bundle-validation/flink/insert.sql | 1 + packaging/hudi-flink-bundle/pom.xml| 3 ++ 14 files changed, 107 insertions(+), 22 deletions(-)
[GitHub] [hudi] danny0405 merged pull request #8070: [HUDI-4372] Enable metadata table by default for flink
danny0405 merged PR #8070: URL: https://github.com/apache/hudi/pull/8070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 closed issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance
soumilshah1995 closed issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance URL: https://github.com/apache/hudi/issues/8031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 commented on issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance
soumilshah1995 commented on issue #8031: URL: https://github.com/apache/hudi/issues/8031#issuecomment-1454933980 Adding setting has resolved issue ``` try: import os import sys import uuid import pyspark from pyspark.sql import SparkSession from pyspark import SparkConf, SparkContext from pyspark.sql.functions import col, asc, desc from pyspark.sql.functions import col, to_timestamp, monotonically_increasing_id, to_date, when from pyspark.sql.functions import * from pyspark.sql.types import * from datetime import datetime from functools import reduce from faker import Faker except Exception as e: pass SUBMIT_ARGS = "--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.1 pyspark-shell" os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS os.environ['PYSPARK_PYTHON'] = sys.executable os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable spark = SparkSession.builder \ .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \ .config('className', 'org.apache.hudi') \ .config('spark.sql.hive.convertMetastoreParquet', 'false') \ .getOrCreate() db_name = "hudidb" table_name = "hudi_table" recordkey = 'uuid' precombine = 'date' path = f"file:///C:/tmp/{db_name}/{table_name}" method = 'upsert' table_type = "COPY_ON_WRITE" # COPY_ON_WRITE | MERGE_ON_READ hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': recordkey, 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.operation': method, 'hoodie.datasource.write.precombine.field': precombine, 'hoodie.datasource.write.partitionpath.field': 'date', "hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled":"true", "hoodie-conf hoodie.datasource.write.partitionpath.field":"date", 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.TimestampBasedKeyGenerator', 'hoodie.deltastreamer.keygen.timebased.timestamp.type': 'DATE_STRING', 'hoodie.deltastreamer.keygen.timebased.timezone':"GMT+8:00", 'hoodie.deltastreamer.keygen.timebased.input.dateformat': '-MM-dd hh:mm:ss', 'hoodie.deltastreamer.keygen.timebased.output.dateformat': '/MM/dd' } #Input field value: “2020-01-06 12:12:12” # Partition path generated from key generator: “2020-01-06 12” data_items = [ (1, "mess 1", 111, "2020-01-06 12:12:12"), (2, "mes 2", 22, "2020-01-06 12:12:12"), ] columns = ["uuid", "message", "precomb", "date"] spark_df = spark.createDataFrame(data=data_items, schema=columns) spark_df.show() spark_df.printSchema() spark_df.write.format("hudi"). \ options(**hudi_options). \ mode("append"). \ save(path) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 commented on issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance
soumilshah1995 commented on issue #8031: URL: https://github.com/apache/hudi/issues/8031#issuecomment-1454933866 Thank you very much for taking time and answering my question looking fwd to pass this as tutorial to community -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (d40a6211f64 -> 2ddcf96cddb)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from d40a6211f64 [HUDI-5796] Adding auto inferring partition from incoming df (#7951) add 2ddcf96cddb [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server (#8079) No new revisions were added by this update. Summary of changes: .../TestRemoteFileSystemViewWithMetadataTable.java | 275 + .../table/view/AbstractTableFileSystemView.java| 34 ++- .../IncrementalTimelineSyncFileSystemView.java | 20 +- .../metadata/HoodieMetadataFileSystemView.java | 20 +- .../HoodieBackedTestDelayedTableMetadata.java | 54 .../hudi/timeline/service/RequestHandler.java | 27 +- 6 files changed, 390 insertions(+), 40 deletions(-) create mode 100644 hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestRemoteFileSystemViewWithMetadataTable.java create mode 100644 hudi-common/src/test/java/org/apache/hudi/metadata/HoodieBackedTestDelayedTableMetadata.java
[GitHub] [hudi] yihua merged pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server
yihua merged PR #8079: URL: https://github.com/apache/hudi/pull/8079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark
hudi-bot commented on PR #8095: URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454858086 ## CI report: * d84c2d5274ac3e5525996156d3057a77dfe200d3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15579) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark
hudi-bot commented on PR #8095: URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454837270 ## CI report: * d84c2d5274ac3e5525996156d3057a77dfe200d3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15579) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark
hudi-bot commented on PR #8095: URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454823902 ## CI report: * d84c2d5274ac3e5525996156d3057a77dfe200d3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] aajisaka commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark
aajisaka commented on PR #8095: URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454821323 It's first time for me to contribute to Hudi and I don't have Hudi contributor privilege. Can someone grant it to me? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5866) Fix unnecessary log messages during bulk insert in Spark
[ https://issues.apache.org/jira/browse/HUDI-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5866: - Labels: pull-request-available (was: ) > Fix unnecessary log messages during bulk insert in Spark > > > Key: HUDI-5866 > URL: https://issues.apache.org/jira/browse/HUDI-5866 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Affects Versions: 0.13.0 >Reporter: Akira Ajisaka >Priority: Major > Labels: pull-request-available > > HUDI-5544 fixed excessive log message issue in Flink, but it's not fixed in > Spark. We need to make a similar fix in hudi-spark-client > https://github.com/apache/hudi/blob/47356a57930687c1bdfa66d1a62421d8a5fc0b29/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BulkInsertDataInternalWriterHelper.java#L147 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] aajisaka opened a new pull request, #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark
aajisaka opened a new pull request, #8095: URL: https://github.com/apache/hudi/pull/8095 ### Change Logs Currently a log msg that says "Creating new file for partition path" is generated every time the current partition changes, even when no new file is being created (which is confusing). This issue is fixed by #7658 in Flink, but it's not fixed in Spark. ### Impact N/A ### Risk level (write none, low medium or high below) low ### Documentation Update N/A ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink
hudi-bot commented on PR #8070: URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454803021 ## CI report: * 8bc5774747acac448ef96b036a4e38d832255441 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15578) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Reopened] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reopened HUDI-5728: -- > HoodieTimelineArchiver archives the latest instant before inflight > replacecommit > > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang resolved HUDI-5728. -- > HoodieTimelineArchiver archives the latest instant before inflight > replacecommit > > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
[ https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reopened HUDI-5772: -- > Align Flink clustering configuration with HoodieClusteringConfig > > > Key: HUDI-5772 > URL: https://issues.apache.org/jira/browse/HUDI-5772 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.13.1 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are > 'clustering.plan.strategy.cluster.begin.partition', > 'clustering.plan.strategy.cluster.end.partition', > 'clustering.plan.strategy.partition.regex.pattern', > 'clustering.plan.strategy.partition.selected' options which do not align the > clustering configuration of HoodieClusteringConfig. FlinkOptions, > FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering > configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
[ https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang resolved HUDI-5772. -- > Align Flink clustering configuration with HoodieClusteringConfig > > > Key: HUDI-5772 > URL: https://issues.apache.org/jira/browse/HUDI-5772 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.13.1 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are > 'clustering.plan.strategy.cluster.begin.partition', > 'clustering.plan.strategy.cluster.end.partition', > 'clustering.plan.strategy.partition.regex.pattern', > 'clustering.plan.strategy.partition.selected' options which do not align the > clustering configuration of HoodieClusteringConfig. FlinkOptions, > FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering > configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS
[ https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-5531. Resolution: Won't Fix > RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to > RECENT_PARTITIONS > > > Key: HUDI-5531 > URL: https://issues.apache.org/jira/browse/HUDI-5531 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.13.1 > > > The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: > output recent partition given skip num and days lookback config, therefore > the RECENT_DAYS strategy doesn't match the semantics because it assumes that > Hudi partitions are partitioned by day, but partitioning by hour can also use > this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode > should rename to RECENT_PARTITIONS for the semantics match. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-2503) HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service
[ https://issues.apache.org/jira/browse/HUDI-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-2503. Resolution: Fixed > HoodieFlinkWriteClient supports to allow parallel writing to tables using > Locking service > - > > Key: HUDI-2503 > URL: https://issues.apache.org/jira/browse/HUDI-2503 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > The strategy interface for conflict resolution with multiple writers is > introduced and the SparkRDDWriteClient has integrated with the > ConflictResolutionStrategy. HoodieFlinkWriteClient should also support to > allow parallel writing to tables using Locking service based on > ConflictResolutionStrategy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
[ https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-5772. Resolution: Fixed > Align Flink clustering configuration with HoodieClusteringConfig > > > Key: HUDI-5772 > URL: https://issues.apache.org/jira/browse/HUDI-5772 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.13.1 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are > 'clustering.plan.strategy.cluster.begin.partition', > 'clustering.plan.strategy.cluster.end.partition', > 'clustering.plan.strategy.partition.regex.pattern', > 'clustering.plan.strategy.partition.selected' options which do not align the > clustering configuration of HoodieClusteringConfig. FlinkOptions, > FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering > configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-5728. Resolution: Fixed > HoodieTimelineArchiver archives the latest instant before inflight > replacecommit > > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink
hudi-bot commented on PR #8070: URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454766654 ## CI report: * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15577) * 8bc5774747acac448ef96b036a4e38d832255441 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15578) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink
hudi-bot commented on PR #8070: URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454763414 ## CI report: * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15577) * 8bc5774747acac448ef96b036a4e38d832255441 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink
hudi-bot commented on PR #8070: URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454761576 ## CI report: * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15577) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink
danny0405 commented on PR #8070: URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454757688 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server
hudi-bot commented on PR #8079: URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454732191 ## CI report: * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN * d2b11c5747266e7f3cb77dfa19193bdb89548e50 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15562) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server
hudi-bot commented on PR #8079: URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454730347 ## CI report: * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN * d660deb903eed17560554a0145464598089fb3ea Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15574) * d2b11c5747266e7f3cb77dfa19193bdb89548e50 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8094: [HUDI-5876] Remove usage of deprecated TableConfig.
hudi-bot commented on PR #8094: URL: https://github.com/apache/hudi/pull/8094#issuecomment-1454727473 ## CI report: * 0123e176853d34eabeb39c71f45061b927c0d93a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15575) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server
hudi-bot commented on PR #8079: URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454712975 ## CI report: * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN * d660deb903eed17560554a0145464598089fb3ea Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15574) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink
hudi-bot commented on PR #8070: URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454700721 ## CI report: * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8076: Support bulk_insert for insert_overwrite and insert_overwrite_table
hudi-bot commented on PR #8076: URL: https://github.com/apache/hudi/pull/8076#issuecomment-1454680899 ## CI report: * 6a239ada8998fd440f19c0082b26d206ed589870 UNKNOWN * f384bbc843028360687903b3b6de835685235b68 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15570) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8094: [HUDI-5876] Remove usage of deprecated TableConfig.
hudi-bot commented on PR #8094: URL: https://github.com/apache/hudi/pull/8094#issuecomment-1454668571 ## CI report: * 0123e176853d34eabeb39c71f45061b927c0d93a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15575) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8094: [HUDI-5876] Remove usage of deprecated TableConfig.
hudi-bot commented on PR #8094: URL: https://github.com/apache/hudi/pull/8094#issuecomment-1454667399 ## CI report: * 0123e176853d34eabeb39c71f45061b927c0d93a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server
hudi-bot commented on PR #8079: URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454667392 ## CI report: * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN * 7fff406e74cdf3faf047634a2d596399fa49f059 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15572) * d660deb903eed17560554a0145464598089fb3ea Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15574) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5876) Remove usage of deprecated TableConfig.
[ https://issues.apache.org/jira/browse/HUDI-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5876: - Labels: pull-request-available (was: ) > Remove usage of deprecated TableConfig. > --- > > Key: HUDI-5876 > URL: https://issues.apache.org/jira/browse/HUDI-5876 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > This is a small change, I found out that SortOperatorGen initializes > TableConfig using deprecated method. Use recommended methods to improve. > TableConfig > /** Please use \{@link TableConfig#getDefault()} instead. */ > @Deprecated > public TableConfig() {} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] slfan1989 opened a new pull request, #8094: [HUDI-5876] Remove usage of deprecated TableConfig.
slfan1989 opened a new pull request, #8094: URL: https://github.com/apache/hudi/pull/8094 ### Change Logs JIRA: HUDI-5876. Remove usage of deprecated TableConfig. This is a small change, I found out that SortOperatorGen initializes TableConfig using deprecated method. Use recommended methods to improve. TableConfig ``` /** Please use {@link TableConfig#getDefault()} instead. */ @Deprecated public TableConfig() {} ``` ### Impact none. ### Risk level (write none, low medium or high below) none. ### Documentation Update none. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server
hudi-bot commented on PR #8079: URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454666283 ## CI report: * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN * c162956f9f418b4603328c37f9e2babf59613d4b Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15571) * 7fff406e74cdf3faf047634a2d596399fa49f059 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15572) * d660deb903eed17560554a0145464598089fb3ea UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5876) Remove usage of deprecated TableConfig.
Shilun Fan created HUDI-5876: Summary: Remove usage of deprecated TableConfig. Key: HUDI-5876 URL: https://issues.apache.org/jira/browse/HUDI-5876 Project: Apache Hudi Issue Type: Improvement Reporter: Shilun Fan Assignee: Shilun Fan This is a small change, I found out that SortOperatorGen initializes TableConfig using deprecated method. Use recommended methods to improve. TableConfig /** Please use \{@link TableConfig#getDefault()} instead. */ @Deprecated public TableConfig() {} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 commented on a diff in pull request #7687: [HUDI-5606] Update to handle deletes in postgres debezium
danny0405 commented on code in PR #7687: URL: https://github.com/apache/hudi/pull/7687#discussion_r1125417796 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/debezium/DebeziumSource.java: ## @@ -86,21 +90,28 @@ public DebeziumSource(TypedProperties props, JavaSparkContext sparkContext, deserializerClassName = props.getString(DataSourceWriteOptions.KAFKA_AVRO_VALUE_DESERIALIZER_CLASS().key(), DataSourceWriteOptions.KAFKA_AVRO_VALUE_DESERIALIZER_CLASS().defaultValue()); +// Currently, debezium source requires Confluent/Kafka schema-registry to fetch the latest schema. +if (schemaProvider == null || !(schemaProvider instanceof SchemaRegistryProvider)) { + schemaRegistryProvider = new SchemaRegistryProvider(props, sparkContext); + schemaProvider = schemaRegistryProvider; +} else { + schemaRegistryProvider = (SchemaRegistryProvider) schemaProvider; +} + try { props.put(NATIVE_KAFKA_VALUE_DESERIALIZER_PROP, Class.forName(deserializerClassName).getName()); + if (deserializerClassName.equals(KafkaAvroSchemaDeserializer.class.getName())) { +if (schemaProvider == null) { + throw new HoodieIOException("SchemaProvider has to be set to use KafkaAvroSchemaDeserializer"); +} +props.put(KAFKA_AVRO_VALUE_DESERIALIZER_SCHEMA, schemaProvider.getSourceSchema().toString()); + } Review Comment: Let's try to add some tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7687: [HUDI-5606] Update to handle deletes in postgres debezium
danny0405 commented on code in PR #7687: URL: https://github.com/apache/hudi/pull/7687#discussion_r1125417754 ## hudi-common/src/main/java/org/apache/hudi/common/model/debezium/AbstractDebeziumAvroPayload.java: ## @@ -55,19 +55,26 @@ public AbstractDebeziumAvroPayload(Option record) { @Override public Option getInsertValue(Schema schema) throws IOException { -IndexedRecord insertRecord = getInsertRecord(schema); -return handleDeleteOperation(insertRecord); +Option insertRecord = getInsertRecord(schema); +if (!insertRecord.isPresent()) { + return insertRecord; +} +return handleDeleteOperation(insertRecord.get()); } @Override public Option combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException { // Step 1: If the time occurrence of the current record in storage is higher than the time occurrence of the // insert record (including a delete record), pick the current record. -if (shouldPickCurrentRecord(currentValue, getInsertRecord(schema), schema)) { - return Option.of(currentValue); +Option indexedRecordOption = getInsertValue(schema); +if (indexedRecordOption.isPresent()) { + if (shouldPickCurrentRecord(currentValue, getInsertRecord(schema).get(), schema)) { +return Option.of(currentValue); + } + // Step 2: Pick the insert record (as a delete record if its a deleted event) + return getInsertValue(schema); Review Comment: No need to invoke `getInsertValue(schema);` twice, can fallback to line 77 directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nfarah86 closed pull request #8093: Docs update1
nfarah86 closed pull request #8093: Docs update1 URL: https://github.com/apache/hudi/pull/8093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nfarah86 opened a new pull request, #8093: Docs update1
nfarah86 opened a new pull request, #8093: URL: https://github.com/apache/hudi/pull/8093 cc @yihua cc @danny0405 cc @bhasudha please review the PR for docs: Timeline ![Screenshot 2023-03-03 at 4 37 51 PM](https://user-images.githubusercontent.com/5392555/222884865-91878270-85e6-450d-ae53-cc68e87875b1.png) Flink ![Screenshot 2023-03-04 at 12 07 23 AM](https://user-images.githubusercontent.com/5392555/222884867-fb69a5bb-2b56-40bd-a8df-ba5073e0.png) File sizing ![Screenshot 2023-03-04 at 12 07 57 AM](https://user-images.githubusercontent.com/5392555/222884868-67aea232-7759-4bc2-8be9-bed098223457.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org