Re: [PR] [HUDI-7938] Broadcast `SerializableConfiguration` to avoid NullPointerException in Kryo SerDe [hudi]
hudi-bot commented on PR #11626: URL: https://github.com/apache/hudi/pull/11626#issuecomment-2227217009 ## CI report: * 256044ead7c3ab3a1c69f3fa46e36417965bb837 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24840) * 01da75c614c6a3a50a9ecca4e4a1ce315886355f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24852) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7938] Broadcast `SerializableConfiguration` to avoid NullPointerException in Kryo SerDe [hudi]
hudi-bot commented on PR #11626: URL: https://github.com/apache/hudi/pull/11626#issuecomment-2227207604 ## CI report: * 256044ead7c3ab3a1c69f3fa46e36417965bb837 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24840) * 01da75c614c6a3a50a9ecca4e4a1ce315886355f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7938) HadoopStorageConfiguration is not properly broadcasted with PySpark
[ https://issues.apache.org/jira/browse/HUDI-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-7938: Summary: HadoopStorageConfiguration is not properly broadcasted with PySpark (was: Missed HoodieSparkKryoRegistrar in Hadoop config by default) > HadoopStorageConfiguration is not properly broadcasted with PySpark > --- > > Key: HUDI-7938 > URL: https://issues.apache.org/jira/browse/HUDI-7938 > Project: Apache Hudi > Issue Type: Bug >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Major > Labels: pull-request-available > > HUDI-7567 Add schema evolution to the filegroup reader (#10957), > but broke integration with PySpark. > When trying to call > {quote}df_load = > spark.read.format({color:#067d17}"org.apache.hudi"{color}).load(tmp_dir_path) > df_load.collect() > {quote} > > got: > > {quote}24/06/28 11:22:06 WARN TaskSetManager: Lost task 1.0 in stage 27.0 > (TID 31) (10.199.141.90 executor 0): java.lang.NullPointerException > at org.apache.hadoop.conf.Configuration.(Configuration.java:842) > at > org.apache.hudi.storage.hadoop.HadoopStorageConfiguration.unwrapCopy(HadoopStorageConfiguration.java:73) > at > org.apache.hudi.storage.hadoop.HadoopStorageConfiguration.unwrapCopy(HadoopStorageConfiguration.java:36) > at > org.apache.spark.sql.execution.datasources.parquet.SparkParquetReaderBase.read(SparkParquetReaderBase.scala:58) > at > org.apache.spark.sql.execution.datasources.parquet.HoodieFileGroupReaderBasedParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(HoodieFileGroupReaderBasedParquetFileFormat.scala:197) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:594) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > at org.apache.spark.scheduler.Task.run(Task.scala:139) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {quote} > Spark 3.4.3 was used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7938] Broadcast `SerializableConfiguration` to avoid NullPointerException in Kryo SerDe [hudi]
geserdugarov commented on PR #11626: URL: https://github.com/apache/hudi/pull/11626#issuecomment-2227205668 Changes in this MR shouldn't change results of Flink related CI (`ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction`). Restarted CI by rebasing and force pushing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7976) Fix BUG introduced in HUDI-7955 due to usage of wrong class
[ https://issues.apache.org/jira/browse/HUDI-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7976: - Fix Version/s: 1.1.0 > Fix BUG introduced in HUDI-7955 due to usage of wrong class > --- > > Key: HUDI-7976 > URL: https://issues.apache.org/jira/browse/HUDI-7976 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > In the bugfix for HUDI-7955, the wrong class for invoking {{getTimestamp > }}was used. > # {*}Wrong{*}: org.apache.hadoop.hive.common.type.Timestamp > # {*}Correct{*}: org.apache.hadoop.hive.serde2.io.TimestampWritableV2 > > !https://git.garena.com/shopee/data-infra/hudi/uploads/eeff29b3e741c65eeb48f9901fa28da0/image.png|width=468,height=235! > > Submitting a bugfix to fix this bugfix... > Log levels for the exception block is also changed to warn so errors will be > printed out. > On top of that, we have simplified the {{getMillis}} shim to remove the > method that was added in HUDI-7955 to standardise it with how {{getDays}} is > written. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7976) Fix BUG introduced in HUDI-7955 due to usage of wrong class
[ https://issues.apache.org/jira/browse/HUDI-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7976. Resolution: Fixed Fixed via master branch: 918c2e0009c054f9fcd4ca19ba3258c491483708 > Fix BUG introduced in HUDI-7955 due to usage of wrong class > --- > > Key: HUDI-7976 > URL: https://issues.apache.org/jira/browse/HUDI-7976 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > In the bugfix for HUDI-7955, the wrong class for invoking {{getTimestamp > }}was used. > # {*}Wrong{*}: org.apache.hadoop.hive.common.type.Timestamp > # {*}Correct{*}: org.apache.hadoop.hive.serde2.io.TimestampWritableV2 > > !https://git.garena.com/shopee/data-infra/hudi/uploads/eeff29b3e741c65eeb48f9901fa28da0/image.png|width=468,height=235! > > Submitting a bugfix to fix this bugfix... > Log levels for the exception block is also changed to warn so errors will be > printed out. > On top of that, we have simplified the {{getMillis}} shim to remove the > method that was added in HUDI-7955 to standardise it with how {{getDays}} is > written. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class (#11612)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 918c2e0009c [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class (#11612) 918c2e0009c is described below commit 918c2e0009c054f9fcd4ca19ba3258c491483708 Author: voonhous AuthorDate: Sun Jul 14 11:29:44 2024 +0800 [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class (#11612) --- .../hudi/hadoop/utils/HiveAvroSerializer.java | 3 +- .../apache/hudi/hadoop/utils/HoodieHiveUtils.java | 8 +--- .../apache/hudi/hadoop/utils/shims/Hive2Shim.java | 9 + .../apache/hudi/hadoop/utils/shims/Hive3Shim.java | 45 ++ .../apache/hudi/hadoop/utils/shims/HiveShim.java | 4 +- 5 files changed, 26 insertions(+), 43 deletions(-) diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java index 47d984c89c3..0c3362ba981 100644 --- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java +++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java @@ -304,8 +304,7 @@ public class HiveAvroSerializer { case DATE: return HoodieHiveUtils.getDays(structFieldData); case TIMESTAMP: -Object timestamp = HoodieHiveUtils.getTimestamp(structFieldData); -return HoodieHiveUtils.getMills(timestamp); +return HoodieHiveUtils.getMills(structFieldData); case INT: if (schema.getLogicalType() != null && schema.getLogicalType().getName().equals("date")) { return new WritableDateObjectInspector().getPrimitiveWritableObject(structFieldData).getDays(); diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java index ced39ccf379..b4894c35d41 100644 --- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java +++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java @@ -182,15 +182,11 @@ public class HoodieHiveUtils { return HIVE_SHIM.getDateWriteable(value); } - public static Object getTimestamp(Object fieldData) { -return HIVE_SHIM.unwrapTimestampAsPrimitive(fieldData); - } - public static int getDays(Object dateWritable) { return HIVE_SHIM.getDays(dateWritable); } - public static long getMills(Object timestamp) { -return HIVE_SHIM.getMills(timestamp); + public static long getMills(Object timestampWritable) { +return HIVE_SHIM.getMills(timestampWritable); } } diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive2Shim.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive2Shim.java index e2a4f36cb7f..7f4b683d246 100644 --- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive2Shim.java +++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive2Shim.java @@ -42,11 +42,6 @@ public class Hive2Shim implements HiveShim { return new TimestampWritable(timestamp); } - @Override - public Object unwrapTimestampAsPrimitive(Object o) { -return o == null ? null : ((TimestampWritable) o).getTimestamp(); - } - public Writable getDateWriteable(int value) { return new DateWritable(value); } @@ -55,7 +50,7 @@ public class Hive2Shim implements HiveShim { return ((DateWritable) dateWritable).getDays(); } - public long getMills(Object timestamp) { -return ((Timestamp) timestamp).getTime(); + public long getMills(Object timestampWritable) { +return ((TimestampWritable) timestampWritable).getTimestamp().getTime(); } } diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive3Shim.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive3Shim.java index 9d6dca4f2b3..bc5b7b3e124 100644 --- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive3Shim.java +++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive3Shim.java @@ -36,11 +36,12 @@ public class Hive3Shim implements HiveShim { public static final Logger LOG = LoggerFactory.getLogger(Hive3Shim.class); - public static final String HIVE_TIMESTAMP_TYPE_CLASS = "org.apache.hadoop.hive.common.type.Timestamp"; - public static final String TIMESTAMP_WRITEABLE_V2_CLASS = "org.apache.hadoop.hive.serde2.io.TimestampWritableV2"; - public static final String DATE_WRITEABLE_V2_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2"; + public static final String TIMESTAMP_CLASS_NAME = "org.apache.hadoop.hive.common.type.Timestamp"; + public static final String TIMESTAMP_WRITEABLE_V2_CLASS_NAME = "org.apache.hadoo
Re: [PR] [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class [hudi]
danny0405 merged PR #11612: URL: https://github.com/apache/hudi/pull/11612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Release notes 1.0.0-beta2 [hudi]
codope commented on code in PR #11618: URL: https://github.com/apache/hudi/pull/11618#discussion_r1676996631 ## website/docs/metadata.md: ## @@ -90,6 +90,32 @@ Following are the different indices currently available under the metadata table Hudi release, this index aids in locating records faster than other existing indices and can provide a speedup orders of magnitude faster in large deployments where index lookup dominates write latencies. + New Indexes in 1.0.0 + +- ***Functional Index***: + A [functional index](https://github.com/apache/hudi/blob/3789840be3d041cbcfc6b24786740210e4e6d6ac/rfc/rfc-63/rfc-63.md) + is an index on a function of a column. If a query has a predicate on a function of a column, the functional index can + be used to speed up the query. Functional index is stored in *func_index_* prefixed partitions (one for each + function) under metadata table. Functional index can be created using SQL syntax. Please checkout SQL DDL + docs [here](/docs/next/sql_ddl#create-functional-index) for more details. + +- ***Partition Stats Index*** + Partition stats index aggregates statistics at the partition level for the columns for which it is enabled. This helps + in efficient partition pruning even for non-partition fields. The partition stats index is stored in *partition_stats* + partition under metadata table. Partition stats index can be enabled using the following configs (note it is required + to specify the columns for which stats should be aggregated): + ```properties +hoodie.metadata.index.partition.stats.enable=true +hoodie.metadata.index.column.stats.columns= + ``` + +- ***Secondary Index***: + Secondary indexes allow users to create indexes on columns that are not part of record key columns in Hudi tables (for + record key fields, Hudi supports [Record-level Index](/blog/2023/11/01/record-level-index). Secondary indexes + can be used to speed up queries with predicate on columns other than record key columns. + +To try out these features, refer to the [SQL guide](/docs/next/sql_ddl#create-partition-stats-index). Review Comment: yes i added in sql guide.. there is one section for partition stats and secondary index combined. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Release notes 1.0.0-beta2 [hudi]
codope commented on code in PR #11618: URL: https://github.com/apache/hudi/pull/11618#discussion_r1676996508 ## website/releases/release-1.0.0-beta2.md: ## @@ -0,0 +1,80 @@ +--- +title: "Release 1.0.0-beta2" +sidebar_position: 1 +layout: releases +toc: true +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## [Release 1.0.0-beta2](https://github.com/apache/hudi/releases/tag/release-1.0.0-beta2) ([docs](/docs/next/quick-start-guide)) + +Apache Hudi 1.0.0-beta2 is the second beta release of Apache Hudi. This release is meant for early adopters to try +out the new features and provide feedback. The release is not meant for production use. + +## Migration Guide + +This release contains major format changes as we will see in highlights below. We encourage users to try out the +**1.0.0-beta2** features on new tables. The 1.0 general availability (GA) release will support automatic table upgrades +from 0.x versions, while also ensuring full backward compatibility when reading 0.x Hudi tables using 1.0, ensuring a +seamless migration experience. + +:::caution +Given that timeline format and log file format has changed in this **beta release**, it is recommended not to attempt to do +rolling upgrades from older versions to this release. +::: + +## Highlights + +### Format changes + +[HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242) is the main epic covering all the format changes proposals, +which are also partly covered in the [Hudi 1.0 tech specification](/tech-specs-1point0). The following are the main +changes in this release: + + Timeline + +No major changes in this release. Refer to [1.0.0-beta1#timeline](release-1.0.0-beta1.md#timeline) for more details. + + Log File Format + +In addition to the fields in the log file header added in [1.0.0-beta1](release-1.0.0-beta1.md#log-file-format), we also +store a flag, `IS_PARTIAL` to indicate whether the log block contains partial updates or not. + +### Metadata indexes + +In 1.0.0-beta1, we added support for functional index. In 1.0.0-beta2, we have added support for secondary indexes and +partition stats index to the [multi-modal indexing](/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi) subsystem. + + Secondary Indexes + +Secondary indexes allow users to create indexes on columns that are not part of record key columns in Hudi tables (for +record key fields, Hudi supports [Record-level Index](/blog/2023/11/01/record-level-index). Secondary indexes can be used to speed up +queries with predicate on columns other than record key columns. + + Partition Stats Index + +Partition stats index aggregates statistics at the partition level for the columns for which it is enabled. This helps +in efficient partition pruning even for non-partition fields. + +To try out these features, refer to the [SQL guide](/docs/next/sql_ddl#create-partition-stats-index). + +### API Changes + + Positional Merging + +In 1.0.0-beta1, we added a new [filegroup reader](/releases/release-1.0.0-beta1#new-filegroup-reader). The reader now +provides position-based merging, as an alternative to existing key-based merging, and skipping pages based on record +positions. The new filegroup reader is integrated with Spark and Hive, and enabled by default. To enable positional +merging set below configs: + +```properties Review Comment: We should enable by default. I guess there are still a few gaps. I have enabled in https://github.com/apache/hudi/pull/11620 and tracking failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class [hudi]
hudi-bot commented on PR #11612: URL: https://github.com/apache/hudi/pull/11612#issuecomment-2227167064 ## CI report: * 108e890a065a78c91d0bf28457b0bf2ec888e78b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24845) * 1e812dabbe90feeca9bd902654e92e1f8fc2de10 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24850) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class [hudi]
hudi-bot commented on PR #11612: URL: https://github.com/apache/hudi/pull/11612#issuecomment-2227163520 ## CI report: * 108e890a065a78c91d0bf28457b0bf2ec888e78b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24845) * 1e812dabbe90feeca9bd902654e92e1f8fc2de10 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: [DOCS] Add doc update for HUDI-7962 (#11622)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 6ad04164dac [DOCS] Add doc update for HUDI-7962 (#11622) 6ad04164dac is described below commit 6ad04164dac86b2ee805845d77d28dd246130e40 Author: houyuting AuthorDate: Sun Jul 14 09:08:04 2024 +0800 [DOCS] Add doc update for HUDI-7962 (#11622) Co-authored-by: houyuting --- website/docs/sql_ddl.md | 12 1 file changed, 12 insertions(+) diff --git a/website/docs/sql_ddl.md b/website/docs/sql_ddl.md index a85d8a7bb04..eebadfc580e 100644 --- a/website/docs/sql_ddl.md +++ b/website/docs/sql_ddl.md @@ -496,6 +496,18 @@ SHOW PARTITIONS hudi_table; --Drop partition: ALTER TABLE hudi_table DROP PARTITION (dt='2021-12-09', hh='10'); ``` +### Show create table + +**Syntax** + +```sql +SHOW CREATE TABLE tableIdentifier; +``` + +**Examples** +```sql +SHOW CREATE TABLE hudi_table; +``` ### Caveats
Re: [PR] [DOCS] Add doc update for HUDI-7962 [hudi]
danny0405 merged PR #11622: URL: https://github.com/apache/hudi/pull/11622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7980) Optimize the configuration content when performing clustering with row writer
[ https://issues.apache.org/jira/browse/HUDI-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7980. Resolution: Fixed Fixed via master branch: 98b3d3bac0f31219e5b93b7528516b27b87ea699 > Optimize the configuration content when performing clustering with row writer > - > > Key: HUDI-7980 > URL: https://issues.apache.org/jira/browse/HUDI-7980 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ma Jian >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Currently, the row writer defaults to snapshot reads for all tables. However, > this method is relatively inefficient for MOR (Merge on Read) tables when > there are no logs. Therefore, we should optimize this part of the > configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7980) Optimize the configuration content when performing clustering with row writer
[ https://issues.apache.org/jira/browse/HUDI-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7980: - Fix Version/s: 1.0.0 > Optimize the configuration content when performing clustering with row writer > - > > Key: HUDI-7980 > URL: https://issues.apache.org/jira/browse/HUDI-7980 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ma Jian >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Currently, the row writer defaults to snapshot reads for all tables. However, > this method is relatively inefficient for MOR (Merge on Read) tables when > there are no logs. Therefore, we should optimize this part of the > configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7980] Optimize the configuration content when performing clustering with row writer [hudi]
danny0405 merged PR #11614: URL: https://github.com/apache/hudi/pull/11614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7980] Optimize the configuration content when performing clustering with row writer (#11614)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 98b3d3bac0f [HUDI-7980] Optimize the configuration content when performing clustering with row writer (#11614) 98b3d3bac0f is described below commit 98b3d3bac0f31219e5b93b7528516b27b87ea699 Author: majian <47964462+majian1...@users.noreply.github.com> AuthorDate: Sun Jul 14 09:06:37 2024 +0800 [HUDI-7980] Optimize the configuration content when performing clustering with row writer (#11614) --- .../run/strategy/MultipleSparkJobExecutionStrategy.java | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java index 75b42491eda..47ccd8700a8 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java @@ -93,7 +93,6 @@ import java.util.stream.Collectors; import java.util.stream.Stream; import static org.apache.hudi.client.utils.SparkPartitionUtils.getPartitionFieldVals; -import static org.apache.hudi.common.config.HoodieCommonConfig.TIMESTAMP_AS_OF; import static org.apache.hudi.config.HoodieClusteringConfig.PLAN_STRATEGY_SORT_COLUMNS; import static org.apache.hudi.io.storage.HoodieSparkIOFactory.getHoodieSparkIOFactory; @@ -438,8 +437,11 @@ public abstract class MultipleSparkJobExecutionStrategy .toArray(StoragePath[]::new); HashMap params = new HashMap<>(); -params.put("hoodie.datasource.query.type", "snapshot"); -params.put(TIMESTAMP_AS_OF.key(), instantTime); +if (hasLogFiles) { + params.put("hoodie.datasource.query.type", "snapshot"); +} else { + params.put("hoodie.datasource.query.type", "read_optimized"); +} StoragePath[] paths; if (hasLogFiles) {
Re: [I] hive sql查询hudi分区表,如果分区字段不是表最后一列,解析parquet文件后返回的数据,没有查询分区字段单在分区字段列位置自动增加了分区字段的值,导致后续列错误发生类型转换问题 [hudi]
danny0405 commented on issue #11609: URL: https://github.com/apache/hudi/issues/11609#issuecomment-2227155521 Hive is a legacy repo and I don't think force the partiton fields in the last of the schema is the right behavior to follow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7986] Fix Duplicate handling behavior when Precombine value is not set [hudi]
hudi-bot commented on PR #11630: URL: https://github.com/apache/hudi/pull/11630#issuecomment-2227123686 ## CI report: * 544cb739d5fe30a5af0279a85d198167a85d0baf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24848) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7986] Fix Dupes behavior with Precombine [hudi]
hudi-bot commented on PR #11630: URL: https://github.com/apache/hudi/pull/11630#issuecomment-2227103064 ## CI report: * 544cb739d5fe30a5af0279a85d198167a85d0baf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24848) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7986] Fix Dupes behavior with Precombine [hudi]
hudi-bot commented on PR #11630: URL: https://github.com/apache/hudi/pull/11630#issuecomment-2227101414 ## CI report: * 544cb739d5fe30a5af0279a85d198167a85d0baf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7986) Make precombine field optional with Dedup feature for Mutable Streams
[ https://issues.apache.org/jira/browse/HUDI-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7986: - Labels: pull-request-available (was: ) > Make precombine field optional with Dedup feature for Mutable Streams > - > > Key: HUDI-7986 > URL: https://issues.apache.org/jira/browse/HUDI-7986 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sivaguru Kannan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7986] Fix Dupes behavior with Precombine [hudi]
csivaguru opened a new pull request, #11630: URL: https://github.com/apache/hudi/pull/11630 Opening a draft PR for OSS fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7986) Make precombine field optional with Dedup feature for Mutable Streams
Sivaguru Kannan created HUDI-7986: - Summary: Make precombine field optional with Dedup feature for Mutable Streams Key: HUDI-7986 URL: https://issues.apache.org/jira/browse/HUDI-7986 Project: Apache Hudi Issue Type: Bug Reporter: Sivaguru Kannan -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7985] Support more formats in timestamp logical types in Json Avro converter [hudi]
hudi-bot commented on PR #11629: URL: https://github.com/apache/hudi/pull/11629#issuecomment-2227076436 ## CI report: * 1f582f381e89945bce7b5b97e33fc2e66c0d7b5f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24847) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class [hudi]
hudi-bot commented on PR #11612: URL: https://github.com/apache/hudi/pull/11612#issuecomment-2227076299 ## CI report: * 108e890a065a78c91d0bf28457b0bf2ec888e78b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24845) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Failed to update metadata(hudi 0.15.0) [hudi]
MrAladdin commented on issue #11587: URL: https://github.com/apache/hudi/issues/11587#issuecomment-2227070835 @nsivabalan @ad1happy2go @danny0405 @codope There are a large number of metadata-related jobs missing from the Spark UI due to an exceptional deltacommit that has been consistently in the INFLIGHT state. Furthermore, using the command `commit showfiles --commit exception_deltacommit_id` does not reveal any file write information. Additionally, the `.hoodie/.temp/` directory contains folders and data corresponding to these exception_deltacommit_ids. exception : ![Image](https://github.com/user-attachments/assets/e90f410d-89b9-40cb-b0d0-465f30c7ce57) normal : ![Image](https://github.com/user-attachments/assets/fde4ce58-0d4e-4ee4-8f56-c6688c0175f6) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] fix the target location for auxlib download in hudi CLI [hudi]
hudi-bot commented on PR #11628: URL: https://github.com/apache/hudi/pull/11628#issuecomment-2227047795 ## CI report: * d57782938c183d8d9ba3039e4be32c3a284fc89e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24846) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] docs: add release guide [hudi-rs]
codecov[bot] commented on PR #66: URL: https://github.com/apache/hudi-rs/pull/66#issuecomment-2227038878 ## [Codecov](https://app.codecov.io/gh/apache/hudi-rs/pull/66?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 87.19%. Comparing base [(`2c59bf1`)](https://app.codecov.io/gh/apache/hudi-rs/commit/2c59bf100c5e77df002edecb2bef8defaa5f209e?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) to head [(`1c59693`)](https://app.codecov.io/gh/apache/hudi-rs/commit/1c596939be9ac6e3e2e8973e61bc14d670d6032d?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #66 +/- ## === Coverage 87.19% 87.19% === Files 13 13 Lines 687 687 === Hits 599 599 Misses 88 88 ``` [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/hudi-rs/pull/66?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] docs: add release guide [hudi-rs]
xushiyan opened a new pull request, #66: URL: https://github.com/apache/hudi-rs/pull/66 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] fix the target location for auxlib download in hudi CLI [hudi]
hudi-bot commented on PR #11628: URL: https://github.com/apache/hudi/pull/11628#issuecomment-2227033491 ## CI report: * d57782938c183d8d9ba3039e4be32c3a284fc89e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24846) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7985] Support more formats in timestamp logical types in Json Avro converter [hudi]
hudi-bot commented on PR #11629: URL: https://github.com/apache/hudi/pull/11629#issuecomment-2227033499 ## CI report: * 1f582f381e89945bce7b5b97e33fc2e66c0d7b5f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24847) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class [hudi]
hudi-bot commented on PR #11612: URL: https://github.com/apache/hudi/pull/11612#issuecomment-2227033459 ## CI report: * 6ceca16530ca218d73a2624c18b09bd07b28b116 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24824) * 108e890a065a78c91d0bf28457b0bf2ec888e78b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24845) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7985] Support more formats in timestamp logical types in Json Avro converter [hudi]
hudi-bot commented on PR #11629: URL: https://github.com/apache/hudi/pull/11629#issuecomment-2227015816 ## CI report: * 1f582f381e89945bce7b5b97e33fc2e66c0d7b5f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] fix the target location for auxlib download in hudi CLI [hudi]
hudi-bot commented on PR #11628: URL: https://github.com/apache/hudi/pull/11628#issuecomment-2227015741 ## CI report: * d57782938c183d8d9ba3039e4be32c3a284fc89e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7976] Fix BUG introduced in HUDI-7955 due to usage of wrong class [hudi]
hudi-bot commented on PR #11612: URL: https://github.com/apache/hudi/pull/11612#issuecomment-2227015499 ## CI report: * 6ceca16530ca218d73a2624c18b09bd07b28b116 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24824) * 108e890a065a78c91d0bf28457b0bf2ec888e78b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] hudi-common 0.14.0 jar in mavenCentral appears to have corrupt generated avro classes [hudi]
lucasmo commented on issue #11602: URL: https://github.com/apache/hudi/issues/11602#issuecomment-2227007832 Here is a reproducer script: ```bash #!/usr/bin/env bash MAVEN="https://repo1.maven.org/maven2"; ARTIFACTS="\ org/apache/avro/avro/1.11.3/avro-1.11.3.jar \ com/fasterxml/jackson/core/jackson-core/2.17.1/jackson-core-2.17.1.jar \ com/fasterxml/jackson/core/jackson-databind/2.17.1/jackson-databind-2.17.1.jar \ com/fasterxml/jackson/core/jackson-annotations/2.17.1/jackson-annotations-2.17.1.jar \ org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.jar \ org/apache/hudi/hudi-common/0.14.0/hudi-common-0.14.0.jar \ " CLASSPATH="" for artifact in $ARTIFACTS; do curl -O "${MAVEN}/${artifact}" jar=$(basename "$artifact") CLASSPATH="${CLASSPATH}:${jar}" done echo $CLASSPATH echo 'org.apache.avro.Schema schema = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"HoodieCleanPartitionMetadata\",\"namespace\":\"org.apache.hudi.avro.model\",\"fields\":[{\"name\":\"partitionPath\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"policy\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"deletePathPatterns\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"successDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"failedDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"isPartitionDeleted\",\"type\":[\"null\",\"boolean\"],\"default\":null}]}"); System.out.println("Class for schema: " + org.apache.avro.specific.SpecificData.get().getClass(schema));' |\ jshell --class-path "${CLASSPATH}" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7985: Description: Following error is thrown when using Json Kafka Source with transformer and decimal is in the schema: We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp logical type. * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and {{Z}} is the zone offset equivalent to {{+00:00}} or UTC ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the separation character * There are systems that use \{{ }} (space) instead of {{T}} as the separation (other parts are the same). References indicate that ISO-8601 used to allow this by _mutual agreement_ ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} with zone offset (which is not supported in {{MercifulJsonConverter}} yet) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple twist of the formatter, it can be easily supported. My take is we should change the formatter of the timestamp logical types to support zone offset and space character as the separator (which is backwards compatible), instead of introducing a new config of format (assuming that common use cases just have space character as the variant). was: We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp logical type. * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and {{Z}} is the zone offset equivalent to {{+00:00}} or UTC ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the separation character * There are systems that use \{{ }} (space) instead of {{T}} as the separation (other parts are the same). References indicate that ISO-8601 used to allow this by _mutual agreement_ ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} with zone offset (which is not supported in {{MercifulJsonConverter}} yet) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple twist of the formatter, it can be easily supported. My take is we should change the formatter of the timestamp logical types to support zone offset and space character as the separator (which is backwards compatible), instead of introducing a new config of format (assuming that common use cases just have space character as the variant). > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Following error is thrown when using Json Kafka Source with transformer and > decimal is in the schema: > > > We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in > timestamp logical type. > * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and > {{Z}} is the zone offset equivalent to {{+00:00}} or UTC > ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) > * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the > separation character > * There are systems that use \{{ }} (space) instead of {{T}} as the > separation (other parts are the same). References indicate that ISO-8601 > used to allow this by _mutual agreement_ > ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], > > [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) > * {{DateT
[jira] [Updated] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7985: Description: Following error is thrown when using Json Kafka Source with transformer and decimal is in the schema: {code:java} Caused by: Json to Avro Type conversion error for field loaded_at, 2024-06-03 13:42:34.951+00:00 for {"type":"long","logicalType":"timestamp-millis"} at org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil$JsonToAvroFieldProcessor.convertToAvro(MercifulJsonConverter.java:194) at org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil.convertToAvro(MercifulJsonConverter.java:204) at org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvroField(MercifulJsonConverter.java:182) at org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvro(MercifulJsonConverter.java:126) at org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:107) at org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:118) ... 43 more {code} We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp logical type. * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and {{Z}} is the zone offset equivalent to {{+00:00}} or UTC ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the separation character * There are systems that use \{{ }} (space) instead of {{T}} as the separation (other parts are the same). References indicate that ISO-8601 used to allow this by _mutual agreement_ ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} with zone offset (which is not supported in {{MercifulJsonConverter}} yet) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple twist of the formatter, it can be easily supported. My take is we should change the formatter of the timestamp logical types to support zone offset and space character as the separator (which is backwards compatible), instead of introducing a new config of format (assuming that common use cases just have space character as the variant). was: Following error is thrown when using Json Kafka Source with transformer and decimal is in the schema: We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp logical type. * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and {{Z}} is the zone offset equivalent to {{+00:00}} or UTC ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the separation character * There are systems that use \{{ }} (space) instead of {{T}} as the separation (other parts are the same). References indicate that ISO-8601 used to allow this by _mutual agreement_ ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} with zone offset (which is not supported in {{MercifulJsonConverter}} yet) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple twist of the formatter, it can be easily supported. My take is we should change the formatter of the timestamp logical types to support zone offset and space character as the separator (which is backwards compatible), instead of introducing a new config of format (assuming that common use cases just have space character as the variant). > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available >
[jira] [Updated] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7985: - Labels: pull-request-available (was: ) > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in > timestamp logical type. > * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and > {{Z}} is the zone offset equivalent to {{+00:00}} or UTC > ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) > * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the > separation character > * There are systems that use \{{ }} (space) instead of {{T}} as the > separation (other parts are the same). References indicate that ISO-8601 > used to allow this by _mutual agreement_ > ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], > > [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) > * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse > timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in > {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} > with zone offset (which is not supported in {{MercifulJsonConverter}} yet) > * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with > space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a > simple twist of the formatter, it can be easily supported. > My take is we should change the formatter of the timestamp logical types to > support zone offset and space character as the separator (which is backwards > compatible), instead of introducing a new config of format (assuming that > common use cases just have space character as the variant). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7985] Support more formats in timestamp logical types in Json Avro converter [hudi]
yihua opened a new pull request, #11629: URL: https://github.com/apache/hudi/pull/11629 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] fix the target location for auxlib download in hudi CLI [hudi]
prabodh1194 opened a new pull request, #11628: URL: https://github.com/apache/hudi/pull/11628 ### Change Logs Using Hudi CLI to access tables on S3 has some limitations as the relevant `hadoop` jars are not defined in the path by default. I have updated the CLI utility to facilitate adding the hadoop s3 jars as well. For compatibility purpose, I have made this facility behind a flag called `IS_S3_ENABLED` which can be set to `true`. Enabling this flag, will add the hadoop jars to the `auxlib` as well. ### Impact NA ### Risk level (write none, low medium or high below) none ### Documentation Update This page can be updated to highlight that the flag can be set to access `s3a` buckets path: https://hudi.apache.org/docs/next/cli/#using-hudi-cli ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Release notes 1.0.0-beta2 [hudi]
nsivabalan commented on code in PR #11618: URL: https://github.com/apache/hudi/pull/11618#discussion_r1676856952 ## website/releases/release-1.0.0-beta2.md: ## @@ -0,0 +1,80 @@ +--- +title: "Release 1.0.0-beta2" +sidebar_position: 1 +layout: releases +toc: true +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## [Release 1.0.0-beta2](https://github.com/apache/hudi/releases/tag/release-1.0.0-beta2) ([docs](/docs/next/quick-start-guide)) + +Apache Hudi 1.0.0-beta2 is the second beta release of Apache Hudi. This release is meant for early adopters to try +out the new features and provide feedback. The release is not meant for production use. + +## Migration Guide + +This release contains major format changes as we will see in highlights below. We encourage users to try out the +**1.0.0-beta2** features on new tables. The 1.0 general availability (GA) release will support automatic table upgrades +from 0.x versions, while also ensuring full backward compatibility when reading 0.x Hudi tables using 1.0, ensuring a +seamless migration experience. + +:::caution +Given that timeline format and log file format has changed in this **beta release**, it is recommended not to attempt to do +rolling upgrades from older versions to this release. +::: + +## Highlights + +### Format changes + +[HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242) is the main epic covering all the format changes proposals, +which are also partly covered in the [Hudi 1.0 tech specification](/tech-specs-1point0). The following are the main +changes in this release: + + Timeline + +No major changes in this release. Refer to [1.0.0-beta1#timeline](release-1.0.0-beta1.md#timeline) for more details. + + Log File Format + +In addition to the fields in the log file header added in [1.0.0-beta1](release-1.0.0-beta1.md#log-file-format), we also +store a flag, `IS_PARTIAL` to indicate whether the log block contains partial updates or not. + +### Metadata indexes + +In 1.0.0-beta1, we added support for functional index. In 1.0.0-beta2, we have added support for secondary indexes and +partition stats index to the [multi-modal indexing](/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi) subsystem. + + Secondary Indexes + +Secondary indexes allow users to create indexes on columns that are not part of record key columns in Hudi tables (for +record key fields, Hudi supports [Record-level Index](/blog/2023/11/01/record-level-index). Secondary indexes can be used to speed up +queries with predicate on columns other than record key columns. + + Partition Stats Index + +Partition stats index aggregates statistics at the partition level for the columns for which it is enabled. This helps +in efficient partition pruning even for non-partition fields. + +To try out these features, refer to the [SQL guide](/docs/next/sql_ddl#create-partition-stats-index). + +### API Changes + + Positional Merging + +In 1.0.0-beta1, we added a new [filegroup reader](/releases/release-1.0.0-beta1#new-filegroup-reader). The reader now +provides position-based merging, as an alternative to existing key-based merging, and skipping pages based on record +positions. The new filegroup reader is integrated with Spark and Hive, and enabled by default. To enable positional +merging set below configs: + +```properties Review Comment: not related to this doc PR. curious in general. if we have fallback mechanism to do key based merges if positional based merges are not possible, why not we enable this by default? ## website/docs/metadata.md: ## @@ -90,6 +90,32 @@ Following are the different indices currently available under the metadata table Hudi release, this index aids in locating records faster than other existing indices and can provide a speedup orders of magnitude faster in large deployments where index lookup dominates write latencies. + New Indexes in 1.0.0 + +- ***Functional Index***: + A [functional index](https://github.com/apache/hudi/blob/3789840be3d041cbcfc6b24786740210e4e6d6ac/rfc/rfc-63/rfc-63.md) + is an index on a function of a column. If a query has a predicate on a function of a column, the functional index can + be used to speed up the query. Functional index is stored in *func_index_* prefixed partitions (one for each + function) under metadata table. Functional index can be created using SQL syntax. Please checkout SQL DDL + docs [here](/docs/next/sql_ddl#create-functional-index) for more details. + +- ***Partition Stats Index*** + Partition stats index aggregates statistics at the partition level for the columns for which it is enabled. This helps + in efficient partition pruning even for non-partition fields. The partition stats index is stored in *partition_stats* + partition under metadata table. Partition stats index can be enabled using the following configs (note it is require
(hudi) branch master updated: [MINOR] Update DOAP with 1.0.0-beta2 Release (#11627)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new afc91515983 [MINOR] Update DOAP with 1.0.0-beta2 Release (#11627) afc91515983 is described below commit afc91515983badd91f671c14f3737fe034d96b9c Author: Sagar Sumit AuthorDate: Sat Jul 13 22:25:34 2024 +0530 [MINOR] Update DOAP with 1.0.0-beta2 Release (#11627) --- doap_HUDI.rdf | 7 +++ 1 file changed, 7 insertions(+) diff --git a/doap_HUDI.rdf b/doap_HUDI.rdf index 981b2619fb1..1f2b45a4899 100644 --- a/doap_HUDI.rdf +++ b/doap_HUDI.rdf @@ -189,6 +189,13 @@ 0.15.0 + + +Apache Hudi 1.0.0-beta2 +2024-07-14 +1.0.0-beta2 + + https://github.com/apache/hudi.git"/>
Re: [PR] [MINOR] Update DOAP with 1.0.0-beta2 Release [hudi]
yihua merged PR #11627: URL: https://github.com/apache/hudi/pull/11627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Update DOAP with 1.0.0-beta2 Release [hudi]
hudi-bot commented on PR #11627: URL: https://github.com/apache/hudi/pull/11627#issuecomment-2226994880 ## CI report: * 2d98b0b15bc44671cb1087f955a301854308cd9e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7985: Description: We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp logical type. * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and {{Z}} is the zone offset equivalent to {{+00:00}} or UTC ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the separation character * There are systems that use \{{ }} (space) instead of {{T}} as the separation (other parts are the same). References indicate that ISO-8601 used to allow this by _mutual agreement_ ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} with zone offset (which is not supported in {{MercifulJsonConverter}} yet) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple twist of the formatter, it can be easily supported. My take is we should change the formatter of the timestamp logical types to support zone offset and space character as the separator (which is backwards compatible), instead of introducing a new config of format (assuming that common use cases just have space character as the variant). was: We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp logical type. * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and {{Z}} is the zone offset equivalent to {{+00:00}} or UTC ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the separation character * There are systems that use {{ }} (space) instead of {{T}} as the separation (other parts are the same). References indicate that ISO-8601 used to allow this by _mutual agreement_ ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} with zone offset (which is not supported in {{MercifulJsonConverter}} yet) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple twist of the formatter, it can be easily supported. My take is we should change the formatter of the timestamp logical types to support zone offset and space character as the separator (which is backwards compatible), instead of introducing a new config of format (for mitigating this incident this is no such need, and assuming that common use cases just have space character as the variant). > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > > We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in > timestamp logical type. > * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and > {{Z}} is the zone offset equivalent to {{+00:00}} or UTC > ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) > * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the > separation character > * There are systems that use \{{ }} (space) instead of {{T}} as the > separation (other parts are the same). References indicate that ISO-8601 > used to allow this by _mutual agreement_ > ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], > > [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) > * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse > timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in > {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} > with zone offset (which is not supp
[jira] [Updated] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7985: Description: We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp logical type. * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and {{Z}} is the zone offset equivalent to {{+00:00}} or UTC ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the separation character * There are systems that use {{ }} (space) instead of {{T}} as the separation (other parts are the same). References indicate that ISO-8601 used to allow this by _mutual agreement_ ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} with zone offset (which is not supported in {{MercifulJsonConverter}} yet) * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple twist of the formatter, it can be easily supported. My take is we should change the formatter of the timestamp logical types to support zone offset and space character as the separator (which is backwards compatible), instead of introducing a new config of format (for mitigating this incident this is no such need, and assuming that common use cases just have space character as the variant). > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > > We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in > timestamp logical type. > * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and > {{Z}} is the zone offset equivalent to {{+00:00}} or UTC > ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) > * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the > separation character > * There are systems that use {{ }} (space) instead of {{T}} as the > separation (other parts are the same). References indicate that ISO-8601 > used to allow this by _mutual agreement_ > ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], > > [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) > * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse > timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in > {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} > with zone offset (which is not supported in {{MercifulJsonConverter}} yet) > * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with > space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a > simple twist of the formatter, it can be easily supported. > My take is we should change the formatter of the timestamp logical types to > support zone offset and space character as the separator (which is backwards > compatible), instead of introducing a new config of format (for mitigating > this incident this is no such need, and assuming that common use cases just > have space character as the variant). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7985: Fix Version/s: 1.0.0 > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in > timestamp logical type. > * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and > {{Z}} is the zone offset equivalent to {{+00:00}} or UTC > ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators]) > * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the > separation character > * There are systems that use \{{ }} (space) instead of {{T}} as the > separation (other parts are the same). References indicate that ISO-8601 > used to allow this by _mutual agreement_ > ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet], > > [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/]) > * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse > timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in > {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} > with zone offset (which is not supported in {{MercifulJsonConverter}} yet) > * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with > space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a > simple twist of the formatter, it can be easily supported. > My take is we should change the formatter of the timestamp logical types to > support zone offset and space character as the separator (which is backwards > compatible), instead of introducing a new config of format (assuming that > common use cases just have space character as the variant). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7985: --- Assignee: Ethan Guo > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
[ https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7985: Status: In Progress (was: Open) > Support more formats in timestamp logical types in Json Avro converter > -- > > Key: HUDI-7985 > URL: https://issues.apache.org/jira/browse/HUDI-7985 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7985) Support more formats in timestamp logical types in Json Avro converter
Ethan Guo created HUDI-7985: --- Summary: Support more formats in timestamp logical types in Json Avro converter Key: HUDI-7985 URL: https://issues.apache.org/jira/browse/HUDI-7985 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [MINOR] Update DOAP with 1.0.0-beta2 Release [hudi]
codope opened a new pull request, #11627: URL: https://github.com/apache/hudi/pull/11627 ### Change Logs This PR updates DOAP with 1.0.0-beta2 Release for record keeping. ### Impact Publish new release version ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-1698) Multiwriting for Flink / Java
[ https://issues.apache.org/jira/browse/HUDI-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-1698. - Resolution: Fixed > Multiwriting for Flink / Java > - > > Key: HUDI-1698 > URL: https://issues.apache.org/jira/browse/HUDI-1698 > Project: Apache Hudi > Issue Type: New Feature > Components: flink, writer-core >Reporter: Nishith Agarwal >Assignee: Danny Chen >Priority: Major > Fix For: 1.0.0-beta2, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7950) Shade roaring bitmap dependency in root POM
[ https://issues.apache.org/jira/browse/HUDI-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-7950. - Resolution: Fixed > Shade roaring bitmap dependency in root POM > --- > > Key: HUDI-7950 > URL: https://issues.apache.org/jira/browse/HUDI-7950 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 0.15.1 > > > We should unify the shading rule of roaring bitmap dependency in the root POM > for consistency among bundles. -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated (e2860cddf54 -> 11304fd93ba)
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from e2860cddf54 [HUDI-7709] Pass partition paths as partition column values if `TimestampBasedKeyGenerator` is used (#11615) add 11304fd93ba [HUDI-7950] Shade roaring bitmap dependency in root POM (#11561) No new revisions were added by this update. Summary of changes: packaging/hudi-spark-bundle/pom.xml | 5 - packaging/hudi-utilities-bundle/pom.xml | 5 - pom.xml | 6 ++ 3 files changed, 6 insertions(+), 10 deletions(-)
Re: [PR] [HUDI-7950] Shade roaring bitmap dependency in root POM [hudi]
codope merged PR #11561: URL: https://github.com/apache/hudi/pull/11561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7928: -- Fix Version/s: (was: 1.0.0-beta2) > Fix shared HFile reader in HoodieNativeAvroHFileReader > -- > > Key: HUDI-7928 > URL: https://issues.apache.org/jira/browse/HUDI-7928 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > The shared HFile reader in HoodieNativeAvroHFileReader uses significant > memory for reading meta info from the HFile. We should avoid keeping the > reference to the shared HFile reader and cache the meta info only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7950) Shade roaring bitmap dependency in root POM
[ https://issues.apache.org/jira/browse/HUDI-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7950: -- Fix Version/s: (was: 1.0.0-beta2) > Shade roaring bitmap dependency in root POM > --- > > Key: HUDI-7950 > URL: https://issues.apache.org/jira/browse/HUDI-7950 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 0.15.1 > > > We should unify the shading rule of roaring bitmap dependency in the root POM > for consistency among bundles. -- This message was sent by Atlassian Jira (v8.20.10#820010)
svn commit: r70286 - in /release/hudi/1.0.0-beta2: ./ hudi-1.0.0-beta2.src.tgz hudi-1.0.0-beta2.src.tgz.asc hudi-1.0.0-beta2.src.tgz.sha512
Author: codope Date: Sat Jul 13 15:46:12 2024 New Revision: 70286 Log: Adding source release for version 1.0.0-beta2 Added: release/hudi/1.0.0-beta2/ release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz (with props) release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.asc release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.sha512 Added: release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz == Binary file - no diff available. Propchange: release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz -- svn:mime-type = application/octet-stream Added: release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.asc == --- release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.asc (added) +++ release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.asc Sat Jul 13 15:46:12 2024 @@ -0,0 +1,14 @@ +-BEGIN PGP SIGNATURE- + +iQGzBAABCAAdFiEE/SFTQuMZlBmt+/Qd1GI+OqFtdbAFAmaR+iwACgkQ1GI+OqFt +dbAkywv8Ct3Nms+FpvKuXIv+0pFx+sw9264H4JJKK5ONtb25Rg4Dg7+OcZDVc8Q6 +KTdS68Ulf95fUQSuJmH+F9Lr6kRBjy478KPe9w4WswDd3b17gQeT9RHmUsRxYfY1 +w9CAS1bEkPeZyases+d4AebYpaoEEB3PZJ+9zXFbBts2GxwtGx4/m32qdJVdkqX6 +mUAWXUKg9eo8skOK78QFkopLqm1/yP/JOnLNG7uJ4X8j1pfXzr0e0ACvpbkb+UJu +nTTA4TJ9iBuPnV1GeF4kZlsKjRpep+qPrOKlGXNPl7nJOVZ0Ca5OeNG+zJEVD2Ql +Rsg0fUtLrlNh4UR+gzvDb2sQ+bTLWLJ+xsFJj+XP7FjkiJiG1JpH4lwZPXiKgxPz +qM/Xto0ufmpgjYlK+C8bGFjJ491/nBhGxqsT8IY6V0A0hssd1LXZUo9s95LA6Op5 +FQdaRkjNvOrhm6VnnP2aa1G/4fQ7Uxtu9da6rcVwYJxB7QotgIqI3K9GVy6A16UH +vtK1kqis +=4w25 +-END PGP SIGNATURE- Added: release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.sha512 == --- release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.sha512 (added) +++ release/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.sha512 Sat Jul 13 15:46:12 2024 @@ -0,0 +1 @@ +69e382e7415d2df60d66f9b2b9d30f310ae168d49c4e6f617188acd2e9246f66619b692d0d7b81c90407ac757b658719da303cf5d69c20289156ff64a9271271 hudi-1.0.0-beta2.src.tgz
Re: [PR] [HUDI-7938] Broadcast `SerializableConfiguration` to avoid NullPointerException in Kryo SerDe [hudi]
hudi-bot commented on PR #11626: URL: https://github.com/apache/hudi/pull/11626#issuecomment-2226825138 ## CI report: * 256044ead7c3ab3a1c69f3fa46e36417965bb837 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24840) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DNM] Temp diff testing 1.x reads with 0.x branch [hudi]
hudi-bot commented on PR #11562: URL: https://github.com/apache/hudi/pull/11562#issuecomment-2226813999 ## CI report: * 086b1d836fd0af91e3cc2a41913bf3e92653bf78 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24841) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DNM] Temp diff testing 1.x reads with 0.x branch [hudi]
hudi-bot commented on PR #11562: URL: https://github.com/apache/hudi/pull/11562#issuecomment-2226812409 ## CI report: * 013aef32a3ad3aa995beb626f5855d9a05234cbf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24689) * 086b1d836fd0af91e3cc2a41913bf3e92653bf78 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1676778486 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [DNM] Temp diff testing 1.x reads with 0.x branch [hudi]
nsivabalan commented on code in PR #11562: URL: https://github.com/apache/hudi/pull/11562#discussion_r1676778058 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java: ## @@ -103,12 +106,30 @@ public void addBaseFile(HoodieBaseFile dataFile) { * Add a new log file into the group. */ public void addLogFile(HoodieLogFile logFile) { -if (!fileSlices.containsKey(logFile.getBaseCommitTime())) { - fileSlices.put(logFile.getBaseCommitTime(), new FileSlice(fileGroupId, logFile.getBaseCommitTime())); +String baseInstantTime = getBaseInstantTime(logFile); Review Comment: changes to accomodate for file slice determination for both 0.x and 1x log files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DNM] Temp diff testing 1.x reads with 0.x branch [hudi]
nsivabalan commented on code in PR #11562: URL: https://github.com/apache/hudi/pull/11562#discussion_r1676775310 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java: ## @@ -38,18 +39,22 @@ */ public class HoodieInstant implements Serializable, Comparable { - // Instant like 20230104152218702.commit.request, 20230104152218702.inflight + // Instant like 20230104152218702.commit.request, 20230104152218702.inflight and 20230104152218702_20230104152630238.commit private static final Pattern NAME_FORMAT = - Pattern.compile("^(\\d+)(\\.\\w+)(\\.\\D+)?$"); + Pattern.compile("^(\\d+(_\\d+)?)(\\.\\w+)(\\.\\D+)?$"); private static final String DELIMITER = "."; + private static final String UNDERSCORE = "_"; + private static final String FILE_NAME_FORMAT_ERROR = "The provided file name %s does not conform to the required format"; + private boolean completionTimeMissing = false; Review Comment: NTR: this will help deduce the completed commit file name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DNM] Temp diff testing 1.x reads with 0.x branch [hudi]
nsivabalan commented on code in PR #11562: URL: https://github.com/apache/hudi/pull/11562#discussion_r1676775154 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java: ## @@ -554,21 +589,6 @@ public String toString() { return this.getClass().getName() + ": " + getInstantsAsStream().map(Object::toString).collect(Collectors.joining(",")); } - /** - * Merge this timeline with the given timeline. - */ - public HoodieDefaultTimeline mergeTimeline(HoodieDefaultTimeline timeline) { Review Comment: moved below. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org