[jira] [Comment Edited] (HUDI-7024) Null Pointer Exception for a flink streaming pipeline for Consistent Hashing
[ https://issues.apache.org/jira/browse/HUDI-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782432#comment-17782432 ] Jing Zhang edited comment on HUDI-7024 at 11/3/23 6:58 AM: --- [~adityagoenka] Could you please provide more information? for example, the exception stack, logs, flink version and hudi version, flink job scripts and spark clustering jobs? was (Author: qingru zhang): [~adityagoenka] Could you please provide more information? for example, the exception stack, logs, flink version and hudi version? > Null Pointer Exception for a flink streaming pipeline for Consistent Hashing > > > Key: HUDI-7024 > URL: https://issues.apache.org/jira/browse/HUDI-7024 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Aditya Goenka >Priority: Critical > Fix For: 0.14.1 > > > When we do a offline clustering job with HoodieClusteringJob for a table with > consistent Hashing enabled, the flink pipeline is failing with a Null > PointerException. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]
ad1happy2go commented on issue #9977: URL: https://github.com/apache/hudi/issues/9977#issuecomment-1791953907 @mzheng-plaid Thanks for raising this. Couple of things we can check to triage this - 1. Check Spark UI and stages, if there is any stage/task failure and retry happening for same. 2. Try to use SIMPLE index instead of BLOOM to check if you still see the data loss. This is to triage if this issue is BLOOM index related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]
hudi-bot commented on PR #9979: URL: https://github.com/apache/hudi/pull/9979#issuecomment-1791950262 ## CI report: * b9e26b3d425f88f0599283a0e834e4581a8b1b64 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]
hudi-bot commented on PR #9955: URL: https://github.com/apache/hudi/pull/9955#issuecomment-1791950173 ## CI report: * 8809ad5187203de0326cca32a3e59a4b1e1b9ca0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20589) * 7a24b91b83fef2b8b2bf278a1fafd9d1bb2a7d03 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20657) * 11a355c59b6c14ce8ba03cfbefcc5b6ab8ca422c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hudi-bot commented on PR #9936: URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791950055 ## CI report: * 2b2a290f4f9fe0693d331a331ba8e8fa882761dd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20551) * 92501c8473c95562c5158daebe08e3787282e6eb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-7024) Null Pointer Exception for a flink streaming pipeline for Consistent Hashing
[ https://issues.apache.org/jira/browse/HUDI-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782432#comment-17782432 ] Jing Zhang commented on HUDI-7024: -- [~adityagoenka] Could you please provide more information? for example, the exception stack, logs, flink version and hudi version? > Null Pointer Exception for a flink streaming pipeline for Consistent Hashing > > > Key: HUDI-7024 > URL: https://issues.apache.org/jira/browse/HUDI-7024 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Aditya Goenka >Priority: Critical > Fix For: 0.14.1 > > > When we do a offline clustering job with HoodieClusteringJob for a table with > consistent Hashing enabled, the flink pipeline is failing with a Null > PointerException. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7002] Fixing initializing RLI MDT partition for non-partitioned dataset [hudi]
hudi-bot commented on PR #9938: URL: https://github.com/apache/hudi/pull/9938#issuecomment-1791950105 ## CI report: * b534ff0015140dc9d338da2da4a1dfb1f6ebac66 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20545) * 0987e3c8d3a299311d32a9bd1243ce8e8b204419 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]
hudi-bot commented on PR #9913: URL: https://github.com/apache/hudi/pull/9913#issuecomment-1791949944 ## CI report: * 5eb4bf14d826e60c412078762aa061f415bac51d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20630) * caefe9891b1eda36c04dfe6003b071bb813db7d7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]
hudi-bot commented on PR #9946: URL: https://github.com/apache/hudi/pull/9946#issuecomment-1791945036 ## CI report: * 6ffc26d3efacd14c5cab8574584e276149d29c6b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20639) * 5daa002dfd75ec233a9ad045ad0c32cfa673a933 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20658) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]
hudi-bot commented on PR #9955: URL: https://github.com/apache/hudi/pull/9955#issuecomment-1791945067 ## CI report: * 8809ad5187203de0326cca32a3e59a4b1e1b9ca0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20589) * 7a24b91b83fef2b8b2bf278a1fafd9d1bb2a7d03 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20657) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]flink-sql write hudi use TIMESTAMP, when hive query, it get time+8h question, use TIMESTAMP_LTZ, the hive schema is bigint but timestamp [hudi]
GaoYaokun commented on issue #9864: URL: https://github.com/apache/hudi/issues/9864#issuecomment-1791940320 I also encountered this issue when I used Flink CDC to write data from MySQL to Hudi and synchronize Hive. The Timestamp(6) field in Hive correctly displayed as Timestamp. But when I use Hive to query it, an error will be reported like this: SQL ERROR: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritableV2 And I don't change any schema of Hive. This Synchronized Hive table is a new table. How to solve this problem? hudi version: 0.13.1 flink version 1.16.1 hive version 3.1.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hehuiyuan commented on code in PR #9936: URL: https://github.com/apache/hudi/pull/9936#discussion_r1381184378 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: @danny0405 , done ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: @danny0405 , done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-3304] Allow selective partial update [hudi]
CTTY commented on PR #7359: URL: https://github.com/apache/hudi/pull/7359#issuecomment-1791922139 Hi, I've cherry-picked this commit and created a new PR to continue the work: #9979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-3304] Add support for selective partial update [hudi]
CTTY opened a new pull request, #9979: URL: https://github.com/apache/hudi/pull/9979 ### Change Logs Allow selective partial update in Hudi Original PR: #7359 ### Impact None ### Risk level (write none, low medium or high below) Medium ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]
hudi-bot commented on PR #9955: URL: https://github.com/apache/hudi/pull/9955#issuecomment-1791919103 ## CI report: * 8809ad5187203de0326cca32a3e59a4b1e1b9ca0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20589) * 7a24b91b83fef2b8b2bf278a1fafd9d1bb2a7d03 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]
hudi-bot commented on PR #9946: URL: https://github.com/apache/hudi/pull/9946#issuecomment-1791919069 ## CI report: * 6ffc26d3efacd14c5cab8574584e276149d29c6b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20639) * 5daa002dfd75ec233a9ad045ad0c32cfa673a933 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7029) Enhance CREATE INDEX syntax for functional index
[ https://issues.apache.org/jira/browse/HUDI-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7029: -- Fix Version/s: 1.0.0 > Enhance CREATE INDEX syntax for functional index > > > Key: HUDI-7029 > URL: https://issues.apache.org/jira/browse/HUDI-7029 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Priority: Major > Fix For: 1.0.0 > > > Currently, user can create index using sql as follows: > `create index idx_datestr on $tableName using column_stats(ts) > options(func='from_unixtime', format='-MM-dd')` > Ideally, we would to simplify this further as follows: > `create index idx_datestr on $tableName using column_stats(from_unixtime(ts, > format='-MM-dd'))` -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]
PrabhuJoseph commented on code in PR #9946: URL: https://github.com/apache/hudi/pull/9946#discussion_r1381174885 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java: ## @@ -75,7 +78,12 @@ private HiveSyncContext(Properties props, HiveConf hiveConf) { public HiveSyncTool hiveSyncTool() { HiveSyncMode syncMode = HiveSyncMode.of(props.getProperty(HIVE_SYNC_MODE.key())); if (syncMode == HiveSyncMode.GLUE) { - return new AwsGlueCatalogSyncTool(props, hiveConf); + if (ReflectionUtils.hasConstructor(AWS_GLUE_CATALOG_SYNC_TOOL_CLASS, + new Class[] {Properties.class, org.apache.hadoop.conf.Configuration.class})) { Review Comment: Thanks for pointing out the unnecessary if condition. I have fixed it in the latest commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7029) Enhance CREATE INDEX syntax for functional index
Sagar Sumit created HUDI-7029: - Summary: Enhance CREATE INDEX syntax for functional index Key: HUDI-7029 URL: https://issues.apache.org/jira/browse/HUDI-7029 Project: Apache Hudi Issue Type: Task Reporter: Sagar Sumit Currently, user can create index using sql as follows: `create index idx_datestr on $tableName using column_stats(ts) options(func='from_unixtime', format='-MM-dd')` Ideally, we would to simplify this further as follows: `create index idx_datestr on $tableName using column_stats(from_unixtime(ts, format='-MM-dd'))` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5219) Support "CREATE INDEX" for index function through Spark SQL
[ https://issues.apache.org/jira/browse/HUDI-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5219. - Fix Version/s: 1.0.0 Resolution: Done > Support "CREATE INDEX" for index function through Spark SQL > --- > > Key: HUDI-5219 > URL: https://issues.apache.org/jira/browse/HUDI-5219 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5219) Support "CREATE INDEX" for index function through Spark SQL
[ https://issues.apache.org/jira/browse/HUDI-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782418#comment-17782418 ] Sagar Sumit commented on HUDI-5219: --- Landed via [https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292] Users can create index using sql: SQL: `create index idx_datestr on $tableName using column_stats(ts) options(func='from_unixtime', format='-MM-dd')` > Support "CREATE INDEX" for index function through Spark SQL > --- > > Key: HUDI-5219 > URL: https://issues.apache.org/jira/browse/HUDI-5219 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5215) Support file pruning based on new index function in Spark
[ https://issues.apache.org/jira/browse/HUDI-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5215. - Fix Version/s: 1.0.0 Resolution: Fixed > Support file pruning based on new index function in Spark > - > > Key: HUDI-5215 > URL: https://issues.apache.org/jira/browse/HUDI-5215 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5215) Support file pruning based on new index function in Spark
[ https://issues.apache.org/jira/browse/HUDI-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782417#comment-17782417 ] Sagar Sumit commented on HUDI-5215: --- `HoodieFileIndex` can now skip files based on functional index. Landed via https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292 > Support file pruning based on new index function in Spark > - > > Key: HUDI-5215 > URL: https://issues.apache.org/jira/browse/HUDI-5215 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5214) Add functionality to create new MT partition for index function
[ https://issues.apache.org/jira/browse/HUDI-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5214. - Fix Version/s: 1.0.0 Resolution: Fixed > Add functionality to create new MT partition for index function > --- > > Key: HUDI-5214 > URL: https://issues.apache.org/jira/browse/HUDI-5214 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5214) Add functionality to create new MT partition for index function
[ https://issues.apache.org/jira/browse/HUDI-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782416#comment-17782416 ] Sagar Sumit commented on HUDI-5214: --- Landed as part of [https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292] Functional index can be created and updated via metadata writer (as of 2023-11-03 only supported for Spark). > Add functionality to create new MT partition for index function > --- > > Key: HUDI-5214 > URL: https://issues.apache.org/jira/browse/HUDI-5214 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5213) Support index function for Spark SQL built-in functions
[ https://issues.apache.org/jira/browse/HUDI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5213. - Fix Version/s: 1.0.0 Resolution: Fixed > Support index function for Spark SQL built-in functions > > > Key: HUDI-5213 > URL: https://issues.apache.org/jira/browse/HUDI-5213 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5213) Support index function for Spark SQL built-in functions
[ https://issues.apache.org/jira/browse/HUDI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782414#comment-17782414 ] Sagar Sumit commented on HUDI-5213: --- Landed as part of [https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292] Some common date/timestamp, string and identity functions are supported. > Support index function for Spark SQL built-in functions > > > Key: HUDI-5213 > URL: https://issues.apache.org/jira/browse/HUDI-5213 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5212) Store index function in table properties
[ https://issues.apache.org/jira/browse/HUDI-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5212. - Fix Version/s: 1.0.0 Resolution: Done > Store index function in table properties > > > Key: HUDI-5212 > URL: https://issues.apache.org/jira/browse/HUDI-5212 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5212) Store index function in table properties
[ https://issues.apache.org/jira/browse/HUDI-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782413#comment-17782413 ] Sagar Sumit commented on HUDI-5212: --- Landed as part of https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292 > Store index function in table properties > > > Key: HUDI-5212 > URL: https://issues.apache.org/jira/browse/HUDI-5212 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-5212) Store index function in table properties
[ https://issues.apache.org/jira/browse/HUDI-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782413#comment-17782413 ] Sagar Sumit edited comment on HUDI-5212 at 11/3/23 5:24 AM: Landed as part of [https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292] Currently all index definitions are stored in a separate json file, path to which can be specified by the user, and that path will be stored in hoodie.properties. was (Author: codope): Landed as part of https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292 > Store index function in table properties > > > Key: HUDI-5212 > URL: https://issues.apache.org/jira/browse/HUDI-5212 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5211) Add abstraction to track a function defined on a column
[ https://issues.apache.org/jira/browse/HUDI-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5211. - Fix Version/s: 1.0.0 Resolution: Done > Add abstraction to track a function defined on a column > --- > > Key: HUDI-5211 > URL: https://issues.apache.org/jira/browse/HUDI-5211 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hehuiyuan commented on code in PR #9936: URL: https://github.com/apache/hudi/pull/9936#discussion_r1381170499 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: Hi @danny0405 , the value of primary key field name has been remove, there are some other issues for UT. ``` Error: Failures: Error:TestWaitBasedTimeGenerator.testSlowerThreadLaterAcquiredLock:143 expected: but was: [INFO] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hehuiyuan commented on code in PR #9936: URL: https://github.com/apache/hudi/pull/9936#discussion_r1381144499 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: @danny0405 hi, What is this problem, the primary key field name has been removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hehuiyuan commented on PR #9936: URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791899414 [INFO] Error: Failures: Error:TestWaitBasedTimeGenerator.testSlowerThreadLaterAcquiredLock:143 expected: but was: [INFO] Error: Tests run: 1026, Failures: 1, Errors: 0, Skipped: 2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hehuiyuan commented on code in PR #9936: URL: https://github.com/apache/hudi/pull/9936#discussion_r1381144499 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: @danny0405 hi, What is this problem, the primary key field name has been removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]
hudi-bot commented on PR #9937: URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791886212 ## CI report: * 392c1a3007e5d562be86a9c0096bbfd53988f5ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20641) * 1d5de86d295233edff138e9bfb8e9151a5b7ecae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20655) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]
hudi-bot commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791886165 ## CI report: * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610) * abd9807817eb49458b1f8dd9f9d31157ba2b5a81 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20654) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]
hudi-bot commented on PR #9937: URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791882241 ## CI report: * 392c1a3007e5d562be86a9c0096bbfd53988f5ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20641) * 1d5de86d295233edff138e9bfb8e9151a5b7ecae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hudi-bot commented on PR #9936: URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791882215 ## CI report: * 2b2a290f4f9fe0693d331a331ba8e8fa882761dd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20551) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]
hudi-bot commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791882158 ## CI report: * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610) * abd9807817eb49458b1f8dd9f9d31157ba2b5a81 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Re-enable a test that got fixed [hudi]
hudi-bot commented on PR #9978: URL: https://github.com/apache/hudi/pull/9978#issuecomment-1791878221 ## CI report: * 3b3a9f61789da9d0f6ac569e5c2a9b7c7be8961c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20653) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hudi-bot commented on PR #9936: URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791878123 ## CI report: * 2b2a290f4f9fe0693d331a331ba8e8fa882761dd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20551) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]
hudi-bot commented on PR #9923: URL: https://github.com/apache/hudi/pull/9923#issuecomment-1791878072 ## CI report: * 2f1b6536c1456fd0211740c90542bf25f53d1010 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20599) * ff11f10133f07427df3d13df8393362a75004807 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20652) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hehuiyuan commented on code in PR #9936: URL: https://github.com/apache/hudi/pull/9936#discussion_r1381144499 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: @danny0405 hi, What is this problem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
hehuiyuan commented on PR #9936: URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791871892 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]
joeytman commented on issue #9971: URL: https://github.com/apache/hudi/issues/9971#issuecomment-1791871773 > Try to set up index.type as BUCKET instead. Thanks for the tip! I'm confused by the results. On first glance, using `index.type` seems to work correctly, files are written by the same naming convention now. But, this log no longer appears: ``` 2023-11-01 22:16:11,025 INFO org.apache.hudi.index.bucket.HoodieBucketIndex [] - Use bucket index, numBuckets = 113, indexFields: [redacted1, redacted2] ``` So, to be clear: * `index.type=BUCKET` actually enables bucket index, but without any logs indicating it's working * `hoodie.index.type=BUCKET` produces logs that indicate it's working, but it doesn't actually do anything -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]
waitingF commented on PR #9937: URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791862212 > @waitingF Can you rebase with the latest master to resolve the test failures, can you try in your local env that the compaction really works? Sure. I tested in my local env, all good. Attach is my test log [local-hudi-cli-table-change-command-verify.txt](https://github.com/apache/hudi/files/13246562/local-hudi-cli-table-change-command-verify.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Re-enable a test that got fixed [hudi]
hudi-bot commented on PR #9978: URL: https://github.com/apache/hudi/pull/9978#issuecomment-1791855627 ## CI report: * 3b3a9f61789da9d0f6ac569e5c2a9b7c7be8961c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]
hudi-bot commented on PR #9975: URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791855608 ## CI report: * 30a00f1575934104714817b6b9243f3866f277d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637) * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20649) * 736f0a04fe805294a0d1722a62ad327636b86a5b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20651) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]
hudi-bot commented on PR #9923: URL: https://github.com/apache/hudi/pull/9923#issuecomment-1791855498 ## CI report: * 2f1b6536c1456fd0211740c90542bf25f53d1010 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20599) * ff11f10133f07427df3d13df8393362a75004807 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]
hudi-bot commented on PR #9975: URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791851892 ## CI report: * 30a00f1575934104714817b6b9243f3866f277d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637) * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20649) * 736f0a04fe805294a0d1722a62ad327636b86a5b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]
hudi-bot commented on PR #9959: URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791851848 ## CI report: * fd974dfa66aa2873ec0491212070db6845dd7877 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20603) * 608a35a71faf69830fde7796babb12c0c327cfe0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20650) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Re-enable a test that got fixed [hudi]
codope opened a new pull request, #9978: URL: https://github.com/apache/hudi/pull/9978 ### Change Logs `testSlowerThreadLaterAcquiredLock` was disabled and got fixed by #9972. This PR simple enables it again. ### Impact none - test change ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]
zhuanshenbsj1 commented on PR #9923: URL: https://github.com/apache/hudi/pull/9923#issuecomment-1791846370 Resolve conflicts&Rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]
danny0405 commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791838790 You can rebase with the latest master to re-trigger it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]
hudi-bot commented on PR #9975: URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791830718 ## CI report: * 30a00f1575934104714817b6b9243f3866f277d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637) * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20649) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]
hudi-bot commented on PR #9959: URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791830679 ## CI report: * fd974dfa66aa2873ec0491212070db6845dd7877 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20603) * 608a35a71faf69830fde7796babb12c0c327cfe0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]
hudi-bot commented on PR #9975: URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791826407 ## CI report: * 30a00f1575934104714817b6b9243f3866f277d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637) * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-5210) End-to-end PoC of functional indexes
[ https://issues.apache.org/jira/browse/HUDI-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5210. - Resolution: Done > End-to-end PoC of functional indexes > > > Key: HUDI-5210 > URL: https://issues.apache.org/jira/browse/HUDI-5210 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5210) End-to-end PoC of functional indexes
[ https://issues.apache.org/jira/browse/HUDI-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5210: -- Status: Patch Available (was: In Progress) > End-to-end PoC of functional indexes > > > Key: HUDI-5210 > URL: https://issues.apache.org/jira/browse/HUDI-5210 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]
ksmou commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791784841 Azure looks some problems -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5210] Implement functional indexes [hudi]
yihua merged PR #9872: URL: https://github.com/apache/hudi/pull/9872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5210] Implement functional indexes [hudi]
yihua commented on code in PR #9872: URL: https://github.com/apache/hudi/pull/9872#discussion_r1380966712 ## hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java: ## @@ -40,7 +40,8 @@ public enum Names { RECORD_PAYLOAD("Record Payload Config"), KAFKA_CONNECT("Kafka Connect Configs"), AWS("Amazon Web Services Configs"), -HUDI_STREAMER("Hudi Streamer Configs"); +HUDI_STREAMER("Hudi Streamer Configs"), +INDEXING("Indexing Configs"); Review Comment: In that case, let's remove the subgroup of `INDEX` or rename it sth different in the follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
hudi-bot commented on PR #9717: URL: https://github.com/apache/hudi/pull/9717#issuecomment-1791778235 ## CI report: * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN * b9c76842e4cdc5a6db43109dafa115109d287584 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20646) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7028) Fix Spark Quick Start
[ https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu reassigned HUDI-7028: - Assignee: Lin Liu > Fix Spark Quick Start > - > > Key: HUDI-7028 > URL: https://issues.apache.org/jira/browse/HUDI-7028 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Fix the bugs for Spark quick start when turning on file group reader and > positional merging flag. > > List some issues found so far: > # [compatibility]When no positions are stored in the header, the read query > failed. Idea behavior: use key based merging instead of failing. > # [compatibility]When a parquet file contains Avro records, the file group > reader of spark job will check if the payload is the expected type; > otherwise, it will throw. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]
danny0405 commented on PR #9959: URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791774741 @cuibo01 you can rebase with the latest master to resolve the test failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
hudi-bot commented on PR #9717: URL: https://github.com/apache/hudi/pull/9717#issuecomment-1791773116 ## CI report: * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN * b544b18820ae3fe8fbf1c50a34e561ad36bfbaba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20624) * b9c76842e4cdc5a6db43109dafa115109d287584 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7028) Fix Spark Quick Start
[ https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7028: -- Description: Fix the bugs for Spark quick start when turning on file group reader and positional merging flag. List some issues found so far: # [compatibility]When no positions are stored in the header, the read query failed. Idea behavior: use key based merging instead of failing. # [compatibility]When a parquet file contains Avro records, the file group reader of spark job will check if the payload is the expected type; otherwise, it will throw. was: Fix the bugs for Spark quick start when turning on file group reader and positional merging flag. List some issues found so far: # [compatibility]When no positions are stored in the header, the read query failed. Idea behavior: use key based merging instead of failing. # [compatibility]When a parquet file contains Avro records, the file group reader of spark job will check if the payload is the expected type; otherwise, it will throw. # > Fix Spark Quick Start > - > > Key: HUDI-7028 > URL: https://issues.apache.org/jira/browse/HUDI-7028 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Fix the bugs for Spark quick start when turning on file group reader and > positional merging flag. > > List some issues found so far: > # [compatibility]When no positions are stored in the header, the read query > failed. Idea behavior: use key based merging instead of failing. > # [compatibility]When a parquet file contains Avro records, the file group > reader of spark job will check if the payload is the expected type; > otherwise, it will throw. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]
hudi-bot commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791773350 ## CI report: * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5210] Implement functional indexes [hudi]
yihua commented on code in PR #9872: URL: https://github.com/apache/hudi/pull/9872#discussion_r1380906029 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieFunctionalIndexConfig.java: ## @@ -0,0 +1,319 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.common.config; + +import org.apache.hudi.common.util.BinaryUtil; +import org.apache.hudi.common.util.ConfigUtils; +import org.apache.hudi.exception.HoodieIOException; +import org.apache.hudi.index.secondary.SecondaryIndexType; +import org.apache.hudi.metadata.MetadataPartitionType; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.annotation.concurrent.Immutable; + +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.time.Instant; +import java.util.Map; +import java.util.Properties; +import java.util.Set; +import java.util.function.BiConsumer; + +import static org.apache.hudi.common.util.ConfigUtils.fetchConfigs; +import static org.apache.hudi.common.util.ConfigUtils.recoverIfNeeded; +import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes; + +@Immutable +@ConfigClassProperty(name = "Common Index Configs", +groupName = ConfigGroups.Names.INDEXING, +subGroupName = ConfigGroups.SubGroupNames.FUNCTIONAL_INDEX, +areCommonConfigs = true, +description = "") +public class HoodieFunctionalIndexConfig extends HoodieConfig { + + private static final Logger LOG = LoggerFactory.getLogger(HoodieFunctionalIndexConfig.class); + + public static final String INDEX_DEFINITION_FILE = "index.properties"; + public static final String INDEX_DEFINITION_FILE_BACKUP = "index.properties.backup"; + public static final ConfigProperty INDEX_NAME = ConfigProperty Review Comment: Got it. ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieFunctionalIndexConfig.java: ## @@ -0,0 +1,319 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.common.config; + +import org.apache.hudi.common.util.BinaryUtil; +import org.apache.hudi.common.util.ConfigUtils; +import org.apache.hudi.exception.HoodieIOException; +import org.apache.hudi.index.secondary.SecondaryIndexType; +import org.apache.hudi.metadata.MetadataPartitionType; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.annotation.concurrent.Immutable; + +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.time.Instant; +import java.util.Map; +import java.util.Properties; +import java.util.Set; +import java.util.function.BiConsumer; + +import static org.apache.hudi.common.util.ConfigUtils.fetchConfigs; +import static org.apache.hudi.common.util.ConfigUtils.recoverIfNeeded; +import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes; + +@Immutable +@ConfigClassProperty(name = "Common Index Configs", +groupName = ConfigGroups.Names.INDEXING, +subGroupName = ConfigGroups.SubGroupNames.FUNCTIONAL_INDEX, +areCommonConfigs = true, +description = "") +public class HoodieFunctionalIn
[jira] [Commented] (HUDI-7028) Fix Spark Quick Start
[ https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782369#comment-17782369 ] Lin Liu commented on HUDI-7028: --- To reproduce the second error: {code:java} import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import org.apache.spark.sql.SaveMode._ import org.apache.hudi.DataSourceReadOptions._ import org.apache.hudi.DataSourceWriteOptions._ import org.apache.hudi.config.HoodieWriteConfig._ import org.apache.hudi.common.model.HoodieRecordval tableName = "hudi_trips_cow" val basePath = "file:///tmp/hudi_trips_cow" val dataGen = new DataGenerator val inserts = convertToStringList(dataGen.generateInserts(10)) val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2)) df.write.format("hudi"). options(getQuickstartWriteConfigs). option(PRECOMBINE_FIELD_OPT_KEY, "ts"). option(RECORDKEY_FIELD_OPT_KEY, "uuid"). option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). option(TABLE_NAME, tableName). option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). mode(Overwrite). save(basePath)val tripsSnapshotDF = spark. read. option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). format("hudi"). load(basePath) tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")spark.sql("select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where fare > 20.0").show() spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from hudi_trips_snapshot").show()val updates = convertToStringList(dataGen.generateUpdates(10)) val df = spark.read.json(spark.sparkContext.parallelize(updates, 2)) df.write.format("hudi"). options(getQuickstartWriteConfigs). option(PRECOMBINE_FIELD_OPT_KEY, "ts"). option(RECORDKEY_FIELD_OPT_KEY, "uuid"). option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). option(TABLE_NAME, tableName). option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). mode(Append). save(basePath) spark. read. option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). format("hudi"). load(basePath). createOrReplaceTempView("hudi_trips_snapshot")val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from hudi_trips_snapshot order by commitTime").map(k => k.getString(0)).take(50) {code} > Fix Spark Quick Start > - > > Key: HUDI-7028 > URL: https://issues.apache.org/jira/browse/HUDI-7028 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Fix the bugs for Spark quick start when turning on file group reader and > positional merging flag. > > List some issues found so far: > # [compatibility]When no positions are stored in the header, the read query > failed. Idea behavior: use key based merging instead of failing. > # [compatibility]When a parquet file contains Avro records, the file group > reader of spark job will check if the payload is the expected type; > otherwise, it will throw. > # -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-7028) Fix Spark Quick Start
[ https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782367#comment-17782367 ] Lin Liu commented on HUDI-7028: --- To reproduce the first error: {code:java} import scala.collection.JavaConversions._ import org.apache.spark.sql.SaveMode._ import org.apache.hudi.DataSourceReadOptions._ import org.apache.hudi.DataSourceWriteOptions._ import org.apache.hudi.common.table.HoodieTableConfig._ import org.apache.hudi.config.HoodieWriteConfig._ import org.apache.hudi.keygen.constant.KeyGeneratorOptions._ import org.apache.hudi.common.model.HoodieRecord import spark.implicits._val tableName = "trips_table" val basePath = "file:///tmp/trips_table_1"val columns = Seq("ts","uuid","rider","driver","fare","city") val data = Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"), (1695091554788L,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70 ,"san_francisco"), (1695046462179L,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90 ,"san_francisco"), (1695516137016L,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo" ), (169511511L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));var inserts = spark.createDataFrame(data).toDF(columns:_*) inserts.write.format("hudi"). option(PARTITIONPATH_FIELD_NAME.key(), "city"). option(TABLE_NAME, tableName). option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). mode(Overwrite). save(basePath) val tripsDF = spark.read. option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). format("hudi").load(basePath) tripsDF.createOrReplaceTempView("trips_table")spark.sql("SELECT uuid, fare, ts, rider, driver, city FROM trips_table WHERE fare > 20.0").show() spark.sql("SELECT _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare FROM trips_table").show(1000, false) val updatesDf = spark.read. option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). format("hudi").load(basePath).filter($"rider" === "rider-D").withColumn("fare", col("fare") * 10)updatesDf.write.format("hudi"). option(OPERATION_OPT_KEY, "upsert"). option(PARTITIONPATH_FIELD_NAME.key(), "city"). option(TABLE_NAME, tableName). option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). mode(Append). save(basePath)// spark-shell val adjustedFareDF = spark.read. option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). format("hudi"). load(basePath).limit(2). withColumn("fare", col("fare") * 10)adjustedFareDF.write.format("hudi"). option("hoodie.datasource.write.payload.class","com.payloads.CustomMergeIntoConnector"). option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.logfile.data.block.format", "parquet"). option("hoodie.datasource.write.record.merger.impls", "org.apache.hudi.HoodieSparkRecordMerger"). option("hoodie.datasource.read.use.new.parquet.file.format", "true"). option("hoodie.file.group.reader.enabled", "true"). option("hoodie.write.record.positions", "true"). mode(Append). save(basePath) {code} > Fix Spark Quick Start > - > > Key: HUDI-7028 > URL: https://issues.apache.org/jira/browse/HUDI-7028 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Fix the bugs for Spark quick start when turning on fil
[jira] [Updated] (HUDI-7028) Fix Spark Quick Start
[ https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7028: -- Description: Fix the bugs for Spark quick start when turning on file group reader and positional merging flag. List some issues found so far: # [compatibility]When no positions are stored in the header, the read query failed. Idea behavior: use key based merging instead of failing. # [compatibility]When a parquet file contains Avro records, the file group reader of spark job will check if the payload is the expected type; otherwise, it will throw. # was: Fix the bugs for Spark quick start when turning on file group reader and positional merging flag. List some issues found so far: # When no positions are stored in the header, the read query failed. Idea behavior: use key based merging instead of failing. # > Fix Spark Quick Start > - > > Key: HUDI-7028 > URL: https://issues.apache.org/jira/browse/HUDI-7028 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Fix the bugs for Spark quick start when turning on file group reader and > positional merging flag. > > List some issues found so far: > # [compatibility]When no positions are stored in the header, the read query > failed. Idea behavior: use key based merging instead of failing. > # [compatibility]When a parquet file contains Avro records, the file group > reader of spark job will check if the payload is the expected type; > otherwise, it will throw. > # -- This message was sent by Atlassian Jira (v8.20.10#820010)
[I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]
mzheng-plaid opened a new issue, #9977: URL: https://github.com/apache/hudi/issues/9977 **Describe the problem you faced** As background, due to https://github.com/apache/hudi/issues/9934 we're testing out clustering our table to have fewer base files in our MOR table. We set up a test by copying an existing table. This table only had base files (no log files) in its initial state. We wanted to verify the performance of clustering as well as data correctness. We clustered one partition and found that **261736 rows were missing after clustering**. We used the following clustering configuration (and the other configurations in "Additional Context"): ``` # Clustering configs "hoodie.clustering.inline": "true", "hoodie.clustering.inline.max.commits": 1, "hoodie.clustering.plan.strategy.small.file.limit": 256 * 1024 * 1024, "hoodie.clustering.plan.strategy.target.file.max.bytes": 512 * 1024 * 1024, "hoodie.clustering.plan.strategy.sort.columns": "itemId.value", "hoodie.clustering.plan.strategy.partition.selected": "dt=2022-08-29", "hoodie.clustering.plan.strategy.max.num.groups": 30, ``` Our clustering code ran as follows: 1. Read one row from partition `dt=2022-08-29` 2. Write out the row (this is just a dummy way of triggering clustering inline), this update will be a no-op. We set "hoodie.clustering.plan.strategy.partition.selected" to be `dt=2022-08-29` to only cluster the partition that was written to. After the write finished I compared the clustered/unclustered tables (we had another copy before running this). Before clustering we had 399896071 rows in that partition and after clustering 399634335 rows in that partition (261736 rows were lost). Joining the two tables, I saw that **all** the missing rows were from **one** base file that was clustered. **This interestingly was the base file that received the update of 1 row**: ``` # Spark code to find the hoodie file and record key for each of the missing rows meta_joined_df = unclustered_df.select( "_hoodie_file_name", "_hoodie_commit_time", "_hoodie_commit_seqno", "_hoodie_record_key", "_hoodie_partition_path", "_hoodie_is_deleted", ).alias("a").join( clustered_df.select( "_hoodie_file_name", "_hoodie_commit_time", "_hoodie_commit_seqno", "_hoodie_record_key", "_hoodie_partition_path", "_hoodie_is_deleted", ).alias("b"), on=F.col("a._hoodie_record_key") == F.col("b._hoodie_record_key"), how="full_outer", ).cache() meta_joined_df.filter(F.col("b._hoodie_record_key").isNull()).groupBy( F.col("a._hoodie_file_name"), ).count().alias("count").orderBy("count", ascending=False).show( n=10, truncate=False ) ``` Output: ``` +---+--+ |_hoodie_file_name |count | +---+--+ |f0b917f5-607e-47c4-96a4-092b4668c436-0_254-10835-21844023_20231016122622692.parquet|261736| +---+--+ ``` The `deltacommit` shows this file was the one that received the update: ``` { "partitionToWriteStats" : { "dt=2022-08-29" : [ { "fileId" : "f0b917f5-607e-47c4-96a4-092b4668c436-0", "path" : "dt=2022-08-29/.f0b917f5-607e-47c4-96a4-092b4668c436-0_20231016122622692.log.1_0-29-5280", "prevCommit" : "20231016122622692", "numWrites" : 1, "numDeletes" : 0, "numUpdateWrites" : 1, "numInserts" : 0, "totalWriteBytes" : 13402, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : "dt=2022-08-29", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 13402, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 2327, "totalCreateTime" : 0 }, "logVersion" : 1, "logOffset" : 0, "baseFile" : "f0b917f5-607e-47c4-96a4-092b4668c436-0_254-10835-21844023_20231016122622692.parquet", "logFiles" : [ ".f0b917f5-607e-47c4-96a4-092b4668c436-0_20231016122622692.log.1_0-29-5280" ], "recordsStats" : { "val" : null } } ] }, "compacted" : false, "extraMetadata" : { "schema" : … }, "operationType" : "UPSERT"
Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]
danny0405 commented on PR #9975: URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791762149 @ksmou Can you rebase with the latest master to fix the test falures? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]
danny0405 commented on PR #9959: URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791761007 Thanks for the contribution, I have reviewed and created a patch: [7012.patch.zip](https://github.com/apache/hudi/files/13245958/7012.patch.zip) You can rebase with the latest master then apply the patch, the patch does not include your changes so there might be conflict if you apply it on your branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] "OutOfMemoryError: Requested array size exceeds VM limit" on data ingestion to MOR table [hudi]
mzheng-plaid commented on issue #9934: URL: https://github.com/apache/hudi/issues/9934#issuecomment-1791751860 Sorry, we also have some other Hudi options set as well that I missed, the important points are the metadata table is disabled and Hive sync is enabled. ``` "hoodie.table.name": self.name, "hoodie.datasource.write.table.name": self.name, "hoodie.datasource.write.operation": "upsert", "hoodie.datasource.write.table.type": "MERGE_ON_READ", "hoodie.datasource.write.partitionpath.field": "dt:SIMPLE", "hoodie.datasource.write.recordkey.field": "id.value", "hoodie.datasource.write.precombine.field": "ts", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.CustomKeyGenerator", "hoodie.datasource.write.hive_style_partitioning": "true", # We disable the metadata table "hoodie.metadata.enable": "false", # We disable the bootstrap index because the table is not bootstrapped "hoodie.bootstrap.index.enable": "false", "hoodie.index.type": "BLOOM", # Hive sync is enabled "hoodie.datasource.hive_sync.enable": "true", ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]
danny0405 commented on PR #9937: URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791749503 @waitingF Can you rebase with the latest master to resolve the test failures, can you try in your local env that the compaction really works? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]
danny0405 commented on code in PR #9936: URL: https://github.com/apache/hudi/pull/9936#discussion_r1380930087 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: ping me again if the PR is ready for reviewing. ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java: ## @@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] rowsWithMeta, String[] row "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], {Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]", }, new String[] { - "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, null, null]", - "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], {Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]", - "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], {Stephen=.0}, [33.0], null, null, null]", - "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], {Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]", - "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], {Fabian=3131.0}, [31.0], null, null, null]", - "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], {Sophia=1818.0}, [18.0, 18.0], null, null, null]", - "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], {Emma=2020.0}, [20.0], null, null, null]", - "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], {Bob=.0}, [44.0, 44.0], null, null, null]", Review Comment: ping me again if the PR is ready for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Flink CDC to HUDI cannot handle rowKind correctly [hudi]
danny0405 commented on issue #9940: URL: https://github.com/apache/hudi/issues/9940#issuecomment-1791746776 Can you turn off the sink materializer ? See the doc here for how to operate: https://www.yuque.com/yuzhao-my9fz/kb/hzosbb? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]
hudi-bot commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791746415 ## CI report: * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7028) Fix Spark Quick Start
[ https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7028: -- Description: Fix the bugs for Spark quick start when turning on file group reader and positional merging flag. List some issues found so far: # When no positions are stored in the header, the read query failed. Idea behavior: use key based merging instead of failing. # was:Fix the bugs for Spark quick start when turning on file group reader and positional merging flag. > Fix Spark Quick Start > - > > Key: HUDI-7028 > URL: https://issues.apache.org/jira/browse/HUDI-7028 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Fix the bugs for Spark quick start when turning on file group reader and > positional merging flag. > > List some issues found so far: > # When no positions are stored in the header, the read query failed. Idea > behavior: use key based merging instead of failing. > # -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]
danny0405 commented on code in PR #9946: URL: https://github.com/apache/hudi/pull/9946#discussion_r1380926282 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java: ## @@ -75,7 +78,12 @@ private HiveSyncContext(Properties props, HiveConf hiveConf) { public HiveSyncTool hiveSyncTool() { HiveSyncMode syncMode = HiveSyncMode.of(props.getProperty(HIVE_SYNC_MODE.key())); if (syncMode == HiveSyncMode.GLUE) { - return new AwsGlueCatalogSyncTool(props, hiveConf); + if (ReflectionUtils.hasConstructor(AWS_GLUE_CATALOG_SYNC_TOOL_CLASS, + new Class[] {Properties.class, org.apache.hadoop.conf.Configuration.class})) { Review Comment: Do we need the if check? We can not fallback to hive sync tool if user expects GLUE. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7028) Fix Spark Quick Start
Lin Liu created HUDI-7028: - Summary: Fix Spark Quick Start Key: HUDI-7028 URL: https://issues.apache.org/jira/browse/HUDI-7028 Project: Apache Hudi Issue Type: Bug Reporter: Lin Liu Fix For: 1.0.0 Fix the bugs for Spark quick start when turning on file group reader and positional merging flag. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7011) a metric to indicate whether rollback has occurred in final compaction state
[ https://issues.apache.org/jira/browse/HUDI-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7011. Resolution: Fixed Fixed via master branch: 9599d0f6b3766261753865bb796d124b27479642 > a metric to indicate whether rollback has occurred in final compaction state > -- > > Key: HUDI-7011 > URL: https://issues.apache.org/jira/browse/HUDI-7011 > Project: Apache Hudi > Issue Type: Improvement >Reporter: jack Lei >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > > add a metric to indicate whether rollback has occurred in final compaction > state to warn people check flink job. > currently, when flink job start async compaction on a mor table, the metrics > in org.apache.hudi.metrics.FlinkCompactionMetrics > will update including pendingCompactionCount,compactionDelay,compactionCost. > However, when a compaction failed need a metric to > tell user a specific instant whether the final compaction has occured > rollback. > so attemp to add a metric named compactionFailedState in > org.apache.hudi.sink.compact.CompactionCommitSink to record the instance > happend rollback, which also means the current compaction failed in current > time -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7011) a metric to indicate whether rollback has occurred in final compaction state
[ https://issues.apache.org/jira/browse/HUDI-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7011: - Fix Version/s: 1.0.0 > a metric to indicate whether rollback has occurred in final compaction state > -- > > Key: HUDI-7011 > URL: https://issues.apache.org/jira/browse/HUDI-7011 > Project: Apache Hudi > Issue Type: Improvement >Reporter: jack Lei >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > > add a metric to indicate whether rollback has occurred in final compaction > state to warn people check flink job. > currently, when flink job start async compaction on a mor table, the metrics > in org.apache.hudi.metrics.FlinkCompactionMetrics > will update including pendingCompactionCount,compactionDelay,compactionCost. > However, when a compaction failed need a metric to > tell user a specific instant whether the final compaction has occured > rollback. > so attemp to add a metric named compactionFailedState in > org.apache.hudi.sink.compact.CompactionCommitSink to record the instance > happend rollback, which also means the current compaction failed in current > time -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]
danny0405 merged PR #9956: URL: https://github.com/apache/hudi/pull/9956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
CTTY commented on code in PR #9717: URL: https://github.com/apache/hudi/pull/9717#discussion_r1380918121 ## hudi-spark-datasource/hudi-spark/pom.xml: ## @@ -245,6 +245,12 @@ org.apache.parquet parquet-avro + + org.apache.parquet + parquet-hadoop-bundle + ${parquet.version} + provided + Review Comment: Added parquet-hadoop-bundle to fix classpath issues ``` java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetOptions$ at org.apache.spark.sql.execution.datasources.parquet.ParquetOptions.(ParquetOptions.scala:50) at org.apache.spark.sql.execution.datasources.parquet.ParquetOptions.(ParquetOptions.scala:40) at org.apache.spark.sql.execution.datasources.parquet.Spark34LegacyHoodieParquetFileFormat.buildReaderWithPartitionValues(Spark34LegacyHoodieParquetFileFormat.scala:150) ## hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/BaseValidateDatasetNode.java: ## @@ -244,10 +239,6 @@ private Dataset getInputDf(ExecutionContext context, SparkSession session, } private ExpressionEncoder getEncoder(StructType schema) { -List attributes = JavaConversions.asJavaCollection(schema.toAttributes()).stream() -.map(Attribute::toAttribute).collect(Collectors.toList()); -return RowEncoder.apply(schema) - .resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(), -SimpleAnalyzer$.MODULE$); +return SparkAdapterSupport$.MODULE$.sparkAdapter().getEncoder(schema); Review Comment: [SPARK-44531](https://github.com/apache/spark/pull/42134) Encoder inference moved elsewhere in Spark 3.5.0 ## hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_5ExtendedSqlAstBuilder.scala: ## @@ -0,0 +1,3496 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.parser + +import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode} +import org.antlr.v4.runtime.{ParserRuleContext, Token} +import org.apache.hudi.spark.sql.parser.HoodieSqlBaseParser._ +import org.apache.hudi.spark.sql.parser.{HoodieSqlBaseBaseVisitor, HoodieSqlBaseParser} +import org.apache.spark.internal.Logging +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogStorageFormat} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate.{First, Last} +import org.apache.spark.sql.catalyst.parser.ParserUtils.{checkDuplicateClauses, checkDuplicateKeys, entry, escapedIdentifier, operationNotAllowed, source, string, stringWithoutUnescape, validate, withOrigin} +import org.apache.spark.sql.catalyst.parser.{EnhancedLogicalPlan, ParseException, ParserInterface} Review Comment: [SPARK-44333](https://github.com/apache/spark/pull/41890), EnhancedLogicalPlan moved to a different package ## hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/HoodieSpark35CatalystExpressionUtils.scala: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.HoodieSparkTypeUtils.isCastPreservingOrdering +import org.apache.spark.sql.catalyst.expressi
[jira] [Closed] (HUDI-6969) Add speed limit for stream read
[ https://issues.apache.org/jira/browse/HUDI-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6969. Resolution: Fixed Fixed via master branch: 1bb1fd1dd60c0635df0827c986b958955c2de682 > Add speed limit for stream read > --- > > Key: HUDI-6969 > URL: https://issues.apache.org/jira/browse/HUDI-6969 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: zhuanshenbsj1 >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]
danny0405 commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791740493 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6969) Add speed limit for stream read
[ https://issues.apache.org/jira/browse/HUDI-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6969: - Fix Version/s: 1.0.0 > Add speed limit for stream read > --- > > Key: HUDI-6969 > URL: https://issues.apache.org/jira/browse/HUDI-6969 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: zhuanshenbsj1 >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 merged PR #9904: URL: https://github.com/apache/hudi/pull/9904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]
danny0405 commented on issue #9971: URL: https://github.com/apache/hudi/issues/9971#issuecomment-1791736105 Try to set up `index.type` as `BUCKET` instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
CTTY commented on code in PR #9717: URL: https://github.com/apache/hudi/pull/9717#discussion_r1380917088 ## hudi-common/src/test/java/org/apache/hudi/common/util/TestClusteringUtils.java: ## @@ -107,6 +108,7 @@ public void testClusteringPlanMultipleInstants() throws Exception { // replacecommit.inflight doesn't have clustering plan. // Verify that getClusteringPlan fetches content from corresponding requested file. + @Disabled("Will fail due to avro issue AVRO-3789. This is fixed in avro 1.11.3") Review Comment: avro 1.11.2 can't compare empty map types -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
CTTY commented on code in PR #9717: URL: https://github.com/apache/hudi/pull/9717#discussion_r1380916039 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/DataFrameUtil.scala: ## @@ -31,7 +33,7 @@ object DataFrameUtil { */ def createFromInternalRows(sparkSession: SparkSession, schema: StructType, rdd: RDD[InternalRow]): DataFrame = { -val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession) +val logicalPlan = LogicalRDD(SparkAdapterSupport.sparkAdapter.toAttributes(schema), rdd)(sparkSession) Review Comment: StructType.toAttributes was removed in Spark 3.5.0 by [SPARK-44353](https://github.com/apache/spark/pull/41925) Solution is to switch to use DataTypeUtils.toAttributes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
CTTY commented on code in PR #9717: URL: https://github.com/apache/hudi/pull/9717#discussion_r1380915008 ## .github/workflows/bot.yml: ## @@ -284,29 +294,33 @@ jobs: matrix: include: - flinkProfile: 'flink1.17' -sparkProfile: 'spark3.4' -sparkRuntime: 'spark3.4.0' - - flinkProfile: 'flink1.17' -sparkProfile: 'spark3.3' -sparkRuntime: 'spark3.3.2' - - flinkProfile: 'flink1.16' -sparkProfile: 'spark3.3' -sparkRuntime: 'spark3.3.2' - - flinkProfile: 'flink1.15' -sparkProfile: 'spark3.3' -sparkRuntime: 'spark3.3.1' - - flinkProfile: 'flink1.14' -sparkProfile: 'spark3.2' -sparkRuntime: 'spark3.2.3' - - flinkProfile: 'flink1.13' -sparkProfile: 'spark3.1' -sparkRuntime: 'spark3.1.3' - - flinkProfile: 'flink1.14' -sparkProfile: 'spark3.0' -sparkRuntime: 'spark3.0.2' - - flinkProfile: 'flink1.13' -sparkProfile: 'spark2.4' -sparkRuntime: 'spark2.4.8' +sparkProfile: 'spark3.5' +sparkRuntime: 'spark3.5.0' +# - flinkProfile: 'flink1.17' +#sparkProfile: 'spark3.4' +#sparkRuntime: 'spark3.4.0' +# - flinkProfile: 'flink1.17' +#sparkProfile: 'spark3.3' +#sparkRuntime: 'spark3.3.2' +# - flinkProfile: 'flink1.16' +#sparkProfile: 'spark3.3' +#sparkRuntime: 'spark3.3.2' +# - flinkProfile: 'flink1.15' +#sparkProfile: 'spark3.3' +#sparkRuntime: 'spark3.3.1' +# - flinkProfile: 'flink1.14' +#sparkProfile: 'spark3.2' +#sparkRuntime: 'spark3.2.3' +# - flinkProfile: 'flink1.13' +#sparkProfile: 'spark3.1' +#sparkRuntime: 'spark3.1.3' +# - flinkProfile: 'flink1.14' +#sparkProfile: 'spark3.0' +#sparkRuntime: 'spark3.0.2' +# - flinkProfile: 'flink1.13' +#sparkProfile: 'spark2.4' +#sparkRuntime: 'spark2.4.8' + Review Comment: Using my personal docker image to test Spark 3.5 specifically, will revert -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-1623][FOLLOW_UP] Fix test TestWaitBasedTimeGenerator & refine codes [hudi]
danny0405 merged PR #9972: URL: https://github.com/apache/hudi/pull/9972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-1623][Tests] Fix test TestWaitBasedTimeGenerator (#9972)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new eeddac702ed [HUDI-1623][Tests] Fix test TestWaitBasedTimeGenerator (#9972) eeddac702ed is described below commit eeddac702ed3a97d6b08a699c506a1898de4af16 Author: Rex(Hui) An AuthorDate: Fri Nov 3 08:18:44 2023 +0800 [HUDI-1623][Tests] Fix test TestWaitBasedTimeGenerator (#9972) --- .../java/org/apache/hudi/config/DynamoDbBasedLockConfig.java | 2 +- .../main/java/org/apache/hudi/config/HoodieLockConfig.java | 3 ++- .../hudi/client/transaction/lock/InProcessLockProvider.java | 2 +- .../org/apache/hudi/common/config/LockConfiguration.java | 4 +--- .../org/apache/hudi/common/table/timeline/TimeGenerator.java | 8 .../apache/hudi/common/table/timeline/TimeGeneratorBase.java | 2 +- .../hudi/common/table/timeline/WaitBasedTimeGenerator.java | 12 ++-- .../common/table/timeline/TestWaitBasedTimeGenerator.java| 6 -- 8 files changed, 20 insertions(+), 19 deletions(-) diff --git a/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java b/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java index 5639db02582..0e884a6797f 100644 --- a/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java +++ b/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java @@ -127,7 +127,7 @@ public class DynamoDbBasedLockConfig extends HoodieConfig { public static final ConfigProperty LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY = ConfigProperty .key(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY) - .defaultValue(LockConfiguration.DEFAULT_ACQUIRE_LOCK_WAIT_TIMEOUT_MS) + .defaultValue(LockConfiguration.DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS) .markAdvanced() .sinceVersion("0.10.0") .withDocumentation("Lock Acquire Wait Timeout in milliseconds"); diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java index b24aecf46c1..fa38da8f8ab 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java @@ -36,6 +36,7 @@ import java.util.Properties; import static org.apache.hudi.common.config.LockConfiguration.DEFAULT_LOCK_ACQUIRE_NUM_RETRIES; import static org.apache.hudi.common.config.LockConfiguration.DEFAULT_LOCK_ACQUIRE_RETRY_WAIT_TIME_IN_MILLIS; +import static org.apache.hudi.common.config.LockConfiguration.DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS; import static org.apache.hudi.common.config.LockConfiguration.DEFAULT_ZK_CONNECTION_TIMEOUT_MS; import static org.apache.hudi.common.config.LockConfiguration.DEFAULT_ZK_SESSION_TIMEOUT_MS; import static org.apache.hudi.common.config.LockConfiguration.FILESYSTEM_LOCK_EXPIRE_PROP_KEY; @@ -106,7 +107,7 @@ public class HoodieLockConfig extends HoodieConfig { public static final ConfigProperty LOCK_ACQUIRE_WAIT_TIMEOUT_MS = ConfigProperty .key(LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY) - .defaultValue(60 * 1000) + .defaultValue(DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS) .markAdvanced() .sinceVersion("0.8.0") .withDocumentation("Timeout in ms, to wait on an individual lock acquire() call, at the lock provider."); diff --git a/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java b/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java index c3437f91c8c..c2edb1864b0 100644 --- a/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java +++ b/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java @@ -61,7 +61,7 @@ public class InProcessLockProvider implements LockProvider new ReentrantReadWriteLock()); maxWaitTimeMillis = typedProperties.getLong(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY, -LockConfiguration.DEFAULT_ACQUIRE_LOCK_WAIT_TIMEOUT_MS); +LockConfiguration.DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS); } @Override diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java b/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java index 9e652c64efe..1171dcf3fce 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java @@ -43,7 +43,7 @@ public class LockConfiguration implements Serializable { public static final String LOCK_ACQUIRE_CLIENT_NUM_RETRIES_PROP_KEY = LOCK_PREFIX + "client.num_retrie
Re: [PR] [HUDI-5210] Implement functional indexes [hudi]
hudi-bot commented on PR #9872: URL: https://github.com/apache/hudi/pull/9872#issuecomment-1791705100 ## CI report: * 0d2dace457162b24edaabe2c83b8d6d0c310050a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20643) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5210] Implement functional indexes [hudi]
hudi-bot commented on PR #9872: URL: https://github.com/apache/hudi/pull/9872#issuecomment-1791439149 ## CI report: * d2eced526259327f5abfb8ac92d8b37b7a4b12c2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20640) * 0d2dace457162b24edaabe2c83b8d6d0c310050a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20643) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org