date:20231102

[jira] [Comment Edited] (HUDI-7024) Null Pointer Exception for a flink streaming pipeline for Consistent Hashing

2023-11-02 Thread Jing Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782432#comment-17782432
 ] 

Jing Zhang edited comment on HUDI-7024 at 11/3/23 6:58 AM:
---

[~adityagoenka]  Could you please provide more information? for example, the 
exception stack, logs, flink version and hudi version, flink job scripts and 
spark clustering jobs?


was (Author: qingru zhang):
[~adityagoenka]  Could you please provide more information? for example, the 
exception stack, logs, flink version and hudi version?

> Null Pointer Exception for a flink streaming pipeline for Consistent Hashing
> 
>
> Key: HUDI-7024
> URL: https://issues.apache.org/jira/browse/HUDI-7024
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.14.1
>
>
> When we do a offline clustering job with HoodieClusteringJob for a table with 
> consistent Hashing enabled, the flink pipeline is failing with a Null 
> PointerException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]

2023-11-02 Thread via GitHub



ad1happy2go commented on issue #9977:
URL: https://github.com/apache/hudi/issues/9977#issuecomment-1791953907

   @mzheng-plaid Thanks for raising this. Couple of things we can check to 
triage this - 
   
   1. Check Spark UI and stages, if there is any stage/task failure and retry 
happening for same.
   2. Try to use SIMPLE index instead of BLOOM to check if you still see the 
data loss. This is to triage if this issue is BLOOM index related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9979:
URL: https://github.com/apache/hudi/pull/9979#issuecomment-1791950262

   
   ## CI report:
   
   * b9e26b3d425f88f0599283a0e834e4581a8b1b64 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9955:
URL: https://github.com/apache/hudi/pull/9955#issuecomment-1791950173

   
   ## CI report:
   
   * 8809ad5187203de0326cca32a3e59a4b1e1b9ca0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20589)
 
   * 7a24b91b83fef2b8b2bf278a1fafd9d1bb2a7d03 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20657)
 
   * 11a355c59b6c14ce8ba03cfbefcc5b6ab8ca422c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9936:
URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791950055

   
   ## CI report:
   
   * 2b2a290f4f9fe0693d331a331ba8e8fa882761dd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20551)
 
   * 92501c8473c95562c5158daebe08e3787282e6eb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-7024) Null Pointer Exception for a flink streaming pipeline for Consistent Hashing

2023-11-02 Thread Jing Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782432#comment-17782432
 ] 

Jing Zhang commented on HUDI-7024:
--

[~adityagoenka]  Could you please provide more information? for example, the 
exception stack, logs, flink version and hudi version?

> Null Pointer Exception for a flink streaming pipeline for Consistent Hashing
> 
>
> Key: HUDI-7024
> URL: https://issues.apache.org/jira/browse/HUDI-7024
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.14.1
>
>
> When we do a offline clustering job with HoodieClusteringJob for a table with 
> consistent Hashing enabled, the flink pipeline is failing with a Null 
> PointerException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7002] Fixing initializing RLI MDT partition for non-partitioned dataset [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9938:
URL: https://github.com/apache/hudi/pull/9938#issuecomment-1791950105

   
   ## CI report:
   
   * b534ff0015140dc9d338da2da4a1dfb1f6ebac66 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20545)
 
   * 0987e3c8d3a299311d32a9bd1243ce8e8b204419 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9913:
URL: https://github.com/apache/hudi/pull/9913#issuecomment-1791949944

   
   ## CI report:
   
   * 5eb4bf14d826e60c412078762aa061f415bac51d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20630)
 
   * caefe9891b1eda36c04dfe6003b071bb813db7d7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9946:
URL: https://github.com/apache/hudi/pull/9946#issuecomment-1791945036

   
   ## CI report:
   
   * 6ffc26d3efacd14c5cab8574584e276149d29c6b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20639)
 
   * 5daa002dfd75ec233a9ad045ad0c32cfa673a933 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20658)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9955:
URL: https://github.com/apache/hudi/pull/9955#issuecomment-1791945067

   
   ## CI report:
   
   * 8809ad5187203de0326cca32a3e59a4b1e1b9ca0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20589)
 
   * 7a24b91b83fef2b8b2bf278a1fafd9d1bb2a7d03 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20657)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT]flink-sql write hudi use TIMESTAMP, when hive query, it get time+8h question, use TIMESTAMP_LTZ, the hive schema is bigint but timestamp [hudi]

2023-11-02 Thread via GitHub



GaoYaokun commented on issue #9864:
URL: https://github.com/apache/hudi/issues/9864#issuecomment-1791940320

   I also encountered this issue when I used Flink CDC to write data from MySQL 
to Hudi and synchronize Hive. The Timestamp(6) field in Hive correctly 
displayed as Timestamp. But when I use Hive to query it, an error will be 
reported like this：
   
   SQL ERROR: java.io.IOException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to 
org.apache.hadoop.hive.serde2.io.TimestampWritableV2
   
   And I don't change any schema of Hive.  This Synchronized Hive table is a 
new table.
   How to solve this problem?
   
   hudi version: 0.13.1 
   flink version 1.16.1
   hive version 3.1.2
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hehuiyuan commented on code in PR #9936:
URL: https://github.com/apache/hudi/pull/9936#discussion_r1381184378


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   @danny0405 ， done



##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   @danny0405 ， done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-3304] Allow selective partial update [hudi]

2023-11-02 Thread via GitHub



CTTY commented on PR #7359:
URL: https://github.com/apache/hudi/pull/7359#issuecomment-1791922139

   Hi, I've cherry-picked this commit and created a new PR to continue the 
work: #9979 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [HUDI-3304] Add support for selective partial update [hudi]

2023-11-02 Thread via GitHub



CTTY opened a new pull request, #9979:
URL: https://github.com/apache/hudi/pull/9979

   ### Change Logs
   Allow selective partial update in Hudi
   Original PR: #7359 
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   Medium
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9955:
URL: https://github.com/apache/hudi/pull/9955#issuecomment-1791919103

   
   ## CI report:
   
   * 8809ad5187203de0326cca32a3e59a4b1e1b9ca0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20589)
 
   * 7a24b91b83fef2b8b2bf278a1fafd9d1bb2a7d03 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9946:
URL: https://github.com/apache/hudi/pull/9946#issuecomment-1791919069

   
   ## CI report:
   
   * 6ffc26d3efacd14c5cab8574584e276149d29c6b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20639)
 
   * 5daa002dfd75ec233a9ad045ad0c32cfa673a933 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7029) Enhance CREATE INDEX syntax for functional index

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7029:
--
Fix Version/s: 1.0.0

> Enhance CREATE INDEX syntax for functional index
> 
>
> Key: HUDI-7029
> URL: https://issues.apache.org/jira/browse/HUDI-7029
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>
> Currently, user can create index using sql as follows: 
> `create index idx_datestr on $tableName using column_stats(ts) 
> options(func='from_unixtime', format='-MM-dd')`
> Ideally, we would to simplify this further as follows:
> `create index idx_datestr on $tableName using column_stats(from_unixtime(ts, 
> format='-MM-dd'))`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-02 Thread via GitHub



PrabhuJoseph commented on code in PR #9946:
URL: https://github.com/apache/hudi/pull/9946#discussion_r1381174885


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java:
##
@@ -75,7 +78,12 @@ private HiveSyncContext(Properties props, HiveConf hiveConf) 
{
   public HiveSyncTool hiveSyncTool() {
 HiveSyncMode syncMode = 
HiveSyncMode.of(props.getProperty(HIVE_SYNC_MODE.key()));
 if (syncMode == HiveSyncMode.GLUE) {
-  return new AwsGlueCatalogSyncTool(props, hiveConf);
+  if (ReflectionUtils.hasConstructor(AWS_GLUE_CATALOG_SYNC_TOOL_CLASS,
+  new Class[] {Properties.class, 
org.apache.hadoop.conf.Configuration.class})) {

Review Comment:
   Thanks for pointing out the unnecessary if condition. I have fixed it in the 
latest commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7029) Enhance CREATE INDEX syntax for functional index

2023-11-02 Thread Sagar Sumit (Jira)

Sagar Sumit created HUDI-7029:
-

 Summary: Enhance CREATE INDEX syntax for functional index
 Key: HUDI-7029
 URL: https://issues.apache.org/jira/browse/HUDI-7029
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit


Currently, user can create index using sql as follows: 

`create index idx_datestr on $tableName using column_stats(ts) 
options(func='from_unixtime', format='-MM-dd')`

Ideally, we would to simplify this further as follows:

`create index idx_datestr on $tableName using column_stats(from_unixtime(ts, 
format='-MM-dd'))`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5219) Support "CREATE INDEX" for index function through Spark SQL

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5219.
-
Fix Version/s: 1.0.0
   Resolution: Done

> Support "CREATE INDEX" for index function through Spark SQL
> ---
>
> Key: HUDI-5219
> URL: https://issues.apache.org/jira/browse/HUDI-5219
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5219) Support "CREATE INDEX" for index function through Spark SQL

2023-11-02 Thread Sagar Sumit (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782418#comment-17782418
 ] 

Sagar Sumit commented on HUDI-5219:
---

Landed via 
[https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292]

Users can create index using sql: 

SQL: `create index idx_datestr on $tableName using column_stats(ts) 
options(func='from_unixtime', format='-MM-dd')`

> Support "CREATE INDEX" for index function through Spark SQL
> ---
>
> Key: HUDI-5219
> URL: https://issues.apache.org/jira/browse/HUDI-5219
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5215) Support file pruning based on new index function in Spark

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5215.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

> Support file pruning based on new index function in Spark
> -
>
> Key: HUDI-5215
> URL: https://issues.apache.org/jira/browse/HUDI-5215
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5215) Support file pruning based on new index function in Spark

2023-11-02 Thread Sagar Sumit (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782417#comment-17782417
 ] 

Sagar Sumit commented on HUDI-5215:
---

`HoodieFileIndex` can now skip files based on functional index. Landed via 
https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292

> Support file pruning based on new index function in Spark
> -
>
> Key: HUDI-5215
> URL: https://issues.apache.org/jira/browse/HUDI-5215
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5214) Add functionality to create new MT partition for index function

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5214.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

> Add functionality to create new MT partition for index function
> ---
>
> Key: HUDI-5214
> URL: https://issues.apache.org/jira/browse/HUDI-5214
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5214) Add functionality to create new MT partition for index function

2023-11-02 Thread Sagar Sumit (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782416#comment-17782416
 ] 

Sagar Sumit commented on HUDI-5214:
---

Landed as part of 
[https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292]

Functional index can be created and updated via metadata writer (as of 
2023-11-03 only supported for Spark).

> Add functionality to create new MT partition for index function
> ---
>
> Key: HUDI-5214
> URL: https://issues.apache.org/jira/browse/HUDI-5214
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5213) Support index function for Spark SQL built-in functions

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5213.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

> Support index function for Spark SQL built-in functions 
> 
>
> Key: HUDI-5213
> URL: https://issues.apache.org/jira/browse/HUDI-5213
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5213) Support index function for Spark SQL built-in functions

2023-11-02 Thread Sagar Sumit (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782414#comment-17782414
 ] 

Sagar Sumit commented on HUDI-5213:
---

Landed as part of 
[https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292]

Some common date/timestamp, string and identity functions are supported.

> Support index function for Spark SQL built-in functions 
> 
>
> Key: HUDI-5213
> URL: https://issues.apache.org/jira/browse/HUDI-5213
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5212) Store index function in table properties

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5212.
-
Fix Version/s: 1.0.0
   Resolution: Done

> Store index function in table properties
> 
>
> Key: HUDI-5212
> URL: https://issues.apache.org/jira/browse/HUDI-5212
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5212) Store index function in table properties

2023-11-02 Thread Sagar Sumit (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782413#comment-17782413
 ] 

Sagar Sumit commented on HUDI-5212:
---

Landed as part of 
https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292

> Store index function in table properties
> 
>
> Key: HUDI-5212
> URL: https://issues.apache.org/jira/browse/HUDI-5212
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-5212) Store index function in table properties

2023-11-02 Thread Sagar Sumit (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782413#comment-17782413
 ] 

Sagar Sumit edited comment on HUDI-5212 at 11/3/23 5:24 AM:


Landed as part of 
[https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292]

Currently all index definitions are stored in a separate json file, path to 
which can be specified by the user, and that path will be stored in 
hoodie.properties.


was (Author: codope):
Landed as part of 
https://github.com/apache/hudi/commit/332f5d9eaa3b97c3132e995a9b405b9903b00292

> Store index function in table properties
> 
>
> Key: HUDI-5212
> URL: https://issues.apache.org/jira/browse/HUDI-5212
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5211) Add abstraction to track a function defined on a column

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5211.
-
Fix Version/s: 1.0.0
   Resolution: Done

> Add abstraction to track a function defined on a column
> ---
>
> Key: HUDI-5211
> URL: https://issues.apache.org/jira/browse/HUDI-5211
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hehuiyuan commented on code in PR #9936:
URL: https://github.com/apache/hudi/pull/9936#discussion_r1381170499


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   Hi @danny0405 , the value  of primary key field name  has been remove, 
there are some other issues for UT.
   
   ```
   Error:  Failures: 
   Error:TestWaitBasedTimeGenerator.testSlowerThreadLaterAcquiredLock:143 
expected:  but was: 
   [INFO] 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hehuiyuan commented on code in PR #9936:
URL: https://github.com/apache/hudi/pull/9936#discussion_r1381144499


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   @danny0405  hi,   What is this problem,  the primary key  field name has 
been removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hehuiyuan commented on PR #9936:
URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791899414

   [INFO] 
   Error:  Failures: 
   Error:TestWaitBasedTimeGenerator.testSlowerThreadLaterAcquiredLock:143 
expected:  but was: 
   [INFO] 
   Error:  Tests run: 1026, Failures: 1, Errors: 0, Skipped: 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hehuiyuan commented on code in PR #9936:
URL: https://github.com/apache/hudi/pull/9936#discussion_r1381144499


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   @danny0405  hi,   What is this problem,  the primary key  field name has 
been removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9937:
URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791886212

   
   ## CI report:
   
   * 392c1a3007e5d562be86a9c0096bbfd53988f5ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20641)
 
   * 1d5de86d295233edff138e9bfb8e9151a5b7ecae Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20655)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791886165

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   * abd9807817eb49458b1f8dd9f9d31157ba2b5a81 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20654)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9937:
URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791882241

   
   ## CI report:
   
   * 392c1a3007e5d562be86a9c0096bbfd53988f5ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20641)
 
   * 1d5de86d295233edff138e9bfb8e9151a5b7ecae UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9936:
URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791882215

   
   ## CI report:
   
   * 2b2a290f4f9fe0693d331a331ba8e8fa882761dd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20551)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791882158

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   * abd9807817eb49458b1f8dd9f9d31157ba2b5a81 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [MINOR] Re-enable a test that got fixed [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9978:
URL: https://github.com/apache/hudi/pull/9978#issuecomment-1791878221

   
   ## CI report:
   
   * 3b3a9f61789da9d0f6ac569e5c2a9b7c7be8961c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20653)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9936:
URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791878123

   
   ## CI report:
   
   * 2b2a290f4f9fe0693d331a331ba8e8fa882761dd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20551)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9923:
URL: https://github.com/apache/hudi/pull/9923#issuecomment-1791878072

   
   ## CI report:
   
   * 2f1b6536c1456fd0211740c90542bf25f53d1010 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20599)
 
   * ff11f10133f07427df3d13df8393362a75004807 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20652)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hehuiyuan commented on code in PR #9936:
URL: https://github.com/apache/hudi/pull/9936#discussion_r1381144499


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   @danny0405  hi,  What is this problem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



hehuiyuan commented on PR #9936:
URL: https://github.com/apache/hudi/pull/9936#issuecomment-1791871892

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]

2023-11-02 Thread via GitHub



joeytman commented on issue #9971:
URL: https://github.com/apache/hudi/issues/9971#issuecomment-1791871773

   > Try to set up index.type as BUCKET instead.
   
   Thanks for the tip! I'm confused by the results. 
   
   On first glance, using `index.type` seems to work correctly, files are 
written by the same naming convention now. 
   
   But, this log no longer appears:
   ```
   2023-11-01 22:16:11,025 INFO  org.apache.hudi.index.bucket.HoodieBucketIndex 
  [] - Use bucket index, numBuckets = 113, indexFields: [redacted1, 
redacted2]
   ```
   
   So, to be clear:

   * `index.type=BUCKET` actually enables bucket index, but without any logs 
indicating it's working
   * `hoodie.index.type=BUCKET` produces logs that indicate it's working, but 
it doesn't actually do anything


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-02 Thread via GitHub



waitingF commented on PR #9937:
URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791862212

   > @waitingF Can you rebase with the latest master to resolve the test 
failures, can you try in your local env that the compaction really works?
   
   Sure.
   
   I tested in my local env, all good. 
   Attach is my test log
   
[local-hudi-cli-table-change-command-verify.txt](https://github.com/apache/hudi/files/13246562/local-hudi-cli-table-change-command-verify.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [MINOR] Re-enable a test that got fixed [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9978:
URL: https://github.com/apache/hudi/pull/9978#issuecomment-1791855627

   
   ## CI report:
   
   * 3b3a9f61789da9d0f6ac569e5c2a9b7c7be8961c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9975:
URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791855608

   
   ## CI report:
   
   * 30a00f1575934104714817b6b9243f3866f277d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637)
 
   * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20649)
 
   * 736f0a04fe805294a0d1722a62ad327636b86a5b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20651)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9923:
URL: https://github.com/apache/hudi/pull/9923#issuecomment-1791855498

   
   ## CI report:
   
   * 2f1b6536c1456fd0211740c90542bf25f53d1010 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20599)
 
   * ff11f10133f07427df3d13df8393362a75004807 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9975:
URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791851892

   
   ## CI report:
   
   * 30a00f1575934104714817b6b9243f3866f277d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637)
 
   * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20649)
 
   * 736f0a04fe805294a0d1722a62ad327636b86a5b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9959:
URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791851848

   
   ## CI report:
   
   * fd974dfa66aa2873ec0491212070db6845dd7877 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20603)
 
   * 608a35a71faf69830fde7796babb12c0c327cfe0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20650)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [MINOR] Re-enable a test that got fixed [hudi]

2023-11-02 Thread via GitHub



codope opened a new pull request, #9978:
URL: https://github.com/apache/hudi/pull/9978

   ### Change Logs
   
   `testSlowerThreadLaterAcquiredLock` was disabled and got fixed by #9972. 
This PR simple enables it again.
   
   ### Impact
   
   none - test change
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-02 Thread via GitHub



zhuanshenbsj1 commented on PR #9923:
URL: https://github.com/apache/hudi/pull/9923#issuecomment-1791846370

   Resolve conflicts&Rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791838790

   You can rebase with the latest master to re-trigger it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9975:
URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791830718

   
   ## CI report:
   
   * 30a00f1575934104714817b6b9243f3866f277d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637)
 
   * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20649)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9959:
URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791830679

   
   ## CI report:
   
   * fd974dfa66aa2873ec0491212070db6845dd7877 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20603)
 
   * 608a35a71faf69830fde7796babb12c0c327cfe0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9975:
URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791826407

   
   ## CI report:
   
   * 30a00f1575934104714817b6b9243f3866f277d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20637)
 
   * de92e35a38f3a42b425063cbd48f4cf2fb56f3e1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-5210) End-to-end PoC of functional indexes

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5210.
-
Resolution: Done

> End-to-end PoC of functional indexes
> 
>
> Key: HUDI-5210
> URL: https://issues.apache.org/jira/browse/HUDI-5210
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5210) End-to-end PoC of functional indexes

2023-11-02 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-5210:
--
Status: Patch Available  (was: In Progress)

> End-to-end PoC of functional indexes
> 
>
> Key: HUDI-5210
> URL: https://issues.apache.org/jira/browse/HUDI-5210
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-02 Thread via GitHub



ksmou commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791784841

   Azure looks some problems


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-02 Thread via GitHub



yihua merged PR #9872:
URL: https://github.com/apache/hudi/pull/9872


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-02 Thread via GitHub



yihua commented on code in PR #9872:
URL: https://github.com/apache/hudi/pull/9872#discussion_r1380966712


##
hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java:
##
@@ -40,7 +40,8 @@ public enum Names {
 RECORD_PAYLOAD("Record Payload Config"),
 KAFKA_CONNECT("Kafka Connect Configs"),
 AWS("Amazon Web Services Configs"),
-HUDI_STREAMER("Hudi Streamer Configs");
+HUDI_STREAMER("Hudi Streamer Configs"),
+INDEXING("Indexing Configs");

Review Comment:
   In that case, let's remove the subgroup of `INDEX` or rename it sth 
different in the follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1791778235

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * b9c76842e4cdc5a6db43109dafa115109d287584 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20646)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-7028) Fix Spark Quick Start

2023-11-02 Thread Lin Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu reassigned HUDI-7028:
-

Assignee: Lin Liu

> Fix Spark Quick Start
> -
>
> Key: HUDI-7028
> URL: https://issues.apache.org/jira/browse/HUDI-7028
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Fix the bugs for Spark quick start when turning on file group reader and 
> positional merging flag.
>  
> List some issues found so far:
>  # [compatibility]When no positions are stored in the header, the read query 
> failed. Idea behavior: use key based merging instead of failing.
>  # [compatibility]When a parquet file contains Avro records, the file group 
> reader of spark job will check if the payload is the expected type; 
> otherwise, it will throw.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on PR #9959:
URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791774741

   @cuibo01 you can rebase with the latest master to resolve the test failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1791773116

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * b544b18820ae3fe8fbf1c50a34e561ad36bfbaba Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20624)
 
   * b9c76842e4cdc5a6db43109dafa115109d287584 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7028) Fix Spark Quick Start

2023-11-02 Thread Lin Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7028:
--
Description: 
Fix the bugs for Spark quick start when turning on file group reader and 
positional merging flag.

 

List some issues found so far:
 # [compatibility]When no positions are stored in the header, the read query 
failed. Idea behavior: use key based merging instead of failing.
 # [compatibility]When a parquet file contains Avro records, the file group 
reader of spark job will check if the payload is the expected type; otherwise, 
it will throw.

  was:
Fix the bugs for Spark quick start when turning on file group reader and 
positional merging flag.

 

List some issues found so far:
 # [compatibility]When no positions are stored in the header, the read query 
failed. Idea behavior: use key based merging instead of failing.
 # [compatibility]When a parquet file contains Avro records, the file group 
reader of spark job will check if the payload is the expected type; otherwise, 
it will throw.
 #  


> Fix Spark Quick Start
> -
>
> Key: HUDI-7028
> URL: https://issues.apache.org/jira/browse/HUDI-7028
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Fix the bugs for Spark quick start when turning on file group reader and 
> positional merging flag.
>  
> List some issues found so far:
>  # [compatibility]When no positions are stored in the header, the read query 
> failed. Idea behavior: use key based merging instead of failing.
>  # [compatibility]When a parquet file contains Avro records, the file group 
> reader of spark job will check if the payload is the expected type; 
> otherwise, it will throw.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791773350

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-02 Thread via GitHub



yihua commented on code in PR #9872:
URL: https://github.com/apache/hudi/pull/9872#discussion_r1380906029


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieFunctionalIndexConfig.java:
##
@@ -0,0 +1,319 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.config;
+
+import org.apache.hudi.common.util.BinaryUtil;
+import org.apache.hudi.common.util.ConfigUtils;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.index.secondary.SecondaryIndexType;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.concurrent.Immutable;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.time.Instant;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+import java.util.function.BiConsumer;
+
+import static org.apache.hudi.common.util.ConfigUtils.fetchConfigs;
+import static org.apache.hudi.common.util.ConfigUtils.recoverIfNeeded;
+import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes;
+
+@Immutable
+@ConfigClassProperty(name = "Common Index Configs",
+groupName = ConfigGroups.Names.INDEXING,
+subGroupName = ConfigGroups.SubGroupNames.FUNCTIONAL_INDEX,
+areCommonConfigs = true,
+description = "")
+public class HoodieFunctionalIndexConfig extends HoodieConfig {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HoodieFunctionalIndexConfig.class);
+
+  public static final String INDEX_DEFINITION_FILE = "index.properties";
+  public static final String INDEX_DEFINITION_FILE_BACKUP = 
"index.properties.backup";
+  public static final ConfigProperty INDEX_NAME = ConfigProperty

Review Comment:
   Got it.



##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieFunctionalIndexConfig.java:
##
@@ -0,0 +1,319 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.config;
+
+import org.apache.hudi.common.util.BinaryUtil;
+import org.apache.hudi.common.util.ConfigUtils;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.index.secondary.SecondaryIndexType;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.concurrent.Immutable;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.time.Instant;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+import java.util.function.BiConsumer;
+
+import static org.apache.hudi.common.util.ConfigUtils.fetchConfigs;
+import static org.apache.hudi.common.util.ConfigUtils.recoverIfNeeded;
+import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes;
+
+@Immutable
+@ConfigClassProperty(name = "Common Index Configs",
+groupName = ConfigGroups.Names.INDEXING,
+subGroupName = ConfigGroups.SubGroupNames.FUNCTIONAL_INDEX,
+areCommonConfigs = true,
+description = "")
+public class HoodieFunctionalIn

[jira] [Commented] (HUDI-7028) Fix Spark Quick Start

2023-11-02 Thread Lin Liu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782369#comment-17782369
 ] 

Lin Liu commented on HUDI-7028:
---

To reproduce the second error:
{code:java}
import org.apache.hudi.QuickstartUtils._
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.common.model.HoodieRecordval tableName = "hudi_trips_cow"
val basePath = "file:///tmp/hudi_trips_cow"
val dataGen = new DataGenerator
val inserts = convertToStringList(dataGen.generateInserts(10))
val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
df.write.format("hudi").
  options(getQuickstartWriteConfigs).
  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
  option(TABLE_NAME, tableName).
  option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  mode(Overwrite).
  save(basePath)val tripsSnapshotDF = spark.
  read.
  option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  format("hudi").
  load(basePath)
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")spark.sql("select 
fare, begin_lon, begin_lat, ts from  hudi_trips_snapshot where fare > 
20.0").show()
spark.sql("select _hoodie_commit_time, _hoodie_record_key, 
_hoodie_partition_path, rider, driver, fare from  
hudi_trips_snapshot").show()val updates = 
convertToStringList(dataGen.generateUpdates(10))
val df = spark.read.json(spark.sparkContext.parallelize(updates, 2))
df.write.format("hudi").
  options(getQuickstartWriteConfigs).
  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
  option(TABLE_NAME, tableName).
  option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  mode(Append).
  save(basePath)
spark.
  read.
  option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  format("hudi").
  load(basePath).
  createOrReplaceTempView("hudi_trips_snapshot")val commits = spark.sql("select 
distinct(_hoodie_commit_time) as commitTime from  hudi_trips_snapshot order by 
commitTime").map(k => k.getString(0)).take(50) {code}

> Fix Spark Quick Start
> -
>
> Key: HUDI-7028
> URL: https://issues.apache.org/jira/browse/HUDI-7028
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Fix the bugs for Spark quick start when turning on file group reader and 
> positional merging flag.
>  
> List some issues found so far:
>  # [compatibility]When no positions are stored in the header, the read query 
> failed. Idea behavior: use key based merging instead of failing.
>  # [compatibility]When a parquet file contains Avro records, the file group 
> reader of spark job will check if the payload is the expected type; 
> otherwise, it will throw.
>  #  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-7028) Fix Spark Quick Start

2023-11-02 Thread Lin Liu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782367#comment-17782367
 ] 

Lin Liu commented on HUDI-7028:
---

To reproduce the first error:
{code:java}
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.common.table.HoodieTableConfig._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.keygen.constant.KeyGeneratorOptions._
import org.apache.hudi.common.model.HoodieRecord
import spark.implicits._val tableName = "trips_table"
val basePath = "file:///tmp/trips_table_1"val columns = 
Seq("ts","uuid","rider","driver","fare","city")
val data =
  
Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
    
(1695091554788L,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70
 ,"san_francisco"),
    
(1695046462179L,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90
 ,"san_francisco"),
    
(1695516137016L,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"
    ),
    
(169511511L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));var
 inserts = spark.createDataFrame(data).toDF(columns:_*)
inserts.write.format("hudi").
  option(PARTITIONPATH_FIELD_NAME.key(), "city").
  option(TABLE_NAME, tableName).
  option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  mode(Overwrite).
  save(basePath)
val tripsDF = spark.read.
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  format("hudi").load(basePath)
tripsDF.createOrReplaceTempView("trips_table")spark.sql("SELECT uuid, fare, ts, 
rider, driver, city FROM  trips_table WHERE fare > 20.0").show()

spark.sql("SELECT _hoodie_commit_time, _hoodie_record_key, 
_hoodie_partition_path, rider, driver, fare FROM  trips_table").show(1000, 
false)
val updatesDf = spark.read.
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  format("hudi").load(basePath).filter($"rider" === 
"rider-D").withColumn("fare", col("fare") * 10)updatesDf.write.format("hudi").
  option(OPERATION_OPT_KEY, "upsert").
  option(PARTITIONPATH_FIELD_NAME.key(), "city").
  option(TABLE_NAME, tableName).
  option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  mode(Append).
  save(basePath)// spark-shell
val adjustedFareDF = spark.read.
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  format("hudi").
  load(basePath).limit(2).
  withColumn("fare", col("fare") * 10)adjustedFareDF.write.format("hudi").
  
option("hoodie.datasource.write.payload.class","com.payloads.CustomMergeIntoConnector").
  option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
  option("hoodie.logfile.data.block.format", "parquet").
  option("hoodie.datasource.write.record.merger.impls", 
"org.apache.hudi.HoodieSparkRecordMerger").
  option("hoodie.datasource.read.use.new.parquet.file.format", "true").
  option("hoodie.file.group.reader.enabled", "true").
  option("hoodie.write.record.positions", "true").
  mode(Append).
  save(basePath)
 {code}

> Fix Spark Quick Start
> -
>
> Key: HUDI-7028
> URL: https://issues.apache.org/jira/browse/HUDI-7028
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Fix the bugs for Spark quick start when turning on fil

[jira] [Updated] (HUDI-7028) Fix Spark Quick Start

2023-11-02 Thread Lin Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7028:
--
Description: 
Fix the bugs for Spark quick start when turning on file group reader and 
positional merging flag.

 

List some issues found so far:
 # [compatibility]When no positions are stored in the header, the read query 
failed. Idea behavior: use key based merging instead of failing.
 # [compatibility]When a parquet file contains Avro records, the file group 
reader of spark job will check if the payload is the expected type; otherwise, 
it will throw.
 #  

  was:
Fix the bugs for Spark quick start when turning on file group reader and 
positional merging flag.

 

List some issues found so far:
 # When no positions are stored in the header, the read query failed. Idea 
behavior: use key based merging instead of failing.
 #  


> Fix Spark Quick Start
> -
>
> Key: HUDI-7028
> URL: https://issues.apache.org/jira/browse/HUDI-7028
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Fix the bugs for Spark quick start when turning on file group reader and 
> positional merging flag.
>  
> List some issues found so far:
>  # [compatibility]When no positions are stored in the header, the read query 
> failed. Idea behavior: use key based merging instead of failing.
>  # [compatibility]When a parquet file contains Avro records, the file group 
> reader of spark job will check if the payload is the expected type; 
> otherwise, it will throw.
>  #  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]

2023-11-02 Thread via GitHub



mzheng-plaid opened a new issue, #9977:
URL: https://github.com/apache/hudi/issues/9977

   **Describe the problem you faced**
   
   As background, due to https://github.com/apache/hudi/issues/9934 we're 
testing out clustering our table to have fewer base files in our MOR table.
   
   We set up a test by copying an existing table. This table only had base 
files (no log files) in its initial state. We wanted to verify the performance 
of clustering as well as data correctness. We clustered one partition and found 
that **261736 rows were missing after clustering**.
   
   We used the following clustering configuration (and the other configurations 
in "Additional Context"):
   ```
   # Clustering configs
   "hoodie.clustering.inline": "true",
   "hoodie.clustering.inline.max.commits": 1,
   "hoodie.clustering.plan.strategy.small.file.limit": 256 * 1024 * 1024,
   "hoodie.clustering.plan.strategy.target.file.max.bytes": 512 * 1024 * 1024,
   "hoodie.clustering.plan.strategy.sort.columns": "itemId.value",
   "hoodie.clustering.plan.strategy.partition.selected": "dt=2022-08-29",
   "hoodie.clustering.plan.strategy.max.num.groups": 30,
   ```
   
   Our clustering code ran as follows:
   
   1. Read one row from partition `dt=2022-08-29`
   2. Write out the row (this is just a dummy way of triggering clustering 
inline), this update will be a no-op. We set 
"hoodie.clustering.plan.strategy.partition.selected" to be `dt=2022-08-29` to 
only cluster the partition that was written to.
   
   After the write finished I compared the clustered/unclustered tables (we had 
another copy before running this). Before clustering we had 399896071 rows in 
that partition and after clustering 399634335 rows in that partition (261736 
rows were lost). 
   
   Joining the two tables, I saw that **all** the missing rows were from 
**one** base file that was clustered. **This interestingly was the base file 
that received the update of 1 row**:
   
   ```
   # Spark code to find the hoodie file and record key for each of the missing 
rows
   meta_joined_df = unclustered_df.select(
   "_hoodie_file_name", 
   "_hoodie_commit_time", 
   "_hoodie_commit_seqno", 
   "_hoodie_record_key",
   "_hoodie_partition_path",
   "_hoodie_is_deleted",
   ).alias("a").join(
   clustered_df.select(
   "_hoodie_file_name", 
   "_hoodie_commit_time", 
   "_hoodie_commit_seqno", 
   "_hoodie_record_key",
   "_hoodie_partition_path",
   "_hoodie_is_deleted",
   ).alias("b"),
   on=F.col("a._hoodie_record_key") == F.col("b._hoodie_record_key"),
   how="full_outer",
   ).cache()
   meta_joined_df.filter(F.col("b._hoodie_record_key").isNull()).groupBy(
   F.col("a._hoodie_file_name"),
   ).count().alias("count").orderBy("count", ascending=False).show(
   n=10,
   truncate=False
   )
   ```
   
   Output:
   ```
   
+---+--+
   |_hoodie_file_name   
   |count |
   
+---+--+
   
|f0b917f5-607e-47c4-96a4-092b4668c436-0_254-10835-21844023_20231016122622692.parquet|261736|
   
+---+--+
   ```
   
   The `deltacommit` shows this file was the one that received the update:
   ```
   {
 "partitionToWriteStats" : {
   "dt=2022-08-29" : [ {
 "fileId" : "f0b917f5-607e-47c4-96a4-092b4668c436-0",
 "path" : 
"dt=2022-08-29/.f0b917f5-607e-47c4-96a4-092b4668c436-0_20231016122622692.log.1_0-29-5280",
 "prevCommit" : "20231016122622692",
 "numWrites" : 1,
 "numDeletes" : 0,
 "numUpdateWrites" : 1,
 "numInserts" : 0,
 "totalWriteBytes" : 13402,
 "totalWriteErrors" : 0,
 "tempPath" : null,
 "partitionPath" : "dt=2022-08-29",
 "totalLogRecords" : 0,
 "totalLogFilesCompacted" : 0,
 "totalLogSizeCompacted" : 0,
 "totalUpdatedRecordsCompacted" : 0,
 "totalLogBlocks" : 0,
 "totalCorruptLogBlock" : 0,
 "totalRollbackBlocks" : 0,
 "fileSizeInBytes" : 13402,
 "minEventTime" : null,
 "maxEventTime" : null,
 "runtimeStats" : {
   "totalScanTime" : 0,
   "totalUpsertTime" : 2327,
   "totalCreateTime" : 0
 },
 "logVersion" : 1,
 "logOffset" : 0,
 "baseFile" : 
"f0b917f5-607e-47c4-96a4-092b4668c436-0_254-10835-21844023_20231016122622692.parquet",
 "logFiles" : [ 
".f0b917f5-607e-47c4-96a4-092b4668c436-0_20231016122622692.log.1_0-29-5280" ],
 "recordsStats" : {
   "val" : null
 }
   } ]
 },
 "compacted" : false,
 "extraMetadata" : {
   "schema" : …
 },
 "operationType" : "UPSERT"

Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on PR #9975:
URL: https://github.com/apache/hudi/pull/9975#issuecomment-1791762149

   @ksmou Can you rebase with the latest master to fix the test falures?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on PR #9959:
URL: https://github.com/apache/hudi/pull/9959#issuecomment-1791761007

   Thanks for the contribution, I have reviewed and created a patch:
   
[7012.patch.zip](https://github.com/apache/hudi/files/13245958/7012.patch.zip)
   You can rebase with the latest master then apply the patch, the patch does 
not include your changes so there might be conflict if you apply it on your 
branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] "OutOfMemoryError: Requested array size exceeds VM limit" on data ingestion to MOR table [hudi]

2023-11-02 Thread via GitHub



mzheng-plaid commented on issue #9934:
URL: https://github.com/apache/hudi/issues/9934#issuecomment-1791751860

   Sorry, we also have some other Hudi options set as well that I missed, the 
important points are the metadata table is disabled and Hive sync is enabled.
   
   ```
   "hoodie.table.name": self.name,
   "hoodie.datasource.write.table.name": self.name,
   "hoodie.datasource.write.operation": "upsert",
   "hoodie.datasource.write.table.type": "MERGE_ON_READ",
   "hoodie.datasource.write.partitionpath.field": "dt:SIMPLE",
   "hoodie.datasource.write.recordkey.field": "id.value",
   "hoodie.datasource.write.precombine.field": "ts",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.CustomKeyGenerator",
   "hoodie.datasource.write.hive_style_partitioning": "true",
   # We disable the metadata table
   "hoodie.metadata.enable": "false",
   # We disable the bootstrap index because the table is not 
bootstrapped
   "hoodie.bootstrap.index.enable": "false",
   "hoodie.index.type": "BLOOM",
   # Hive sync is enabled
   "hoodie.datasource.hive_sync.enable": "true", 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on PR #9937:
URL: https://github.com/apache/hudi/pull/9937#issuecomment-1791749503

   @waitingF Can you rebase with the latest master to resolve the test 
failures, can you try in your local env that the compaction really works?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [WIP][HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on code in PR #9936:
URL: https://github.com/apache/hudi/pull/9936#discussion_r1380930087


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   ping me again if the PR is ready for reviewing.



##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   ping me again if the PR is ready for reviewing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] Flink CDC to HUDI cannot handle rowKind correctly [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on issue #9940:
URL: https://github.com/apache/hudi/issues/9940#issuecomment-1791746776

   Can you turn off the sink materializer ? See the doc here for how to 
operate: https://www.yuque.com/yuzhao-my9fz/kb/hzosbb?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791746415

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7028) Fix Spark Quick Start

2023-11-02 Thread Lin Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7028:
--
Description: 
Fix the bugs for Spark quick start when turning on file group reader and 
positional merging flag.

 

List some issues found so far:
 # When no positions are stored in the header, the read query failed. Idea 
behavior: use key based merging instead of failing.
 #  

  was:Fix the bugs for Spark quick start when turning on file group reader and 
positional merging flag.


> Fix Spark Quick Start
> -
>
> Key: HUDI-7028
> URL: https://issues.apache.org/jira/browse/HUDI-7028
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Fix the bugs for Spark quick start when turning on file group reader and 
> positional merging flag.
>  
> List some issues found so far:
>  # When no positions are stored in the header, the read query failed. Idea 
> behavior: use key based merging instead of failing.
>  #  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on code in PR #9946:
URL: https://github.com/apache/hudi/pull/9946#discussion_r1380926282


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java:
##
@@ -75,7 +78,12 @@ private HiveSyncContext(Properties props, HiveConf hiveConf) 
{
   public HiveSyncTool hiveSyncTool() {
 HiveSyncMode syncMode = 
HiveSyncMode.of(props.getProperty(HIVE_SYNC_MODE.key()));
 if (syncMode == HiveSyncMode.GLUE) {
-  return new AwsGlueCatalogSyncTool(props, hiveConf);
+  if (ReflectionUtils.hasConstructor(AWS_GLUE_CATALOG_SYNC_TOOL_CLASS,
+  new Class[] {Properties.class, 
org.apache.hadoop.conf.Configuration.class})) {

Review Comment:
   Do we need the if check? We can not fallback to hive sync tool if user 
expects GLUE.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7028) Fix Spark Quick Start

2023-11-02 Thread Lin Liu (Jira)

Lin Liu created HUDI-7028:
-

 Summary: Fix Spark Quick Start
 Key: HUDI-7028
 URL: https://issues.apache.org/jira/browse/HUDI-7028
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lin Liu
 Fix For: 1.0.0


Fix the bugs for Spark quick start when turning on file group reader and 
positional merging flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7011) a metric to indicate whether rollback has occurred in final compaction state

2023-11-02 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7011.

Resolution: Fixed

Fixed via master branch: 9599d0f6b3766261753865bb796d124b27479642

>  a metric to indicate whether rollback has occurred in final compaction state 
> --
>
> Key: HUDI-7011
> URL: https://issues.apache.org/jira/browse/HUDI-7011
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: jack Lei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> add a metric to indicate whether rollback has occurred in final compaction 
> state to warn people check flink job.
> currently, when flink job start async compaction on a mor table, the metrics 
> in org.apache.hudi.metrics.FlinkCompactionMetrics 
> will update including pendingCompactionCount,compactionDelay,compactionCost.
> However, when a compaction failed need a metric to
> tell user a specific instant whether the final compaction has occured 
> rollback.
> so attemp to add a metric named compactionFailedState in 
> org.apache.hudi.sink.compact.CompactionCommitSink to record the instance
> happend rollback, which also means the current compaction failed in current 
> time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7011) a metric to indicate whether rollback has occurred in final compaction state

2023-11-02 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7011:
-
Fix Version/s: 1.0.0

>  a metric to indicate whether rollback has occurred in final compaction state 
> --
>
> Key: HUDI-7011
> URL: https://issues.apache.org/jira/browse/HUDI-7011
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: jack Lei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> add a metric to indicate whether rollback has occurred in final compaction 
> state to warn people check flink job.
> currently, when flink job start async compaction on a mor table, the metrics 
> in org.apache.hudi.metrics.FlinkCompactionMetrics 
> will update including pendingCompactionCount,compactionDelay,compactionCost.
> However, when a compaction failed need a metric to
> tell user a specific instant whether the final compaction has occured 
> rollback.
> so attemp to add a metric named compactionFailedState in 
> org.apache.hudi.sink.compact.CompactionCommitSink to record the instance
> happend rollback, which also means the current compaction failed in current 
> time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-02 Thread via GitHub



danny0405 merged PR #9956:
URL: https://github.com/apache/hudi/pull/9956


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-02 Thread via GitHub



CTTY commented on code in PR #9717:
URL: https://github.com/apache/hudi/pull/9717#discussion_r1380918121


##
hudi-spark-datasource/hudi-spark/pom.xml:
##
@@ -245,6 +245,12 @@
   org.apache.parquet
   parquet-avro
 
+
+  org.apache.parquet
+  parquet-hadoop-bundle
+  ${parquet.version}
+  provided
+

Review Comment:
   Added parquet-hadoop-bundle to fix classpath issues
   ```
   java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.sql.execution.datasources.parquet.ParquetOptions$
   
   at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOptions.(ParquetOptions.scala:50)
   at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOptions.(ParquetOptions.scala:40)
   at 
org.apache.spark.sql.execution.datasources.parquet.Spark34LegacyHoodieParquetFileFormat.buildReaderWithPartitionValues(Spark34LegacyHoodieParquetFileFormat.scala:150)



##
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/BaseValidateDatasetNode.java:
##
@@ -244,10 +239,6 @@ private Dataset getInputDf(ExecutionContext context, 
SparkSession session,
   }
 
   private ExpressionEncoder getEncoder(StructType schema) {
-List attributes = 
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
-.map(Attribute::toAttribute).collect(Collectors.toList());
-return RowEncoder.apply(schema)
-
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
-SimpleAnalyzer$.MODULE$);
+return SparkAdapterSupport$.MODULE$.sparkAdapter().getEncoder(schema);

Review Comment:
   [SPARK-44531](https://github.com/apache/spark/pull/42134) Encoder inference 
moved elsewhere in Spark 3.5.0



##
hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_5ExtendedSqlAstBuilder.scala:
##
@@ -0,0 +1,3496 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.parser
+
+import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode}
+import org.antlr.v4.runtime.{ParserRuleContext, Token}
+import org.apache.hudi.spark.sql.parser.HoodieSqlBaseParser._
+import org.apache.hudi.spark.sql.parser.{HoodieSqlBaseBaseVisitor, 
HoodieSqlBaseParser}
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogStorageFormat}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.{First, Last}
+import 
org.apache.spark.sql.catalyst.parser.ParserUtils.{checkDuplicateClauses, 
checkDuplicateKeys, entry, escapedIdentifier, operationNotAllowed, source, 
string, stringWithoutUnescape, validate, withOrigin}
+import org.apache.spark.sql.catalyst.parser.{EnhancedLogicalPlan, 
ParseException, ParserInterface}

Review Comment:
   [SPARK-44333](https://github.com/apache/spark/pull/41890), 
EnhancedLogicalPlan moved to a different package



##
hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/HoodieSpark35CatalystExpressionUtils.scala:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.HoodieSparkTypeUtils.isCastPreservingOrdering
+import org.apache.spark.sql.catalyst.expressi

[jira] [Closed] (HUDI-6969) Add speed limit for stream read

2023-11-02 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6969.

Resolution: Fixed

Fixed via master branch: 1bb1fd1dd60c0635df0827c986b958955c2de682

> Add speed limit for stream read
> ---
>
> Key: HUDI-6969
> URL: https://issues.apache.org/jira/browse/HUDI-6969
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: zhuanshenbsj1
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1791740493

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-6969) Add speed limit for stream read

2023-11-02 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6969:
-
Fix Version/s: 1.0.0

> Add speed limit for stream read
> ---
>
> Key: HUDI-6969
> URL: https://issues.apache.org/jira/browse/HUDI-6969
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: zhuanshenbsj1
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-11-02 Thread via GitHub



danny0405 merged PR #9904:
URL: https://github.com/apache/hudi/pull/9904


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]

2023-11-02 Thread via GitHub



danny0405 commented on issue #9971:
URL: https://github.com/apache/hudi/issues/9971#issuecomment-1791736105

   Try to set up `index.type` as `BUCKET` instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-02 Thread via GitHub



CTTY commented on code in PR #9717:
URL: https://github.com/apache/hudi/pull/9717#discussion_r1380917088


##
hudi-common/src/test/java/org/apache/hudi/common/util/TestClusteringUtils.java:
##
@@ -107,6 +108,7 @@ public void testClusteringPlanMultipleInstants() throws 
Exception {
 
   // replacecommit.inflight doesn't have clustering plan.
   // Verify that getClusteringPlan fetches content from corresponding 
requested file.
+  @Disabled("Will fail due to avro issue AVRO-3789. This is fixed in avro 
1.11.3")

Review Comment:
   avro 1.11.2 can't compare empty map types



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-02 Thread via GitHub



CTTY commented on code in PR #9717:
URL: https://github.com/apache/hudi/pull/9717#discussion_r1380916039


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/DataFrameUtil.scala:
##
@@ -31,7 +33,7 @@ object DataFrameUtil {
*/
   def createFromInternalRows(sparkSession: SparkSession, schema:
   StructType, rdd: RDD[InternalRow]): DataFrame = {
-val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession)
+val logicalPlan = 
LogicalRDD(SparkAdapterSupport.sparkAdapter.toAttributes(schema), 
rdd)(sparkSession)

Review Comment:
   StructType.toAttributes was removed in Spark 3.5.0 by 
[SPARK-44353](https://github.com/apache/spark/pull/41925)
   
   Solution is to switch to use DataTypeUtils.toAttributes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-02 Thread via GitHub



CTTY commented on code in PR #9717:
URL: https://github.com/apache/hudi/pull/9717#discussion_r1380915008


##
.github/workflows/bot.yml:
##
@@ -284,29 +294,33 @@ jobs:
   matrix:
 include:
   - flinkProfile: 'flink1.17'
-sparkProfile: 'spark3.4'
-sparkRuntime: 'spark3.4.0'
-  - flinkProfile: 'flink1.17'
-sparkProfile: 'spark3.3'
-sparkRuntime: 'spark3.3.2'
-  - flinkProfile: 'flink1.16'
-sparkProfile: 'spark3.3'
-sparkRuntime: 'spark3.3.2'
-  - flinkProfile: 'flink1.15'
-sparkProfile: 'spark3.3'
-sparkRuntime: 'spark3.3.1'
-  - flinkProfile: 'flink1.14'
-sparkProfile: 'spark3.2'
-sparkRuntime: 'spark3.2.3'
-  - flinkProfile: 'flink1.13'
-sparkProfile: 'spark3.1'
-sparkRuntime: 'spark3.1.3'
-  - flinkProfile: 'flink1.14'
-sparkProfile: 'spark3.0'
-sparkRuntime: 'spark3.0.2'
-  - flinkProfile: 'flink1.13'
-sparkProfile: 'spark2.4'
-sparkRuntime: 'spark2.4.8'
+sparkProfile: 'spark3.5'
+sparkRuntime: 'spark3.5.0'
+#  - flinkProfile: 'flink1.17'
+#sparkProfile: 'spark3.4'
+#sparkRuntime: 'spark3.4.0'
+#  - flinkProfile: 'flink1.17'
+#sparkProfile: 'spark3.3'
+#sparkRuntime: 'spark3.3.2'
+#  - flinkProfile: 'flink1.16'
+#sparkProfile: 'spark3.3'
+#sparkRuntime: 'spark3.3.2'
+#  - flinkProfile: 'flink1.15'
+#sparkProfile: 'spark3.3'
+#sparkRuntime: 'spark3.3.1'
+#  - flinkProfile: 'flink1.14'
+#sparkProfile: 'spark3.2'
+#sparkRuntime: 'spark3.2.3'
+#  - flinkProfile: 'flink1.13'
+#sparkProfile: 'spark3.1'
+#sparkRuntime: 'spark3.1.3'
+#  - flinkProfile: 'flink1.14'
+#sparkProfile: 'spark3.0'
+#sparkRuntime: 'spark3.0.2'
+#  - flinkProfile: 'flink1.13'
+#sparkProfile: 'spark2.4'
+#sparkRuntime: 'spark2.4.8'
+

Review Comment:
   Using my personal docker image to test Spark 3.5 specifically, will revert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-1623][FOLLOW_UP] Fix test TestWaitBasedTimeGenerator & refine codes [hudi]

2023-11-02 Thread via GitHub



danny0405 merged PR #9972:
URL: https://github.com/apache/hudi/pull/9972


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

(hudi) branch master updated: [HUDI-1623][Tests] Fix test TestWaitBasedTimeGenerator (#9972)

2023-11-02 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new eeddac702ed [HUDI-1623][Tests] Fix test TestWaitBasedTimeGenerator 
(#9972)
eeddac702ed is described below

commit eeddac702ed3a97d6b08a699c506a1898de4af16
Author: Rex(Hui) An 
AuthorDate: Fri Nov 3 08:18:44 2023 +0800

[HUDI-1623][Tests] Fix test TestWaitBasedTimeGenerator (#9972)
---
 .../java/org/apache/hudi/config/DynamoDbBasedLockConfig.java |  2 +-
 .../main/java/org/apache/hudi/config/HoodieLockConfig.java   |  3 ++-
 .../hudi/client/transaction/lock/InProcessLockProvider.java  |  2 +-
 .../org/apache/hudi/common/config/LockConfiguration.java |  4 +---
 .../org/apache/hudi/common/table/timeline/TimeGenerator.java |  8 
 .../apache/hudi/common/table/timeline/TimeGeneratorBase.java |  2 +-
 .../hudi/common/table/timeline/WaitBasedTimeGenerator.java   | 12 ++--
 .../common/table/timeline/TestWaitBasedTimeGenerator.java|  6 --
 8 files changed, 20 insertions(+), 19 deletions(-)

diff --git 
a/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java 
b/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java
index 5639db02582..0e884a6797f 100644
--- a/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java
+++ b/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java
@@ -127,7 +127,7 @@ public class DynamoDbBasedLockConfig extends HoodieConfig {
 
   public static final ConfigProperty 
LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY = ConfigProperty
   .key(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY)
-  .defaultValue(LockConfiguration.DEFAULT_ACQUIRE_LOCK_WAIT_TIMEOUT_MS)
+  .defaultValue(LockConfiguration.DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS)
   .markAdvanced()
   .sinceVersion("0.10.0")
   .withDocumentation("Lock Acquire Wait Timeout in milliseconds");
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java
index b24aecf46c1..fa38da8f8ab 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java
@@ -36,6 +36,7 @@ import java.util.Properties;
 
 import static 
org.apache.hudi.common.config.LockConfiguration.DEFAULT_LOCK_ACQUIRE_NUM_RETRIES;
 import static 
org.apache.hudi.common.config.LockConfiguration.DEFAULT_LOCK_ACQUIRE_RETRY_WAIT_TIME_IN_MILLIS;
+import static 
org.apache.hudi.common.config.LockConfiguration.DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS;
 import static 
org.apache.hudi.common.config.LockConfiguration.DEFAULT_ZK_CONNECTION_TIMEOUT_MS;
 import static 
org.apache.hudi.common.config.LockConfiguration.DEFAULT_ZK_SESSION_TIMEOUT_MS;
 import static 
org.apache.hudi.common.config.LockConfiguration.FILESYSTEM_LOCK_EXPIRE_PROP_KEY;
@@ -106,7 +107,7 @@ public class HoodieLockConfig extends HoodieConfig {
 
   public static final ConfigProperty LOCK_ACQUIRE_WAIT_TIMEOUT_MS = 
ConfigProperty
   .key(LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY)
-  .defaultValue(60 * 1000)
+  .defaultValue(DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS)
   .markAdvanced()
   .sinceVersion("0.8.0")
   .withDocumentation("Timeout in ms, to wait on an individual lock 
acquire() call, at the lock provider.");
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java
 
b/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java
index c3437f91c8c..c2edb1864b0 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/client/transaction/lock/InProcessLockProvider.java
@@ -61,7 +61,7 @@ public class InProcessLockProvider implements 
LockProvider 
new ReentrantReadWriteLock());
 maxWaitTimeMillis = 
typedProperties.getLong(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY,
-LockConfiguration.DEFAULT_ACQUIRE_LOCK_WAIT_TIMEOUT_MS);
+LockConfiguration.DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS);
   }
 
   @Override
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java
index 9e652c64efe..1171dcf3fce 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java
@@ -43,7 +43,7 @@ public class LockConfiguration implements Serializable {
   public static final String LOCK_ACQUIRE_CLIENT_NUM_RETRIES_PROP_KEY = 
LOCK_PREFIX + "client.num_retrie

Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9872:
URL: https://github.com/apache/hudi/pull/9872#issuecomment-1791705100

   
   ## CI report:
   
   * 0d2dace457162b24edaabe2c83b8d6d0c310050a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20643)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-02 Thread via GitHub



hudi-bot commented on PR #9872:
URL: https://github.com/apache/hudi/pull/9872#issuecomment-1791439149

   
   ## CI report:
   
   * d2eced526259327f5abfb8ac92d8b37b7a4b12c2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20640)
 
   * 0d2dace457162b24edaabe2c83b8d6d0c310050a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20643)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 184 matches

Mail list logo