[jira] [Updated] (HUDI-7221) Move Hudi Option class from hudi-common to hudi-io module

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7221:

Description: This is to make classes in hudi-io module to also use Option 
class.

> Move Hudi Option class from hudi-common to hudi-io module
> -
>
> Key: HUDI-7221
> URL: https://issues.apache.org/jira/browse/HUDI-7221
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> This is to make classes in hudi-io module to also use Option class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7846.
---
Resolution: Fixed

> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> The following warning is thrown when doing maven parallel build with `mvn -T 
> 1C ...`
> {code:java}
> [WARNING] Enable debug to see precisely which goals are not marked as 
> thread-safe.
> [WARNING] *
> [WARNING] * Your build is requesting parallel execution, but this         *
> [WARNING] * project contains the following plugin(s) that have goals not  *
> [WARNING] * marked as thread-safe to support parallel execution.          *
> [WARNING] * While this /may/ work fine, please look for plugin updates    *
> [WARNING] * and/or request plugins be made thread-safe.                   *
> [WARNING] * If reporting an issue, report it against the plugin in        *
> [WARNING] * question, not against Apache Maven.                           *
> [WARNING] *
> [WARNING] The following plugins are not marked as thread-safe in 
> hudi-hadoop-mr:
> [WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7846:

Story Points: 0

> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> The following warning is thrown when doing maven parallel build with `mvn -T 
> 1C ...`
> {code:java}
> [WARNING] Enable debug to see precisely which goals are not marked as 
> thread-safe.
> [WARNING] *
> [WARNING] * Your build is requesting parallel execution, but this         *
> [WARNING] * project contains the following plugin(s) that have goals not  *
> [WARNING] * marked as thread-safe to support parallel execution.          *
> [WARNING] * While this /may/ work fine, please look for plugin updates    *
> [WARNING] * and/or request plugins be made thread-safe.                   *
> [WARNING] * If reporting an issue, report it against the plugin in        *
> [WARNING] * question, not against Apache Maven.                           *
> [WARNING] *
> [WARNING] The following plugins are not marked as thread-safe in 
> hudi-hadoop-mr:
> [WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6798:

Story Points: 10  (was: 3)

> Implement event-time-based merging mode in FileGroupReader
> --
>
> Key: HUDI-6798
> URL: https://issues.apache.org/jira/browse/HUDI-6798
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> To achieve this, we should add a new table config 
> {{hoodie.record.merge.mode}} to control the record merging mode and behavior 
> in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements 
> event-time ordering in it. The table config {{hoodie.record.merge.mode}} is 
> going to be the single config that determines how the record merging happens 
> in release 1.0 and beyond.
>  
> Three merging modes to define:
>  * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, 
> i.e., the record from later transaction overwrites the earlier record with 
> the same key. This corresponds to the behavior of existing payload class 
> {{{}OverwriteWithLatestAvroPayload{}}}.
>  * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge 
> records, i.e., the record with the larger event time overwrites the record 
> with the smaller event time on the same key, regardless of transaction time. 
> The event time or preCombine field needs to be specified by the user. This 
> corresponds to the behavior of existing payload class 
> {{{}DefaultHoodieRecordPayload{}}}.
>  * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a 
> user specifies a custom record merger strategy or payload class with Avro 
> record merger, this is going to be specified so the record merging follows 
> user-defined logic as before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-6798.
---
Resolution: Fixed

> Implement event-time-based merging mode in FileGroupReader
> --
>
> Key: HUDI-6798
> URL: https://issues.apache.org/jira/browse/HUDI-6798
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> To achieve this, we should add a new table config 
> {{hoodie.record.merge.mode}} to control the record merging mode and behavior 
> in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements 
> event-time ordering in it. The table config {{hoodie.record.merge.mode}} is 
> going to be the single config that determines how the record merging happens 
> in release 1.0 and beyond.
>  
> Three merging modes to define:
>  * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, 
> i.e., the record from later transaction overwrites the earlier record with 
> the same key. This corresponds to the behavior of existing payload class 
> {{{}OverwriteWithLatestAvroPayload{}}}.
>  * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge 
> records, i.e., the record with the larger event time overwrites the record 
> with the smaller event time on the same key, regardless of transaction time. 
> The event time or preCombine field needs to be specified by the user. This 
> corresponds to the behavior of existing payload class 
> {{{}DefaultHoodieRecordPayload{}}}.
>  * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a 
> user specifies a custom record merger strategy or payload class with Avro 
> record merger, this is going to be specified so the record merging follows 
> user-defined logic as before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6798:

Description: 
To achieve this, we should add a new table config {{hoodie.record.merge.mode}} 
to control the record merging mode and behavior in the new file group reader 
({{{}HoodieFileGroupReader{}}}) and implements event-time ordering in it. The 
table config {{hoodie.record.merge.mode}} is going to be the single config that 
determines how the record merging happens in release 1.0 and beyond.

 

Three merging modes to define:
 * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, 
i.e., the record from later transaction overwrites the earlier record with the 
same key. This corresponds to the behavior of existing payload class 
{{{}OverwriteWithLatestAvroPayload{}}}.
 * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge 
records, i.e., the record with the larger event time overwrites the record with 
the smaller event time on the same key, regardless of transaction time. The 
event time or preCombine field needs to be specified by the user. This 
corresponds to the behavior of existing payload class 
{{{}DefaultHoodieRecordPayload{}}}.
 * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a 
user specifies a custom record merger strategy or payload class with Avro 
record merger, this is going to be specified so the record merging follows 
user-defined logic as before.

  was:To achieve this, we should add a new table config 
{{hoodie.record.merge.mode}} to control the record merging mode and behavior in 
the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements 
event-time ordering in it. The table config {{hoodie.record.merge.mode}} is 
going to be the single config that determines how the record merging happens in 
release 1.0 and beyond.


> Implement event-time-based merging mode in FileGroupReader
> --
>
> Key: HUDI-6798
> URL: https://issues.apache.org/jira/browse/HUDI-6798
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> To achieve this, we should add a new table config 
> {{hoodie.record.merge.mode}} to control the record merging mode and behavior 
> in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements 
> event-time ordering in it. The table config {{hoodie.record.merge.mode}} is 
> going to be the single config that determines how the record merging happens 
> in release 1.0 and beyond.
>  
> Three merging modes to define:
>  * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, 
> i.e., the record from later transaction overwrites the earlier record with 
> the same key. This corresponds to the behavior of existing payload class 
> {{{}OverwriteWithLatestAvroPayload{}}}.
>  * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge 
> records, i.e., the record with the larger event time overwrites the record 
> with the smaller event time on the same key, regardless of transaction time. 
> The event time or preCombine field needs to be specified by the user. This 
> corresponds to the behavior of existing payload class 
> {{{}DefaultHoodieRecordPayload{}}}.
>  * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a 
> user specifies a custom record merger strategy or payload class with Avro 
> record merger, this is going to be specified so the record merging follows 
> user-defined logic as before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6798:

Description: To achieve this, we should add a new table config 
{{hoodie.record.merge.mode}} to control the record merging mode and behavior in 
the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements 
event-time ordering in it. The table config {{hoodie.record.merge.mode}} is 
going to be the single config that determines how the record merging happens in 
release 1.0 and beyond.

> Implement event-time-based merging mode in FileGroupReader
> --
>
> Key: HUDI-6798
> URL: https://issues.apache.org/jira/browse/HUDI-6798
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> To achieve this, we should add a new table config 
> {{hoodie.record.merge.mode}} to control the record merging mode and behavior 
> in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements 
> event-time ordering in it. The table config {{hoodie.record.merge.mode}} is 
> going to be the single config that determines how the record merging happens 
> in release 1.0 and beyond.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7045) Fix new file format and reader for schema evolution

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7045.
---
Resolution: Fixed

> Fix new file format and reader for schema evolution
> ---
>
> Key: HUDI-7045
> URL: https://issues.apache.org/jira/browse/HUDI-7045
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> When this is implemented, parquet readers should not be created in 
> HoodieFileGroupReaderBasedParquetFileFormat. Additionally, we can 
> uncomment/add the code from this commit: 
> [https://github.com/apache/hudi/pull/10137/commits/b0b711e0c355320da652fa7f2d8669539873d4d6]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [MINOR] Moving to 0.16.0-SNAPSHOT on branch-0.x [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11425:
URL: https://github.com/apache/hudi/pull/11425#issuecomment-2157467585

   
   ## CI report:
   
   * 47b890aa81f7f92f092da13ef7a7999f579f5d03 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7853) Fix missing serDe properties post migration from hiveSync to glueSync

2024-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7853:
-
Labels: pull-request-available  (was: )

> Fix missing serDe properties post migration from hiveSync to glueSync
> -
>
> Key: HUDI-7853
> URL: https://issues.apache.org/jira/browse/HUDI-7853
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Prathit Malik
>Assignee: Prathit Malik
>Priority: Major
>  Labels: pull-request-available
>
> More info : [https://github.com/apache/hudi/issues/11397]
>  
> After migration to 0.13.1, hudi table path is missing from serde properties 
> due to which when reading from spark below error is thrown
> - org.apache.hudi.exception.HoodieException: 'path' or 'Key: 
> 'hoodie.datasource.read.paths' , default: null description: Comma separated 
> list of file paths to read within a Hudi table. since version: version is not 
> defined deprecated after: version is not defined)' or both must be specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7852] Constrain the comparison of different types of ordering values to limited cases [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11424:
URL: https://github.com/apache/hudi/pull/11424#issuecomment-2157467513

   
   ## CI report:
   
   * f90504d5f8ef99b4ea25dd5b05127c54d3f4252e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24328)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11423:
URL: https://github.com/apache/hudi/pull/11423#issuecomment-2157467452

   
   ## CI report:
   
   * 19caeb8d2270645aa6d0ddbdeaa08b31755d974b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24327)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7853] Fix missing serDe properties post migration from hiveSync to glueSync [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11404:
URL: https://github.com/apache/hudi/pull/11404#issuecomment-2157467328

   
   ## CI report:
   
   * fcafb8766c4b27557d1c40398ce28d8de8aec724 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24243)
 
   * 9dcca40f6488ef301b93176293f0f46dcb8ae017 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-7847) Infer record merge mode during table upgrade

2024-06-09 Thread Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853562#comment-17853562
 ] 

Ethan Guo commented on HUDI-7847:
-

Note that HUDI-6798 has added the inference logic on the new record merge mode 
table config in HoodieTableMetaClient#inferRecordMergeMode (see 
[https://github.com/apache/hudi/pull/9894).]  We can reuse the same logic 
during table upgrade from table version 7 to 8 (SevenToEightUpgradeHandler).

> Infer record merge mode during table upgrade
> 
>
> Key: HUDI-7847
> URL: https://issues.apache.org/jira/browse/HUDI-7847
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Geser Dugarov
>Priority: Major
> Fix For: 1.0.0
>
>
> Record merge mode is required to dictate the merging behavior in release 1.x, 
> playing the same role as the payload class config in the release 0.x.  During 
> table upgrade, we need to infer the record merge mode based on the payload 
> class so it's correctly set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7846:

Sprint: 2024/06/03-16

> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> The following warning is thrown when doing maven parallel build with `mvn -T 
> 1C ...`
> {code:java}
> [WARNING] Enable debug to see precisely which goals are not marked as 
> thread-safe.
> [WARNING] *
> [WARNING] * Your build is requesting parallel execution, but this         *
> [WARNING] * project contains the following plugin(s) that have goals not  *
> [WARNING] * marked as thread-safe to support parallel execution.          *
> [WARNING] * While this /may/ work fine, please look for plugin updates    *
> [WARNING] * and/or request plugins be made thread-safe.                   *
> [WARNING] * If reporting an issue, report it against the plugin in        *
> [WARNING] * question, not against Apache Maven.                           *
> [WARNING] *
> [WARNING] The following plugins are not marked as thread-safe in 
> hudi-hadoop-mr:
> [WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7853) Fix missing serDe properties post migration from hiveSync to glueSync

2024-06-09 Thread Prathit Malik (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathit Malik updated HUDI-7853:

Description: 
More info : [https://github.com/apache/hudi/issues/11397]

 

After migration to 0.13.1, hudi table path is missing from serde properties due 
to which when reading from spark below error is thrown

- org.apache.hudi.exception.HoodieException: 'path' or 'Key: 
'hoodie.datasource.read.paths' , default: null description: Comma separated 
list of file paths to read within a Hudi table. since version: version is not 
defined deprecated after: version is not defined)' or both must be specified.

  was:
More info : [https://github.com/apache/hudi/issues/11397]

 

After migration to 0.13.1, hudi table path is missing from serde properties due 
to which when reading from spark below error is thrown

```org.apache.hudi.exception.HoodieException: 'path' or 'Key: 
'hoodie.datasource.read.paths' , default: null description: Comma separated 
list of file paths to read within a Hudi table. since version: version is not 
defined deprecated after: version is not defined)' or both must be specified.```


> Fix missing serDe properties post migration from hiveSync to glueSync
> -
>
> Key: HUDI-7853
> URL: https://issues.apache.org/jira/browse/HUDI-7853
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Prathit Malik
>Assignee: Prathit Malik
>Priority: Major
>
> More info : [https://github.com/apache/hudi/issues/11397]
>  
> After migration to 0.13.1, hudi table path is missing from serde properties 
> due to which when reading from spark below error is thrown
> - org.apache.hudi.exception.HoodieException: 'path' or 'Key: 
> 'hoodie.datasource.read.paths' , default: null description: Comma separated 
> list of file paths to read within a Hudi table. since version: version is not 
> defined deprecated after: version is not defined)' or both must be specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7852] Constrain the comparison of different types of ordering values to limited cases [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11424:
URL: https://github.com/apache/hudi/pull/11424#issuecomment-2157455424

   
   ## CI report:
   
   * f90504d5f8ef99b4ea25dd5b05127c54d3f4252e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11423:
URL: https://github.com/apache/hudi/pull/11423#issuecomment-2157455359

   
   ## CI report:
   
   * 19caeb8d2270645aa6d0ddbdeaa08b31755d974b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7853) Fix missing serDe properties post migration from hiveSync to glueSync

2024-06-09 Thread Prathit Malik (Jira)
Prathit Malik created HUDI-7853:
---

 Summary: Fix missing serDe properties post migration from hiveSync 
to glueSync
 Key: HUDI-7853
 URL: https://issues.apache.org/jira/browse/HUDI-7853
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Prathit Malik
Assignee: Prathit Malik


More info : [https://github.com/apache/hudi/issues/11397]

 

After migration to 0.13.1, hudi table path is missing from serde properties due 
to which when reading from spark below error is thrown

```org.apache.hudi.exception.HoodieException: 'path' or 'Key: 
'hoodie.datasource.read.paths' , default: null description: Comma separated 
list of file paths to read within a Hudi table. since version: version is not 
defined deprecated after: version is not defined)' or both must be specified.```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Sprint: 2024/06/03-16

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Status: Patch Available  (was: In Progress)

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [MINOR] Moving to 0.16.0-SNAPSHOT on branch-0.x [hudi]

2024-06-09 Thread via GitHub


yihua opened a new pull request, #11425:
URL: https://github.com/apache/hudi/pull/11425

   ### Change Logs
   
   This PR moves branch-0.x to version 0.16.0-SNAPSHOT.
   
   ### Impact
   
   Moves to the next 0.x version.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11422:
URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157444537

   
   ## CI report:
   
   * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24326)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7852:
-
Labels: pull-request-available  (was: )

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7852] Constrain the comparison of different types of ordering values to limited cases [hudi]

2024-06-09 Thread via GitHub


yihua opened a new pull request, #11424:
URL: https://github.com/apache/hudi/pull/11424

   ### Change Logs
   
   `HoodieBaseFileGroupRecordBuffer#compareTo` compares the numbers by casting 
them to the long value, which may not be safe for Float and Double.  This PR 
limits the allowed cases of ordering value comparison to avoid wrong results.
   
   ### Impact
   
   Makes ordering value comparison safe.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov updated HUDI-7849:

Status: In Progress  (was: Open)

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Below shows the top long-running tests in the job "UT flink & FT common & 
> flink & spark-client & hudi-spark" in Azure CI.  The time running 
> testFiltersInFileFormat should be reduced.
> {code:java}
> /usr/bin/bash --noprofile --norc 
> /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
> grep: */target/surefire-reports/*.xml: No such file or directory
> 366.474 boolean) [2] false(testFiltersInFileFormat
> 223.221 boolean) [1] true(testFiltersInFileFormat
> 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
> 65.48 boolean) [2] true(testDeletePartitionAndArchive
> 56.558 boolean) [1] false(testDeletePartitionAndArchive{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7849:
-
Labels: pull-request-available  (was: )

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Below shows the top long-running tests in the job "UT flink & FT common & 
> flink & spark-client & hudi-spark" in Azure CI.  The time running 
> testFiltersInFileFormat should be reduced.
> {code:java}
> /usr/bin/bash --noprofile --norc 
> /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
> grep: */target/surefire-reports/*.xml: No such file or directory
> 366.474 boolean) [2] false(testFiltersInFileFormat
> 223.221 boolean) [1] true(testFiltersInFileFormat
> 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
> 65.48 boolean) [2] true(testDeletePartitionAndArchive
> 56.558 boolean) [1] false(testDeletePartitionAndArchive{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]

2024-06-09 Thread via GitHub


wombatu-kun opened a new pull request, #11423:
URL: https://github.com/apache/hudi/pull/11423

   ### Change Logs
   
   - reduced number of inserts/updates (from 10/2 to 100/20);  
   - added lib `spark-fast-tests` (in test scope) and used 
`assertSmallDatasetEquality` for comparing dataframes.  
   
   ### Impact
   
   While running testFiltersInFileFormat locally:  
   before: [true] - 1,26min, [false] - 1,01min;  
   after: [true] - 33sec, [false] - 19 sec.  
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Story Points: 1

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Status: In Progress  (was: Open)

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11422:
URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157229627

   
   ## CI report:
   
   * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24326)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11422:
URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157189600

   
   ## CI report:
   
   * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632554676


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type
+//  so this util with the number comparison is not necessary.
+try {
+  return o1.compareTo(o2);
+} catch (ClassCastException e) {
+  if (o1 instanceof Number && o2 instanceof Number) {
+Long o1LongValue = ((Number) o1).longValue();
+Long o2LongValue = ((Number) o2).longValue();
+return o1LongValue.compareTo(o2LongValue);

Review Comment:
   HUDI-7852 to track.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Description: HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers 
by casting them to the long value, which may not be safe for Float and Double.  
We should limit the allowed cases to avoid wrong results.

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Fix Version/s: 1.0.0

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7852:
---

Assignee: Ethan Guo

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7852:
---

 Summary: Constrain the comparison of different types of ordering 
values to limited cases
 Key: HUDI-7852
 URL: https://issues.apache.org/jira/browse/HUDI-7852
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632553603


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type
+//  so this util with the number comparison is not necessary.
+try {
+  return o1.compareTo(o2);
+} catch (ClassCastException e) {
+  if (o1 instanceof Number && o2 instanceof Number) {
+Long o1LongValue = ((Number) o1).longValue();
+Long o2LongValue = ((Number) o2).longValue();
+return o1LongValue.compareTo(o2LongValue);

Review Comment:
   We can constrain the comparison to Long and Integer only to limit the 
possibility of wrong results.  I'll create a follow-up PR to fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632551875


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type

Review Comment:
   Yes, based on the test cases this only happens when the ordering field value 
is deserialized from the delete records.  We need to check if the existing 
Avro-based merging logic has done schema handling to make this work (which may 
also incur additional overhead).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


yihua commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157165840

   CI is green.
   https://github.com/apache/hudi/assets/2497195/6d8f4fa9-3e64-4914-9a46-05e8783cd458";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7851) Fix java doc of DeltaWriteProfile

2024-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7851:
-
Labels: pull-request-available  (was: )

> Fix java doc of DeltaWriteProfile
> -
>
> Key: HUDI-7851
> URL: https://issues.apache.org/jira/browse/HUDI-7851
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]

2024-06-09 Thread via GitHub


usberkeley opened a new pull request, #11422:
URL: https://github.com/apache/hudi/pull/11422

   ### Change Logs
   
   Fix java doc of DeltaWriteProfile
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [1] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [1] Change Logs and Impact were stated clearly
   - [1] Adequate tests were added if applicable
   - [1] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader (#9894)

2024-06-09 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c0576131759 [HUDI-6798] Add record merging mode and implement 
event-time ordering in the new file group reader (#9894)
c0576131759 is described below

commit c05761317596585a3c0c3cc69a34b4407843351c
Author: Y Ethan Guo 
AuthorDate: Sun Jun 9 20:48:09 2024 -0700

[HUDI-6798] Add record merging mode and implement event-time ordering in 
the new file group reader (#9894)

This PR adds a new table config `hoodie.record.merge.mode` to control the
record merging mode and behavior in the new file group reader
(`HoodieFileGroupReader`) and implements event-time ordering in it.
The config `hoodie.record.merge.mode` is going to be the single config that
determines how the record merging happens in release 1.0 and beyond.

-

Co-authored-by: Sagar Sumit 
---
 .../hudi/client/TestTableSchemaEvolution.java  |   3 +
 .../hudi/common/config/HoodieCommonConfig.java |   3 +
 .../apache/hudi/common/config/RecordMergeMode.java |  36 
 .../hudi/common/table/HoodieTableConfig.java   |  13 +-
 .../hudi/common/table/HoodieTableMetaClient.java   | 114 ++-
 .../table/log/BaseHoodieLogRecordReader.java   |   7 +
 .../table/log/HoodieMergedLogRecordReader.java |  13 +-
 .../read/HoodieBaseFileGroupRecordBuffer.java  | 209 -
 .../common/table/read/HoodieFileGroupReader.java   |  26 ++-
 .../table/read/TestHoodieFileGroupReaderBase.java  |  77 ++--
 .../common/table/TestHoodieTableMetaClient.java| 144 ++
 .../hudi/common/table/read/TestCustomMerger.java   |   4 +
 .../common/table/read/TestEventTimeMerging.java|   4 +
 ...stHoodiePositionBasedFileGroupRecordBuffer.java |   6 +-
 .../read/TestHoodieFileGroupReaderOnSpark.scala|  11 +-
 15 files changed, 588 insertions(+), 82 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
index f5fa70c6668..496b42c13d6 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
@@ -20,6 +20,7 @@ package org.apache.hudi.client;
 
 import org.apache.hudi.avro.AvroSchemaUtils;
 import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.config.RecordMergeMode;
 import org.apache.hudi.common.model.HoodieAvroRecord;
 import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieRecord;
@@ -48,6 +49,7 @@ import java.io.IOException;
 import java.util.List;
 import java.util.stream.Collectors;
 
+import static 
org.apache.hudi.common.config.HoodieCommonConfig.RECORD_MERGE_MODE;
 import static 
org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion.VERSION_1;
 import static 
org.apache.hudi.common.testutils.HoodieTestDataGenerator.EXTRA_TYPE_SCHEMA;
 import static 
org.apache.hudi.common.testutils.HoodieTestDataGenerator.FARE_NESTED_SCHEMA;
@@ -165,6 +167,7 @@ public class TestTableSchemaEvolution extends 
HoodieClientTestBase {
 HoodieTableMetaClient.withPropertyBuilder()
 .fromMetaClient(metaClient)
 .setTableType(HoodieTableType.MERGE_ON_READ)
+
.setRecordMergeMode(RecordMergeMode.valueOf(RECORD_MERGE_MODE.defaultValue()))
 .setTimelineLayoutVersion(VERSION_1)
 .initTable(metaClient.getStorageConf().newInstance(), 
metaClient.getBasePath());
 
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
index 1a4c2e31780..c96b07ee4f0 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
@@ -18,6 +18,7 @@
 
 package org.apache.hudi.common.config;
 
+import org.apache.hudi.common.table.HoodieTableConfig;
 import 
org.apache.hudi.common.table.timeline.TimelineUtils.HollowCommitHandling;
 import org.apache.hudi.common.util.collection.ExternalSpillableMap;
 
@@ -81,6 +82,8 @@ public class HoodieCommonConfig extends HoodieConfig {
   + " operation will fail schema compatibility check. Set this option 
to true will make the missing "
   + " column be filled with null values to successfully complete the 
write operation.");
 
+  public static final ConfigProperty RECORD_MERGE_MODE = 
HoodieTableConfig.RECORD_MERGE_MODE;
+
   public static final ConfigProperty 
SPILLABLE_DISK_MAP_TYPE = ConfigProperty
   .key("hoodie.common.spillable.diskmap.type")
  

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


codope merged PR #9894:
URL: https://github.com/apache/hudi/pull/9894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7851) Fix java doc of DeltaWriteProfile

2024-06-09 Thread bradley (Jira)
bradley created HUDI-7851:
-

 Summary: Fix java doc of DeltaWriteProfile
 Key: HUDI-7851
 URL: https://issues.apache.org/jira/browse/HUDI-7851
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: bradley






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157141857

   
   ## CI report:
   
   * 3a1ec4524a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157136586

   
   ## CI report:
   
   * ca01c48cd352583dbf024006de57c9f6827b237b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24324)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24323)
 
   * 3a1ec4524a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


codope commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632524037


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type

Review Comment:
   does this happen only for delete records?



##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type
+//  so this util with the number comparison is not necessary.
+try {
+  return o1.compareTo(o2);
+} catch (ClassCastException e) {
+  if (o1 instanceof Number && o2 instanceof Number) {
+Long o1LongValue = ((Number) o1).longValue();
+Long o2LongValue = ((Number) o2).longValue();
+return o1LongValue.compareTo(o2LongValue);

Review Comment:
   can possibly lead to wrong result with float/double?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub


the-other-tim-brown commented on code in PR #11381:
URL: https://github.com/apache/hudi/pull/11381#discussion_r1632525375


##
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java:
##
@@ -113,6 +120,21 @@ public static InternalSchema reconcileSchema(Schema 
incomingSchema, InternalSche
   typeChange.updateColumnType(col, inComingInternalSchema.findType(col));
 });
 
+// mark columns missing from incoming schema as nullable
+Set visited = new HashSet<>();
+diffFromOldSchema.stream()
+// ignore meta fields
+.filter(col -> !META_FIELD_NAMES.contains(col))
+.sorted()
+.forEach(col -> {
+  // if parent is marked as nullable, only update the parent and not 
all the missing children field
+  String parent = TableChangesHelper.getParentName(col);
+  if (!visited.contains(parent)) {
+typeChange.updateColumnNullability(col, true);
+  }
+  visited.add(col);
+});

Review Comment:
   @nsivabalan I've updated the PR to include the boolean



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7849:
---

Assignee: Vova Kolmakov

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
> Fix For: 1.0.0
>
>
> Below shows the top long-running tests in the job "UT flink & FT common & 
> flink & spark-client & hudi-spark" in Azure CI.  The time running 
> testFiltersInFileFormat should be reduced.
> {code:java}
> /usr/bin/bash --noprofile --norc 
> /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
> grep: */target/surefire-reports/*.xml: No such file or directory
> 366.474 boolean) [2] false(testFiltersInFileFormat
> 223.221 boolean) [1] true(testFiltersInFileFormat
> 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
> 65.48 boolean) [2] true(testDeletePartitionAndArchive
> 56.558 boolean) [1] false(testDeletePartitionAndArchive{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7847) Infer record merge mode during table upgrade

2024-06-09 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov reassigned HUDI-7847:
---

Assignee: Geser Dugarov

> Infer record merge mode during table upgrade
> 
>
> Key: HUDI-7847
> URL: https://issues.apache.org/jira/browse/HUDI-7847
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Geser Dugarov
>Priority: Major
> Fix For: 1.0.0
>
>
> Record merge mode is required to dictate the merging behavior in release 1.x, 
> playing the same role as the payload class config in the release 0.x.  During 
> table upgrade, we need to infer the record merge mode based on the payload 
> class so it's correctly set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7838) Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and AbstractHoodieLogRecordReader

2024-06-09 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7838:
---

Assignee: Vova Kolmakov

> Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and  
> AbstractHoodieLogRecordReader
> ---
>
> Key: HUDI-7838
> URL: https://issues.apache.org/jira/browse/HUDI-7838
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core
>Reporter: Jonathan Vexler
>Assignee: Vova Kolmakov
>Priority: Major
>
> hoodie.schema.cache.enable should be used to decide if we want to use the 
> schema cache. Currently it is hardcoded to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11381:
URL: https://github.com/apache/hudi/pull/11381#issuecomment-2157083778

   
   ## CI report:
   
   * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24320)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov reassigned HUDI-7850:
---

Assignee: Geser Dugarov

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Geser Dugarov
>Priority: Major
> Fix For: 1.0.0
>
>
> Right now, "hoodie.record.merge.mode" is optional during writes as it is 
> inferred from the payload class name, payload type, and the record merger 
> strategy during the creation of the table properties.  We should make this 
> config mandatory in release 1.0 and make other merge configs optional to 
> simplify the configuration experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157082849

   
   ## CI report:
   
   * ca01c48cd352583dbf024006de57c9f6827b237b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24324)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24323)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-7839) Can not find props file when using HoodieDeltaStreamer with Hudi 0.14.1

2024-06-09 Thread Vova Kolmakov (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853545#comment-17853545
 ] 

Vova Kolmakov commented on HUDI-7839:
-

Fixed via master branch: 9f9064761bac766cc7884027432568c06817ddd7

> Can not find props file when using HoodieDeltaStreamer with Hudi 0.14.1
> ---
>
> Key: HUDI-7839
> URL: https://issues.apache.org/jira/browse/HUDI-7839
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Xiaoxuan Li
>Assignee: Vova Kolmakov
>Priority: Major
>
> When use HoodieDeltaStreamer with Hudi 0.14.1, the following error was throw
> {noformat}
> Cannot read properties from dfs from file 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.properties{noformat}
>  
> It works fine on Hudi 0.14.0. It might related to a new change bring in 
> 0.14.1 -> [https://github.com/apache/hudi/pull/9913]
>  
> error log:
> {code:java}
> 24/06/06 22:42:09 INFO Client:client token: N/Adiagnostics: User class threw 
> exception: org.apache.hudi.exception.HoodieIOException: Cannot read 
> properties from dfs from file 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.propertiesat
>  
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166)at
>  
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85)at
>  org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232)at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404)at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850)at
>  
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)at
>  org.apache.hudi.common.util.Option.ifPresent(Option.java:97)at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)at
>  
> org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)at
>  java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)at
>  
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at
>  java.base/java.lang.reflect.Method.invoke(Method.java:568)at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)Caused
>  by: java.io.FileNotFoundException: File 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.properties
>  does not existat 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:968)at
>  
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1289)at
>  
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:958)at
>  
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:472)at
>  
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:188)at
>  org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:581)at 
> org.apache.hadoop.fs.FileSystem.open(FileSystem.java:1004)at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161)...
>  18 more
> ApplicationMaster host: ip-172-31-75-55.ec2.internalApplicationMaster RPC 
> port: 43905queue: defaultstart time: 1717713711465final status: 
> FAILEDtracking URL: 
> http://ip-172-31-69-122.ec2.internal:20888/proxy/application_1717399456895_0009/user:
>  hadoop24/06/06 22:42:09 ERROR Client: Application diagnostics message: User 
> class threw exception: org.apache.hudi.exception.HoodieIOException: Cannot 
> read properties from dfs from file 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.propertiesat
>  
> org.

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157034573

   
   ## CI report:
   
   * 7b6c9d86accaf976f4db0185fa1a203c82f04446 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24322)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24321)
 
   * ca01c48cd352583dbf024006de57c9f6827b237b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157025978

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318)
 
   * 7b6c9d86accaf976f4db0185fa1a203c82f04446 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Fix Version/s: 1.0.0

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Right now, "hoodie.record.merge.mode" is optional during writes as it is 
> inferred from the payload class name, payload type, and the record merger 
> strategy during the creation of the table properties.  We should make this 
> config mandatory in release 1.0 and make other merge configs optional to 
> simplify the configuration experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Description: Right now, "hoodie.record.merge.mode" is optional during 
writes as it is inferred from the payload class name, payload type, and the 
record merger strategy during the creation of the table properties.  We should 
make this config mandatory in release 1.0 and make other merge configs optional 
to simplify the configuration experience.  (was: Right now )

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>
> Right now, "hoodie.record.merge.mode" is optional during writes as it is 
> inferred from the payload class name, payload type, and the record merger 
> strategy during the creation of the table properties.  We should make this 
> config mandatory in release 1.0 and make other merge configs optional to 
> simplify the configuration experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7850) Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Description: Right now 

> Makes `hoodie.record.merge.mode` mandatory upon creating the table and first 
> write
> --
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>
> Right now 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7850) Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7850:
---

 Summary: Makes `hoodie.record.merge.mode` mandatory upon creating 
the table and first write
 Key: HUDI-7850
 URL: https://issues.apache.org/jira/browse/HUDI-7850
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Summary: Makes hoodie.record.merge.mode mandatory upon creating the table 
and first write  (was: Makes `hoodie.record.merge.mode` mandatory upon creating 
the table and first write)

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>
> Right now 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157018634

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7842) Update docs on the new record merge mode config

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7842:

Summary: Update docs on the new record merge mode config  (was: Update docs 
with the new record merge mode config)

> Update docs on the new record merge mode config
> ---
>
> Key: HUDI-7842
> URL: https://issues.apache.org/jira/browse/HUDI-7842
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> We should educate users on the new record merge mode config introduced by 
> HUDI-6798 that simplifies configs controlling the merging behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7849:

Description: 
Below shows the top long-running tests in the job "UT flink & FT common & flink 
& spark-client & hudi-spark" in Azure CI.  The time running 
testFiltersInFileFormat should be reduced.
{code:java}
/usr/bin/bash --noprofile --norc 
/home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
grep: */target/surefire-reports/*.xml: No such file or directory
366.474 boolean) [2] false(testFiltersInFileFormat
223.221 boolean) [1] true(testFiltersInFileFormat
80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
65.48 boolean) [2] true(testDeletePartitionAndArchive
56.558 boolean) [1] false(testDeletePartitionAndArchive{code}

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Below shows the top long-running tests in the job "UT flink & FT common & 
> flink & spark-client & hudi-spark" in Azure CI.  The time running 
> testFiltersInFileFormat should be reduced.
> {code:java}
> /usr/bin/bash --noprofile --norc 
> /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
> grep: */target/surefire-reports/*.xml: No such file or directory
> 366.474 boolean) [2] false(testFiltersInFileFormat
> 223.221 boolean) [1] true(testFiltersInFileFormat
> 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
> 65.48 boolean) [2] true(testDeletePartitionAndArchive
> 56.558 boolean) [1] false(testDeletePartitionAndArchive{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]

2024-06-09 Thread via GitHub


danny0405 commented on issue #11419:
URL: https://github.com/apache/hudi/issues/11419#issuecomment-2156988506

   hmm, would you mind  to fire a fix for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7849:

Fix Version/s: 1.0.0

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7849:
---

 Summary: Reduce time spent on running testFiltersInFileFormat
 Key: HUDI-7849
 URL: https://issues.apache.org/jira/browse/HUDI-7849
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11381:
URL: https://github.com/apache/hudi/pull/11381#issuecomment-2156963475

   
   ## CI report:
   
   * 7ac5620ea218b34184ba918f6197339f2f695eb9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24317)
 
   * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24320)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #11381:
URL: https://github.com/apache/hudi/pull/11381#issuecomment-2156937303

   
   ## CI report:
   
   * 7ac5620ea218b34184ba918f6197339f2f695eb9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24317)
 
   * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156933007

   
   ## CI report:
   
   * 3a1ec4524a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156929070

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24319)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318)
 
   * 3a1ec4524a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7759) Remove Hadoop dependencies in hudi-common module

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7759.
---
Resolution: Fixed

> Remove Hadoop dependencies in hudi-common module
> 
>
> Key: HUDI-7759
> URL: https://issues.apache.org/jira/browse/HUDI-7759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7752) Abstract serializeRecords for log writing

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7752.
---
Resolution: Fixed

> Abstract serializeRecords for log writing
> -
>
> Key: HUDI-7752
> URL: https://issues.apache.org/jira/browse/HUDI-7752
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7754) Remove AvroWriteSupport and ParquetReaderIterator from hudi-common

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7754.
---
Resolution: Fixed

> Remove AvroWriteSupport and ParquetReaderIterator from hudi-common
> --
>
> Key: HUDI-7754
> URL: https://issues.apache.org/jira/browse/HUDI-7754
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> 2 classes with hadoop deps that can be moved to hadoop common and aren't 
> covered by other prs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7750) Move HoodieLogFormatWriter class to hoodie-hadoop-common module

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7750.
---
Resolution: Fixed

> Move HoodieLogFormatWriter class to hoodie-hadoop-common module
> ---
>
> Key: HUDI-7750
> URL: https://issues.apache.org/jira/browse/HUDI-7750
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156876724

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24319)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-4732) Leverage Schema Registry for reading proto messages from kafka

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-4732.
---
Resolution: Fixed

> Leverage Schema Registry for reading proto messages from kafka
> --
>
> Key: HUDI-4732
> URL: https://issues.apache.org/jira/browse/HUDI-4732
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> If you use the Confluent Schema Registry, they provide a way to deserialize 
> the kafka message value without providing the protobuf class name. The first 
> cut of ProtoKafkaSource requires users to specify a classname but we want to 
> allow users the flexibility to use this other method of deserializing the 
> message.
>  
> Docs: 
> https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-protobuf.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7739) Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7739:

Fix Version/s: 0.15.0

> Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy
> --
>
> Key: HUDI-7739
> URL: https://issues.apache.org/jira/browse/HUDI-7739
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7699) Support STS external ids and configurable session names in the AWS StsAssumeRoleCredentialsProvider

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7699:

Fix Version/s: 0.15.0

> Support STS external ids and configurable session names in the AWS 
> StsAssumeRoleCredentialsProvider
> ---
>
> Key: HUDI-7699
> URL: https://issues.apache.org/jira/browse/HUDI-7699
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ian Streeter
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> [HUDI-6695|https://issues.apache.org/jira/browse/HUDI-6695] added a AWS 
> credentials provider to support assuming a role when syncing to Glue.
> 
> We use Hudi in a multi-tenant environment, and our customers give us 
> delegated access to their Glue catalog.  In this multi-tenant setup it is 
> important to use [an external 
> ID|https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html]
>  to improve security when assuming IAM roles.
> 
> Furthermore, the STS session name is currently hard-coded to "hoodie".  
> It is helpful for us to have configurable session names so we have better 
> tracability of what entities are creating STS sessions in the cloud.
> 
> Currently, the assumed role is configured with the 
> {{hoodie.aws.role.arn}} config property.  I would like to add the following 
> extra optional config properties, which will be used by the 
> {{HoodieConfigAWSAssumedRoleCredentialsProvider}}:
> 
> - {{hoodie.aws.role.external.id}}
> - {{hoodie.aws.role.session.name}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7738) FileStreamReader need set Charset with UTF-8

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7738:

Fix Version/s: 0.15.0

> FileStreamReader need set Charset with UTF-8
> 
>
> Key: HUDI-7738
> URL: https://issues.apache.org/jira/browse/HUDI-7738
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> FileStreamReader need set Charset with UTF-8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7737) Bump Spark 3.4 version to Spark 3.4.3

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7737:

Fix Version/s: 0.15.0

> Bump Spark 3.4 version to Spark 3.4.3
> -
>
> Key: HUDI-7737
> URL: https://issues.apache.org/jira/browse/HUDI-7737
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Spark 3.4.3 has been released: https://github.com/apache/spark/tree/v3.4.3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7715) Partition TTL for Flink

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7715:

Fix Version/s: 1.0.0

> Partition TTL for Flink
> ---
>
> Key: HUDI-7715
> URL: https://issues.apache.org/jira/browse/HUDI-7715
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xi chaomin
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7720:

Fix Version/s: 0.15.0
   1.0.0

> Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
> -
>
> Key: HUDI-7720
> URL: https://issues.apache.org/jira/browse/HUDI-7720
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: 1280X1280.PNG
>
>
> Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most 
> recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan 
> executor 204): java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:178)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493)
> at 
> org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122)
>  at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) 
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
> at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
> at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at 
> scala.collection.AbstractIterator.to(Iterator.scala:1431) at 
> scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
> at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at 
> scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at 
> org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
> at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) 
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7721) Fix broken build on master

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7721:

Fix Version/s: 0.15.0

> Fix broken build on master
> --
>
> Key: HUDI-7721
> URL: https://issues.apache.org/jira/browse/HUDI-7721
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> TestHoodieDeltaStreamer is invalid due to 
> [https://github.com/apache/hudi/pull/11099.] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7720.
---
Resolution: Fixed

> Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
> -
>
> Key: HUDI-7720
> URL: https://issues.apache.org/jira/browse/HUDI-7720
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: 1280X1280.PNG
>
>
> Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most 
> recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan 
> executor 204): java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:178)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493)
> at 
> org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122)
>  at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) 
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
> at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
> at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at 
> scala.collection.AbstractIterator.to(Iterator.scala:1431) at 
> scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
> at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at 
> scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at 
> org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
> at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) 
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7641) Add metrics to track what partitions are enabled in MDT

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7641:

Fix Version/s: 1.0.0

> Add metrics to track what partitions are enabled in MDT
> ---
>
> Key: HUDI-7641
> URL: https://issues.apache.org/jira/browse/HUDI-7641
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7467) TestHoodieDeltaStreamer. testAutoGenerateRecordKeys

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7467.
---
Resolution: Fixed

> TestHoodieDeltaStreamer. testAutoGenerateRecordKeys
> ---
>
> Key: HUDI-7467
> URL: https://issues.apache.org/jira/browse/HUDI-7467
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: Lin Liu
>Assignee: tao pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> This test is flaky and sometimes it fails in Azure CI.  We need to reproduce 
> it locally and check why it is flaky (if there is any bug causing it, or it's 
> due to test setup).
> [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22725&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=9df7def4-004b-5fb7-f042-da5d723783ad&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e]
> {code:java}
> [ERROR] Tests run: 131, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 
> 2,459.289 s <<< FAILURE! - in 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer
> [ERROR] testAutoGenerateRecordKeys  Time elapsed: 14.248 s  <<< FAILURE!
> org.opentest4j.AssertionFailedError: expected: <300> but was: <500>
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at 
> org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:611)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.assertRecordCount(HoodieDeltaStreamerTestBase.java:486)
>   at 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testAutoGenerateRecordKeys(TestHoodieDeltaStreamer.java:2823)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7467) TestHoodieDeltaStreamer. testAutoGenerateRecordKeys

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7467:

Fix Version/s: 0.15.0
   1.0.0

> TestHoodieDeltaStreamer. testAutoGenerateRecordKeys
> ---
>
> Key: HUDI-7467
> URL: https://issues.apache.org/jira/browse/HUDI-7467
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: Lin Liu
>Assignee: tao pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> This test is flaky and sometimes it fails in Azure CI.  We need to reproduce 
> it locally and check why it is flaky (if there is any bug causing it, or it's 
> due to test setup).
> [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22725&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=9df7def4-004b-5fb7-f042-da5d723783ad&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e]
> {code:java}
> [ERROR] Tests run: 131, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 
> 2,459.289 s <<< FAILURE! - in 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer
> [ERROR] testAutoGenerateRecordKeys  Time elapsed: 14.248 s  <<< FAILURE!
> org.opentest4j.AssertionFailedError: expected: <300> but was: <500>
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at 
> org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:611)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.assertRecordCount(HoodieDeltaStreamerTestBase.java:486)
>   at 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testAutoGenerateRecordKeys(TestHoodieDeltaStreamer.java:2823)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7641) Add metrics to track what partitions are enabled in MDT

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7641.
---
Resolution: Fixed

> Add metrics to track what partitions are enabled in MDT
> ---
>
> Key: HUDI-7641
> URL: https://issues.apache.org/jira/browse/HUDI-7641
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7710) BugFix: Remove compaction.inflight from conflict resolution

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7710:

Fix Version/s: 0.15.0
   1.0.0

> BugFix: Remove compaction.inflight from conflict resolution
> ---
>
> Key: HUDI-7710
> URL: https://issues.apache.org/jira/browse/HUDI-7710
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: compaction
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> During conflict resolution, compaction.inflight is found; since they don't 
> contain any plan information, this could cause NPE error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7688) Avoid always repeated inflate when encounter InterruptedIOException

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7688.
---
Resolution: Fixed

> Avoid always repeated inflate when encounter InterruptedIOException
> ---
>
> Key: HUDI-7688
> URL: https://issues.apache.org/jira/browse/HUDI-7688
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: image-2024-04-30-11-25-41-671.png, 
> image-2024-04-30-11-27-59-572.png
>
>
> !image-2024-04-30-11-25-41-671.png!
> !image-2024-04-30-11-27-59-572.png!
> We should avoid always retry inflate when encounter InterruptedIOException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7688) Avoid always repeated inflate when encounter InterruptedIOException

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7688:

Fix Version/s: 0.15.0
   1.0.0

> Avoid always repeated inflate when encounter InterruptedIOException
> ---
>
> Key: HUDI-7688
> URL: https://issues.apache.org/jira/browse/HUDI-7688
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: image-2024-04-30-11-25-41-671.png, 
> image-2024-04-30-11-27-59-572.png
>
>
> !image-2024-04-30-11-25-41-671.png!
> !image-2024-04-30-11-27-59-572.png!
> We should avoid always retry inflate when encounter InterruptedIOException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7667) Create util method to get offset range for fetching new data in KafkaSource

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7667:

Fix Version/s: 0.15.0

> Create util method to get offset range for fetching new data in KafkaSource
> ---
>
> Key: HUDI-7667
> URL: https://issues.apache.org/jira/browse/HUDI-7667
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: deltastreamer
>Reporter: Vinish Reddy
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7684) Sort the records for Flink metadata table bulk_insert

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7684:

Fix Version/s: 0.15.0

> Sort the records for Flink metadata table bulk_insert
> -
>
> Key: HUDI-7684
> URL: https://issues.apache.org/jira/browse/HUDI-7684
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> The HFile write requires the input to be sorted, without the sort, 
> re-enabling MDT on existing table could incur issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7682) Remove the files copy in Azure CI tests report

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7682:

Fix Version/s: 0.15.0

> Remove the files copy in Azure CI tests report
> --
>
> Key: HUDI-7682
> URL: https://issues.apache.org/jira/browse/HUDI-7682
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: compile
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7511) Offset range calculation in kafka should return all topic partitions

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7511.
---
Resolution: Fixed

> Offset range calculation in kafka should return all topic partitions 
> -
>
> Key: HUDI-7511
> URL: https://issues.apache.org/jira/browse/HUDI-7511
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> after [https://github.com/apache/hudi/pull/10869] got landed, we are not 
> returning every topic partition in final ranges. But for checkpointing 
> purpose, we need to have every kafka topic partition in final ranges even if 
> we are not consuming anything. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7658:

Fix Version/s: 0.15.0
   1.0.0

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-06-09 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7658.
---
Resolution: Fixed

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >