[jira] [Updated] (HUDI-6295) Support schema evolution for the columns used in MERGE INTO update clause.

2023-06-29 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6295:

Fix Version/s: 0.15.0
   (was: 1.0.0)

> Support schema evolution for the columns used in MERGE INTO update clause.
> --
>
> Key: HUDI-6295
> URL: https://issues.apache.org/jira/browse/HUDI-6295
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Aditya Goenka
>Priority: Major
> Fix For: 0.15.0
>
>
> Details in github issue - [https://github.com/apache/hudi/issues/8502]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6295) Support schema evolution for the columns used in MERGE INTO update clause.

2023-06-29 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6295:

Priority: Major  (was: Minor)

> Support schema evolution for the columns used in MERGE INTO update clause.
> --
>
> Key: HUDI-6295
> URL: https://issues.apache.org/jira/browse/HUDI-6295
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Aditya Goenka
>Priority: Major
> Fix For: 1.0.0
>
>
> Details in github issue - [https://github.com/apache/hudi/issues/8502]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6441) Passing custom Headers with Hudi Callback URL

2023-06-29 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6441:

Priority: Major  (was: Minor)

> Passing custom Headers with Hudi Callback URL
> -
>
> Key: HUDI-6441
> URL: https://issues.apache.org/jira/browse/HUDI-6441
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Aditya Goenka
>Priority: Major
> Fix For: 1.0.0
>
>
> Hudi callback URL's doesn't support passing the custom headers as of now. 
> Implement a way to pass them and use it for callback.
> Github Issue - [https://github.com/apache/hudi/issues/8834]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6059) Add JSON file reader support for Delta Streamer

2023-06-29 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6059:

Priority: Major  (was: Minor)

> Add JSON file reader support for Delta Streamer
> ---
>
> Key: HUDI-6059
> URL: https://issues.apache.org/jira/browse/HUDI-6059
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Aditya Goenka
>Priority: Major
> Fix For: 1.0.0
>
>
> Delta Streamer doesn't supports the normal json file reader, the current 
> implementation of
> JsonDFSSource is mostly created to read the single json record from file.
> Leverage spark Json reader by overriding 
> org.apache.hudi.utilities.sources.RowSource similar to CSVDFSSource.
> Github Issue - [https://github.com/apache/hudi/issues/8409]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9103: [MINOR]move hoodie hfile/orc reader/writer test cases from hudi-client-common to hudi-common

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9103:
URL: https://github.com/apache/hudi/pull/9103#issuecomment-1614152292

   
   ## CI report:
   
   * f26d06b7eb099e698fe7058f3ffba327d4ae5c7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18228)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9103: [MINOR]move hoodie hfile/orc reader/writer test cases from hudi-client-common to hudi-common

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9103:
URL: https://github.com/apache/hudi/pull/9103#issuecomment-1614146955

   
   ## CI report:
   
   * f26d06b7eb099e698fe7058f3ffba327d4ae5c7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6346) Allow duplicates by default for insert operation type

2023-06-29 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6346:

Priority: Blocker  (was: Critical)

> Allow duplicates by default for insert operation type
> -
>
> Key: HUDI-6346
> URL: https://issues.apache.org/jira/browse/HUDI-6346
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Aditya Goenka
>Priority: Blocker
>  Labels: 0.14.0
>
> Insert operation type by default results in some data inconsistency, as it 
> doesn't allow all duplicates and some of them are deduplicated when doing the 
> small file merging. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614141754

   
   ## CI report:
   
   * f154ee335eb307e2bcffd895cfd95bfb1f417a1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18218)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Mulavar commented on pull request #9103: [MINOR]move hoodie hfile/orc reader/writer test cases from hudi-client-common to hudi-common

2023-06-29 Thread via GitHub


Mulavar commented on PR #9103:
URL: https://github.com/apache/hudi/pull/9103#issuecomment-1614115757

   Since hfile/orc reader/writer are written in the hudi-common module, it is 
better to write the test cases in the same module.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Mulavar opened a new pull request, #9103: [MINOR]move hoodie hfile/orc reader/writer test cases from hudi-client-common to hudi-common

2023-06-29 Thread via GitHub


Mulavar opened a new pull request, #9103:
URL: https://github.com/apache/hudi/pull/9103

   
   ### Change Logs
   Move hoodie hfile/orc reader/writer test cases from hudi-client-common to 
hudi-common.
   
   ### Impact
   None
   
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Mulavar closed issue #9102: [MINOR]Refactor hfile/orc reader/test cases

2023-06-29 Thread via GitHub


Mulavar closed issue #9102: [MINOR]Refactor hfile/orc reader/test cases
URL: https://github.com/apache/hudi/issues/9102


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Mulavar opened a new issue, #9102: [MINOR]Refactor hfile/orc reader/test cases

2023-06-29 Thread via GitHub


Mulavar opened a new issue, #9102:
URL: https://github.com/apache/hudi/issues/9102

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   ## Change Logs
   
   A clear and concise description of the problem.
   The test cases of HoodieAvroHFileWriter/Reader which written in hudi-common 
module are placed in hudi-client-common, maybe it's better to to place this 
test code into hudi-common module.
   
   ## Risk level
   Low.
   
   ## Documentation Update
   NA
   
   ## Contributor's checklist
Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-6444) Support delete and delete_partition with RLI

2023-06-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-6444:


Assignee: Raymond Xu

> Support delete and delete_partition with RLI
> 
>
> Key: HUDI-6444
> URL: https://issues.apache.org/jira/browse/HUDI-6444
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index, metadata
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9100: [MINOR]Adjust annotation in HoodieLogFormatReader

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9100:
URL: https://github.com/apache/hudi/pull/9100#issuecomment-1614100501

   
   ## CI report:
   
   * d04a1916d065e33d78bc1f874558db39333b113a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18227)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1614096296

   
   ## CI report:
   
   * 99475ffc62972ee49905fca98ea70f2096cfb135 UNKNOWN
   * ec568a0c309690a1b0931249aae1e4aab9eddc9b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18217)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9100: [MINOR]Adjust annotation in HoodieLogFormatReader

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9100:
URL: https://github.com/apache/hudi/pull/9100#issuecomment-1614096358

   
   ## CI report:
   
   * d04a1916d065e33d78bc1f874558db39333b113a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess opened a new issue, #9101: [SUPPORT] Transaction and task state inconsistency

2023-06-29 Thread via GitHub


KnightChess opened a new issue, #9101:
URL: https://github.com/apache/hudi/issues/9101

   code in:
   
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java#L215-L272
   
   In this picture, we have submit instance commit success. But in two, when we 
trigger `mayBeCleanAndArchive`, it throw Exception, and make this job failed, 
it will retry in job level. But this commit has commit.
   So the final result is: instance commit success -> job failed and retry, and 
the success instance will not rollback.
   
![image](https://github.com/apache/hudi/assets/20125927/eea732b5-62e3-4c0f-924f-06d36a1714c3)
   
   Including other places, I think will cause this problem. I think we need 
catch all exception after we commit instance success or extend the scope of a 
transaction.
   I prefer first catch all exception
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Spark version : 3.2.0
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9099: [HUDI-6457]Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBase…

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9099:
URL: https://github.com/apache/hudi/pull/9099#issuecomment-1614067678

   
   ## CI report:
   
   * c61be845ddfc82ffcc107f8db437fc75d334eb58 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18226)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9097:
URL: https://github.com/apache/hudi/pull/9097#issuecomment-1614067654

   
   ## CI report:
   
   * fc76b6cf3bfb61e27eb1e8130b58ce9b5649fb5e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18221)
 
   * db92d6d09635496b22c27e1375057fed504e6c70 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18225)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1614067541

   
   ## CI report:
   
   * 2aafcc1737e74d9569531d5efc5faf8c5d1b33ec Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18208)
 
   * 045511c3843e115d0df5d97f5f38726b75c98be7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18224)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ksmou opened a new pull request, #9100: [MINOR]Adjust annotation in HoodieLogFormatReader

2023-06-29 Thread via GitHub


ksmou opened a new pull request, #9100:
URL: https://github.com/apache/hudi/pull/9100

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


danny0405 commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1247377008


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option 
recordOpt) {
 
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
 
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
   }
+} else {
+  this.isDeletedRecord = true;

Review Comment:
   The key is not set when the avro data is not set, so I'm wondering whether 
the `key == null` actually represents it is a delete record.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9099: [HUDI-6457]Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBase…

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9099:
URL: https://github.com/apache/hudi/pull/9099#issuecomment-1614063613

   
   ## CI report:
   
   * c61be845ddfc82ffcc107f8db437fc75d334eb58 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9098: [MINOR] Reverting disabled tests for multiwriter archival

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9098:
URL: https://github.com/apache/hudi/pull/9098#issuecomment-1614063595

   
   ## CI report:
   
   * 120a4bcce84c866dfff254294f2a20a54a7d0b1e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18223)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9097:
URL: https://github.com/apache/hudi/pull/9097#issuecomment-1614063575

   
   ## CI report:
   
   * fc76b6cf3bfb61e27eb1e8130b58ce9b5649fb5e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18221)
 
   * db92d6d09635496b22c27e1375057fed504e6c70 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1614063336

   
   ## CI report:
   
   * 2aafcc1737e74d9569531d5efc5faf8c5d1b33ec Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18208)
 
   * 045511c3843e115d0df5d97f5f38726b75c98be7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


danny0405 commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246019736


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option 
recordOpt) {
 
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
 
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
   }
+} else {
+  this.isDeletedRecord = true;

Review Comment:
   BTW, I don't like how the deletes are handling with existing column stats, 
even though we already do that does not mean it is correct, maintaining 
preCombining logic for every meta info is disgusting and inefficient.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9097:
URL: https://github.com/apache/hudi/pull/9097#issuecomment-1614058800

   
   ## CI report:
   
   * fc76b6cf3bfb61e27eb1e8130b58ce9b5649fb5e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18221)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9098: [MINOR] Reverting disabled tests for multiwriter archival

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9098:
URL: https://github.com/apache/hudi/pull/9098#issuecomment-1614058835

   
   ## CI report:
   
   * 120a4bcce84c866dfff254294f2a20a54a7d0b1e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8683: [HUDI-5533] Support spark columns comments

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8683:
URL: https://github.com/apache/hudi/pull/8683#issuecomment-1614058101

   
   ## CI report:
   
   * 8d6893fd9daf07c30524474cf9a4d39c66a37cba UNKNOWN
   * de250fbbcf1d16ba358dd08270eab5e11a5e3740 UNKNOWN
   * f41404ad3c5c399ee0243fe0ea9a5ee70b74f896 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18214)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6457) Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned

2023-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6457:
-
Labels: pull-request-available  (was: )

> Keep JavaSizeBasedClusteringPlanStrategy and 
> SparkSizeBasedClusteringPlanStrategy aligned
> -
>
> Key: HUDI-6457
> URL: https://issues.apache.org/jira/browse/HUDI-6457
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] ksmou opened a new pull request, #9099: [HUDI-6457]Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBase…

2023-06-29 Thread via GitHub


ksmou opened a new pull request, #9099:
URL: https://github.com/apache/hudi/pull/9099

   …dClusteringPlanStrategy aligned
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


nsivabalan merged PR #9041:
URL: https://github.com/apache/hudi/pull/9041


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6431] Support update partition path in record-level index (#9041)

2023-06-29 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 0a3a58ee09c [HUDI-6431] Support update partition path in record-level 
index (#9041)
0a3a58ee09c is described below

commit 0a3a58ee09cff816e47bafc63d760ba4de60e5a0
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Thu Jun 29 19:47:56 2023 -0700

[HUDI-6431] Support update partition path in record-level index (#9041)
---
 .../org/apache/hudi/config/HoodieIndexConfig.java  |  13 ++-
 .../org/apache/hudi/config/HoodieWriteConfig.java  |   6 +-
 .../org/apache/hudi/index/HoodieIndexUtils.java| 128 -
 .../apache/hudi/index/bloom/HoodieBloomIndex.java  |   2 +-
 .../hudi/index/bloom/HoodieGlobalBloomIndex.java   |  51 ++--
 .../hudi/index/bucket/HoodieBucketIndex.java   |   5 +-
 .../hudi/index/simple/HoodieGlobalSimpleIndex.java |  94 ---
 .../hudi/index/simple/HoodieSimpleIndex.java   |   6 +-
 .../hudi/io/HoodieKeyLocationFetchHandle.java  |  23 ++--
 ...arkConsistentBucketDuplicateUpdateStrategy.java |   6 +-
 .../hudi/index/SparkMetadataTableRecordIndex.java  |  49 +---
 .../index/bloom/TestHoodieGlobalBloomIndex.java|   2 +-
 .../TestGlobalIndexEnableUpdatePartitions.java |  26 +++--
 13 files changed, 196 insertions(+), 215 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
index 7c730def11b..c77b9780548 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
@@ -258,6 +258,12 @@ public class HoodieIndexConfig extends HoodieConfig {
   .markAdvanced()
   .withDocumentation("Similar to " + 
BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE + ", but for simple index.");
 
+  public static final ConfigProperty 
RECORD_INDEX_UPDATE_PARTITION_PATH_ENABLE = ConfigProperty
+  .key("hoodie.record.index.update.partition.path")
+  .defaultValue("false")
+  .markAdvanced()
+  .withDocumentation("Similar to " + 
BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE + ", but for record index.");
+
   public static final ConfigProperty 
GLOBAL_INDEX_RECONCILE_PARALLELISM = ConfigProperty
   .key("hoodie.global.index.reconcile.parallelism")
   .defaultValue("60")
@@ -649,7 +655,7 @@ public class HoodieIndexConfig extends HoodieConfig {
   return this;
 }
 
-public Builder withBloomIndexUpdatePartitionPath(boolean 
updatePartitionPath) {
+public Builder withGlobalBloomIndexUpdatePartitionPath(boolean 
updatePartitionPath) {
   hoodieIndexConfig.setValue(BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE, 
String.valueOf(updatePartitionPath));
   return this;
 }
@@ -679,6 +685,11 @@ public class HoodieIndexConfig extends HoodieConfig {
   return this;
 }
 
+public Builder withRecordIndexUpdatePartitionPath(boolean 
updatePartitionPath) {
+  hoodieIndexConfig.setValue(RECORD_INDEX_UPDATE_PARTITION_PATH_ENABLE, 
String.valueOf(updatePartitionPath));
+  return this;
+}
+
 public Builder withGlobalIndexReconcileParallelism(int parallelism) {
   hoodieIndexConfig.setValue(GLOBAL_INDEX_RECONCILE_PARALLELISM, 
String.valueOf(parallelism));
   return this;
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index 7b672abf241..bc964b3cfe8 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -1913,7 +1913,7 @@ public class HoodieWriteConfig extends HoodieConfig {
 return getInt(HoodieIndexConfig.BLOOM_INDEX_KEYS_PER_BUCKET);
   }
 
-  public boolean getBloomIndexUpdatePartitionPath() {
+  public boolean getGlobalBloomIndexUpdatePartitionPath() {
 return 
getBoolean(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE);
   }
 
@@ -1969,6 +1969,10 @@ public class HoodieWriteConfig extends HoodieConfig {
 return getBoolean(HoodieIndexConfig.RECORD_INDEX_USE_CACHING);
   }
 
+  public boolean getRecordIndexUpdatePartitionPath() {
+return 
getBoolean(HoodieIndexConfig.RECORD_INDEX_UPDATE_PARTITION_PATH_ENABLE);
+  }
+
   /**
* storage properties.
*/
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
index 46ad232022d..24a4dc05d10 100644
--- 

[GitHub] [hudi] boneanxs commented on a diff in pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming

2023-06-29 Thread via GitHub


boneanxs commented on code in PR #9053:
URL: https://github.com/apache/hudi/pull/9053#discussion_r1247346902


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/execution/RangeSample.scala:
##
@@ -316,6 +316,8 @@ object RangeSampleSort {
 
HoodieClusteringConfig.LAYOUT_OPTIMIZE_BUILD_CURVE_SAMPLE_SIZE.defaultValue.toString).toInt
   val sample = new RangeSample(zOrderBounds, sampleRdd)
   val rangeBounds = sample.getRangeBounds()
+  if (rangeBounds.size <= 1)

Review Comment:
   if sort columns contain complex type, `sortDataFrameBySampleSupportAllTypes` 
will be used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9097:
URL: https://github.com/apache/hudi/pull/9097#issuecomment-1614030516

   
   ## CI report:
   
   * fc76b6cf3bfb61e27eb1e8130b58ce9b5649fb5e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9041:
URL: https://github.com/apache/hudi/pull/9041#issuecomment-1614030382

   
   ## CI report:
   
   * 2f139383c54f93669342539af77dca9b3a352be3 UNKNOWN
   * a1458e17e5749a89948be8f60387eeecd4c0f87c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18201)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #9098: [MINOR] Reverting disabled tests for multiwriter archival

2023-06-29 Thread via GitHub


nsivabalan opened a new pull request, #9098:
URL: https://github.com/apache/hudi/pull/9098

   ### Change Logs
   
   [MINOR] Reverting disabled tests for multiwriter archival
   
   ### Impact
   
   [MINOR] Reverting disabled tests for multiwriter archival
   
   ### Risk level (write none, low medium or high below)
   
   low.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] gamblewin commented on issue #9093: [SUPPORT] Is it allowed using Flink Table API sqlQuery() to read data from hudi tables?

2023-06-29 Thread via GitHub


gamblewin commented on issue #9093:
URL: https://github.com/apache/hudi/issues/9093#issuecomment-1614027795

   One more question, does hudi have any flink api for bulk insert?
   
![image](https://github.com/apache/hudi/assets/39117591/29652adf-3196-4109-91ba-6f5d176a7d05)
   The example on official website is inserting data into Hudi table one by 
one, what if i want to split data source stream into different windows and when 
each window closes, bulk insert all data in that window into Hudi table.
   
   For now, the only way i can think of bulk insert is use the executeSql() 
method of StreamTableEnvironment to execute SQL statements by concatenating the 
SQL string. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6458) Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6458:
-
Labels: pull-request-available  (was: )

> Scheduling jobs should not fail when there is no completed commits
> --
>
> Key: HUDI-6458
> URL: https://issues.apache.org/jira/browse/HUDI-6458
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] ksmou opened a new pull request, #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread via GitHub


ksmou opened a new pull request, #9097:
URL: https://github.com/apache/hudi/pull/9097

   ### Change Logs
   
   remove unused commits num check before executing compactor/clustering.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1614021158

   
   ## CI report:
   
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * ac4f2ce82babd0794dd73ec097ae79853978b5a5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18211)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8978:
URL: https://github.com/apache/hudi/pull/8978#issuecomment-1614020967

   
   ## CI report:
   
   * 85d6a980287b105a661025ed5aa45da319ad52a1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18213)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8160: [SUPPORT] Schema evolution wrt to datatype promotion isnt working. org.apache.avro.AvroRuntimeException: cannot support rewrite value for schema t

2023-06-29 Thread via GitHub


ad1happy2go commented on issue #8160:
URL: https://github.com/apache/hudi/issues/8160#issuecomment-1614001314

   @aajisaka Gentle ping. Were you able to check it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1613992170

   
   ## CI report:
   
   * 50a92342798b808ebe521d82b99e4622eeb77ce8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18207)
 
   * c401984679350ad245c1b60d4f889b8a18715169 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18220)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1613987357

   
   ## CI report:
   
   * 50a92342798b808ebe521d82b99e4622eeb77ce8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18207)
 
   * c401984679350ad245c1b60d4f889b8a18715169 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on issue #9090: [SUPPORT] FileNotFoundException would happen occasionally after cherrypick HUDI-1517

2023-06-29 Thread via GitHub


beyond1920 commented on issue #9090:
URL: https://github.com/apache/hudi/issues/9090#issuecomment-1613971350

   @ad1happy2go Thanks for response. I would try it on master later. However I 
believe it would exist in master branch too because the related code path does 
not change.
   I have offline discussion with @guanziyue , he would try to fix this problem 
recently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] CTTY commented on pull request #9071: [HUDI-6453] Cascade Glue schema changes to partitions

2023-06-29 Thread via GitHub


CTTY commented on PR #9071:
URL: https://github.com/apache/hudi/pull/9071#issuecomment-1613955565

   @danny0405 Hi Danny, thanks for taking a look
   
   We found when Hudi uses `AwsGlueCatalogSyncTool` to sync schema changes to 
Glue, it only changes table schema without cascading partition level schema. 
But this behavior is actually expected because we never implemented cascading 
behavior for `AwsGlueCatalogSyncClient` 
[LOC](https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java#L333)
   
   This would cause problems when users change their schema later on. Because 
the schema changes it not cascaded, only newer partitions would use the new 
schema and older schema would still have old schema in Glue. Then when users 
use engines like Athena that's aware of partition-level schema to query Glue 
catalog it would seem the older partition is not readable due to failures 
described here: [Athena partition schema mismatch 
errors](https://docs.aws.amazon.com/athena/latest/ug/updates-and-partitions.html)
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9053:
URL: https://github.com/apache/hudi/pull/9053#issuecomment-1613948641

   
   ## CI report:
   
   * 2521fcee784d790c505bbdb7b1cd73a2e83da95a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18210)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613943924

   
   ## CI report:
   
   * 2aafcc1737e74d9569531d5efc5faf8c5d1b33ec Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18208)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni commented on pull request #8683: [HUDI-5533] Support spark columns comments

2023-06-29 Thread via GitHub


parisni commented on PR #8683:
URL: https://github.com/apache/hudi/pull/8683#issuecomment-1613914438

   I have investigated a bit, and here my current understanding:
   
   Reading hudi table w/ spark has two path:
   1. if 
`spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog`
 (which is what hudi recommend in the documentation), then hudi [will rely on 
the `HiveSessionCatalog` to get the 
schema](https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-spark-datasource/hudi-spark3.2plus-common/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala#L101-L109).
 Then if it's a hive metastore implementation, spark will try to get the schema 
as case sensitive and thus not get it from the hive schema (which is case 
insensitive), and fall back fetching the table properties 
`spark.sql.sources.schema` instead.  If it's a glue metastore likely the same 
happens. BTW, [our hive_sync service currently don't propagate the 
comments](https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/SparkDataSourceTableUtils.java#L44-L97)
 in the `spark.sql.source
 s.schema` and that's why in this case `spark.sql("desc table")` or 
`spark.table("table").schema` won't return the comments. This behavior can 
currently be avoided by setting 
`hoodie.datasource.hive_sync.sync_as_datasource=false`, which forces spark to 
grab the information from hive (by letting the spark properties empty in the 
hms), but in a case insensitive way. I'm not sure what are the  consequences of 
relying on hive only.
   2. if `spark.sql.catalog.spark_catalog` is not set or if reading hudi table 
by path `spark.read.format("hudi").load("path")`, then spark uses the path 
updated in this PR, by mean get the schema information from the hudi avro file. 
Except when using `spark.sql("desc table")` b/c spark fallbacks to 
`hiveSessionCatalog` in this case.
   
   So right now, using this PR and setting 
both`hoodie.datasource.hive_sync.sync_comment=true`  and 
`hoodie.datasource.hive_sync.sync_as_datasource=false`, one will get the 
comments in any case (by identifier or by path). However not setting the spark 
datasource informations within the HMS might have some bad effects (if not, why 
making so much efforts to maintains two schemas within the hms?). 
   
   To fix this we could:
   1. make hive_sync [populate the comments in the 
properties](https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/SparkDataSourceTableUtils.java#L44-L97)
 
   2. make `HoodieCatalog` not use anymore the `HiveSessionCatalog` to get the 
schema, but use the hudi avro in place and skip the HMS for this.
   
   I would go for `1` b/c it keeps the current logic intact, and also cover the 
case `spark.sql("desc tablename")` when `spark.sql.catalog.spark_catalog` is 
not set.
   
   Thought @danny0405 @bhasudha @yihua ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1613909899

   
   ## CI report:
   
   * 50a92342798b808ebe521d82b99e4622eeb77ce8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18207)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1613881415

   
   ## CI report:
   
   * 5c71da28dab6c40b1937c2995cec0e96c1a27fb7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18216)
 
   * e4a46690c7c7fbb8bbccdfa34e7e591a3d8f4e1e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18219)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1613875595

   
   ## CI report:
   
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18195)
 
   * 5c71da28dab6c40b1937c2995cec0e96c1a27fb7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18216)
 
   * e4a46690c7c7fbb8bbccdfa34e7e591a3d8f4e1e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] suryaprasanna commented on pull request #9006: [HUDI-6404] Implement ParquetToolsExecutionStrategy for clustering

2023-06-29 Thread via GitHub


suryaprasanna commented on PR #9006:
URL: https://github.com/apache/hudi/pull/9006#issuecomment-1613853964

   > > If there is a use case of pruning some columns to save storage memory, 
current approach of clustering will iterate over every record and remove the 
unused column, this is so much time consuming.
   > 
   > Thanks @suryaprasanna , can you clarify what's the relationship between 
column pruning and clustering, for regular notion of Hudi clustering, it only 
merges small file groups into larger ones with optional soring on columns, 
there is no pruning happens here, how the user expects to improve the 
efficiency with this patch overall?
   
   Clustering is initially added to do sorting and stitching. But its framework 
is flexible enough to do wide variety of rewriter use cases. Following are the 
other rewriter usecases that can be done using Clustering framework.
   1. Encryption. Async encryption on data files can be done on demand basis, 
by restricting the clustering group to be 1. Which then becomes a update of the 
file.
   2. Column pruning. This current change be used run parquet_tools prune 
command on unused columns to reduce the storage footprint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613834591

   
   ## CI report:
   
   * 99475ffc62972ee49905fca98ea70f2096cfb135 UNKNOWN
   * f31f13745ac4e3a32455a52a61e15082afec51e3 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18215)
 
   * ec568a0c309690a1b0931249aae1e4aab9eddc9b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18217)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1613834275

   
   ## CI report:
   
   * 3b6d13a83efdae5e46eebe9ae168ba7e0d8e9f34 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18189)
 
   * f154ee335eb307e2bcffd895cfd95bfb1f417a1e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18218)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #9074: [MINOR] Upload coverage report to codecov.

2023-06-29 Thread via GitHub


prashantwason commented on PR #9074:
URL: https://github.com/apache/hudi/pull/9074#issuecomment-1613832280

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613826837

   
   ## CI report:
   
   * b157b76dc4a7aa862f264b975a12b6212aca7138 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18212)
 
   * 99475ffc62972ee49905fca98ea70f2096cfb135 UNKNOWN
   * f31f13745ac4e3a32455a52a61e15082afec51e3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18215)
 
   * ec568a0c309690a1b0931249aae1e4aab9eddc9b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1613826553

   
   ## CI report:
   
   * 3b6d13a83efdae5e46eebe9ae168ba7e0d8e9f34 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18189)
 
   * f154ee335eb307e2bcffd895cfd95bfb1f417a1e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] prashantwason commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


prashantwason commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1247188452


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option 
recordOpt) {
 
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
 
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
   }
+} else {
+  this.isDeletedRecord = true;

Review Comment:
   >> I would favor isDeleted field in HoodieRecordIndexInfo in the schema.
   With  this design, for all deletes:
   1. We first have to read the existing record from MDT
   2. Add an Upsert to the log files
   3. Remove the deleted record at time of compaction
   
   The above has performance implication for larger indexes like RI.
   
   With the above design, DELETE from RI are treated exactly as DELETE from 
dataset - we write a DELETE block to the log file and existing MOR code takes 
care of it. This is simple.
   
   The reason we could not use this design for MDT is because there is no 
usecase in MDT where we actually DELETE an entire record. Example:
   1. If a file is deleted (during clean), we need to modify the 
partition_file_list_record to remove that single file
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9082: [HUDI-6445] Distribute spark ds func tests

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9082:
URL: https://github.com/apache/hudi/pull/9082#issuecomment-1613819442

   
   ## CI report:
   
   * 474ce7e9a78909fe90b0641f7be1b059084bb11a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18202)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1613819259

   
   ## CI report:
   
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18195)
 
   * 5c71da28dab6c40b1937c2995cec0e96c1a27fb7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18216)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613773319

   
   ## CI report:
   
   * b157b76dc4a7aa862f264b975a12b6212aca7138 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18212)
 
   * 99475ffc62972ee49905fca98ea70f2096cfb135 UNKNOWN
   * f31f13745ac4e3a32455a52a61e15082afec51e3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18215)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1613773079

   
   ## CI report:
   
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18195)
 
   * 5c71da28dab6c40b1937c2995cec0e96c1a27fb7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8683: [HUDI-5533] Support spark columns comments

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8683:
URL: https://github.com/apache/hudi/pull/8683#issuecomment-1613772156

   
   ## CI report:
   
   * 7bdb94998ee2853e15de0b4ce6c20735f43a0f5c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17006)
 
   * 8d6893fd9daf07c30524474cf9a4d39c66a37cba UNKNOWN
   * de250fbbcf1d16ba358dd08270eab5e11a5e3740 UNKNOWN
   * f41404ad3c5c399ee0243fe0ea9a5ee70b74f896 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18214)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613763527

   
   ## CI report:
   
   * b157b76dc4a7aa862f264b975a12b6212aca7138 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18212)
 
   * 99475ffc62972ee49905fca98ea70f2096cfb135 UNKNOWN
   * f31f13745ac4e3a32455a52a61e15082afec51e3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8683: [HUDI-5533] Support spark columns comments

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8683:
URL: https://github.com/apache/hudi/pull/8683#issuecomment-1613762346

   
   ## CI report:
   
   * 7bdb94998ee2853e15de0b4ce6c20735f43a0f5c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17006)
 
   * 8d6893fd9daf07c30524474cf9a4d39c66a37cba UNKNOWN
   * de250fbbcf1d16ba358dd08270eab5e11a5e3740 UNKNOWN
   * f41404ad3c5c399ee0243fe0ea9a5ee70b74f896 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6460) Fix Hbase Index for deletes

2023-06-29 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6460:
-

 Summary: Fix Hbase Index for deletes
 Key: HUDI-6460
 URL: https://issues.apache.org/jira/browse/HUDI-6460
 Project: Apache Hudi
  Issue Type: Improvement
  Components: index
Reporter: sivabalan narayanan


With  adding delete support for RLI, 
[https://github.com/apache/hudi/pull/9058/files] 

Hbase index needs some fixes. 

Test that is failing is:

TestSparkHoodieHBaseIndex.

testTagLocationAndPartitionPathUpdateWithExplicitRollback

 

Root cause:

when update partition path is set to true, within same batch we have a deleted 
record and a new insert record. So, to hbase we are sending both the records 
and for some inserts take precedence, while for others deletes take precedence. 

 

we need to fix SparkHoodieHbaseIndex.

updateLocation

to do one pass overWriteStatus and ensure we de-dup if we have two records 
where one of them is deleted and another is inserted. 

but there are chances that only deletes are present, so in such cases, we need 
to ensure deletes are routed to hbase. 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613754049

   
   ## CI report:
   
   * 34835dfe0e6ff5d6145ffbccd4d52b55af0c3771 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18209)
 
   * b157b76dc4a7aa862f264b975a12b6212aca7138 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18212)
 
   * 99475ffc62972ee49905fca98ea70f2096cfb135 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8978:
URL: https://github.com/apache/hudi/pull/8978#issuecomment-1613753323

   
   ## CI report:
   
   * 30f67aed744512323e0ee8a04423ebb85c00c728 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18169)
 
   * 85d6a980287b105a661025ed5aa45da319ad52a1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18213)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] NewtonXu closed issue #9018: [SUPPORT] AmazonDynamoDBLockClientOptions failing to instantiate for Hudi AWS DynamoDB concurrency control

2023-06-29 Thread via GitHub


NewtonXu closed issue #9018: [SUPPORT] AmazonDynamoDBLockClientOptions failing 
to instantiate for Hudi AWS DynamoDB concurrency control
URL: https://github.com/apache/hudi/issues/9018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] NewtonXu commented on issue #9018: [SUPPORT] AmazonDynamoDBLockClientOptions failing to instantiate for Hudi AWS DynamoDB concurrency control

2023-06-29 Thread via GitHub


NewtonXu commented on issue #9018:
URL: https://github.com/apache/hudi/issues/9018#issuecomment-1613740158

   Yes, this worked for me. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ehurheap commented on issue #9079: [SUPPORT] Hudi delete not working when using UuidKeyGenerator

2023-06-29 Thread via GitHub


ehurheap commented on issue #9079:
URL: https://github.com/apache/hudi/issues/9079#issuecomment-1613712994

   Please post details how to workaround this problem as we discussed in office 
hours. I believe it was using the WriteClient, and empty RDD and some hoodie 
metadata fields?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni commented on a diff in pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming

2023-06-29 Thread via GitHub


parisni commented on code in PR #9053:
URL: https://github.com/apache/hudi/pull/9053#discussion_r1247037719


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/execution/RangeSample.scala:
##
@@ -316,6 +316,8 @@ object RangeSampleSort {
 
HoodieClusteringConfig.LAYOUT_OPTIMIZE_BUILD_CURVE_SAMPLE_SIZE.defaultValue.toString).toInt
   val sample = new RangeSample(zOrderBounds, sampleRdd)
   val rangeBounds = sample.getRangeBounds()
+  if (rangeBounds.size <= 1)

Review Comment:
   likely this one was not failing, still added a short path to avoid sorting 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613687266

   
   ## CI report:
   
   * 34835dfe0e6ff5d6145ffbccd4d52b55af0c3771 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18209)
 
   * b157b76dc4a7aa862f264b975a12b6212aca7138 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18212)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613687172

   
   ## CI report:
   
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * 767eb9cc26d98ed8e64632f98ab688aa4145e5aa Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18204)
 
   * ac4f2ce82babd0794dd73ec097ae79853978b5a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18211)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8978:
URL: https://github.com/apache/hudi/pull/8978#issuecomment-1613686736

   
   ## CI report:
   
   * 30f67aed744512323e0ee8a04423ebb85c00c728 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18169)
 
   * 85d6a980287b105a661025ed5aa45da319ad52a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9053:
URL: https://github.com/apache/hudi/pull/9053#issuecomment-1613677129

   
   ## CI report:
   
   * 074626f1eb9a809051016d8e4633f8487f477459 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18171)
 
   * 2521fcee784d790c505bbdb7b1cd73a2e83da95a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18210)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613677450

   
   ## CI report:
   
   * 34835dfe0e6ff5d6145ffbccd4d52b55af0c3771 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18209)
 
   * b157b76dc4a7aa862f264b975a12b6212aca7138 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613677353

   
   ## CI report:
   
   * be6801e9ca41f00576a511c7d3ffe144e90717ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18179)
 
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * 767eb9cc26d98ed8e64632f98ab688aa4145e5aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18204)
 
   * ac4f2ce82babd0794dd73ec097ae79853978b5a5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-29 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1247016425


##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkRecordMerger.java:
##
@@ -41,13 +42,30 @@ public Option> 
merge(HoodieRecord older, Schema oldSc
 ValidationUtils.checkArgument(older.getRecordType() == 
HoodieRecordType.SPARK);
 ValidationUtils.checkArgument(newer.getRecordType() == 
HoodieRecordType.SPARK);
 
-if (newer.getData() == null) {
-  // Delete record
-  return Option.empty();
+if (newer instanceof HoodieSparkRecord) {
+  HoodieSparkRecord newSparkRecord = (HoodieSparkRecord) newer;
+  if (newSparkRecord.isDeleted()) {
+// Delete record
+return Option.empty();
+  }
+} else {
+  if (newer.getData() == null) {

Review Comment:
   Test case failures occur in `TestMORDataSource` (`testPayloadDelete` for 
example) where test cases fail with following exception:
   
   ```
   1284819 [Executor task launch worker for task 2.0 in stage 107.0 (TID 136)] 
ERROR org.apache.spark.executor.Executor [] - Exception in task 2.0 in stage 
107.0 (TID 136)
   java.lang.ClassCastException: org.apache.hudi.common.model.HoodieEmptyRecord 
cannot be cast to org.apache.hudi.common.model.HoodieSparkRecord
at 
org.apache.hudi.HoodieSparkRecordMerger.merge(HoodieSparkRecordMerger.java:45) 
~[classes/:?]
at org.apache.hudi.RecordMergingFileIterator.merge(Iterators.scala:241) 
~[classes/:?]
at 
org.apache.hudi.RecordMergingFileIterator.hasNextInternal(Iterators.scala:218) 
~[classes/:?]
at 
org.apache.hudi.RecordMergingFileIterator.doHasNext(Iterators.scala:203) 
~[classes/:?]
at 
org.apache.hudi.util.CachingIterator.hasNext(CachingIterator.scala:36) 
~[classes/:?]
at 
org.apache.hudi.util.CachingIterator.hasNext$(CachingIterator.scala:36) 
~[classes/:?]
at org.apache.hudi.LogFileIterator.hasNext(Iterators.scala:61) 
~[classes/:?]
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_0$(Unknown
 Source) ~[?:?]
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) ~[?:?]
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
 ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.15.jar:?]
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
 ~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
 ~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.scheduler.Task.run(Task.scala:136) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_372]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_372]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]```

   The HoodieEmptyRecord that is leading to `ClassCastException` is being 
created in `HoodieMergedLogRecordScanner.java` Line 295
   ```// Put the DELETE record
   if (recordType == HoodieRecordType.AVRO) {
 records.put(key, SpillableMapUtils.generateEmptyPayload(key,
 deleteRecord.getPartitionPath(), deleteRecord.getOrderingValue(), 
getPayloadClassFQN()));
   } else {
 HoodieEmptyRecord record = new HoodieEmptyRecord<>(new HoodieKey(key, 
deleteRecord.getPartitionPath()), null, deleteRecord.getOrderingValue(), 
recordType);
 records.put(key, record);
   }
   ```
   
   Based on offline discussion, we decided to continue with `instanceof` check 
before casting to `HoodieSparkRecord`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this 

[GitHub] [hudi] hudi-bot commented on pull request #9092: [MINOR] Enable log compaction by default for MDT

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9092:
URL: https://github.com/apache/hudi/pull/9092#issuecomment-1613667062

   
   ## CI report:
   
   * 408e9f946e0a0647b0fc9f8e220d55ad2fbde62d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18199)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1613666832

   
   ## CI report:
   
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * 9c6d2bf222b7247bc926302045123bad69157d39 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18198)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9053:
URL: https://github.com/apache/hudi/pull/9053#issuecomment-1613666713

   
   ## CI report:
   
   * 074626f1eb9a809051016d8e4633f8487f477459 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18171)
 
   * 2521fcee784d790c505bbdb7b1cd73a2e83da95a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613611539

   
   ## CI report:
   
   * 34835dfe0e6ff5d6145ffbccd4d52b55af0c3771 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18209)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613611322

   
   ## CI report:
   
   * af66542fd96990611c79e90c943a18341442 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18203)
 
   * 2aafcc1737e74d9569531d5efc5faf8c5d1b33ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18208)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9095: Test ci

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9095:
URL: https://github.com/apache/hudi/pull/9095#issuecomment-1613601961

   
   ## CI report:
   
   * 34835dfe0e6ff5d6145ffbccd4d52b55af0c3771 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613601746

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   * af66542fd96990611c79e90c943a18341442 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18203)
 
   * 2aafcc1737e74d9569531d5efc5faf8c5d1b33ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] MathurCodes1 opened a new issue, #9096: [SUPPORT] Unable to alter column name for a Hudi table.

2023-06-29 Thread via GitHub


MathurCodes1 opened a new issue, #9096:
URL: https://github.com/apache/hudi/issues/9096

   
   
   **Describe the problem you faced**
   I'm unable to alter the column name of Hudi table .
   spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO 
subidentifier") unbable to change the column name.
   
   A clear and concise description of the problem.
   
   I'm unable to alter the column name of Hudi table .
   spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO 
subidentifier") code is unable to change the column name.
   
   Getting the following error when trying to change the column using above 
code:
   **RENAME COLUMN is only supported with v2 tables**
   
   
   **To Reproduce**
   
   ```
   import com.amazonaws.services.glue.GlueContext
   import com.amazonaws.services.glue.util.{GlueArgParser, Job}
   import org.apache.hudi.DataSourceWriteOptions
   import org.apache.spark.sql.functions._
   import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
   import org.apache.spark.{SparkConf, SparkContext}
   
   import scala.collection.JavaConverters._
   import scala.collection.mutable
   
   object ReportingJob {
   
 var spark: SparkSession = _
 var glueContext: GlueContext = _
   
 def main(inputParams: Array[String]): Unit = {
   
   val args: Map[String, String] = 
GlueArgParser.getResolvedOptions(inputParams, Seq("JOB_NAME").toArray)
   val sysArgs: mutable.Map[String, String] = 
scala.collection.mutable.Map(args.toSeq: _*)
  
   implicit val glueContext: GlueContext = init(sysArgs)
   implicit val spark: SparkSession = glueContext.getSparkSession
   
   import spark.implicits._

   val partitionColumnName: String = "id"
   val hudiTableName: String = "Customer"
   val preCombineKey: String = "id"
   val recordKey = "id"
   val basePath= "s3://aws-amazon-uk/customer/production/"
   
   
  val df= Seq((123,"1","seq1"),(124,"0","seq2")).toDF("id","subid","subseq")
   
 val hudiCommonOptions: Map[String, String] = Map(
   "hoodie.table.name" -> hudiTableName,
   "hoodie.datasource.write.keygenerator.class" -> 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.precombine.field" -> preCombineKey,
   "hoodie.datasource.write.recordkey.field" -> recordKey,
   "hoodie.datasource.write.operation" -> "bulk_insert",
   //"hoodie.datasource.write.operation" -> "upsert",
   "hoodie.datasource.write.row.writer.enable" -> "true",
   "hoodie.datasource.write.reconcile.schema" -> "true",
   "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
   "hoodie.datasource.write.hive_style_partitioning" -> "true",
   // "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
   //  "hoodie.upsert.shuffle.parallelism" -> "400",
   "hoodie.datasource.hive_sync.enable" -> "true",
   "hoodie.datasource.hive_sync.table" -> hudiTableName,
   "hoodie.datasource.hive_sync.database" -> "customer_db",
   "hoodie.datasource.hive_sync.partition_fields" -> 
partitionColumnName,
   "hoodie.datasource.hive_sync.partition_extractor_class" -> 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   "hoodie.datasource.hive_sync.use_jdbc" -> "false",
   "hoodie.combine.before.upsert" -> "true",
   "hoodie.avro.schema.external.transformation" -> "true",
   "hoodie.schema.on.read.enable" -> "true",
   "hoodie.datasource.write.schema.allow.auto.evolution.column.drop" -> 
"true",
   "hoodie.index.type" -> "BLOOM",
   "spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
   DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
 )
   
   

 df.write.format("org.apache.hudi")
   .options(hudiCommonOptions)
   .mode(SaveMode.Overwrite)
   .save(basePath+hudiTableName)

spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid 
TO subidentifier")
 commit()
 }
   
 def commit(): Unit = {
   Job.commit()
 }
   
   
 def init(sysArgs: mutable.Map[String, String]): GlueContext = {
   
   val conf = new SparkConf()
   
   conf.set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
   conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "CORRECTED")
   conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "CORRECTED")
   conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", 
"CORRECTED")
   conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", 
"CORRECTED")
   conf.set("spark.sql.avro.datetimeRebaseModeInRead", "CORRECTED")
   val sparkContext = new SparkContext(conf)
   glueContext = new GlueContext(sparkContext)
   Job.init(sysArgs("JOB_NAME"), glueContext, sysArgs.asJava)
   glueContext
   
   

[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1613593548

   
   ## CI report:
   
   * 9751b6399ebf6b629f3940d612bdfe2e2005a25f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18172)
 
   * 50a92342798b808ebe521d82b99e4622eeb77ce8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18207)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


nsivabalan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246951472


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -209,9 +211,10 @@ public class HoodieMetadataPayload implements 
HoodieRecordPayload 
orderingVal) {
-this(Option.of(record));
+  public HoodieMetadataPayload(@Nullable GenericRecord record, Comparable 
orderingVal) {
+this(Option.ofNullable(record));

Review Comment:
   
https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java#L292
 
   
   We issue deletes to RLI using EmptyRecordPayload which goes in as a Delete 
Log BLock. When we deserialize (read path) this, it goes here where we try to 
instantiate the resp payload using reflection. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


nsivabalan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246951472


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -209,9 +211,10 @@ public class HoodieMetadataPayload implements 
HoodieRecordPayload 
orderingVal) {
-this(Option.of(record));
+  public HoodieMetadataPayload(@Nullable GenericRecord record, Comparable 
orderingVal) {
+this(Option.ofNullable(record));

Review Comment:
   
https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java#L292
 
   
   We issue deletes to RLI using EmptyRecordPayload which goes in as a Delete 
Log BLock. When we deserialize this, it goes here where we try to instantiate 
the resp payload using reflection. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


nsivabalan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246949788


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option 
recordOpt) {
 
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
 
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
   }
+} else {
+  this.isDeletedRecord = true;

Review Comment:
   hey folks. here is the context. I feel we should go this route. may be there 
are opportunities to optimize col stats and bloom filter records as well. 
   
   Generally, for any payload, we should have a key and a top level field 
preferrably to denote isDeleted. 
   So, if entire records needs to be deleted, we can rely on the top level 
isDelete field. This is unavoidable since we write using 
EmptyHoodieRecordPayload in some flows (delete), but read back using specific 
payload class. So, every payload will have to support deserialize an 
EmptyRecordPayload. 
   
   
   Now, lets go into more specifics. 
   RLI:
   Commit1: 
   add key1 to RLI partition. 
   
   rolling back commit1:
   delete key1 from RLI partition. 
   From a HoodieRecord standpoint, its as simple as adding a new entry and 
deleting the same. Its simpler and our getInsertValue or 
combineAndGetUpdateValue will be fast. If we push the isDeleted to 
HoodieRecordIndexInfo, then we need to explicitly set the type and then parse 
the HoodieRecordIndexInfo data and then deduce that its deleted. 
   
   Again, w/ EmptyRecordPayload, this is not even doable and we have to go with 
this. 
   
   Why we did not have this issue before. 
   apparently, with FILES, the keys are partitions, and hence, except 
delete_partition, no records from FILES will be deleted in its entirely. 
   
   W/ col stats, a delete, while writing to MDT partition, is yet another 
upsert record with isDeleted within ColumnStats Metadata. So, our 
getInsertValue or combineAndGetUpdate value will need to deserialize entire 
record and then deduce that its deleted. 
   A right fix here also would be to do what we are doing w/ RLI in this patch. 
   
   i.e. 
   in commit1, 
   add col1_part1_file1 : value to MDT
   
   in some X commit, when file1 is deleted:
   just delete col1_part1_file1 from col stats partition in MDT, by using 
EmptyRecordPayload. 
   
   So, Log record reading and compaction will be fast. 
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6426) Upgrade Spark 3.4.1

2023-06-29 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-6426:

Fix Version/s: 0.14.0
 Priority: Blocker  (was: Major)

> Upgrade Spark 3.4.1
> ---
>
> Key: HUDI-6426
> URL: https://issues.apache.org/jira/browse/HUDI-6426
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Rahil Chertara
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Spark 3.4.1 rc1 is out [https://github.com/apache/spark/tree/v3.4.1-rc1] we 
> should start the upgrade process for this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1613571050

   
   ## CI report:
   
   * 9751b6399ebf6b629f3940d612bdfe2e2005a25f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18172)
 
   * 50a92342798b808ebe521d82b99e4622eeb77ce8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8609:
URL: https://github.com/apache/hudi/pull/8609#issuecomment-1613560816

   
   ## CI report:
   
   * a64034d612fa64c99dd8d319ac00680924773f53 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18197)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] splate commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3

2023-06-29 Thread via GitHub


splate commented on PR #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1613554365

   Would this bug also exist in the spark hudi libraries used in AWS glue?  My 
issue is I am trying to use Spark SQL to query a hudi table and put it into a 
spark dataframe.  I am getting a casting exception 
("java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be 
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable").   Would this be 
related to this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1613551942

   
   ## CI report:
   
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18196)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (05435bb0344 -> dc3aa399ffc)

2023-06-29 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 05435bb0344 [MINOR] Increase timeout for Azure CI: UT spark-datasource 
to 240 minutes (#9089)
 add dc3aa399ffc [HUDI-6393] Enable MOR support for Record index with 
functional test cases (#9017)

No new revisions were added by this update.

Summary of changes:
 .../metadata/HoodieBackedTableMetadataWriter.java  |   5 -
 .../hudi/metadata/HoodieBackedTableMetadata.java   |   4 +
 .../hudi/functional/TestRecordLevelIndex.scala | 608 +
 .../org/apache/hudi/util/JavaConversions.scala |  23 +-
 4 files changed, 625 insertions(+), 15 deletions(-)
 create mode 100644 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala
 copy 
hudi-utilities/src/main/java/org/apache/hudi/utilities/exception/HoodieIncrementalPullException.java
 => 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/util/JavaConversions.scala
 (65%)



  1   2   3   >