[GitHub] [hudi] hudi-bot commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table

2022-12-14 Thread GitBox


hudi-bot commented on PR #7467:
URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352682386

   
   ## CI report:
   
   * 825867864b847cb097b17f281824096a2ce41c42 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13753)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13755)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner

2022-12-14 Thread GitBox


hudi-bot commented on PR #7468:
URL: https://github.com/apache/hudi/pull/7468#issuecomment-1352682419

   
   ## CI report:
   
   * e42cf58d7a37de5f724673c71ea350937709041a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13754)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] qifanlili commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table

2022-12-14 Thread GitBox


qifanlili commented on PR #7467:
URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352648693


   Exactly the same as [#7222](https://github.com/apache/hudi/pull/7222) , 
please take another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] loukey-lj commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet

2022-12-14 Thread GitBox


loukey-lj commented on PR #6612:
URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352637903

   > 
   
   I don't know if I can fully support schema evolution. I hope to improve this 
function with the help of the community. I will write a small demo as soon as 
possible


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5386) Cleaning conflicts in occ mode

2022-12-14 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Summary: Cleaning conflicts in occ mode  (was: Rollback conflict in occ 
mode)

> Cleaning conflicts in occ mode
> --
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] Zouxxyy commented on pull request #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner

2022-12-14 Thread GitBox


Zouxxyy commented on PR #7468:
URL: https://github.com/apache/hudi/pull/7468#issuecomment-1352631288

   @boneanxs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner

2022-12-14 Thread GitBox


hudi-bot commented on PR #7468:
URL: https://github.com/apache/hudi/pull/7468#issuecomment-1352630472

   
   ## CI report:
   
   * e42cf58d7a37de5f724673c71ea350937709041a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table

2022-12-14 Thread GitBox


hudi-bot commented on PR #7467:
URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352630435

   
   ## CI report:
   
   * 825867864b847cb097b17f281824096a2ce41c42 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7377: [HUDI-4827] Rebase Azure Image on Ubuntu 22.04

2022-12-14 Thread GitBox


hudi-bot commented on PR #7377:
URL: https://github.com/apache/hudi/pull/7377#issuecomment-1352630075

   
   ## CI report:
   
   * c204b495505dd07be32da89de25e9bb19ceb19d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13462)
 
   * 676c3d09549bd933a2f722989fa4e431a822418e UNKNOWN
   * 62c8ca2b5fb40e5dd62859b0437004f236535b11 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13751)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5394) RowCustomColumnsSortPartitioner should not use sortWithinPartitions

2022-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5394:
-
Labels: pull-request-available  (was: )

> RowCustomColumnsSortPartitioner should not use sortWithinPartitions
> ---
>
> Key: HUDI-5394
> URL: https://issues.apache.org/jira/browse/HUDI-5394
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] qifanlili commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table

2022-12-14 Thread GitBox


qifanlili commented on PR #7467:
URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352626937

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy opened a new pull request, #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner

2022-12-14 Thread GitBox


Zouxxyy opened a new pull request, #7468:
URL: https://github.com/apache/hudi/pull/7468

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate

2022-12-14 Thread GitBox


hudi-bot commented on PR #7455:
URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352626034

   
   ## CI report:
   
   * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732)
 
   * 738f67333ccb21b6c96540ded3477cb379ce0a57 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13752)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7377: [HUDI-4827] Rebase Azure Image on Ubuntu 22.04

2022-12-14 Thread GitBox


hudi-bot commented on PR #7377:
URL: https://github.com/apache/hudi/pull/7377#issuecomment-1352625879

   
   ## CI report:
   
   * c204b495505dd07be32da89de25e9bb19ceb19d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13462)
 
   * 676c3d09549bd933a2f722989fa4e431a822418e UNKNOWN
   * 62c8ca2b5fb40e5dd62859b0437004f236535b11 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] qifanlili commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table

2022-12-14 Thread GitBox


qifanlili commented on PR #7467:
URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352625050

   ![Uploading image.png…]()
   
   set hoodie.datasource.hive_sync.create_managed_table = true
   does not take effect
   
   The reason is the following code, which is always True
   This props was not set when the HiveSyncContext was created
   ![Uploading image.png…]()
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5394) RowCustomColumnsSortPartitioner should not use sortWithinPartitions

2022-12-14 Thread zouxxyy (Jira)
zouxxyy created HUDI-5394:
-

 Summary: RowCustomColumnsSortPartitioner should not use 
sortWithinPartitions
 Key: HUDI-5394
 URL: https://issues.apache.org/jira/browse/HUDI-5394
 Project: Apache Hudi
  Issue Type: Bug
Reporter: zouxxyy






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] qifanlili opened a new pull request, #7467: [MINOR] fixed Flink's DataStream does not support creating managed table

2022-12-14 Thread GitBox


qifanlili opened a new pull request, #7467:
URL: https://github.com/apache/hudi/pull/7467

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5090) throw runtime Exception when flink streming job checkpoint abort

2022-12-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5090:
-
Fix Version/s: (was: 0.12.2)

> throw runtime Exception when flink streming job checkpoint abort
> 
>
> Key: HUDI-5090
> URL: https://issues.apache.org/jira/browse/HUDI-5090
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: chenfengLiu
>Assignee: chenfengLiu
>Priority: Major
>  Labels: pull-request-available
>
> When write task in a Flink job want to flush data, there is a condition that 
> listened a new instant which have been start. If there is no new instant, the 
> TM will wait for timeout.
> We can see the code at 
> [https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/common/AbstractStreamWriteFunction.java#L252.]
> Now there is a case that when the JM start new instant fail, JM won't retry 
> this work. So how all the write tasks will hang.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7466: [HUDI-5393] Remove the reuse of metadata table writer for flink write…

2022-12-14 Thread GitBox


hudi-bot commented on PR #7466:
URL: https://github.com/apache/hudi/pull/7466#issuecomment-1352621750

   
   ## CI report:
   
   * 699dea79216b47ec98c271b346b48be2ab112571 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13750)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7465: [HUDI-3661] Flink async compaction is not thread safe when use waterm…

2022-12-14 Thread GitBox


hudi-bot commented on PR #7465:
URL: https://github.com/apache/hudi/pull/7465#issuecomment-1352621725

   
   ## CI report:
   
   * 25fa9a09aa18a39692fc7885de2361fdd0057f7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13749)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)

2022-12-14 Thread GitBox


hudi-bot commented on PR #7464:
URL: https://github.com/apache/hudi/pull/7464#issuecomment-1352621705

   
   ## CI report:
   
   * 76da506b6afafdd8138445333d988a6bc5a5cd0e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13748)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate

2022-12-14 Thread GitBox


hudi-bot commented on PR #7455:
URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352621635

   
   ## CI report:
   
   * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732)
 
   * 738f67333ccb21b6c96540ded3477cb379ce0a57 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7377: [HUDI-4827] Rebase Azure Image on Ubuntu 22.04

2022-12-14 Thread GitBox


hudi-bot commented on PR #7377:
URL: https://github.com/apache/hudi/pull/7377#issuecomment-1352621407

   
   ## CI report:
   
   * c204b495505dd07be32da89de25e9bb19ceb19d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13462)
 
   * 676c3d09549bd933a2f722989fa4e431a822418e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-14 Thread GitBox


hudi-bot commented on PR #7159:
URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352621117

   
   ## CI report:
   
   * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN
   * 45cf7f3c242e20f49b95242a06efe1e24649edc7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13739)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] guanziyue commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet

2022-12-14 Thread GitBox


guanziyue commented on PR #6612:
URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352617591

   > > @loukey-lj : can you respond to @guanziyue 's comment above. I will 
review this patch by this week.
   > 
   > Yes, this optimization is applicable to other frameworks. For hudi, its 
advantage is that it can get rowgroups and store them in the index while 
updating the index. For schema evolution, we currently only support adding 
fields. Different rowgroups in the Parquet file can have different schmeas, but 
this is unknown to the query side. If schema changes are not considered, I can 
submit a small demo
   
   Thanks for your reply. Agree that this idea can improve performance a lot 
theoretically. It worries me that current parquet implementation or interface 
cannot fully support this idea. Looking forward to this RFC!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate

2022-12-14 Thread GitBox


hudi-bot commented on PR #7455:
URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352616734

   
   ## CI report:
   
   * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch release-0.12.2-blockers-candidate updated (ee8c9dfe97b -> 738f67333cc)

2022-12-14 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch release-0.12.2-blockers-candidate
in repository https://gitbox.apache.org/repos/asf/hudi.git


from ee8c9dfe97b Fixing schemas used for bootstrap reader
 add 738f67333cc [HUDI-5375] Fixing reusing file readers with Metadata 
reader within FileIndex (#7450)

No new revisions were added by this update.

Summary of changes:
 hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[GitHub] [hudi] nsivabalan merged pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-14 Thread GitBox


nsivabalan merged PR #7450:
URL: https://github.com/apache/hudi/pull/7450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5365) Add TOS StorageScheme to support Volcengine Object Storage

2022-12-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5365:
-
Fix Version/s: 0.13.0

> Add TOS StorageScheme to support Volcengine Object Storage
> --
>
> Key: HUDI-5365
> URL: https://issues.apache.org/jira/browse/HUDI-5365
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Zhiping Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> [TOS|https://www.volcengine.com/product/tos] is an object storage from 
> Volcengine, and [CFS|https://www.volcengine.com/product/cfs] is a cloud HDFS 
> from Volcengine. Hudi StorageSchme doesn't support them currently, which 
> cause we cannot integrate any data processing engines with Hudi on TOS/CFS, I 
> would suggest support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5365) Add TOS StorageScheme to support Volcengine Object Storage

2022-12-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-5365.
--

> Add TOS StorageScheme to support Volcengine Object Storage
> --
>
> Key: HUDI-5365
> URL: https://issues.apache.org/jira/browse/HUDI-5365
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Zhiping Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> [TOS|https://www.volcengine.com/product/tos] is an object storage from 
> Volcengine, and [CFS|https://www.volcengine.com/product/cfs] is a cloud HDFS 
> from Volcengine. Hudi StorageSchme doesn't support them currently, which 
> cause we cannot integrate any data processing engines with Hudi on TOS/CFS, I 
> would suggest support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-5365) Add TOS StorageScheme to support Volcengine Object Storage

2022-12-14 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647845#comment-17647845
 ] 

Danny Chen commented on HUDI-5365:
--

Fixed via master branch: 6ef477238b4818b3a4da07f1426ea0dd296b7dbb

> Add TOS StorageScheme to support Volcengine Object Storage
> --
>
> Key: HUDI-5365
> URL: https://issues.apache.org/jira/browse/HUDI-5365
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Zhiping Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> [TOS|https://www.volcengine.com/product/tos] is an object storage from 
> Volcengine, and [CFS|https://www.volcengine.com/product/cfs] is a cloud HDFS 
> from Volcengine. Hudi StorageSchme doesn't support them currently, which 
> cause we cannot integrate any data processing engines with Hudi on TOS/CFS, I 
> would suggest support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-5365] Add Volcengine Object Storage(tos) and Cloud HDFS(cfs) (#7425)

2022-12-14 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6ef477238b4 [HUDI-5365] Add Volcengine Object Storage(tos) and Cloud 
HDFS(cfs) (#7425)
6ef477238b4 is described below

commit 6ef477238b4818b3a4da07f1426ea0dd296b7dbb
Author: stayrascal 
AuthorDate: Thu Dec 15 13:23:14 2022 +0800

[HUDI-5365] Add Volcengine Object Storage(tos) and Cloud HDFS(cfs) (#7425)

Co-authored-by: wuzhiping 
---
 .../src/main/java/org/apache/hudi/common/fs/StorageSchemes.java | 6 +-
 .../src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java 
b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
index 10619f8b3af..9b5af8bc648 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
@@ -69,7 +69,11 @@ public enum StorageSchemes {
   // Baidu Object Storage
   BOS("bos", false),
   // Oracle Cloud Infrastructure Object Storage
-  OCI("oci", false);
+  OCI("oci", false),
+  // Volcengine Object Storage
+  TOS("tos", false),
+  // Volcengine Cloud HDFS
+  CFS("cfs", true);
 
   private String scheme;
   private boolean supportsAppend;
diff --git 
a/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java 
b/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java
index 354ad6d0cca..7f2e0c2f8de 100644
--- 
a/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java
+++ 
b/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java
@@ -52,6 +52,8 @@ public class TestStorageSchemes {
 assertFalse(StorageSchemes.isAppendSupported("ks3"));
 assertTrue(StorageSchemes.isAppendSupported("ofs"));
 assertFalse(StorageSchemes.isAppendSupported("oci"));
+assertFalse(StorageSchemes.isAppendSupported("tos"));
+assertTrue(StorageSchemes.isAppendSupported("cfs"));
 assertThrows(IllegalArgumentException.class, () -> {
   StorageSchemes.isAppendSupported("s2");
 }, "Should throw exception for unsupported schemes");



[GitHub] [hudi] danny0405 merged pull request #7425: [HUDI-5365] Add Volcengine Object Storage(tos) and Cloud HDFS(cfs)

2022-12-14 Thread GitBox


danny0405 merged PR #7425:
URL: https://github.com/apache/hudi/pull/7425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7466: [HUDI-5393] Remove the reuse of metadata table writer for flink write…

2022-12-14 Thread GitBox


hudi-bot commented on PR #7466:
URL: https://github.com/apache/hudi/pull/7466#issuecomment-1352575059

   
   ## CI report:
   
   * 699dea79216b47ec98c271b346b48be2ab112571 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetatata (0.12.2)

2022-12-14 Thread GitBox


hudi-bot commented on PR #7462:
URL: https://github.com/apache/hudi/pull/7462#issuecomment-1352575027

   
   ## CI report:
   
   * b296d6ba677a4211e1c0927cd7228e8ff25a5d94 UNKNOWN
   * 448285015964bd681d1291cb1545ebdec605a3e8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13746)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)

2022-12-14 Thread GitBox


hudi-bot commented on PR #7464:
URL: https://github.com/apache/hudi/pull/7464#issuecomment-1352575036

   
   ## CI report:
   
   * 76da506b6afafdd8138445333d988a6bc5a5cd0e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7465: [HUDI-3661] Flink async compaction is not thread safe when use waterm…

2022-12-14 Thread GitBox


hudi-bot commented on PR #7465:
URL: https://github.com/apache/hudi/pull/7465#issuecomment-1352575047

   
   ## CI report:
   
   * 25fa9a09aa18a39692fc7885de2361fdd0057f7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #7428: [HUDI-5368] decouple GlueCatalogSyncTool by using reflecting instead of import class directly.

2022-12-14 Thread GitBox


danny0405 commented on PR #7428:
URL: https://github.com/apache/hudi/pull/7428#issuecomment-1352573297

   @xushiyan Can you take a look if you have time :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetatata (0.12.2)

2022-12-14 Thread GitBox


hudi-bot commented on PR #7462:
URL: https://github.com/apache/hudi/pull/7462#issuecomment-1352571760

   
   ## CI report:
   
   * b296d6ba677a4211e1c0927cd7228e8ff25a5d94 UNKNOWN
   * 448285015964bd681d1291cb1545ebdec605a3e8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-14 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352571674

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN
   * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN
   * 6158087d7e518a7bef01ba01b993029436bf429d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13740)
 
   * a509172c60864820f6758716c4e832645a97a57f UNKNOWN
   * 155476af5ac3169cc11c9b4fab5057ca407995f8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13745)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7419: [WIP][HUDI-5357] Optimize deployment of release artifacts

2022-12-14 Thread GitBox


danny0405 commented on code in PR #7419:
URL: https://github.com/apache/hudi/pull/7419#discussion_r1049222602


##
scripts/release/deploy_staging_jars.sh:
##
@@ -37,15 +37,26 @@ if [ "$#" -gt "1" ]; then
 fi
 
 declare -a ALL_VERSION_OPTS=(
-"-Dscala-2.11 -Dspark2 -Dflink1.13" # for legacy bundle name
-"-Dscala-2.12 -Dspark2 -Dflink1.13" # for legacy bundle name
-"-Dscala-2.12 -Dspark3 -Dflink1.14" # for legacy bundle name
-"-Dscala-2.11 -Dspark2.4 -Dflink1.13"
-"-Dscala-2.11 -Dspark2.4 -Dflink1.14"
-"-Dscala-2.12 -Dspark2.4 -Dflink1.13"
-"-Dscala-2.12 -Dspark3.3 -Dflink1.15"
-"-Dscala-2.12 -Dspark3.2 -Dflink1.14"
-"-Dscala-2.12 -Dspark3.1 -Dflink1.14" # run this last to make sure utilities 
bundle has spark 3.1
+# upload all module jars and bundle jars
+"-Dscala-2.11 -Dspark2.4"
+"-Dscala-2.12 -Dspark2.4"
+"-Dscala-2.12 -Dspark3.1"
+"-Dscala-2.12 -Dspark3.2"
+"-Dscala-2.12 -Dspark3.3"
+
+# spark bundles (legacy) (not overwriting previous uploads as these jar names 
are unique)
+"-Dscala-2.11 -Dspark2 -pl packaging/hudi-spark-bundle" # for legacy bundle 
name hudi-spark-bundle_2.11
+"-Dscala-2.12 -Dspark2 -pl packaging/hudi-spark-bundle" # for legacy bundle 
name hudi-spark-bundle_2.12
+"-Dscala-2.12 -Dspark3 -pl packaging/hudi-spark-bundle" # for legacy bundle 
name hudi-spark3-bundle_2.12
+
+# utilities bundles (legacy) (overwriting previous uploads)
+"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-utilities-bundle" # 
utilities-bundle_2.11 is for spark 2.4 only
+"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-utilities-bundle" # 
utilities-bundle_2.12 is for spark 3.1 only
+
+# flink bundles (overwriting previous uploads)
+"-Dscala-2.12 -Dflink1.13 -Davro.version=1.10.0 -pl 
packaging/hudi-flink-bundle"
+"-Dscala-2.12 -Dflink1.14 -Davro.version=1.10.0 -pl 
packaging/hudi-flink-bundle"
+"-Dscala-2.12 -Dflink1.15 -Davro.version=1.10.0 -pl 
packaging/hudi-flink-bundle"

Review Comment:
   The hard code avro version is hard to maintain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)

2022-12-14 Thread GitBox


danny0405 commented on PR #7464:
URL: https://github.com/apache/hudi/pull/7464#issuecomment-1352568899

   Thanks for the ckerry-pick, have fired a following fix: 
https://github.com/apache/hudi/pull/7466


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetatata (0.12.2)

2022-12-14 Thread GitBox


hudi-bot commented on PR #7462:
URL: https://github.com/apache/hudi/pull/7462#issuecomment-1352567892

   
   ## CI report:
   
   * b296d6ba677a4211e1c0927cd7228e8ff25a5d94 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field

2022-12-14 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352567402

   
   ## CI report:
   
   * c60978aaf0dd183b05139dda6bd741ea43877f42 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13715)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13736)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13744)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5393) Remove the reuse of metadata table writer for flink write client

2022-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5393:
-
Labels: pull-request-available  (was: )

> Remove the reuse of metadata table writer for flink write client
> 
>
> Key: HUDI-5393
> URL: https://issues.apache.org/jira/browse/HUDI-5393
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-14 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1352567780

   
   ## CI report:
   
   * 3d4f6bf574764b5e8c962b94fa1b7fbfd6e735b5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13737)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7423: [HUDI-5384] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2022-12-14 Thread GitBox


hudi-bot commented on PR #7423:
URL: https://github.com/apache/hudi/pull/7423#issuecomment-1352567585

   
   ## CI report:
   
   * 09b901a56869b8282c92d6c05ad746f98f2d6a01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13735)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 opened a new pull request, #7466: [HUDI-5393] Remove the reuse of metadata table writer for flink write…

2022-12-14 Thread GitBox


danny0405 opened a new pull request, #7466:
URL: https://github.com/apache/hudi/pull/7466

   … client
   
   ### Change Logs
   
   After HUDI-5366, the writer is closed after each write, there is no need to 
reuse the writer anymore, even thoudh the reuse can reduce some cost but the 
state is hard to maintain as correct.
   
   ### Impact
   
   No
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5393) Remove the reuse of metadata table writer for flink write client

2022-12-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5393:
-
Fix Version/s: 0.12.2
   0.13.0

> Remove the reuse of metadata table writer for flink write client
> 
>
> Key: HUDI-5393
> URL: https://issues.apache.org/jira/browse/HUDI-5393
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
> Fix For: 0.12.2, 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field

2022-12-14 Thread GitBox


xicm commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352559067

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #7175: [HUDI-5191] Fix compatibility with avro 1.10

2022-12-14 Thread GitBox


xushiyan commented on code in PR #7175:
URL: https://github.com/apache/hudi/pull/7175#discussion_r1049214021


##
.github/workflows/bot.yml:
##
@@ -73,6 +73,14 @@ jobs:
 run: |
   HUDI_VERSION=$(mvn help:evaluate -Dexpression=project.version -q 
-DforceStdout)
   ./packaging/bundle-validation/ci_run.sh $HUDI_VERSION
+  - name: Common Test

Review Comment:
   there is a deeper issue with this - hudi common is tightly coupled with avro 
models, which variates wrt spark profiles. currently hudi-common jar won't be 
compatible across all engine profiles: e.g., if built with spark3.3 (avro 
1.11), it won't work with spark 2 (avro 1.8)  or flink (avro 1.10). this needs 
to be decoupled first. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5393) Remove the reuse of metadata table writer for flink write client

2022-12-14 Thread Danny Chen (Jira)
Danny Chen created HUDI-5393:


 Summary: Remove the reuse of metadata table writer for flink write 
client
 Key: HUDI-5393
 URL: https://issues.apache.org/jira/browse/HUDI-5393
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 opened a new pull request, #7465: [HUDI-3661] Flink async compaction is not thread safe when use waterm…

2022-12-14 Thread GitBox


danny0405 opened a new pull request, #7465:
URL: https://github.com/apache/hudi/pull/7465

   …ark (#7399)
   
   (cherry picked from commit 86d1e39fb4e971b11e8c6394f6611b7bd7089bd4)
   
   ### Change Logs
   
   This is a bug fix cherry pick for release 0.12.2.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)

2022-12-14 Thread GitBox


nsivabalan opened a new pull request, #7464:
URL: https://github.com/apache/hudi/pull/7464

   ### Change Logs
   
   Re-applying https://github.com/apache/hudi/pull/7437 against 0.12.2 branch. 
   Closing metadata writer wherever possible. 
   Stacked on top of https://github.com/apache/hudi/pull/7462
   
   ### Impact
   
   Closing open file handles to MDT. 
   
   ### Risk level (write none, low medium or high below)
   
   low.
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] abhishekkh opened a new pull request, #7463: add jsontoavro converter

2022-12-14 Thread GitBox


abhishekkh opened a new pull request, #7463:
URL: https://github.com/apache/hudi/pull/7463

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] loukey-lj commented on a diff in pull request #7336: [HUDI-5297][HUDI-5298] Refactoring WriteStatus

2022-12-14 Thread GitBox


loukey-lj commented on code in PR #7336:
URL: https://github.com/apache/hudi/pull/7336#discussion_r1049198455


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -86,6 +88,7 @@
 
   protected HoodieTimer timer;
   protected WriteStatus writeStatus;
+  protected HoodieRecordLocation newRecordLocation;

Review Comment:
   I got it wrong. I thought that a WriteStatus only has one location instance. 
Now it seems that there is no great change from obtaining the loaction from 
HoodieRecord. But in some partition change scenarios, we may need to use 
HoodieRecord's operation to determine the curd of the index



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] qifanlili closed pull request #7222: [MINOR] fixed Flink's DataStream does not support creating managed table

2022-12-14 Thread GitBox


qifanlili closed pull request #7222: [MINOR] fixed Flink's DataStream does not 
support creating managed table
URL: https://github.com/apache/hudi/pull/7222


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetad…

2022-12-14 Thread GitBox


nsivabalan opened a new pull request, #7462:
URL: https://github.com/apache/hudi/pull/7462

   …ata (#7320)
   
   ### Change Logs
   
   Re-applying https://github.com/apache/hudi/pull/7320 against 0.12.2 branch
   remove the lock in #writeTableMetadata
   
   ### Impact
   
   Support metadata table in Flink 
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-14 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352527902

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN
   * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN
   * 6158087d7e518a7bef01ba01b993029436bf429d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13740)
 
   * a509172c60864820f6758716c4e832645a97a57f UNKNOWN
   * 155476af5ac3169cc11c9b4fab5057ca407995f8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

2022-12-14 Thread GitBox


hudi-bot commented on PR #7461:
URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352524838

   
   ## CI report:
   
   * 8305782809d957b5fc7d280414a4e700a47138d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13734)
 
   * f152865c372e5e57fbb0acd23b3f704b73c1cd5f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13743)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7456: [HUDI-4917][FOLLOW_UP]Optimize codes logic to not break the old class meaning

2022-12-14 Thread GitBox


hudi-bot commented on PR #7456:
URL: https://github.com/apache/hudi/pull/7456#issuecomment-1352524818

   
   ## CI report:
   
   * 2d48757d98f331de07db0796554bf8da73de1ffc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13716)
 
   * c4041c7446cd25b9809f616b640c069ef2959107 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13742)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-14 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352524771

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN
   * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN
   * 65e21b863cdfec85ffc17beb3f0a6560796a0a09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13729)
 
   * 6158087d7e518a7bef01ba01b993029436bf429d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13740)
 
   * a509172c60864820f6758716c4e832645a97a57f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-14 Thread GitBox


hudi-bot commented on PR #7159:
URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352524563

   
   ## CI report:
   
   * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN
   * 6fd8a8f6cd9907dfe4f25164f2e0240af65cab5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13680)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13738)
 
   * 45cf7f3c242e20f49b95242a06efe1e24649edc7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13739)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2022-12-14 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1352524227

   
   ## CI report:
   
   * a8dd96042f42ca74fa8789decdea7397072ec890 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13388)
 
   * a28a39f44afe5561fbf33a6381721e98911a01db Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13741)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

2022-12-14 Thread GitBox


hudi-bot commented on PR #7461:
URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352520613

   
   ## CI report:
   
   * 8305782809d957b5fc7d280414a4e700a47138d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13734)
 
   * f152865c372e5e57fbb0acd23b3f704b73c1cd5f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7456: [HUDI-4917][FOLLOW_UP]Optimize codes logic to not break the old class meaning

2022-12-14 Thread GitBox


hudi-bot commented on PR #7456:
URL: https://github.com/apache/hudi/pull/7456#issuecomment-1352520580

   
   ## CI report:
   
   * 2d48757d98f331de07db0796554bf8da73de1ffc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13716)
 
   * c4041c7446cd25b9809f616b640c069ef2959107 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-14 Thread GitBox


nsivabalan commented on code in PR #7450:
URL: https://github.com/apache/hudi/pull/7450#discussion_r1049189154


##
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##
@@ -1598,7 +1597,7 @@ public void 
testAvroLogRecordReaderWithMixedInsertsCorruptsAndRollback(ExternalS
 scanner.close();
   }
 
-  @ParameterizedTest
+  /*@ParameterizedTest

Review Comment:
   I based of this patch on our release branch which had this test 
unintentionally pulled in. but guess w/ latest state of the branch, its not an 
issue. this test is not pulled in only 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-14 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352520525

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN
   * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN
   * 65e21b863cdfec85ffc17beb3f0a6560796a0a09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13729)
 
   * 6158087d7e518a7bef01ba01b993029436bf429d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-14 Thread GitBox


hudi-bot commented on PR #7159:
URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352520269

   
   ## CI report:
   
   * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN
   * 6fd8a8f6cd9907dfe4f25164f2e0240af65cab5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13680)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13738)
 
   * 45cf7f3c242e20f49b95242a06efe1e24649edc7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2022-12-14 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1352519867

   
   ## CI report:
   
   * a8dd96042f42ca74fa8789decdea7397072ec890 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13388)
 
   * a28a39f44afe5561fbf33a6381721e98911a01db UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-14 Thread GitBox


hudi-bot commented on PR #7159:
URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352517388

   
   ## CI report:
   
   * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN
   * 6fd8a8f6cd9907dfe4f25164f2e0240af65cab5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13680)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13738)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] loukey-lj commented on a diff in pull request #7336: [HUDI-5297][HUDI-5298] Refactoring WriteStatus

2022-12-14 Thread GitBox


loukey-lj commented on code in PR #7336:
URL: https://github.com/apache/hudi/pull/7336#discussion_r1049177064


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -86,6 +88,7 @@
 
   protected HoodieTimer timer;
   protected WriteStatus writeStatus;
+  protected HoodieRecordLocation newRecordLocation;

Review Comment:
   I think it is also necessary to obtain the location from every record. The 
index can not only reach the file level, but also the rowGroup level and even 
the row level



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] melin commented on issue #7406: [SUPPORT] Support Debezium JSON

2022-12-14 Thread GitBox


melin commented on issue #7406:
URL: https://github.com/apache/hudi/issues/7406#issuecomment-1352506799

   > @melin can you elaborate the use case pls?
   
   Avro format, which relies on kafka schema registry, increases deployment and 
maintenance costs. It's more convenient if it's json
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-12-14 Thread GitBox


zhuanshenbsj1 commented on PR #7159:
URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352506224

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] loukey-lj commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet

2022-12-14 Thread GitBox


loukey-lj commented on PR #6612:
URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352505945

   > @loukey-lj : can you respond to @guanziyue 's comment above. I will review 
this patch by this week.
   
   Yes, this optimization is applicable to other frameworks. For hudi, its 
advantage is that it can get rowgroups and store them in the index while 
updating the index. For schema evolution, we currently only support adding 
fields. Different rowgroups in the Parquet file can have different schmeas, but 
this is unknown to the query side. If schema changes are not considered, I can 
submit a small demo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure

2022-12-14 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3636:
--
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Clustering fails due to marker creation failure
> ---
>
> Key: HUDI-3636
> URL: https://issues.apache.org/jira/browse/HUDI-3636
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: multi-writer
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Scenario: multi-writer test, one writer doing ingesting with Deltastreamer 
> continuous mode, COW, inserts, async clustering and cleaning (partitions 
> under 2022/1, 2022/2), another writer with Spark datasource doing backfills 
> to different partitions (2021/12).  
> 0.10.0 no MT, clustering instant is inflight (failing it in the middle before 
> upgrade) ➝ 0.11 MT, with multi-writer configuration the same as before.
> The clustering/replace instant cannot make progress due to marker creation 
> failure, failing the DS ingestion as well.  Need to investigate if this is 
> timeline-server-based marker related or MT related.
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
> stage 46.0 failed 1 times, most recent failure: Lost task 2.0 in stage 46.0 
> (TID 277) (192.168.70.231 executor driver): java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
> 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
> Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
> failed: Connection refused (Connection refused)
>     at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>     at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
>     at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>     at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>     at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>     at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>     at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>     at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>     at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>     at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>     at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>     at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>     at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>     at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>     at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>     at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>     at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
> 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
> Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
> failed: Connection refused (Connection refused)
>     at 
> 

[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-14 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1352477380

   
   ## CI report:
   
   * f11664234aaf6c74c98c1d75a364770931f9c00b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13706)
 
   * 3d4f6bf574764b5e8c962b94fa1b7fbfd6e735b5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13737)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-14 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1352474318

   
   ## CI report:
   
   * f11664234aaf6c74c98c1d75a364770931f9c00b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13706)
 
   * 3d4f6bf574764b5e8c962b94fa1b7fbfd6e735b5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field

2022-12-14 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352471026

   
   ## CI report:
   
   * c60978aaf0dd183b05139dda6bd741ea43877f42 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13715)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13736)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field

2022-12-14 Thread GitBox


xicm commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352444013

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-14 Thread GitBox


alexeykudinkin commented on code in PR #7450:
URL: https://github.com/apache/hudi/pull/7450#discussion_r1049125968


##
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##
@@ -1598,7 +1597,7 @@ public void 
testAvroLogRecordReaderWithMixedInsertsCorruptsAndRollback(ExternalS
 scanner.close();
   }
 
-  @ParameterizedTest
+  /*@ParameterizedTest

Review Comment:
   Why changing this one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7423: [HUDI-5384] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2022-12-14 Thread GitBox


hudi-bot commented on PR #7423:
URL: https://github.com/apache/hudi/pull/7423#issuecomment-1352430240

   
   ## CI report:
   
   * 2905580eede076436b472c22da2f2d6af27d1e1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13699)
 
   * 09b901a56869b8282c92d6c05ad746f98f2d6a01 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13735)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

2022-12-14 Thread GitBox


hudi-bot commented on PR #7461:
URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352423852

   
   ## CI report:
   
   * 8305782809d957b5fc7d280414a4e700a47138d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13734)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7423: [HUDI-5384] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2022-12-14 Thread GitBox


hudi-bot commented on PR #7423:
URL: https://github.com/apache/hudi/pull/7423#issuecomment-1352423665

   
   ## CI report:
   
   * 2905580eede076436b472c22da2f2d6af27d1e1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13699)
 
   * 09b901a56869b8282c92d6c05ad746f98f2d6a01 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kasured commented on issue #7246: [SUPPORT] Controlling the Archival process retention

2022-12-14 Thread GitBox


kasured commented on issue #7246:
URL: https://github.com/apache/hudi/issues/7246#issuecomment-1352420251

   Sorry for the late update. I should have followed up with that before. Right 
after the issue was created we decided to go with the presumably least risky 
option of decreasing the batch size 'hoodie.commits.archival.batch'. It helped 
to eliminate the issue with that particular table.
   
   However, the remaining concern for us at the moment is that regardless the 
options I have listed (unless there are some other) either the number of 
archive files will keep increasing (if the archive merge is disabled) or the 
overall archival size will be accumulating (if archival is enabled).
   
   Therefore we can close the issue as there is no OOM anymore, on the other 
hand there seems to be no way to control the growth of the archival 
files/overall size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

2022-12-14 Thread GitBox


hudi-bot commented on PR #7461:
URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352419381

   
   ## CI report:
   
   * c74b3094a3b1cf632e569635ee570bd53ebcde1e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13733)
 
   * 8305782809d957b5fc7d280414a4e700a47138d6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

2022-12-14 Thread GitBox


hudi-bot commented on PR #7461:
URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352415760

   
   ## CI report:
   
   * c74b3094a3b1cf632e569635ee570bd53ebcde1e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13733)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5392:
--
Sprint: 2022/12/12

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5392:
--
Status: In Progress  (was: Open)

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays

2022-12-14 Thread GitBox


hudi-bot commented on PR #7461:
URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352368542

   
   ## CI report:
   
   * c74b3094a3b1cf632e569635ee570bd53ebcde1e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5392:
-
Labels: pull-request-available  (was: )

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] alexeykudinkin opened a new pull request, #7461: [HUDI-5392] Fixing Bootstrapping flow'

2022-12-14 Thread GitBox


alexeykudinkin opened a new pull request, #7461:
URL: https://github.com/apache/hudi/pull/7461

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate

2022-12-14 Thread GitBox


hudi-bot commented on PR #7455:
URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352364791

   
   ## CI report:
   
   * 292630b480861b993951eca862f25f0c9b861ec1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13731)
 
   * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate

2022-12-14 Thread GitBox


hudi-bot commented on PR #7455:
URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352361142

   
   ## CI report:
   
   * 292630b480861b993951eca862f25f0c9b861ec1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13731)
 
   * ee8c9dfe97b6f4fad9824244d93bd81718d56511 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin reassigned HUDI-5392:
-

Assignee: Alexey Kudinkin  (was: Ethan Guo)

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5392:
--
Story Points: 8  (was: 2)

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch release-0.12.2-blockers-candidate updated (292630b4808 -> ee8c9dfe97b)

2022-12-14 Thread akudinkin
This is an automated email from the ASF dual-hosted git repository.

akudinkin pushed a change to branch release-0.12.2-blockers-candidate
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 292630b4808 Avoiding costly lookups into the schema cache in 
`SqlTypedRecord`
 add ee8c9dfe97b Fixing schemas used for bootstrap reader

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/table/action/commit/HoodieMergeHelper.java | 10 --
 .../org/apache/hudi/table/action/commit/FlinkMergeHelper.java  |  9 -
 .../org/apache/hudi/table/action/commit/JavaMergeHelper.java   |  9 -
 3 files changed, 24 insertions(+), 4 deletions(-)



[jira] [Commented] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread Alexey Kudinkin (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647735#comment-17647735
 ] 

Alexey Kudinkin commented on HUDI-5392:
---

Another contributing issue is that when reading Bootstrap file we don't specify 
the expected schema and therefore records from the Bootstrap file are read in 
the schema decode from file's Parquet one. This is problematic b/c when we 
validate the Avro schemas their corresponding names are checked and this 
creates mismatches since Parquet schemas don't bear names/namespaces (of the 
structs)

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin reassigned HUDI-5392:
-

Assignee: Ethan Guo  (was: Alexey Kudinkin)

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-14 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin reassigned HUDI-5392:
-

Assignee: Alexey Kudinkin

> Fix Bootstrap files reader to configure arrays to be read in the new format
> ---
>
> Key: HUDI-5392
> URL: https://issues.apache.org/jira/browse/HUDI-5392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.13.0
>
>
> When writing Bootstrap file we’re using Spark writer that writes arrays in 
> the new format, while Hudi reads it in the old (Avro compatible) format:
> {code:java}
>  // Old
>  optional group tip_history (LIST) {
> repeated group array {
>   optional double amount;
>   optional binary currency (UTF8);
> }
>   }
>  // new
>  optional group tip_history (LIST) {
> repeated group list {
>   optional group element {
> optional double amount;
> optional binary currency (UTF8);
>   }
> }
>   } {code}
>  
> To fix that we need to make sure that Bootstrap files are *always* read in a 
> new format (Spark default) unlike Hudi's Parquet files
> We also need to fix TestDataSourceForBootstrap, as it currently doesn't 
> actually assert that the records are written correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >