date:20220824

[GitHub] [hudi] TengHuo commented on pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



TengHuo commented on PR #6000:
URL: https://github.com/apache/hudi/pull/6000#issuecomment-1226840804

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6493: [HUDI-4715] Needed To ReSync Hive In StreamWriteOperatorCoordinator's…

2022-08-24 Thread GitBox



hudi-bot commented on PR #6493:
URL: https://github.com/apache/hudi/pull/6493#issuecomment-1226819031

   
   ## CI report:
   
   * 1168aaac3b5f8339c7f366ea486d5bf7f8ca0259 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan opened a new pull request, #6494: Hudi 4696 fix flaky

2022-08-24 Thread GitBox



xushiyan opened a new pull request, #6494:
URL: https://github.com/apache/hudi/pull/6494

   ### Change Logs
   
   Fix flakiness
   
   ```text
   [ERROR] org.apache.hudi.hadoop.functional.TestHoodieCombineHiveInputFormat  
Time elapsed: 0.675 s  <<< ERROR!
   java.lang.NullPointerException
at 
org.apache.hudi.common.testutils.minicluster.HdfsTestService.configureDFSCluster(HdfsTestService.java:135)
at 
org.apache.hudi.common.testutils.minicluster.HdfsTestService.start(HdfsTestService.java:87)
at 
org.apache.hudi.common.testutils.minicluster.MiniClusterUtil.setUp(MiniClusterUtil.java:41)
   ```
   
   ### Impact
   
   **Risk level: none**
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226815507

   
   ## CI report:
   
   * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930)
 
   * 5fbf569007c321799dcc80b827c0a2417e1dbf49 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4715) Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method

2022-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4715:
-
Labels: pull-request-available  (was: )

> Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method
> 
>
> Key: HUDI-4715
> URL: https://issues.apache.org/jira/browse/HUDI-4715
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: yuemeng
>Priority: Major
>  Labels: pull-request-available
>
> Currently. It recommits the last inflight instant if the write metadata 
> checkpoint successfully
> but was not committed due to some rare cases. We also need to ReSync hive as 
> same as we recommit the instant if we need sync hive



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4715) Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method

2022-08-24 Thread yuemeng (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuemeng reassigned HUDI-4715:
-

Assignee: yuemeng

> Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method
> 
>
> Key: HUDI-4715
> URL: https://issues.apache.org/jira/browse/HUDI-4715
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: yuemeng
>Assignee: yuemeng
>Priority: Major
>  Labels: pull-request-available
>
> Currently. It recommits the last inflight instant if the write metadata 
> checkpoint successfully
> but was not committed due to some rare cases. We also need to ReSync hive as 
> same as we recommit the instant if we need sync hive



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] JerryYue-M opened a new pull request, #6493: [HUDI-4715] Needed To ReSync Hive In StreamWriteOperatorCoordinator's…

2022-08-24 Thread GitBox



JerryYue-M opened a new pull request, #6493:
URL: https://github.com/apache/hudi/pull/6493

   … initInstant method
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226812308

   
   ## CI report:
   
   * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930)
 
   * 5fbf569007c321799dcc80b827c0a2417e1dbf49 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite

2022-08-24 Thread GitBox



hudi-bot commented on PR #6490:
URL: https://github.com/apache/hudi/pull/6490#issuecomment-1226806258

   
   ## CI report:
   
   * 9d140233d69c7b5d3c7e68d31c3824e7082e40cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226806206

   
   ## CI report:
   
   * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-4665) Flip default for "ignore.failed.batch" for streaming sink

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-4665.

Fix Version/s: 0.12.1
   Resolution: Fixed

> Flip default for "ignore.failed.batch" for streaming sink
> -
>
> Key: HUDI-4665
> URL: https://issues.apache.org/jira/browse/HUDI-4665
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[hudi] branch master updated (5f92221655 -> c188852f49)

2022-08-24 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 5f92221655 [HUDI-4441] Log4j2 configuration fixes and removal of 
log4j1 dependencies (#6170)
 add c188852f49 [HUDI-4665] Flipping default for "ignore failed batch" 
config in streaming sink to false (#6450)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/configuration/FlinkOptions.java  | 3 ++-
 .../src/main/scala/org/apache/hudi/DataSourceOptions.scala | 7 ---
 .../src/main/scala/org/apache/hudi/HoodieStreamingSink.scala   | 2 +-
 .../hudi-spark/src/test/java/HoodieJavaStreamingApp.java   | 1 +
 4 files changed, 8 insertions(+), 5 deletions(-)

[GitHub] [hudi] xushiyan merged pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



xushiyan merged PR #6450:
URL: https://github.com/apache/hudi/pull/6450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4716) Avoid bundle parquet in hadoop-mr

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4716:
-
Fix Version/s: 0.13.0
   (was: 0.12.1)

> Avoid bundle parquet in hadoop-mr
> -
>
> Key: HUDI-4716
> URL: https://issues.apache.org/jira/browse/HUDI-4716
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.13.0
>
>
> As per discussion in 
> https://github.com/apache/hudi/pull/5250#discussion_r930144788
> This will reduce the bundle size and uphold the principle of not bundling 
> file storage format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4716) Avoid bundle parquet in hadoop-mr

2022-08-24 Thread Raymond Xu (Jira)

Raymond Xu created HUDI-4716:


 Summary: Avoid bundle parquet in hadoop-mr
 Key: HUDI-4716
 URL: https://issues.apache.org/jira/browse/HUDI-4716
 Project: Apache Hudi
  Issue Type: Improvement
  Components: dependencies
Reporter: Raymond Xu
 Fix For: 0.12.1


As per discussion in 
https://github.com/apache/hudi/pull/5250#discussion_r930144788

This will reduce the bundle size and uphold the principle of not bundling file 
storage format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-24 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226773397

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   * e5b86079b7405e9bb1604c9960d85349855db23a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929)
 
   * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN
   * b26294c07ac06186c66a10444e7677656be94037 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10934)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-24 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226770952

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   * e5b86079b7405e9bb1604c9960d85349855db23a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929)
 
   * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN
   * b26294c07ac06186c66a10444e7677656be94037 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource

2022-08-24 Thread GitBox



hudi-bot commented on PR #6135:
URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226770645

   
   ## CI report:
   
   * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN
   * 194396032e698ac4210d8652a969c7e58832d5db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925)
 
   * 9a4df85fd5b1787af587817cba959c536d904fde Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zjuwangg opened a new pull request, #6492: Update compaction.md

2022-08-24 Thread GitBox



zjuwangg opened a new pull request, #6492:
URL: https://github.com/apache/hudi/pull/6492

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226768249

   
   ## CI report:
   
   * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4715) Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method

2022-08-24 Thread yuemeng (Jira)

yuemeng created HUDI-4715:
-

 Summary: Needed To ReSync Hive In StreamWriteOperatorCoordinator's 
initInstant method
 Key: HUDI-4715
 URL: https://issues.apache.org/jira/browse/HUDI-4715
 Project: Apache Hudi
  Issue Type: Bug
Reporter: yuemeng


Currently. It recommits the last inflight instant if the write metadata 
checkpoint successfully
but was not committed due to some rare cases. We also need to ReSync hive as 
same as we recommit the instant if we need sync hive



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…

2022-08-24 Thread GitBox



hudi-bot commented on PR #6491:
URL: https://github.com/apache/hudi/pull/6491#issuecomment-1226737763

   
   ## CI report:
   
   * 25604fb76beaec486a84061f29f63260b3412521 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10932)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-24 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226737727

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   * e5b86079b7405e9bb1604c9960d85349855db23a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929)
 
   * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] WangCHX commented on issue #6487: [SUPPORT] Primary key check in deltastreamer

2022-08-24 Thread GitBox



WangCHX commented on issue #6487:
URL: https://github.com/apache/hudi/issues/6487#issuecomment-1226735410

   thanks. @yihua 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…

2022-08-24 Thread GitBox



hudi-bot commented on PR #6491:
URL: https://github.com/apache/hudi/pull/6491#issuecomment-1226735211

   
   ## CI report:
   
   * 25604fb76beaec486a84061f29f63260b3412521 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite

2022-08-24 Thread GitBox



hudi-bot commented on PR #6490:
URL: https://github.com/apache/hudi/pull/6490#issuecomment-1226735186

   
   ## CI report:
   
   * 9d140233d69c7b5d3c7e68d31c3824e7082e40cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-24 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226735169

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   * e5b86079b7405e9bb1604c9960d85349855db23a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226735140

   
   ## CI report:
   
   * 5fa2c9cab3ed92e80292666043cbbd71cb24dc23 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10915)
 
   * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226735073

   
   ## CI report:
   
   * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927)
 
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-24 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226732509

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   * e5b86079b7405e9bb1604c9960d85349855db23a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226732486

   
   ## CI report:
   
   * 5fa2c9cab3ed92e80292666043cbbd71cb24dc23 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10915)
 
   * cab8551200693c9633c54b3e2acc9ab68ddc44cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite

2022-08-24 Thread GitBox



hudi-bot commented on PR #6490:
URL: https://github.com/apache/hudi/pull/6490#issuecomment-1226732541

   
   ## CI report:
   
   * 9d140233d69c7b5d3c7e68d31c3824e7082e40cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226732418

   
   ## CI report:
   
   * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927)
 
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource

2022-08-24 Thread GitBox



hudi-bot commented on PR #6135:
URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226732135

   
   ## CI report:
   
   * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN
   * 194396032e698ac4210d8652a969c7e58832d5db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925)
 
   * 9a4df85fd5b1787af587817cba959c536d904fde UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4714) HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig

2022-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4714:
-
Labels: pull-request-available  (was: )

> HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig
> --
>
> Key: HUDI-4714
> URL: https://issues.apache.org/jira/browse/HUDI-4714
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: yuemeng
>Assignee: yuemeng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, it doesn't load callback config to write config when call 
> StreamUtil's getHoodieClientConfig method
> So In hoodie Flink write client ,callback never worked
> {code}
> HoodieWriteConfig.Builder builder =
> HoodieWriteConfig.newBuilder()
> .withEngineType(EngineType.FLINK)
> .withPath(conf.getString(FlinkOptions.PATH))
> .combineInput(conf.getBoolean(FlinkOptions.PRE_COMBINE), true)
> .withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf))
> .withClusteringConfig(
> HoodieClusteringConfig.newBuilder()
> .withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED))
> .withClusteringPlanStrategyClass(conf.getString(FlinkOptions.CLUSTERING_PLAN_STRATEGY_CLASS))
> .withClusteringPlanPartitionFilterMode(
> ClusteringPlanPartitionFilterMode.valueOf(conf.getString(FlinkOptions.CLUSTERING_PLAN_PARTITION_FILTER_MODE_NAME)))
> .withClusteringTargetPartitions(conf.getInteger(FlinkOptions.CLUSTERING_TARGET_PARTITIONS))
> .withClusteringMaxNumGroups(conf.getInteger(FlinkOptions.CLUSTERING_MAX_NUM_GROUPS))
> .withClusteringTargetFileMaxBytes(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES))
> .withClusteringPlanSmallFileLimit(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT)
>  * 1024 * 1024L)
> .withClusteringSkipPartitionsFromLatest(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST))
> .withAsyncClusteringMaxCommits(conf.getInteger(FlinkOptions.CLUSTERING_DELTA_COMMITS))
> .build())
> .withCleanConfig(HoodieCleanConfig.newBuilder()
> .withAsyncClean(conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED))
> .retainCommits(conf.getInteger(FlinkOptions.CLEAN_RETAIN_COMMITS))
> .cleanerNumHoursRetained(conf.getInteger(FlinkOptions.CLEAN_RETAIN_HOURS))
> .retainFileVersions(conf.getInteger(FlinkOptions.CLEAN_RETAIN_FILE_VERSIONS))
> // override and hardcode to 20,
> // actually Flink cleaning is always with parallelism 1 now
> .withCleanerParallelism(20)
> .withCleanerPolicy(HoodieCleaningPolicy.valueOf(conf.getString(FlinkOptions.CLEAN_POLICY)))
> .build())
> .withArchivalConfig(HoodieArchivalConfig.newBuilder()
> .archiveCommitsWith(conf.getInteger(FlinkOptions.ARCHIVE_MIN_COMMITS), 
> conf.getInteger(FlinkOptions.ARCHIVE_MAX_COMMITS))
> .build())
> .withCompactionConfig(HoodieCompactionConfig.newBuilder()
> .withTargetIOPerCompactionInMB(conf.getLong(FlinkOptions.COMPACTION_TARGET_IO))
> .withInlineCompactionTriggerStrategy(
> CompactionTriggerStrategy.valueOf(conf.getString(FlinkOptions.COMPACTION_TRIGGER_STRATEGY).toUpperCase(Locale.ROOT)))
> .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_COMMITS))
> .withMaxDeltaSecondsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_SECONDS))
> .build())
> .withMemoryConfig(
> HoodieMemoryConfig.newBuilder()
> .withMaxMemoryMaxSize(
> conf.getInteger(FlinkOptions.WRITE_MERGE_MAX_MEMORY) * 1024 * 1024L,
> conf.getInteger(FlinkOptions.COMPACTION_MAX_MEMORY) * 1024 * 1024L
> ).build())
> .forTable(conf.getString(FlinkOptions.TABLE_NAME))
> .withStorageConfig(HoodieStorageConfig.newBuilder()
> .logFileDataBlockMaxSize(conf.getInteger(FlinkOptions.WRITE_LOG_BLOCK_SIZE) * 
> 1024 * 1024)
> .logFileMaxSize(conf.getLong(FlinkOptions.WRITE_LOG_MAX_SIZE) * 1024 * 1024)
> .parquetBlockSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_BLOCK_SIZE) * 
> 1024 * 1024)
> .parquetPageSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_PAGE_SIZE) * 1024 
> * 1024)
> .parquetMaxFileSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_MAX_FILE_SIZE) 
> * 1024 * 1024L)
> .build())
> .withMetadataConfig(HoodieMetadataConfig.newBuilder()
> .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
> .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
> .build())
> .withLockConfig(HoodieLockConfig.newBuilder()
> .withLockProvider(FileSystemBasedLockProvider.class)
> .withLockWaitTimeInMillis(2000L) // 2s
> .withFileSystemLockExpire(1) // 1 minute
> .withClientNumRetries(30)
> .withFileSystemLockPath(StreamerUtil.getAuxiliaryPath(conf))
> .build())
> .withPayloadConfig(getPayloadConfig(conf))
> .withEmbeddedTimelineServerEnabled(enableEmbeddedTimelineService)
> .withEmbeddedTimelineServerReuseEnabled(true) // make

[GitHub] [hudi] JerryYue-M opened a new pull request, #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…

2022-08-24 Thread GitBox



JerryYue-M opened a new pull request, #6491:
URL: https://github.com/apache/hudi/pull/6491

   …ieWriteConfig
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-24 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226729451

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226729350

   
   ## CI report:
   
   * 7deae45265d4ade6c324d4477e6144289f1714fd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #6409: [HUDI-4629] Create hive table from existing hoodie table failed when the table schema is not defined

2022-08-24 Thread GitBox



xushiyan commented on code in PR #6409:
URL: https://github.com/apache/hudi/pull/6409#discussion_r954466771


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala:
##
@@ -129,7 +129,17 @@ class HoodieCatalogTable(val spark: SparkSession, var 
table: CatalogTable) exten
   /**
* Table schema
*/
-  lazy val tableSchema: StructType = table.schema
+  lazy val tableSchema: StructType = if (table.schema.nonEmpty) {
+table.schema
+  } else {
+val schemaFromMetaOpt = loadTableSchemaByMetaClient()

Review Comment:
   @jinxing64 can you also have a look pls? given this is based on your 
previous change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #6409: [HUDI-4629] Create hive table from existing hoodie table failed when the table schema is not defined

2022-08-24 Thread GitBox



xushiyan commented on code in PR #6409:
URL: https://github.com/apache/hudi/pull/6409#discussion_r954466051


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala:
##
@@ -129,7 +129,17 @@ class HoodieCatalogTable(val spark: SparkSession, var 
table: CatalogTable) exten
   /**
* Table schema
*/
-  lazy val tableSchema: StructType = table.schema
+  lazy val tableSchema: StructType = if (table.schema.nonEmpty) {
+table.schema
+  } else {
+val schemaFromMetaOpt = loadTableSchemaByMetaClient()

Review Comment:
   this already handled in `parseSchemaAndConfigs()` , isn't it? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4714) HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig

2022-08-24 Thread yuemeng (Jira)

yuemeng created HUDI-4714:
-

 Summary: HoodieFlinkWriteClient can't load callback config to 
HoodieWriteConfig
 Key: HUDI-4714
 URL: https://issues.apache.org/jira/browse/HUDI-4714
 Project: Apache Hudi
  Issue Type: Bug
Reporter: yuemeng


Currently, it doesn't load callback config to write config when call 
StreamUtil's getHoodieClientConfig method

So In hoodie Flink write client ,callback never worked

{code}

HoodieWriteConfig.Builder builder =
HoodieWriteConfig.newBuilder()
.withEngineType(EngineType.FLINK)
.withPath(conf.getString(FlinkOptions.PATH))
.combineInput(conf.getBoolean(FlinkOptions.PRE_COMBINE), true)
.withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf))
.withClusteringConfig(
HoodieClusteringConfig.newBuilder()
.withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED))
.withClusteringPlanStrategyClass(conf.getString(FlinkOptions.CLUSTERING_PLAN_STRATEGY_CLASS))
.withClusteringPlanPartitionFilterMode(
ClusteringPlanPartitionFilterMode.valueOf(conf.getString(FlinkOptions.CLUSTERING_PLAN_PARTITION_FILTER_MODE_NAME)))
.withClusteringTargetPartitions(conf.getInteger(FlinkOptions.CLUSTERING_TARGET_PARTITIONS))
.withClusteringMaxNumGroups(conf.getInteger(FlinkOptions.CLUSTERING_MAX_NUM_GROUPS))
.withClusteringTargetFileMaxBytes(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES))
.withClusteringPlanSmallFileLimit(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT)
 * 1024 * 1024L)
.withClusteringSkipPartitionsFromLatest(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST))
.withAsyncClusteringMaxCommits(conf.getInteger(FlinkOptions.CLUSTERING_DELTA_COMMITS))
.build())
.withCleanConfig(HoodieCleanConfig.newBuilder()
.withAsyncClean(conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED))
.retainCommits(conf.getInteger(FlinkOptions.CLEAN_RETAIN_COMMITS))
.cleanerNumHoursRetained(conf.getInteger(FlinkOptions.CLEAN_RETAIN_HOURS))
.retainFileVersions(conf.getInteger(FlinkOptions.CLEAN_RETAIN_FILE_VERSIONS))
// override and hardcode to 20,
// actually Flink cleaning is always with parallelism 1 now
.withCleanerParallelism(20)
.withCleanerPolicy(HoodieCleaningPolicy.valueOf(conf.getString(FlinkOptions.CLEAN_POLICY)))
.build())
.withArchivalConfig(HoodieArchivalConfig.newBuilder()
.archiveCommitsWith(conf.getInteger(FlinkOptions.ARCHIVE_MIN_COMMITS), 
conf.getInteger(FlinkOptions.ARCHIVE_MAX_COMMITS))
.build())
.withCompactionConfig(HoodieCompactionConfig.newBuilder()
.withTargetIOPerCompactionInMB(conf.getLong(FlinkOptions.COMPACTION_TARGET_IO))
.withInlineCompactionTriggerStrategy(
CompactionTriggerStrategy.valueOf(conf.getString(FlinkOptions.COMPACTION_TRIGGER_STRATEGY).toUpperCase(Locale.ROOT)))
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_COMMITS))
.withMaxDeltaSecondsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_SECONDS))
.build())
.withMemoryConfig(
HoodieMemoryConfig.newBuilder()
.withMaxMemoryMaxSize(
conf.getInteger(FlinkOptions.WRITE_MERGE_MAX_MEMORY) * 1024 * 1024L,
conf.getInteger(FlinkOptions.COMPACTION_MAX_MEMORY) * 1024 * 1024L
).build())
.forTable(conf.getString(FlinkOptions.TABLE_NAME))
.withStorageConfig(HoodieStorageConfig.newBuilder()
.logFileDataBlockMaxSize(conf.getInteger(FlinkOptions.WRITE_LOG_BLOCK_SIZE) * 
1024 * 1024)
.logFileMaxSize(conf.getLong(FlinkOptions.WRITE_LOG_MAX_SIZE) * 1024 * 1024)
.parquetBlockSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_BLOCK_SIZE) * 1024 
* 1024)
.parquetPageSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_PAGE_SIZE) * 1024 * 
1024)
.parquetMaxFileSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_MAX_FILE_SIZE) * 
1024 * 1024L)
.build())
.withMetadataConfig(HoodieMetadataConfig.newBuilder()
.enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
.build())
.withLockConfig(HoodieLockConfig.newBuilder()
.withLockProvider(FileSystemBasedLockProvider.class)
.withLockWaitTimeInMillis(2000L) // 2s
.withFileSystemLockExpire(1) // 1 minute
.withClientNumRetries(30)
.withFileSystemLockPath(StreamerUtil.getAuxiliaryPath(conf))
.build())
.withPayloadConfig(getPayloadConfig(conf))
.withEmbeddedTimelineServerEnabled(enableEmbeddedTimelineService)
.withEmbeddedTimelineServerReuseEnabled(true) // make write client embedded 
timeline service singleton
.withAutoCommit(false)
.withAllowOperationMetadataField(conf.getBoolean(FlinkOptions.CHANGELOG_ENABLED))
.withProps(flinkConf2TypedProperties(conf))
.withSchema(getSourceSchema(conf).toString());

{code}

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4714) HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig

2022-08-24 Thread yuemeng (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuemeng reassigned HUDI-4714:
-

Assignee: yuemeng

> HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig
> --
>
> Key: HUDI-4714
> URL: https://issues.apache.org/jira/browse/HUDI-4714
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: yuemeng
>Assignee: yuemeng
>Priority: Major
>
> Currently, it doesn't load callback config to write config when call 
> StreamUtil's getHoodieClientConfig method
> So In hoodie Flink write client ,callback never worked
> {code}
> HoodieWriteConfig.Builder builder =
> HoodieWriteConfig.newBuilder()
> .withEngineType(EngineType.FLINK)
> .withPath(conf.getString(FlinkOptions.PATH))
> .combineInput(conf.getBoolean(FlinkOptions.PRE_COMBINE), true)
> .withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf))
> .withClusteringConfig(
> HoodieClusteringConfig.newBuilder()
> .withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED))
> .withClusteringPlanStrategyClass(conf.getString(FlinkOptions.CLUSTERING_PLAN_STRATEGY_CLASS))
> .withClusteringPlanPartitionFilterMode(
> ClusteringPlanPartitionFilterMode.valueOf(conf.getString(FlinkOptions.CLUSTERING_PLAN_PARTITION_FILTER_MODE_NAME)))
> .withClusteringTargetPartitions(conf.getInteger(FlinkOptions.CLUSTERING_TARGET_PARTITIONS))
> .withClusteringMaxNumGroups(conf.getInteger(FlinkOptions.CLUSTERING_MAX_NUM_GROUPS))
> .withClusteringTargetFileMaxBytes(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES))
> .withClusteringPlanSmallFileLimit(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT)
>  * 1024 * 1024L)
> .withClusteringSkipPartitionsFromLatest(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST))
> .withAsyncClusteringMaxCommits(conf.getInteger(FlinkOptions.CLUSTERING_DELTA_COMMITS))
> .build())
> .withCleanConfig(HoodieCleanConfig.newBuilder()
> .withAsyncClean(conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED))
> .retainCommits(conf.getInteger(FlinkOptions.CLEAN_RETAIN_COMMITS))
> .cleanerNumHoursRetained(conf.getInteger(FlinkOptions.CLEAN_RETAIN_HOURS))
> .retainFileVersions(conf.getInteger(FlinkOptions.CLEAN_RETAIN_FILE_VERSIONS))
> // override and hardcode to 20,
> // actually Flink cleaning is always with parallelism 1 now
> .withCleanerParallelism(20)
> .withCleanerPolicy(HoodieCleaningPolicy.valueOf(conf.getString(FlinkOptions.CLEAN_POLICY)))
> .build())
> .withArchivalConfig(HoodieArchivalConfig.newBuilder()
> .archiveCommitsWith(conf.getInteger(FlinkOptions.ARCHIVE_MIN_COMMITS), 
> conf.getInteger(FlinkOptions.ARCHIVE_MAX_COMMITS))
> .build())
> .withCompactionConfig(HoodieCompactionConfig.newBuilder()
> .withTargetIOPerCompactionInMB(conf.getLong(FlinkOptions.COMPACTION_TARGET_IO))
> .withInlineCompactionTriggerStrategy(
> CompactionTriggerStrategy.valueOf(conf.getString(FlinkOptions.COMPACTION_TRIGGER_STRATEGY).toUpperCase(Locale.ROOT)))
> .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_COMMITS))
> .withMaxDeltaSecondsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_SECONDS))
> .build())
> .withMemoryConfig(
> HoodieMemoryConfig.newBuilder()
> .withMaxMemoryMaxSize(
> conf.getInteger(FlinkOptions.WRITE_MERGE_MAX_MEMORY) * 1024 * 1024L,
> conf.getInteger(FlinkOptions.COMPACTION_MAX_MEMORY) * 1024 * 1024L
> ).build())
> .forTable(conf.getString(FlinkOptions.TABLE_NAME))
> .withStorageConfig(HoodieStorageConfig.newBuilder()
> .logFileDataBlockMaxSize(conf.getInteger(FlinkOptions.WRITE_LOG_BLOCK_SIZE) * 
> 1024 * 1024)
> .logFileMaxSize(conf.getLong(FlinkOptions.WRITE_LOG_MAX_SIZE) * 1024 * 1024)
> .parquetBlockSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_BLOCK_SIZE) * 
> 1024 * 1024)
> .parquetPageSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_PAGE_SIZE) * 1024 
> * 1024)
> .parquetMaxFileSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_MAX_FILE_SIZE) 
> * 1024 * 1024L)
> .build())
> .withMetadataConfig(HoodieMetadataConfig.newBuilder()
> .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
> .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
> .build())
> .withLockConfig(HoodieLockConfig.newBuilder()
> .withLockProvider(FileSystemBasedLockProvider.class)
> .withLockWaitTimeInMillis(2000L) // 2s
> .withFileSystemLockExpire(1) // 1 minute
> .withClientNumRetries(30)
> .withFileSystemLockPath(StreamerUtil.getAuxiliaryPath(conf))
> .build())
> .withPayloadConfig(getPayloadConfig(conf))
> .withEmbeddedTimelineServerEnabled(enableEmbeddedTimelineService)
> .withEmbeddedTimelineServerReuseEnabled(true) // make write client embedded 
> timeline service singleton
> .withAutoCommit(false

[jira] [Updated] (HUDI-4713) Fix flaky ITTestHoodieDataSource#testAppendWrite

2022-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4713:
-
Labels: pull-request-available  (was: )

> Fix flaky ITTestHoodieDataSource#testAppendWrite
> 
>
> Key: HUDI-4713
> URL: https://issues.apache.org/jira/browse/HUDI-4713
> Project: Apache Hudi
>  Issue Type: Test
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] danny0405 opened a new pull request, #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite

2022-08-24 Thread GitBox



danny0405 opened a new pull request, #6490:
URL: https://github.com/apache/hudi/pull/6490

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4713) Fix flaky ITTestHoodieDataSource#testAppendWrite

2022-08-24 Thread Danny Chen (Jira)

Danny Chen created HUDI-4713:


 Summary: Fix flaky ITTestHoodieDataSource#testAppendWrite
 Key: HUDI-4713
 URL: https://issues.apache.org/jira/browse/HUDI-4713
 Project: Apache Hudi
  Issue Type: Test
  Components: flink
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] wzx140 commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



wzx140 commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1226708070

   @xiarixiaoyao Could you please review it? Really thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #6240: [HUDI-4482][HUDI-4483] fix checkstyle error and remove guava

2022-08-24 Thread GitBox



xushiyan commented on code in PR #6240:
URL: https://github.com/apache/hudi/pull/6240#discussion_r954451365


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowHoodieLogFileMetadataProcedure.scala:
##
@@ -30,6 +29,7 @@ import org.apache.parquet.avro.AvroSchemaConverter
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, 
StructType}
 
+import java.util

Review Comment:
   to be consistent with other code like in 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala
 we'd prefer use `java.util.HashMap` (explicit) over `util.HashMap`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-4392) Flink MOR table inline compaction plan execution sequence should be configurable

2022-08-24 Thread yuemeng (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuemeng reassigned HUDI-4392:
-

Assignee: yuemeng

> Flink MOR table inline compaction plan execution sequence should be 
> configurable
> 
>
> Key: HUDI-4392
> URL: https://issues.apache.org/jira/browse/HUDI-4392
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: yuemeng
>Assignee: yuemeng
>Priority: Major
>  Labels: pull-request-available
>
> when there leaves too much compaction in some cases. Flink inline compaction 
> always deals the earliest then one by one, it may fail the job because of too 
> much compact operation and we can do nothing
> Flink MOR table inline compaction plan execution sequence should be 
> configurable to avoid too much compaction needed to compact to fail the job
> When there are a large number of compact plans that need to be executed， 
> inline compact operation handles the latest compaction plan to ensure 
> stability and some external job (offline or compact server) to handle the 
> rest compaction plan.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4485:
-
Labels: pull-request-available  (was: )

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: spring-shell-1.2.0.RELEASE.jar
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> {*}{{*}}Describe the problem you faced{{*}}{*}
> Hudi cli got empty result after running command show fsview all.
> ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png])
> The type of table t1 is COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> {*}{{*}}To Reproduce{{*}}{*}
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
> uuid VARCHAR(20),
> name VARCHAR(10),
> age INT,
> ts TIMESTAMP(3),
> `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
> 'connector' = 'hudi',
> 'path' = 'hdfs:///path/to/table/',
> 'table.type' = 'COPY_ON_WRITE'
> );
> – insert data using values
> INSERT INTO t1 VALUES
> ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> {*}{{*}}Expected behavior{{*}}{*}
> `show fsview all` in Hudi cli should return all file slices.
> {*}{{*}}Environment Description{{*}}{*}
>  * Hudi version : 0.11.1
>  * Spark version : 3.1.1
>  * Hive version : 3.1.0
>  * Hadoop version : 3.1.1
>  * Storage (HDFS/S3/GCS..) : HDFS
>  * Running on Docker? (yes/no) : no
> {*}{{*}}Additional context{{*}}{*}
> No.
> {*}{{*}}Stacktrace{{*}}{*}
> N/A
>  
> Temporary solution：
> I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the 
> attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] paul8263 opened a new pull request, #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-24 Thread GitBox



paul8263 opened a new pull request, #6489:
URL: https://github.com/apache/hudi/pull/6489

   …value for show fsview all pathRegex parameter.
   
   ### Change Logs
   
   In order to fix 
[HUDI-4485](https://issues.apache.org/jira/projects/HUDI/issues/HUDI-4485), we 
bumped spring shell to 2.1.1 and updated the default value for show fsview all 
pathRegex parameter.
   
   ### Impact
   
   Public API and user-facing features are not affected. But it may have 
performance impact.
   
   **Risk level: medium**
   
   Updated the unit test and all hudi-cli tests can pass.
   
   Also its functionality has been tested in the real-world environment.
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226691384

   
   ## CI report:
   
   * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan merged pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-08-24 Thread GitBox



xushiyan merged PR #6170:
URL: https://github.com/apache/hudi/pull/6170


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6487: [SUPPORT] Primary key check in deltastreamer

2022-08-24 Thread GitBox



yihua commented on issue #6487:
URL: https://github.com/apache/hudi/issues/6487#issuecomment-1226658157

   @WangCHX Let me check the code.  There should be validation of the write 
config against the table config to make sure they are consistent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226656369

   
   ## CI report:
   
   * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10923)
 
   * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226621598

   
   ## CI report:
   
   * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10923)
 
   * 7deae45265d4ade6c324d4477e6144289f1714fd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (1e162bb73a -> e5584b3735)

2022-08-24 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 1e162bb73a HUDI-4687 add show_invalid_parquet procedure (#6480)
 add e5584b3735 [HUDI-4584] Fixing `SQLConf` not being propagated to 
executor (#6352)

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/hudi/HoodieSparkUtils.scala   | 15 +-
 .../spark/sql/execution/SQLConfInjectingRDD.scala  | 61 ++
 2 files changed, 74 insertions(+), 2 deletions(-)
 create mode 100644 
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/SQLConfInjectingRDD.scala

[GitHub] [hudi] yihua merged pull request #6352: [HUDI-4584] Fixing `SQLConf` not being propagated to executor

2022-08-24 Thread GitBox



yihua merged PR #6352:
URL: https://github.com/apache/hudi/pull/6352


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-08-24 Thread GitBox



yihua commented on issue #5765:
URL: https://github.com/apache/hudi/issues/5765#issuecomment-1226572795

   @nsivabalan We can document the workaround, yet it's still not ideal for 
users relying on releases.  I'll check if we can fix it by the dependency 
management within Hudi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch asf-site updated: [DOCS] Fix youtube image (#6488)

2022-08-24 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 37aeb19151 [DOCS] Fix youtube image (#6488)
37aeb19151 is described below

commit 37aeb19151447e0a3905b092b2ba93ffe408447e
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Thu Aug 25 03:32:19 2022 +0530

[DOCS] Fix youtube image (#6488)
---
 website/src/css/custom.css|   2 +-
 website/static/assets/images/youtube.jpeg | Bin 8825 -> 0 bytes
 website/static/assets/images/youtube.png  | Bin 0 -> 970 bytes
 3 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/website/src/css/custom.css b/website/src/css/custom.css
index 21c02d3bbb..2d9bd50881 100644
--- a/website/src/css/custom.css
+++ b/website/src/css/custom.css
@@ -94,7 +94,7 @@ html[data-theme='dark'] .docusaurus-highlight-code-line {
 }
 
 .header-youtube-link:before {
-  background: url(/assets/images/youtube.jpeg) no-repeat;
+  background: url(/assets/images/youtube.png) no-repeat;
   content: "";
   display: flex;
   height: 30px;
diff --git a/website/static/assets/images/youtube.jpeg 
b/website/static/assets/images/youtube.jpeg
deleted file mode 100644
index 0b5cf9732a..00
Binary files a/website/static/assets/images/youtube.jpeg and /dev/null differ
diff --git a/website/static/assets/images/youtube.png 
b/website/static/assets/images/youtube.png
new file mode 100644
index 00..0700bfb1b6
Binary files /dev/null and b/website/static/assets/images/youtube.png differ

[GitHub] [hudi] xushiyan merged pull request #6488: [DOCS] Fix youtube image in header

2022-08-24 Thread GitBox



xushiyan merged PR #6488:
URL: https://github.com/apache/hudi/pull/6488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-24 Thread GitBox



alexeykudinkin commented on code in PR #6046:
URL: https://github.com/apache/hudi/pull/6046#discussion_r951930671


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -98,10 +110,18 @@ public HoodieWriteMetadata> 
performClustering(final Hood
 // execute clustering for each group async and collect WriteStatus
 Stream> writeStatusesStream = FutureUtils.allOf(
 clusteringPlan.getInputGroups().stream()
-.map(inputGroup -> runClusteringForGroupAsync(inputGroup,
-clusteringPlan.getStrategy().getStrategyParams(),
-
Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false),
-instantTime))
+.map(inputGroup -> {
+  if 
(Boolean.parseBoolean(getWriteConfig().getString(HoodieClusteringConfig.CLUSTERING_AS_ROW)))
 {

Review Comment:
   Let's abstract this as a method in `WriteConfig`



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RowSpatialCurveSortPartitioner.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.execution.bulkinsert;
+
+import org.apache.hudi.config.HoodieClusteringConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.sort.SpaceCurveSortingHelper;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class RowSpatialCurveSortPartitioner extends 
RowCustomColumnsSortPartitioner {

Review Comment:
   Why do we inherit from `RowCustomColumnsSortPartitioner`



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -273,6 +398,62 @@ private HoodieData> 
readRecordsForGroupBaseFiles(JavaSparkContex
 .map(record -> transform(record, writeConfig)));
   }
 
+  /**
+   * Get dataset of all records for the group. This includes all records from 
file slice (Apply updates from log files, if any).
+   */
+  private Dataset readRecordsForGroupAsRow(JavaSparkContext jsc,
+   HoodieClusteringGroup 
clusteringGroup,
+   String instantTime) {
+List clusteringOps = 
clusteringGroup.getSlices().stream()
+.map(ClusteringOperation::create).collect(Collectors.toList());
+boolean hasLogFiles = clusteringOps.stream().anyMatch(op -> 
op.getDeltaFilePaths().size() > 0);
+SQLContext sqlContext = new SQLContext(jsc.sc());
+
+String[] baseFilePaths = clusteringOps
+.stream()
+.map(op -> {
+  ArrayList pairs = new ArrayList<>();
+  if (op.getBootstrapFilePath() != null) {
+pairs.add(op.getBootstrapFilePath());
+  }
+  if (op.getDataFilePath() != null) {
+pairs.add(op.getDataFilePath());
+  }
+  return pairs;
+})
+.flatMap(Collection::stream)
+.filter(path -> !path.isEmpty())
+.toArray(String[]::new);
+String[] deltaPaths = clusteringOps
+.stream()
+.filter(op -> !op.getDeltaFilePaths().isEmpty())
+.flatMap(op -> op.getDeltaFilePaths().stream())
+.toArray(String[]::new);
+
+Dataset inputRecords;
+if (hasLogFiles) {
+  String compactionFractor = 
Option.ofNullable(getWriteConfig().getString("compaction.memory.fraction"))
+  .orElse("0.75");
+  String[] paths = new String[baseFilePaths.length + deltaPaths.length];
+  System.arraycopy(baseFilePaths, 0, paths, 0, baseFilePaths.length);
+  System.arraycopy(deltaPaths, 0, paths, baseFilePaths.length, 
deltaPaths.length);

Review Comment:
   You can use `CollectionUtils.combine`



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieInternalWriteStatusCoordinator.java:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional inf

[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource

2022-08-24 Thread GitBox



hudi-bot commented on PR #6135:
URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226410197

   
   ## CI report:
   
   * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN
   * 194396032e698ac4210d8652a969c7e58832d5db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bhasudha opened a new pull request, #6488: [DOCS] Fix youtube image in header

2022-08-24 Thread GitBox



bhasudha opened a new pull request, #6488:
URL: https://github.com/apache/hudi/pull/6488

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on pull request #6216: [HUDI-4475] fix create table with not exists hoodie properties file

2022-08-24 Thread GitBox



xushiyan commented on PR #6216:
URL: https://github.com/apache/hudi/pull/6216#issuecomment-1226338378

   > > > What kind of operations would need to cause this case?
   > > 
   > > 
   > > Because now the production environment finds that re-deleting the table 
(non-purge) and some other abnormal scenarios of compcation will cause the hudi 
properties file to be deleted, and then the re-creation of the table will fail.
   > 
   > I think we need to figure out the root cause why compaction deletes 
`hoodie.properties`. Also can we add ut for deleting the `hoodie.properties` 
manually and see whether the writing works well?
   
   @leesf @XuQianJin-Stars I agree that we should fix the root cause. This 
patch is more like treating the symptom. @XuQianJin-Stars let's close this and 
try reproduce the problem you had? if any corner case where properties file was 
unexpectedly deleted, it should be fixed most likely around the code in 
https://github.com/apache/hudi/blob/a75cc02273ae87c383ae1ed46f95006c366f70fc/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java#L344
 as mentioned by @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on pull request #6432: [HUDI-4586] Improve metadata fetching in bloom index

2022-08-24 Thread GitBox



alexeykudinkin commented on PR #6432:
URL: https://github.com/apache/hudi/pull/6432#issuecomment-1226334636

   Following up on my previous comment: taking a deeper look i see following 
issues in our code at the moment
   
   1. Creating `HoodieTableBackedMetadata` instance w/in `HoodieTable` we don't 
specify that it should reuse MT readers.
   2. `HoodieMetadataMergedLogRecordReader.getRecordsByKeys` always clears 
previously computed `records` and always scans from scratch while instead we 
should NOT be re-processing records that have already been processed, and 
instead just incrementally process missing ones.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6432: [HUDI-4586] Improve metadata fetching in bloom index

2022-08-24 Thread GitBox



alexeykudinkin commented on code in PR #6432:
URL: https://github.com/apache/hudi/pull/6432#discussion_r954187229


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/BloomIndexFileInfo.java:
##
@@ -27,19 +27,20 @@
 public class BloomIndexFileInfo implements Serializable {
 
   private final String fileId;
-
+  private final String filename;
   private final String minRecordKey;
-
   private final String maxRecordKey;
 
-  public BloomIndexFileInfo(String fileId, String minRecordKey, String 
maxRecordKey) {
+  public BloomIndexFileInfo(String fileId, String filename, String 
minRecordKey, String maxRecordKey) {
 this.fileId = fileId;
+this.filename = filename;

Review Comment:
   nit: `fileName`



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/BloomIndexFileInfo.java:
##
@@ -27,19 +27,20 @@
 public class BloomIndexFileInfo implements Serializable {
 
   private final String fileId;
-
+  private final String filename;

Review Comment:
   Do we really need to store both file-id and file-name? I think we can just 
store the file-name, and then convert it to file-id wherever necessary



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/HoodieMetadataBloomIndexCheckFunction.java:
##
@@ -83,37 +89,64 @@ protected void start() {
 @Override
 protected List computeNext() {
   // Partition path and file name pair to list of keys
-  final Map, List> fileToKeysMap = new 
HashMap<>();
-  final Map fileIDBaseFileMap = new HashMap<>();
+  final Map, List> batchFileToKeysMap = 
new HashMap<>();
   final List resultList = new ArrayList<>();
+  String lastFileId = null;
+
+  try {
+// Here we batch process the lookup of bloom filters in metadata table
+// assuming the partition path and file name pairs are already sorted 
by the corresponding key
+while (inputItr.hasNext()) {
+  Tuple2, HoodieKey> entry = inputItr.next();
+  final String partitionPath = entry._2.getPartitionPath();
+  final String fileId = entry._1._1();
+  final String filename = entry._1._2();
+
+  if (lastFileId == null || !lastFileId.equals(fileId)) {
+if (processedFileIdSet.contains(fileId)) {
+  LOG.warn(String.format("Fetching the bloom filter for file ID %s 
again.  "
+  + " The input pairs of file ID and record key are not 
sorted.", fileId));
+}
+lastFileId = fileId;
+processedFileIdSet.add(fileId);
+  }
+
+  batchFileToKeysMap.computeIfAbsent(Pair.of(partitionPath, filename), 
k -> new ArrayList<>()).add(entry._2);
 
-  while (inputItr.hasNext()) {
-Tuple2 entry = inputItr.next();
-final String partitionPath = entry._2.getPartitionPath();
-final String fileId = entry._1;
-if (!fileIDBaseFileMap.containsKey(fileId)) {
-  Option baseFile = 
hoodieTable.getBaseFileOnlyView().getLatestBaseFile(partitionPath, fileId);
-  if (!baseFile.isPresent()) {
-throw new HoodieIndexException("Failed to find the base file for 
partition: " + partitionPath
-+ ", fileId: " + fileId);
+  if (batchFileToKeysMap.size() == batchSize) {
+resultList.addAll(lookupKeysInBloomFilters(batchFileToKeysMap));
+batchFileToKeysMap.clear();
   }
-  fileIDBaseFileMap.put(fileId, baseFile.get());
 }
-fileToKeysMap.computeIfAbsent(Pair.of(partitionPath, 
fileIDBaseFileMap.get(fileId).getFileName()),
-k -> new ArrayList<>()).add(entry._2);
-if (fileToKeysMap.size() > 
BLOOM_FILTER_CHECK_MAX_FILE_COUNT_PER_BATCH) {
-  break;
+
+if (batchFileToKeysMap.size() > 0) {
+  resultList.addAll(lookupKeysInBloomFilters(batchFileToKeysMap));
+  batchFileToKeysMap.clear();
 }
+
+return resultList;
+  } catch (Throwable e) {
+if (e instanceof HoodieException) {
+  throw e;
+}
+throw new HoodieIndexException("Error checking bloom filter using 
metadata table.", e);
   }
-  if (fileToKeysMap.isEmpty()) {
-return Collections.emptyList();
-  }
+}
+
+@Override
+protected void end() {
+}
 
-  List> partitionNameFileNameList = new 
ArrayList<>(fileToKeysMap.keySet());
+private List lookupKeysInBloomFilters(
+Map, List> fileToKeysMap) {
+  List resultList = new ArrayList<>();
+  List> partitionPathFileNameList = new 
ArrayList<>(fileToKeysMap.keySet());
+  HoodieTimer timer = HoodieTimer.start();
   Map, BloomFilter> fileToBloomFilterMap =
-  
hoodieTable.getMetadataTable().getBloomFilters(partitionNameFileNameList);
+  
hoodieTable.getMetadataTable().getBloomFilters(partitionPathFileNameList);
+  LOG.error(String.format("Took %d ms to look up %s bloom

[jira] [Updated] (HUDI-4712) Flaky Tests w/ azure CI

2022-08-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4712:
--
Epic Link: HUDI-4302

> Flaky Tests w/ azure CI
> ---
>
> Key: HUDI-4712
> URL: https://issues.apache.org/jira/browse/HUDI-4712
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Priority: Major
>
> In our azure CI runs, last 4 to 5 runs have failed. Mostly its related to 
> Flink It tests. 
>  
> Tracking the failures here. 
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24]
> {code:java}
> 2022-08-24T02:57:19.7513911Z [INFO] Running 
> org.apache.hudi.table.ITTestHoodieDataSource
> 2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 987.826 s - in 
> org.apache.hudi.table.ITTestHoodieDataSource
> 2022-08-24T03:13:47.5997817Z [INFO] Running 
> org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
> 2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in 
> org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
> 2022-08-24T03:15:01.9381479Z [ERROR] 
> testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1]  Time elapsed: 
> 7.976 s  <<< ERROR!
> 2022-08-24T03:15:01.9382198Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2022-08-24T03:15:01.9383171Z Caused by: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
> 2022-08-24T03:15:01.9383730Z Caused by: 
> org.apache.hudi.exception.HoodieIOException: IOException when reading log 
> file 
> 2022-08-24T03:15:01.9384878Z Caused by: java.io.FileNotFoundException: File 
> file:/tmp/junit4630651021120836419/par4/.d6675c02-f0f4-40ba-9f5e-986b84f73cb6_20220824031447463.log.1_0-4-0
>  does not exist
> 2022-08-24T03:15:01.9650641Z 
> 2022-08-24T03:15:01.9651540Z [INFO] Running 
> org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering
> 2022-08-24T03:15:21.1311853Z [INFO] Tests run: 2, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 19.189 s - in 
> org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering
> 2022-08-24T03:15:21.1324486Z [INFO] Running 
> org.apache.hudi.sink.ITTestDataStreamWrite
> 2022-08-24T03:17:40.5148801Z [ERROR] Tests run: 9, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 139.379 s <<< FAILURE! - in 
> org.apache.hudi.sink.ITTestDataStreamWrite
> 2022-08-24T03:17:40.5149895Z [ERROR] 
> testWriteMergeOnReadWithCompaction{String}[1]  Time elapsed: 21.725 s  <<< 
> FAILURE!
> 2022-08-24T03:17:40.5150555Z org.opentest4j.AssertionFailedError: expected: 
>  but was: 
> 2022-08-24T03:17:40.5151262Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252)
> 2022-08-24T03:17:40.5152067Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182)
> 2022-08-24T03:17:40.5153086Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156)
> 2022-08-24T03:17:40.5153593Z 
> 2022-08-24T03:17:41.0381772Z [INFO] 
> 2022-08-24T03:17:41.0382573Z [INFO] Results:
> 2022-08-24T03:17:41.0383977Z [INFO] 
> 2022-08-24T03:17:41.0384447Z [ERROR] Failures: 
> 2022-08-24T03:17:41.0385833Z [ERROR]   
> ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252
>  expected:  but was: 
> 2022-08-24T03:17:41.0386560Z [ERROR] Errors: 
> 2022-08-24T03:17:41.0387330Z [ERROR]   
> ITTestHoodieFlinkCompactor.testHoodieFlinkCompactorWithPlanSelectStrategy » 
> JobExecution
> 2022-08-24T03:17:41.0387896Z [INFO]  {code}
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24]
> {code:java}
> 2022-08-24T02:57:18.8817403Z [INFO] 
> ---
> 2022-08-24T02:57:19.7513911Z [INFO] Running 
> org.apache.hudi.table.ITTestHoodieDataSource
> 2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 987.826 s - in 
> org.apache.hudi.table.ITTestHoodieDataSource
> 2022-08-24T03:13:47.5997817Z [INFO] Running 
> org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
> 2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in 
> org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
> 2022-08-24T03:15:01.9381479Z [ERROR] 
> testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1]  Time elapsed: 
> 7.976 s  <<< ERROR!
> 2022-08-24T03:15:01.9382198Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2022-08-24T03:15:01.9383171Z Caused by: 
> org.apache.f

[jira] [Created] (HUDI-4712) Flaky Tests w/ azure CI

2022-08-24 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-4712:
-

 Summary: Flaky Tests w/ azure CI
 Key: HUDI-4712
 URL: https://issues.apache.org/jira/browse/HUDI-4712
 Project: Apache Hudi
  Issue Type: Test
Reporter: sivabalan narayanan


In our azure CI runs, last 4 to 5 runs have failed. Mostly its related to Flink 
It tests. 

 

Tracking the failures here. 

[https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24]
{code:java}
2022-08-24T02:57:19.7513911Z [INFO] Running 
org.apache.hudi.table.ITTestHoodieDataSource
2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, 
Skipped: 0, Time elapsed: 987.826 s - in 
org.apache.hudi.table.ITTestHoodieDataSource
2022-08-24T03:13:47.5997817Z [INFO] Running 
org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in 
org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
2022-08-24T03:15:01.9381479Z [ERROR] 
testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1]  Time elapsed: 7.976 
s  <<< ERROR!
2022-08-24T03:15:01.9382198Z 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2022-08-24T03:15:01.9383171Z Caused by: org.apache.flink.runtime.JobException: 
Recovery is suppressed by NoRestartBackoffTimeStrategy
2022-08-24T03:15:01.9383730Z Caused by: 
org.apache.hudi.exception.HoodieIOException: IOException when reading log file 
2022-08-24T03:15:01.9384878Z Caused by: java.io.FileNotFoundException: File 
file:/tmp/junit4630651021120836419/par4/.d6675c02-f0f4-40ba-9f5e-986b84f73cb6_20220824031447463.log.1_0-4-0
 does not exist
2022-08-24T03:15:01.9650641Z 
2022-08-24T03:15:01.9651540Z [INFO] Running 
org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering
2022-08-24T03:15:21.1311853Z [INFO] Tests run: 2, Failures: 0, Errors: 0, 
Skipped: 0, Time elapsed: 19.189 s - in 
org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering
2022-08-24T03:15:21.1324486Z [INFO] Running 
org.apache.hudi.sink.ITTestDataStreamWrite
2022-08-24T03:17:40.5148801Z [ERROR] Tests run: 9, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 139.379 s <<< FAILURE! - in 
org.apache.hudi.sink.ITTestDataStreamWrite
2022-08-24T03:17:40.5149895Z [ERROR] 
testWriteMergeOnReadWithCompaction{String}[1]  Time elapsed: 21.725 s  <<< 
FAILURE!
2022-08-24T03:17:40.5150555Z org.opentest4j.AssertionFailedError: expected: 
 but was: 
2022-08-24T03:17:40.5151262Zat 
org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252)
2022-08-24T03:17:40.5152067Zat 
org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182)
2022-08-24T03:17:40.5153086Zat 
org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156)
2022-08-24T03:17:40.5153593Z 
2022-08-24T03:17:41.0381772Z [INFO] 
2022-08-24T03:17:41.0382573Z [INFO] Results:
2022-08-24T03:17:41.0383977Z [INFO] 
2022-08-24T03:17:41.0384447Z [ERROR] Failures: 
2022-08-24T03:17:41.0385833Z [ERROR]   
ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252
 expected:  but was: 
2022-08-24T03:17:41.0386560Z [ERROR] Errors: 
2022-08-24T03:17:41.0387330Z [ERROR]   
ITTestHoodieFlinkCompactor.testHoodieFlinkCompactorWithPlanSelectStrategy » 
JobExecution
2022-08-24T03:17:41.0387896Z [INFO]  {code}
[https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24]
{code:java}
2022-08-24T02:57:18.8817403Z [INFO] 
---
2022-08-24T02:57:19.7513911Z [INFO] Running 
org.apache.hudi.table.ITTestHoodieDataSource
2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, 
Skipped: 0, Time elapsed: 987.826 s - in 
org.apache.hudi.table.ITTestHoodieDataSource
2022-08-24T03:13:47.5997817Z [INFO] Running 
org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in 
org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor
2022-08-24T03:15:01.9381479Z [ERROR] 
testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1]  Time elapsed: 7.976 
s  <<< ERROR!
2022-08-24T03:15:01.9382198Z 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2022-08-24T03:15:01.9383171Z Caused by: org.apache.flink.runtime.JobException: 
Recovery is suppressed by NoRestartBackoffTimeStrategy
2022-08-24T03:15:01.9383730Z Caused by: 
org.apache.hudi.exception.HoodieIOException: IOException when reading log file 
2022-08-24T03:15:01.9384878Z Caused by: java.io.FileNotFoundException: File 
file:/tmp/junit4630651021120836419/par4/.d6675c02-f0f4-40ba-9f5e-986b84f73cb6

[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource

2022-08-24 Thread GitBox



hudi-bot commented on PR #6135:
URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226143907

   
   ## CI report:
   
   * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN
   * f70abbc3b45005d40e74252814edc0078a50030e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10909)
 
   * 194396032e698ac4210d8652a969c7e58832d5db Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] WangCHX opened a new issue, #6487: [SUPPORT] Primary key check in deltastreamer

2022-08-24 Thread GitBox



WangCHX opened a new issue, #6487:
URL: https://github.com/apache/hudi/issues/6487

   **Describe the problem you faced**
   
   we accidentally configure wrong primary key in the spark write config, it 
cause duplicate data. wondering if there is a way to avoid it. 
   
   **To Reproduce**
   change the primary config in write config and run the spark job.
   
   **Expected behavior**
   maybe should block the spark job to write data if the primary key config is 
different from the primary key in the original table.
   
   
   **Environment Description**
   
   * Hudi version : 0.11.0
   
   * Spark version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : yes. on k8s.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1265) Improving bootstrap and efficient migration of existing non-Hudi dataset

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1265:
-
Due Date: 30/Sep/22

> Improving bootstrap and efficient migration of existing non-Hudi dataset
> 
>
> Key: HUDI-1265
> URL: https://issues.apache.org/jira/browse/HUDI-1265
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: bootstrap
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: hudi-umbrellas
> Fix For: 0.13.0
>
>
> This is an EPIC to revisit the logic of bootstrap for efficient migration of 
> existing non-Hudi dataset, bridging any gaps with new features such as 
> metadata table.
> Here are the two modes of bootstrap and migration we suppose to support:
>  # Onboard for new partitions alone: Given an existing non-Hudi partitioned 
> dataset (/path/parquet), Hudi manages new partitions under the same table 
> path (/path/parquet) while keeping non-Hudi partitions untouched in place.  
> Query engine treats non-Hudi partitions differently when reading the data.  
> This works perfect for immutable data where there are no updates to old 
> partitions and new data is only appended to the new partition.
>  # Metadata-only and full-record bootstrap: Given an existing parquet dataset 
> (/path/parquet), Hudi generates the record-level metadata (Hudi meta columns) 
> during the bootstrap process in a new table path (/path/parquet_hudi) 
> different from the parquet dataset.  There are two modes; they can be chosen 
> at the granularity of partition in a single bootstrap action.  This unlocks 
> the ability for Hudi to do upsert for all partitions.
>  ## Metadata-only: generates record-level metadata only per parquet file and 
> a bootstrap index for mapping, without rewriting the actual data records. 
> During query execution, the source data is merged with Hudi metadata to 
> return the results.  This is the default mode.  
>  ## Full-record: use bulk insert to generate record-level metadata, copy over 
> and rewrite the source data with bulk insert.  During query execution, 
> record-level metadata, i.e., meta columns, and the data columns are read from 
> the same parquet, improving the read performance.
> Phase 1: Testing and verification of status-quo (1~1.5 week)
> Writing:
>  * Two migration modes above
>  * COW and MOR
>  * 1 additional commit after bootstrap doing upsert for metadata-only and 
> full-record bootstrap
>  * Spark datasource, Deltastreamer
>  * Partitioned and non-partitioned table
>  * Simple/complex key gen
>  * Hive-style partition
>  * w/ and w/o metadata table enabled
>  * Meta sync
> Reading:
>  * Hive QL, Spark SQL, Spark datasource, Presto/Trino
>  * Snapshot, read-optimized, incremental query
>  * Queries in the original query testing plan: 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]
> Need to develop a validation tool for automated validation
>  * Metadata, i.e., meta columns and index in metadata table, is properly 
> populated
>  * Data queried from Hudi table matches the parquet data
> Add tests when needed
>  * HUDI-4125 Add integration tests around bootstrapped Hudi table
> Phase 2: Functionality and correctness fix,  (2~3 weeks)
> Known and possible issues:
>  * Spark cannot see non-Hudi partitions in first onboarding mode
>  * Bootstrap Relation does not support MOR; HUDI-2071 Support Reading 
> Bootstrap MOR RT Table In Spark DataSource Table
>  * HUDI-915 Partition Columns missing in files upserted after Metadata 
> Bootstrap
>  * HUDI-992 For hive-style partitioned source data, partition columns synced 
> with Hive will always have String type
>  * HUDI-1369 Bootstrap Datasource jobs from hanging via spark-submit
>  * HUDI-3122  Presto query failed for bootstrap tables
>  * HUDI-1779  Fail to bootstrap/upsert a table which contains timestamp column
> Phase 3: Performance (1~2 weeks)
>  * HUDI-1157 Optimization whether to query Bootstrapped table using 
> HoodieBootstrapRelation vs Sparks Parquet datasource
>  * HUDI-4453 Support partition pruning for tables Bootstrapped from Source 
> Hive Style partitioned tables
>  * HUDI-619 Avoid stitching meta columns and only load data columns for 
> improving read performance
>  * HUDI-1158 Optimizations in parallelized listing behaviour for markers and 
> bootstrap source files
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6438: [HUDI-4642] Adding support to hudi-cli to repair depcrated partition

2022-08-24 Thread GitBox



hudi-bot commented on PR #6438:
URL: https://github.com/apache/hudi/pull/6438#issuecomment-1226132397

   
   ## CI report:
   
   * fea65135a8035ef70929759594da64dc985a2d0a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10924)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch asf-site updated: [DOCS] Add YouTube channel and Office hours page (#6482)

2022-08-24 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 50937a8c70 [DOCS] Add YouTube channel and Office hours page (#6482)
50937a8c70 is described below

commit 50937a8c7014af79b30c067a4641c6df47cc6889
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Wed Aug 24 23:58:37 2022 +0530

[DOCS] Add YouTube channel and Office hours page (#6482)
---
 website/community/office_hours.md |  13 +
 website/community/syncs.md|   9 -
 website/community/team.md |   2 +-
 website/docusaurus.config.js  |  14 ++
 website/src/css/custom.css|  10 +-
 website/static/assets/images/youtube.jpeg | Bin 0 -> 8825 bytes
 6 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/website/community/office_hours.md 
b/website/community/office_hours.md
new file mode 100644
index 00..4ca6efae1c
--- /dev/null
+++ b/website/community/office_hours.md
@@ -0,0 +1,13 @@
+---
+sidebar_position: 3
+title: "Office Hours"
+toc: true
+---
+
+# Weekly Office Hours
+
+**[ZOOM LINK TO JOIN](https://zoom.us/j/95710395048)**
+
+Office hours are held every week on Thu, 08:00 AM Pacific Time (US and 
Canada)([translate to other time 
zones](https://www.worldtimebuddy.com/?qm=1&lid=5368361,2643743,1264527,1796236&h=5368361&date=2022-8-25&sln=8-9&hf=1))
+
+One of the PMC members/committers will hold office hours to help answer 
questions interactively, on a first-come first-serve basis.
diff --git a/website/community/syncs.md b/website/community/syncs.md
index b1373e8b33..7cb89ea0f0 100644
--- a/website/community/syncs.md
+++ b/website/community/syncs.md
@@ -37,12 +37,3 @@ If you would like to present in one of the community calls, 
please fill out a [f
 Here are some upcoming calls for convenience. 
 
 ![Upcoming calls](/assets/images/upcoming-community-calls.png)
-
-
-## Weekly Office Hours
-
-**[ZOOM LINK TO JOIN](https://zoom.us/j/95710395048)**
-
-When every week on Thu, 08:00 AM Pacific Time (US and Canada)([translate to 
other time 
zones](https://www.worldtimebuddy.com/?qm=1&lid=5368361,2643743,1264527,1796236&h=5368361&date=2021-11-24&sln=8-9&hf=1))
-
-One of the PMC members/committers will hold office hours to help answer 
questions interactively, on a first-come first-serve basis.
diff --git a/website/community/team.md b/website/community/team.md
index a69306e68d..062a6d2cd8 100644
--- a/website/community/team.md
+++ b/website/community/team.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 3
+sidebar_position: 4
 title: "Team"
 toc: true
 last_modified_at: 2020-09-01T15:59:57-04:00
diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index d67bde016f..f3bff36771 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -185,6 +185,10 @@ module.exports = {
   label: 'Community Syncs',
   to: '/community/syncs',
 },
+{
+  label: 'Office Hours',
+  to: '/community/office_hours',
+},
 {
   label: 'Team',
   to: '/community/team',
@@ -231,6 +235,12 @@ module.exports = {
   className: 'header-slack-link',
   'aria-label': 'Hudi Slack Channel',
 },
+{
+  href: 'https://www.youtube.com/channel/UCs7AhE0BWaEPZSChrBR-Muw',
+  position: 'right',
+  className: 'header-youtube-link',
+  'aria-label': 'Hudi YouTube Channel',
+},
   ],
 },
 footer: {
@@ -342,6 +352,10 @@ module.exports = {
   label: 'Twitter',
   href: 'https://twitter.com/ApacheHudi',
 },
+{
+  label: 'YouTube',
+  href:  
'https://www.youtube.com/channel/UCs7AhE0BWaEPZSChrBR-Muw',
+},
 {
   label: 'Mailing List',
   to: 
'mailto:dev-subscr...@hudi.apache.org?Subject=SubscribeToHudi',
diff --git a/website/src/css/custom.css b/website/src/css/custom.css
index 94f10971e9..21c02d3bbb 100644
--- a/website/src/css/custom.css
+++ b/website/src/css/custom.css
@@ -39,7 +39,7 @@ html[data-theme='dark'] .docusaurus-highlight-code-line {
 }
 
 @media (max-width: 767px) {
-  .hero__img, .header-github-link, .header-slack-link, .header-twitter-link {
+  .hero__img, .header-github-link, .header-slack-link, .header-twitter-link, 
.header-youtube-link {
 display: none;
   }
   .hero__title {
@@ -93,6 +93,14 @@ html[data-theme='dark'] .docusaurus-highlight-code-line {
   width: 30px;
 }
 
+.header-youtube-link:before {
+  background: url(/assets/images/youtube.jpeg) no-repeat;
+  content: "";
+  display: flex;
+  height: 30px;
+  width: 30px;
+}
+
 .hero__title {
   font-size: 4rem;
   text-ali

[GitHub] [hudi] xushiyan merged pull request #6482: [DOCS] Add youtube channel and Office hours page

2022-08-24 Thread GitBox



xushiyan merged PR #6482:
URL: https://github.com/apache/hudi/pull/6482


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6438: [HUDI-4642] Adding support to hudi-cli to repair depcrated partition

2022-08-24 Thread GitBox



hudi-bot commented on PR #6438:
URL: https://github.com/apache/hudi/pull/6438#issuecomment-1226069262

   
   ## CI report:
   
   * 6e3fa8ca9ca7f5a72bfd9d8c4874183b9ff64586 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10810)
 
   * fea65135a8035ef70929759594da64dc985a2d0a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10924)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6438: [HUDI-4642] Adding support to hudi-cli to repair depcrated partition

2022-08-24 Thread GitBox



hudi-bot commented on PR #6438:
URL: https://github.com/apache/hudi/pull/6438#issuecomment-1226064753

   
   ## CI report:
   
   * 6e3fa8ca9ca7f5a72bfd9d8c4874183b9ff64586 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10810)
 
   * fea65135a8035ef70929759594da64dc985a2d0a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource

2022-08-24 Thread GitBox



hudi-bot commented on PR #6135:
URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226064289

   
   ## CI report:
   
   * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN
   * f70abbc3b45005d40e74252814edc0078a50030e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10909)
 
   * 194396032e698ac4210d8652a969c7e58832d5db UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4711) Fix flaky: ITTestHoodieDataSource#testAppendWrite (false)

2022-08-24 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-4711:
-

 Summary: Fix flaky: ITTestHoodieDataSource#testAppendWrite (false)
 Key: HUDI-4711
 URL: https://issues.apache.org/jira/browse/HUDI-4711
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


occurances:
 # Aug24th: 
[https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10923/logs/22]
 #  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4710) Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-08-24 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-4710:
-

 Summary: Fix flaky: 
TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
 Key: HUDI-4710
 URL: https://issues.apache.org/jira/browse/HUDI-4710
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


Instance occurance:

Aug 24th: 

[https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10923/logs/22]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Code review

2022-08-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4709:
--
Story Points: 2

> [RFC-48] Log Compaction Code review
> ---
>
> Key: HUDI-4709
> URL: https://issues.apache.org/jira/browse/HUDI-4709
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Prasanna Rajaperumal
>Assignee: sivabalan narayanan
>Priority: Major
>
> Specifically the changes on the merge logic in AbstractHoodieLogRecordReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Code review

2022-08-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4709:
--
Sprint: 2022/08/22

> [RFC-48] Log Compaction Code review
> ---
>
> Key: HUDI-4709
> URL: https://issues.apache.org/jira/browse/HUDI-4709
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Prasanna Rajaperumal
>Assignee: sivabalan narayanan
>Priority: Major
>
> Specifically the changes on the merge logic in AbstractHoodieLogRecordReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4709) [RFC-48] Log Compaction Code review

2022-08-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-4709:
-

Assignee: sivabalan narayanan

> [RFC-48] Log Compaction Code review
> ---
>
> Key: HUDI-4709
> URL: https://issues.apache.org/jira/browse/HUDI-4709
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Prasanna Rajaperumal
>Assignee: sivabalan narayanan
>Priority: Major
>
> Specifically the changes on the merge logic in AbstractHoodieLogRecordReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



hudi-bot commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225990562

   
   ## CI report:
   
   * d6b7c487e76c46460a2fb0c9647aeea901d17995 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10921)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



hudi-bot commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225985254

   
   ## CI report:
   
   * d6b7c487e76c46460a2fb0c9647aeea901d17995 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10921)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Code review

2022-08-24 Thread Prasanna Rajaperumal (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Rajaperumal updated HUDI-4709:
---
Summary: [RFC-48] Log Compaction Code review  (was: [RFC-48] Log Compaction 
Review Code)

> [RFC-48] Log Compaction Code review
> ---
>
> Key: HUDI-4709
> URL: https://issues.apache.org/jira/browse/HUDI-4709
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Prasanna Rajaperumal
>Priority: Major
>
> Specifically the changes on the merge logic in AbstractHoodieLogRecordReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Review Code

2022-08-24 Thread Prasanna Rajaperumal (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Rajaperumal updated HUDI-4709:
---
Summary: [RFC-48] Log Compaction Review Code  (was: [RFC-48] Review Code)

> [RFC-48] Log Compaction Review Code
> ---
>
> Key: HUDI-4709
> URL: https://issues.apache.org/jira/browse/HUDI-4709
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Prasanna Rajaperumal
>Priority: Major
>
> Specifically the changes on the merge logic in AbstractHoodieLogRecordReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4709) [RFC-48] Review Code

2022-08-24 Thread Prasanna Rajaperumal (Jira)

Prasanna Rajaperumal created HUDI-4709:
--

 Summary: [RFC-48] Review Code
 Key: HUDI-4709
 URL: https://issues.apache.org/jira/browse/HUDI-4709
 Project: Apache Hudi
  Issue Type: Task
Reporter: Prasanna Rajaperumal


Specifically the changes on the merge logic in AbstractHoodieLogRecordReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] wzx140 commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



wzx140 commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225940143

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1225917271

   
   ## CI report:
   
   * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10923)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] 15663671003 commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-08-24 Thread GitBox



15663671003 commented on issue #5765:
URL: https://github.com/apache/hudi/issues/5765#issuecomment-1225911236

   > I created a ticket to track the fix: 
[HUDI-4341](https://issues.apache.org/jira/browse/HUDI-4341).
   
   Will the next version consider fixing this problem? which bothers newbies 
like me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4696) Flaky: TestHoodieCombineHiveInputFormat.setUpClass:86 » NullPointer

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4696:
-
Status: In Progress  (was: Open)

> Flaky: TestHoodieCombineHiveInputFormat.setUpClass:86 » NullPointer

> 
>
> Key: HUDI-4696
> URL: https://issues.apache.org/jira/browse/HUDI-4696
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.12.1
>
>
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10720&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2673) Add integration/e2e test for kafka-connect functionality

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2673:
-
Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 
2022/09/05  (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 
2022/05/16)

> Add integration/e2e test for kafka-connect functionality
> 
>
> Key: HUDI-2673
> URL: https://issues.apache.org/jira/browse/HUDI-2673
> Project: Apache Hudi
>  Issue Type: Task
>  Components: kafka-connect, tests-ci
>Reporter: Ethan Guo
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The integration test should use bundle jar and run in docker setup.  This can 
> prevent any issue in the bundle, like HUDI-3903, that is not covered by unit 
> and functional tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2673) Add integration/e2e test for kafka-connect functionality

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2673:
-
Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16  
(was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 
2022/08/22)

> Add integration/e2e test for kafka-connect functionality
> 
>
> Key: HUDI-2673
> URL: https://issues.apache.org/jira/browse/HUDI-2673
> Project: Apache Hudi
>  Issue Type: Task
>  Components: kafka-connect, tests-ci
>Reporter: Ethan Guo
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The integration test should use bundle jar and run in docker setup.  This can 
> prevent any issue in the bundle, like HUDI-3903, that is not covered by unit 
> and functional tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4650) Commits Command: Include both active and archive timeline for a given range of intants

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4650:
-
Reviewers: sivabalan narayanan

> Commits Command: Include both active and archive timeline for a given range 
> of intants
> --
>
> Key: HUDI-4650
> URL: https://issues.apache.org/jira/browse/HUDI-4650
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4528) Diff tool to compare metadata across snapshots in a given time range

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4528:
-
Reviewers: sivabalan narayanan

> Diff tool to compare metadata across snapshots in a given time range
> 
>
> Key: HUDI-4528
> URL: https://issues.apache.org/jira/browse/HUDI-4528
> Project: Apache Hudi
>  Issue Type: Task
>  Components: cli
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> A tool that diffs two snapshots at table and partition level and can give 
> info about what new file ids got created, deleted, updated and track other 
> changes that are captured in write stats. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4633) Add command to trace partition through a range of commits

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4633:
-
Reviewers: sivabalan narayanan

> Add command to trace partition through a range of commits
> -
>
> Key: HUDI-4633
> URL: https://issues.apache.org/jira/browse/HUDI-4633
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4389) Make HoodieStreamingSink idempotent

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4389:
-
Status: In Progress  (was: Open)

> Make HoodieStreamingSink idempotent
> ---
>
> Key: HUDI-4389
> URL: https://issues.apache.org/jira/browse/HUDI-4389
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available, streaming
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4389) Make HoodieStreamingSink idempotent

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4389:
-
Status: Patch Available  (was: In Progress)

> Make HoodieStreamingSink idempotent
> ---
>
> Key: HUDI-4389
> URL: https://issues.apache.org/jira/browse/HUDI-4389
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available, streaming
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4389) Make HoodieStreamingSink idempotent

2022-08-24 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4389:
-
Story Points: 1

> Make HoodieStreamingSink idempotent
> ---
>
> Key: HUDI-4389
> URL: https://issues.apache.org/jira/browse/HUDI-4389
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available, streaming
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-2673) Add integration/e2e test for kafka-connect functionality

2022-08-24 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584323#comment-17584323
 ] 

Raymond Xu commented on HUDI-2673:
--

Pivot to make KC into docker demo

> Add integration/e2e test for kafka-connect functionality
> 
>
> Key: HUDI-2673
> URL: https://issues.apache.org/jira/browse/HUDI-2673
> Project: Apache Hudi
>  Issue Type: Task
>  Components: kafka-connect, tests-ci
>Reporter: Ethan Guo
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The integration test should use bundle jar and run in docker setup.  This can 
> prevent any issue in the bundle, like HUDI-3903, that is not covered by unit 
> and functional tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 >

1 - 100 of 158 matches

Mail list logo