[GitHub] [hudi] codecov-commenter edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381051#comment-17381051
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2181) Refine doc for FlinkCreateHandle

2021-07-14 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2181:
---
Summary: Refine doc for FlinkCreateHandle  (was: Refine the doc for 
FlinkCreateHandle)

> Refine doc for FlinkCreateHandle
> 
>
> Key: HUDI-2181
> URL: https://issues.apache.org/jira/browse/HUDI-2181
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: zhangminglei
>Priority: Major
>
> FlinkCreateHandle does not append to the original file for subsequent 
> mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2181) Refine doc for FlinkCreateHandle

2021-07-14 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei reassigned HUDI-2181:
--

Assignee: zhangminglei

> Refine doc for FlinkCreateHandle
> 
>
> Key: HUDI-2181
> URL: https://issues.apache.org/jira/browse/HUDI-2181
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>
> FlinkCreateHandle does not append to the original file for subsequent 
> mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2181) Refine the doc for FlinkCreateHandle

2021-07-14 Thread zhangminglei (Jira)
zhangminglei created HUDI-2181:
--

 Summary: Refine the doc for FlinkCreateHandle
 Key: HUDI-2181
 URL: https://issues.apache.org/jira/browse/HUDI-2181
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: zhangminglei


FlinkCreateHandle does not append to the original file for subsequent 
mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381045#comment-17381045
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (814e45c) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `16.79%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3184   +/-   ##
   =
   - Coverage 44.10%   27.30%   -16.80% 
   + Complexity 5157 1292 -3865 
   =
 Files   936  386  -550 
 Lines 4162915343-26286 
 Branches   4189 1339 -2850 
   =
   - Hits  18362 4190-14172 
   + Misses2163810849-10789 
   + Partials   1629  304 -1325 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.91% <0.00%> (-13.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <0.00%> (+50.14%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (814e45c) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `16.79%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3184   +/-   ##
   =
   - Coverage 44.10%   27.30%   -16.80% 
   + Complexity 5157 1292 -3865 
   =
 Files   936  386  -550 
 Lines 4162915343-26286 
 Branches   4189 1339 -2850 
   =
   - Hits  18362 4190-14172 
   + Misses2163810849-10789 
   + Partials   1629  304 -1325 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.91% <0.00%> (-13.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <0.00%> (+50.14%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Updated] (HUDI-2044) Extend support for rockDB and compression for Spillable map to all consumers of ExternalSpillableMap

2021-07-14 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra updated HUDI-2044:
--
Description: 
# HUDI-2028 only implements rockDb support for Spillable map in 
HoodieMergeHandle since we are blocked on the configuration refactor PR to land
 # This ticket will track the implementation to extend rocksDB (and compression 
for bit cask) support for Spoilable Map to all consumers of 
ExternalSpillableMap.java

  was:
# HUDI-2028 only implements rockDb support for Spillable map in 
HoodieMergeHandle since we are blocked on the configuration refactor PR to land
 # This ticket will track the implementation to extend rocksDB support for 
Spoilable Map to all consumers of ExternalSpillableMap.java


> Extend support for rockDB and compression for Spillable map to all consumers 
> of ExternalSpillableMap
> 
>
> Key: HUDI-2044
> URL: https://issues.apache.org/jira/browse/HUDI-2044
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Major
>
> # HUDI-2028 only implements rockDb support for Spillable map in 
> HoodieMergeHandle since we are blocked on the configuration refactor PR to 
> land
>  # This ticket will track the implementation to extend rocksDB (and 
> compression for bit cask) support for Spoilable Map to all consumers of 
> ExternalSpillableMap.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381044#comment-17381044
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * b4aa7869d8343a16b225a81844e907fbee63b576 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=919)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=922)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * b4aa7869d8343a16b225a81844e907fbee63b576 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=919)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=922)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381043#comment-17381043
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-880409359


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-880409359


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381039#comment-17381039
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (814e45c) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `28.35%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3184   +/-   ##
   =
   - Coverage 44.10%   15.74%   -28.36% 
   + Complexity 5157  493 -4664 
   =
 Files   936  284  -652 
 Lines 4162911835-29794 
 Branches   4189  982 -3207 
   =
   - Hits  18362 1864-16498 
   + Misses21638 9808-11830 
   + Partials   1629  163 -1466 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <0.00%> (+50.14%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (814e45c) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `28.35%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3184   +/-   ##
   =
   - Coverage 44.10%   15.74%   -28.36% 
   + Complexity 5157  493 -4664 
   =
 Files   936  284  -652 
 Lines 4162911835-29794 
 Branches   4189  982 -3207 
   =
   - Hits  18362 1864-16498 
   + Misses21638 9808-11830 
   + Partials   1629  163 -1466 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <0.00%> (+50.14%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381037#comment-17381037
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   * cf901c664f7baaab834b3f02a819144b5558f952 UNKNOWN
   * 814e45c99f54bb11ed54263cc4077e0a08689b48 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=921)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   * cf901c664f7baaab834b3f02a819144b5558f952 UNKNOWN
   * 814e45c99f54bb11ed54263cc4077e0a08689b48 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=921)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381035#comment-17381035
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (814e45c) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `41.27%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3184   +/-   ##
   
   - Coverage 44.10%   2.83%   -41.28% 
   + Complexity 5157  85 -5072 
   
 Files   936 284  -652 
 Lines 41629   11835-29794 
 Branches   4189 982 -3207 
   
   - Hits  18362 335-18027 
   + Misses21638   11474-10164 
   + Partials   1629  26 -1603 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (814e45c) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `41.27%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3184   +/-   ##
   
   - Coverage 44.10%   2.83%   -41.28% 
   + Complexity 5157  85 -5072 
   
 Files   936 284  -652 
 Lines 41629   11835-29794 
 Branches   4189 982 -3207 
   
   - Hits  18362 335-18027 
   + Misses21638   11474-10164 
   + Partials   1629  26 -1603 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381034#comment-17381034
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (cf901c6) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `41.27%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head cf901c6 differs from pull request most recent 
head 814e45c. Consider uploading reports for the commit 814e45c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3184   +/-   ##
   
   - Coverage 44.10%   2.83%   -41.28% 
   + Complexity 5157  85 -5072 
   
 Files   936 284  -652 
 Lines 41629   11835-29794 
 Branches   4189 982 -3207 
   
   - Hits  18362 335-18027 
   + Misses21638   11474-10164 
   + Partials   1629  26 -1603 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870526141


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3184](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (cf901c6) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `41.27%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head cf901c6 differs from pull request most recent 
head 814e45c. Consider uploading reports for the commit 814e45c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3184/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3184   +/-   ##
   
   - Coverage 44.10%   2.83%   -41.28% 
   + Complexity 5157  85 -5072 
   
 Files   936 284  -652 
 Lines 41629   11835-29794 
 Branches   4189 982 -3207 
   
   - Hits  18362 335-18027 
   + Misses21638   11474-10164 
   + Partials   1629  26 -1603 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3184?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3184/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381031#comment-17381031
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   * cf901c664f7baaab834b3f02a819144b5558f952 UNKNOWN
   * 814e45c99f54bb11ed54263cc4077e0a08689b48 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   * cf901c664f7baaab834b3f02a819144b5558f952 UNKNOWN
   * 814e45c99f54bb11ed54263cc4077e0a08689b48 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381030#comment-17381030
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * f15a539b1ea1ed7fb9a5b8a31cc9b88d68a6710f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=911)
 
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   * cf901c664f7baaab834b3f02a819144b5558f952 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * f15a539b1ea1ed7fb9a5b8a31cc9b88d68a6710f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=911)
 
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   * cf901c664f7baaab834b3f02a819144b5558f952 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381027#comment-17381027
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **increase** coverage by `3.65%`.
   > The diff coverage is `38.88%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3259  +/-   ##
   
   + Coverage 44.10%   47.76%   +3.65% 
   - Complexity 5157 5566 +409 
   
 Files   936  936  
 Lines 4162941653  +24 
 Branches   4189 4195   +6 
   
   + Hits  1836219897+1535 
   + Misses2163819987-1651 
   - Partials   1629 1769 +140 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.47% <ø> (+<0.01%)` | :arrow_up: |
   | hudicommon | `48.69% <ø> (ø)` | |
   | hudiflink | `59.68% <ø> (ø)` | |
   | hudihadoopmr | `52.02% <ø> (ø)` | |
   | hudisparkdatasource | `67.21% <ø> (ø)` | |
   | hudisync | `55.73% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `58.96% <38.88%> (+49.84%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `51.06% <38.88%> (+51.06%)` | :arrow_up: |
   | 
[...e/hudi/client/heartbeat/HoodieHeartbeatClient.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9oZWFydGJlYXQvSG9vZGllSGVhcnRiZWF0Q2xpZW50LmphdmE=)
 | `69.15% <0.00%> (+0.93%)` | :arrow_up: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `88.79% <0.00%> (+5.17%)` | :arrow_up: |
   | 
[...e/hudi/utilities/transform/ChainedTransformer.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9DaGFpbmVkVHJhbnNmb3JtZXIuamF2YQ==)
 | `100.00% <0.00%> (+11.11%)` | :arrow_up: |
   | 
[...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh)
 | `71.42% <0.00%> 

[GitHub] [hudi] vinothchandar commented on pull request #3155: [Do-No-Merge][WIP] Running TestCleaner tests repeatedly

2021-07-14 Thread GitBox


vinothchandar commented on pull request #3155:
URL: https://github.com/apache/hudi/pull/3155#issuecomment-880387121


   closing this for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar closed pull request #3155: [Do-No-Merge][WIP] Running TestCleaner tests repeatedly

2021-07-14 Thread GitBox


vinothchandar closed pull request #3155:
URL: https://github.com/apache/hudi/pull/3155


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **increase** coverage by `3.65%`.
   > The diff coverage is `38.88%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3259  +/-   ##
   
   + Coverage 44.10%   47.76%   +3.65% 
   - Complexity 5157 5566 +409 
   
 Files   936  936  
 Lines 4162941653  +24 
 Branches   4189 4195   +6 
   
   + Hits  1836219897+1535 
   + Misses2163819987-1651 
   - Partials   1629 1769 +140 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.47% <ø> (+<0.01%)` | :arrow_up: |
   | hudicommon | `48.69% <ø> (ø)` | |
   | hudiflink | `59.68% <ø> (ø)` | |
   | hudihadoopmr | `52.02% <ø> (ø)` | |
   | hudisparkdatasource | `67.21% <ø> (ø)` | |
   | hudisync | `55.73% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `58.96% <38.88%> (+49.84%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `51.06% <38.88%> (+51.06%)` | :arrow_up: |
   | 
[...e/hudi/client/heartbeat/HoodieHeartbeatClient.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9oZWFydGJlYXQvSG9vZGllSGVhcnRiZWF0Q2xpZW50LmphdmE=)
 | `69.15% <0.00%> (+0.93%)` | :arrow_up: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `88.79% <0.00%> (+5.17%)` | :arrow_up: |
   | 
[...e/hudi/utilities/transform/ChainedTransformer.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9DaGFpbmVkVHJhbnNmb3JtZXIuamF2YQ==)
 | `100.00% <0.00%> (+11.11%)` | :arrow_up: |
   | 
[...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh)
 | `71.42% <0.00%> (+14.28%)` | :arrow_up: |
   | 

[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381025#comment-17381025
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381022#comment-17381022
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * b4aa7869d8343a16b225a81844e907fbee63b576 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=919)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * b4aa7869d8343a16b225a81844e907fbee63b576 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=919)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381014#comment-17381014
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `16.80%`.
   > The diff coverage is `38.88%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3259   +/-   ##
   =
   - Coverage 44.10%   27.29%   -16.81% 
   + Complexity 5157 1292 -3865 
   =
 Files   936  386  -550 
 Lines 4162915367-26262 
 Branches   4189 1345 -2844 
   =
   - Hits  18362 4195-14167 
   + Misses2163810864-10774 
   + Partials   1629  308 -1321 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.91% <ø> (-13.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <ø> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.96% <38.88%> (+49.84%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `51.06% <38.88%> (+51.06%)` | :arrow_up: |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `16.80%`.
   > The diff coverage is `38.88%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3259   +/-   ##
   =
   - Coverage 44.10%   27.29%   -16.81% 
   + Complexity 5157 1292 -3865 
   =
 Files   936  386  -550 
 Lines 4162915367-26262 
 Branches   4189 1345 -2844 
   =
   - Hits  18362 4195-14167 
   + Misses2163810864-10774 
   + Partials   1629  308 -1321 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.91% <ø> (-13.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <ø> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.96% <38.88%> (+49.84%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `51.06% <38.88%> (+51.06%)` | :arrow_up: |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381011#comment-17381011
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * f15a539b1ea1ed7fb9a5b8a31cc9b88d68a6710f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=911)
 
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * f15a539b1ea1ed7fb9a5b8a31cc9b88d68a6710f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=911)
 
   * c0063ddcc875e3e13348861ebaf21ef47126a691 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=920)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381009#comment-17381009
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * f15a539b1ea1ed7fb9a5b8a31cc9b88d68a6710f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=911)
 
   * c0063ddcc875e3e13348861ebaf21ef47126a691 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * f15a539b1ea1ed7fb9a5b8a31cc9b88d68a6710f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=911)
 
   * c0063ddcc875e3e13348861ebaf21ef47126a691 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381007#comment-17381007
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `28.34%`.
   > The diff coverage is `38.88%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3259   +/-   ##
   =
   - Coverage 44.10%   15.76%   -28.35% 
   + Complexity 5157  493 -4664 
   =
 Files   936  284  -652 
 Lines 4162911859-29770 
 Branches   4189  988 -3201 
   =
   - Hits  18362 1869-16493 
   + Misses21638 9823-11815 
   + Partials   1629  167 -1462 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <ø> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.96% <38.88%> (+49.84%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `51.06% <38.88%> (+51.06%)` | :arrow_up: |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `28.34%`.
   > The diff coverage is `38.88%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3259   +/-   ##
   =
   - Coverage 44.10%   15.76%   -28.35% 
   + Complexity 5157  493 -4664 
   =
 Files   936  284  -652 
 Lines 4162911859-29770 
 Branches   4189  988 -3201 
   =
   - Hits  18362 1869-16493 
   + Misses21638 9823-11815 
   + Partials   1629  167 -1462 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.47%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <ø> (-50.88%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.96% <38.88%> (+49.84%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `51.06% <38.88%> (+51.06%)` | :arrow_up: |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381001#comment-17381001
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-880370120


   > @zhangyue19921010 hello, Which company are you from? Can we add wechat? My 
wechat is lw19900302
   
   It's my pleasure. I'm coming from freewheel :)
   
   Also all the changes are done. PTAL and thanks a lot for your review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-880370120


   > @zhangyue19921010 hello, Which company are you from? Can we add wechat? My 
wechat is lw19900302
   
   It's my pleasure. I'm coming from freewheel :)
   
   Also all the changes are done. PTAL and thanks a lot for your review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381000#comment-17381000
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-880370120


   > @zhangyue19921010 hello, Which company are you from? Can we add wechat? My 
wechat is lw19900302
   
   It's my pleasure. I'm coming from freewheel :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-880370120


   > @zhangyue19921010 hello, Which company are you from? Can we add wechat? My 
wechat is lw19900302
   
   It's my pleasure. I'm coming from freewheel :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380999#comment-17380999
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670111508



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1059,6 +1059,50 @@ public void testHoodieAsyncClusteringJob() throws 
Exception {
 assertEquals(1, 
metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstants().toArray().length);
   }
 
+  @Test
+  public void testHoodieAsyncClusteringJobWithScheduleAndExecute() throws 
Exception {
+String tableBasePath = dfsBasePath + "/asyncClustering2";
+// Keep it higher than batch-size to test continuous mode
+int totalRecords = 3000;
+
+// Initial bulk insert
+HoodieDeltaStreamer.Config cfg = TestHelpers.makeConfig(tableBasePath, 
WriteOperationType.INSERT);
+cfg.continuousMode = true;
+cfg.tableType = HoodieTableType.COPY_ON_WRITE.name();
+cfg.configs.add(String.format("%s=%d", 
SourceConfigs.MAX_UNIQUE_RECORDS_PROP, totalRecords));
+cfg.configs.add(String.format("%s=false", 
HoodieCompactionConfig.AUTO_CLEAN_PROP.key()));
+cfg.configs.add(String.format("%s=true", 
HoodieClusteringConfig.ASYNC_CLUSTERING_ENABLE_OPT_KEY.key()));
+HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc);
+deltaStreamerTestRunner(ds, cfg, (r) -> {
+  TestHelpers.assertAtLeastNCommits(2, tableBasePath, dfs);
+  HoodieClusteringJob.Config scheduleClusteringConfig = 
buildHoodieClusteringUtilConfig(tableBasePath,
+  null, true);
+  scheduleClusteringConfig.runningMode = "scheduleAndExecute";
+  HoodieClusteringJob scheduleClusteringJob = new HoodieClusteringJob(jsc, 
scheduleClusteringConfig);
+
+  try {
+int result = scheduleClusteringJob.doScheduleAndCluster();
+if (result == 0) {
+  LOG.info("Cluster success");
+} else {
+  LOG.warn("Import failed");
+  return false;
+}
+  } catch (Exception e) {
+LOG.warn("ScheduleAndExecute clustering failed", e);
+return false;
+  }
+
+  HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(this.dfs.getConf()).setBasePath(tableBasePath).setLoadActiveTimelineOnLoad(true).build();
+  int pendingReplaceSize = 
metaClient.getActiveTimeline().filterPendingReplaceTimeline().getInstants().toArray().length;

Review comment:
   Nice catching. Changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380998#comment-17380998
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670111402



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -171,4 +200,38 @@ private int doCluster(JavaSparkContext jsc) throws 
Exception {
   return client.scheduleClustering(Option.empty());
 }
   }
+
+  @TestOnly
+  public int doScheduleAndCluster() throws Exception {
+return this.doScheduleAndCluster(jsc);
+  }
+
+  public int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
+LOG.info("Step 1: Do schedule");
+String schemaStr = getSchemaFromLatestInstant();
+try (SparkRDDWriteClient client = UtilHelpers.createHoodieClient(jsc, 
cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+
+  Option instantTime;
+  if (cfg.clusteringInstantTime != null) {
+client.scheduleClusteringAtInstant(cfg.clusteringInstantTime, 
Option.empty());
+instantTime = Option.of(cfg.clusteringInstantTime);
+  } else {
+instantTime = client.scheduleClustering(Option.empty());
+  }
+
+  int result = instantTime.isPresent() ? 0 : -1;

Review comment:
   E, actually, there already has doSchedule() and doCluster() 
function. But if we let doScheduleAndCluster() use  doschedule() and 
docluster() directly, it will start and stop SparkRDDWriteClient twice which is 
an expensive action and unnecessary. 
   
   Maybe let schedule action and cluster action use a common 
SparkRDDWriteClient is better.
   
   For example start and stop Timeline service twice.
   ```
   21/07/15 11:05:11 INFO EmbeddedTimelineService: Starting Timeline service !!
   21/07/15 11:05:11 INFO EmbeddedTimelineService: Overriding hostIp to 
(localhost) found in spark-conf. It was null
   21/07/15 11:05:11 INFO FileSystemViewManager: Creating View Manager with 
storage type :MEMORY
   21/07/15 11:05:11 INFO FileSystemViewManager: Creating in-memory based Table 
View
   21/07/15 11:05:11 INFO log: Logging initialized @4500ms to 
org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
   21/07/15 11:05:11 INFO Javalin: 
  __  __ _
 / / _ _   __  _ / /(_)
__  / // __ `/| | / // __ `// // // __ \
   / /_/ // /_/ / | |/ // /_/ // // // / / /
   \/ \__,_/  |___/ \__,_//_//_//_/ /_/
   
   https://javalin.io/documentation
   
   21/07/15 11:05:11 INFO Javalin: Starting Javalin ...
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670111508



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1059,6 +1059,50 @@ public void testHoodieAsyncClusteringJob() throws 
Exception {
 assertEquals(1, 
metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstants().toArray().length);
   }
 
+  @Test
+  public void testHoodieAsyncClusteringJobWithScheduleAndExecute() throws 
Exception {
+String tableBasePath = dfsBasePath + "/asyncClustering2";
+// Keep it higher than batch-size to test continuous mode
+int totalRecords = 3000;
+
+// Initial bulk insert
+HoodieDeltaStreamer.Config cfg = TestHelpers.makeConfig(tableBasePath, 
WriteOperationType.INSERT);
+cfg.continuousMode = true;
+cfg.tableType = HoodieTableType.COPY_ON_WRITE.name();
+cfg.configs.add(String.format("%s=%d", 
SourceConfigs.MAX_UNIQUE_RECORDS_PROP, totalRecords));
+cfg.configs.add(String.format("%s=false", 
HoodieCompactionConfig.AUTO_CLEAN_PROP.key()));
+cfg.configs.add(String.format("%s=true", 
HoodieClusteringConfig.ASYNC_CLUSTERING_ENABLE_OPT_KEY.key()));
+HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc);
+deltaStreamerTestRunner(ds, cfg, (r) -> {
+  TestHelpers.assertAtLeastNCommits(2, tableBasePath, dfs);
+  HoodieClusteringJob.Config scheduleClusteringConfig = 
buildHoodieClusteringUtilConfig(tableBasePath,
+  null, true);
+  scheduleClusteringConfig.runningMode = "scheduleAndExecute";
+  HoodieClusteringJob scheduleClusteringJob = new HoodieClusteringJob(jsc, 
scheduleClusteringConfig);
+
+  try {
+int result = scheduleClusteringJob.doScheduleAndCluster();
+if (result == 0) {
+  LOG.info("Cluster success");
+} else {
+  LOG.warn("Import failed");
+  return false;
+}
+  } catch (Exception e) {
+LOG.warn("ScheduleAndExecute clustering failed", e);
+return false;
+  }
+
+  HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(this.dfs.getConf()).setBasePath(tableBasePath).setLoadActiveTimelineOnLoad(true).build();
+  int pendingReplaceSize = 
metaClient.getActiveTimeline().filterPendingReplaceTimeline().getInstants().toArray().length;

Review comment:
   Nice catching. Changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670111402



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -171,4 +200,38 @@ private int doCluster(JavaSparkContext jsc) throws 
Exception {
   return client.scheduleClustering(Option.empty());
 }
   }
+
+  @TestOnly
+  public int doScheduleAndCluster() throws Exception {
+return this.doScheduleAndCluster(jsc);
+  }
+
+  public int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
+LOG.info("Step 1: Do schedule");
+String schemaStr = getSchemaFromLatestInstant();
+try (SparkRDDWriteClient client = UtilHelpers.createHoodieClient(jsc, 
cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+
+  Option instantTime;
+  if (cfg.clusteringInstantTime != null) {
+client.scheduleClusteringAtInstant(cfg.clusteringInstantTime, 
Option.empty());
+instantTime = Option.of(cfg.clusteringInstantTime);
+  } else {
+instantTime = client.scheduleClustering(Option.empty());
+  }
+
+  int result = instantTime.isPresent() ? 0 : -1;

Review comment:
   E, actually, there already has doSchedule() and doCluster() 
function. But if we let doScheduleAndCluster() use  doschedule() and 
docluster() directly, it will start and stop SparkRDDWriteClient twice which is 
an expensive action and unnecessary. 
   
   Maybe let schedule action and cluster action use a common 
SparkRDDWriteClient is better.
   
   For example start and stop Timeline service twice.
   ```
   21/07/15 11:05:11 INFO EmbeddedTimelineService: Starting Timeline service !!
   21/07/15 11:05:11 INFO EmbeddedTimelineService: Overriding hostIp to 
(localhost) found in spark-conf. It was null
   21/07/15 11:05:11 INFO FileSystemViewManager: Creating View Manager with 
storage type :MEMORY
   21/07/15 11:05:11 INFO FileSystemViewManager: Creating in-memory based Table 
View
   21/07/15 11:05:11 INFO log: Logging initialized @4500ms to 
org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
   21/07/15 11:05:11 INFO Javalin: 
  __  __ _
 / / _ _   __  _ / /(_)
__  / // __ `/| | / // __ `// // // __ \
   / /_/ // /_/ / | |/ // /_/ // // // / / /
   \/ \__,_/  |___/ \__,_//_//_//_/ /_/
   
   https://javalin.io/documentation
   
   21/07/15 11:05:11 INFO Javalin: Starting Javalin ...
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380997#comment-17380997
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #3259  +/-   ##
   ===
   - Coverage  2.83%   2.82%   -0.01% 
 Complexity   85  85  
   ===
 Files   284 284  
 Lines 11835   11859  +24 
 Branches982 988   +6 
   ===
 Hits335 335  
   - Misses11474   11498  +24 
 Partials 26  26  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudiclient | `0.00% <ø> (ø)` | |
   | hudisync | `4.85% <ø> (ø)` | |
   | hudiutilities | `9.04% <0.00%> (-0.08%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Last update 
[d024439...b4aa786](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Read the [comment 
docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a 

[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380996#comment-17380996
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670110077



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -171,4 +200,38 @@ private int doCluster(JavaSparkContext jsc) throws 
Exception {
   return client.scheduleClustering(Option.empty());
 }
   }
+
+  @TestOnly
+  public int doScheduleAndCluster() throws Exception {

Review comment:
   Sure thing. Changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878091821


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3259](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b4aa786) into 
[master](https://codecov.io/gh/apache/hudi/commit/d024439764ceeca6366cb33689b729a1c69a6272?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d024439) will **decrease** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3259/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #3259  +/-   ##
   ===
   - Coverage  2.83%   2.82%   -0.01% 
 Complexity   85  85  
   ===
 Files   284 284  
 Lines 11835   11859  +24 
 Branches982 988   +6 
   ===
 Hits335 335  
   - Misses11474   11498  +24 
 Partials 26  26  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudiclient | `0.00% <ø> (ø)` | |
   | hudisync | `4.85% <ø> (ø)` | |
   | hudiutilities | `9.04% <0.00%> (-0.08%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/3259/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=footer_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Last update 
[d024439...b4aa786](https://codecov.io/gh/apache/hudi/pull/3259?src=pr=lastupdated_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
 Read the [comment 
docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670110077



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -171,4 +200,38 @@ private int doCluster(JavaSparkContext jsc) throws 
Exception {
   return client.scheduleClustering(Option.empty());
 }
   }
+
+  @TestOnly
+  public int doScheduleAndCluster() throws Exception {

Review comment:
   Sure thing. Changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380995#comment-17380995
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670109986



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -121,17 +141,26 @@ public static void main(String[] args) {
   public int cluster(int retry) {
 this.fs = FSUtils.getFs(cfg.basePath, jsc.hadoopConfiguration());
 int ret = UtilHelpers.retry(retry, () -> {
-  if (cfg.runSchedule) {
-LOG.info("Do schedule");
-Option instantTime = doSchedule(jsc);
-int result = instantTime.isPresent() ? 0 : -1;
-if (result == 0) {
-  LOG.info("The schedule instant time is " + instantTime.get());
+  String runningMode = cfg.runningMode == null ? "" : 
cfg.runningMode.toLowerCase();
+  switch (runningMode) {
+case SCHEDULE: {
+  LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+  Option instantTime = doSchedule(jsc);
+  int result = instantTime.isPresent() ? 0 : -1;
+  if (result == 0) {
+LOG.info("The schedule instant time is " + instantTime.get());
+  }
+  return result;
+}
+case SCHEDULE_AND_EXECUTE: {
+  LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+  return doScheduleAndCluster(jsc);
+}
+case EXECUTE:
+default: {
+  LOG.info("Running Mode: [" + EXECUTE + "]; Do cluster");

Review comment:
   Nice catching. I changed the default behavior as `LOG.info("Unsupported 
running mode [" + runningMode + "], quit the job directly");` in case users set 
a wrong value of --mode like `--mode abcd`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670109986



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -121,17 +141,26 @@ public static void main(String[] args) {
   public int cluster(int retry) {
 this.fs = FSUtils.getFs(cfg.basePath, jsc.hadoopConfiguration());
 int ret = UtilHelpers.retry(retry, () -> {
-  if (cfg.runSchedule) {
-LOG.info("Do schedule");
-Option instantTime = doSchedule(jsc);
-int result = instantTime.isPresent() ? 0 : -1;
-if (result == 0) {
-  LOG.info("The schedule instant time is " + instantTime.get());
+  String runningMode = cfg.runningMode == null ? "" : 
cfg.runningMode.toLowerCase();
+  switch (runningMode) {
+case SCHEDULE: {
+  LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+  Option instantTime = doSchedule(jsc);
+  int result = instantTime.isPresent() ? 0 : -1;
+  if (result == 0) {
+LOG.info("The schedule instant time is " + instantTime.get());
+  }
+  return result;
+}
+case SCHEDULE_AND_EXECUTE: {
+  LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+  return doScheduleAndCluster(jsc);
+}
+case EXECUTE:
+default: {
+  LOG.info("Running Mode: [" + EXECUTE + "]; Do cluster");

Review comment:
   Nice catching. I changed the default behavior as `LOG.info("Unsupported 
running mode [" + runningMode + "], quit the job directly");` in case users set 
a wrong value of --mode like `--mode abcd`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380994#comment-17380994
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670109400



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -121,17 +141,26 @@ public static void main(String[] args) {
   public int cluster(int retry) {
 this.fs = FSUtils.getFs(cfg.basePath, jsc.hadoopConfiguration());
 int ret = UtilHelpers.retry(retry, () -> {
-  if (cfg.runSchedule) {
-LOG.info("Do schedule");
-Option instantTime = doSchedule(jsc);
-int result = instantTime.isPresent() ? 0 : -1;
-if (result == 0) {
-  LOG.info("The schedule instant time is " + instantTime.get());
+  String runningMode = cfg.runningMode == null ? "" : 
cfg.runningMode.toLowerCase();

Review comment:
   When developers call `public int cluster(int retry)` internally like 
https://github.com/apache/hudi/blob/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java#L1069
   They may not set running mode config, so we need check this value to avoid 
NLP.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670109400



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -121,17 +141,26 @@ public static void main(String[] args) {
   public int cluster(int retry) {
 this.fs = FSUtils.getFs(cfg.basePath, jsc.hadoopConfiguration());
 int ret = UtilHelpers.retry(retry, () -> {
-  if (cfg.runSchedule) {
-LOG.info("Do schedule");
-Option instantTime = doSchedule(jsc);
-int result = instantTime.isPresent() ? 0 : -1;
-if (result == 0) {
-  LOG.info("The schedule instant time is " + instantTime.get());
+  String runningMode = cfg.runningMode == null ? "" : 
cfg.runningMode.toLowerCase();

Review comment:
   When developers call `public int cluster(int retry)` internally like 
https://github.com/apache/hudi/blob/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java#L1069
   They may not set running mode config, so we need check this value to avoid 
NLP.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380993#comment-17380993
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   * b4aa7869d8343a16b225a81844e907fbee63b576 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=919)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   * b4aa7869d8343a16b225a81844e907fbee63b576 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=919)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380992#comment-17380992
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   * b4aa7869d8343a16b225a81844e907fbee63b576 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-14 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   * b4aa7869d8343a16b225a81844e907fbee63b576 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2059) When log exists in mor table, clustering is triggered. The query result shows that the update record in log is lost

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380986#comment-17380986
 ] 

ASF GitHub Bot commented on HUDI-2059:
--

xiarixiaoyao closed pull request #3181:
URL: https://github.com/apache/hudi/pull/3181


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> When log exists in mor table,  clustering is triggered. The query result 
> shows that the update record in log is lost
> 
>
> Key: HUDI-2059
> URL: https://issues.apache.org/jira/browse/HUDI-2059
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
> Environment: hadoop 3.1.1
> spark3.1.1/spark2.4.5
> hive3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> When log exists in mor table, and clustering is triggered. The query result 
> shows that the update record of log is lost。
> the reason of this problem is that:  hoodie use HoodieFileSliceReader to read 
> table data and then do clustering.  HoodieFileSliceReader call 
> HoodieMergedLogRecordScanner.
> processNextRecord to merge update values and old valuse,   when call that 
> function old values is reserved update values is discarded, this is wrong。
> test step:
> // step1 : create hudi mor table
> val df = spark.range(0, 1000).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1))
>  .withColumn("p", lit(2))
> df.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL).
>  option(PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(PARTITIONPATH_FIELD_OPT_KEY, "p").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert").
>  option(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
> classOf[org.apache.hudi.keygen.ComplexKeyGenerator].getName).
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option("hoodie.upsert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Overwrite).save(basePath)
> // step2, update age where keyid < 5 to produce log files
> val df1 = spark.range(0, 5).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1 + 1000))
>  .withColumn("p", lit(2))
> df1.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL).
>  option(PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(PARTITIONPATH_FIELD_OPT_KEY, "p").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "upsert").
>  option(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
> classOf[org.apache.hudi.keygen.ComplexKeyGenerator].getName).
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option("hoodie.upsert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Append).save(basePath)
> // step3, do cluster inline
> val df2 = spark.range(6, 10).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1 + 2000))
>  .withColumn("p", lit(2))
> df2.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL).
>  option(PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(PARTITIONPATH_FIELD_OPT_KEY, "p").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "upsert").
>  option(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
> classOf[org.apache.hudi.keygen.ComplexKeyGenerator].getName).
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option("hoodie.upsert.shuffle.parallelism", "4").
>  option("hoodie.parquet.small.file.limit", "0").
>  option("hoodie.clustering.inline", "true").
>  option("hoodie.clustering.inline.max.commits", "1").
>  option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
> "1073741824").
>  option("hoodie.clustering.plan.strategy.small.file.limit", "629145600").
>  option("hoodie.clustering.plan.strategy.max.bytes.per.group", 
> Long.MaxValue.toString)
>  .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Append).save(basePath)
> spark.read.format("hudi")
>  .load(basePath).select("age").where("keyid = 0").show(100, false)
> +---+
> |age|
> +---+
> |1 |
> +—+
> the result is wrong, since we update the value of age to 1001 at step 2.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira

[jira] [Commented] (HUDI-2059) When log exists in mor table, clustering is triggered. The query result shows that the update record in log is lost

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380985#comment-17380985
 ] 

ASF GitHub Bot commented on HUDI-2059:
--

xiarixiaoyao commented on pull request #3181:
URL: https://github.com/apache/hudi/pull/3181#issuecomment-880359721


   @garyli1019thanks for your review。   close this pr, since HUDI-2170 
solved this problem。 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> When log exists in mor table,  clustering is triggered. The query result 
> shows that the update record in log is lost
> 
>
> Key: HUDI-2059
> URL: https://issues.apache.org/jira/browse/HUDI-2059
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
> Environment: hadoop 3.1.1
> spark3.1.1/spark2.4.5
> hive3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> When log exists in mor table, and clustering is triggered. The query result 
> shows that the update record of log is lost。
> the reason of this problem is that:  hoodie use HoodieFileSliceReader to read 
> table data and then do clustering.  HoodieFileSliceReader call 
> HoodieMergedLogRecordScanner.
> processNextRecord to merge update values and old valuse,   when call that 
> function old values is reserved update values is discarded, this is wrong。
> test step:
> // step1 : create hudi mor table
> val df = spark.range(0, 1000).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1))
>  .withColumn("p", lit(2))
> df.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL).
>  option(PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(PARTITIONPATH_FIELD_OPT_KEY, "p").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert").
>  option(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
> classOf[org.apache.hudi.keygen.ComplexKeyGenerator].getName).
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option("hoodie.upsert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Overwrite).save(basePath)
> // step2, update age where keyid < 5 to produce log files
> val df1 = spark.range(0, 5).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1 + 1000))
>  .withColumn("p", lit(2))
> df1.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL).
>  option(PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(PARTITIONPATH_FIELD_OPT_KEY, "p").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "upsert").
>  option(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
> classOf[org.apache.hudi.keygen.ComplexKeyGenerator].getName).
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option("hoodie.upsert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Append).save(basePath)
> // step3, do cluster inline
> val df2 = spark.range(6, 10).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1 + 2000))
>  .withColumn("p", lit(2))
> df2.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL).
>  option(PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(PARTITIONPATH_FIELD_OPT_KEY, "p").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "upsert").
>  option(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
> classOf[org.apache.hudi.keygen.ComplexKeyGenerator].getName).
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option("hoodie.upsert.shuffle.parallelism", "4").
>  option("hoodie.parquet.small.file.limit", "0").
>  option("hoodie.clustering.inline", "true").
>  option("hoodie.clustering.inline.max.commits", "1").
>  option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
> "1073741824").
>  option("hoodie.clustering.plan.strategy.small.file.limit", "629145600").
>  option("hoodie.clustering.plan.strategy.max.bytes.per.group", 
> Long.MaxValue.toString)
>  .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Append).save(basePath)
> spark.read.format("hudi")
>  .load(basePath).select("age").where("keyid = 0").show(100, false)
> +---+
> |age|
> +---+
> |1 |
> +—+
> the result is 

[GitHub] [hudi] xiarixiaoyao closed pull request #3181: [HUDI-2059] When log exists in mor table, clustering is triggered. The query result shows that the update record in log is lost

2021-07-14 Thread GitBox


xiarixiaoyao closed pull request #3181:
URL: https://github.com/apache/hudi/pull/3181


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #3181: [HUDI-2059] When log exists in mor table, clustering is triggered. The query result shows that the update record in log is lost

2021-07-14 Thread GitBox


xiarixiaoyao commented on pull request #3181:
URL: https://github.com/apache/hudi/pull/3181#issuecomment-880359721


   @garyli1019thanks for your review。   close this pr, since HUDI-2170 
solved this problem。 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2139) MergeInto MOR Table May Result InCorrect Result

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380982#comment-17380982
 ] 

ASF GitHub Bot commented on HUDI-2139:
--

pengzhiwei2018 commented on a change in pull request #3230:
URL: https://github.com/apache/hudi/pull/3230#discussion_r670098428



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##
@@ -189,8 +191,14 @@ protected boolean isUpdateRecord(HoodieRecord 
hoodieRecord) {
   private Option getIndexedRecord(HoodieRecord hoodieRecord) 
{
 Option> recordMetadata = 
hoodieRecord.getData().getMetadata();
 try {
-  Option avroRecord = 
hoodieRecord.getData().getInsertValue(tableSchema,
-  config.getProps());
+  // Pass the isUpdateRecord to the props for HoodieRecordPayload to judge
+  // Whether it is a update or insert record.
+  boolean isUpdateRecord = isUpdateRecord(hoodieRecord);

Review comment:
   Here I just pass the `isUpdateRecord` flag to the `ExpressionPayload`. 
So it can know current record is a matched record or not matched record. The 
matched record will execute the match-clause in merge-into, while the 
not-matched record will execute the not-match-clause. If we do not have such 
information, the result of merge into will be incorrect.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala
##
@@ -126,48 +140,62 @@ class ExpressionPayload(record: GenericRecord,
 }
   }
 
+  /**
+   * Process the not-matched record. Test if the record matched any of 
insert-conditions,
+   * if matched then return the result of insert-assignment. Or else return a
+   * {@link HoodieWriteHandle.IGNORE_RECORD} which will be ignored by 
HoodieWriteHandle.
+   *
+   * @param inputRecord The input record to process.
+   * @param properties  The properties.
+   * @return The result of the record to insert.
+   */
+  private def processNotMatchedRecord(inputRecord: SqlTypedRecord, properties: 
Properties): HOption[IndexedRecord] = {
+val insertConditionAndAssignmentsText =
+  
properties.get(ExpressionPayload.PAYLOAD_INSERT_CONDITION_AND_ASSIGNMENTS)
+// Get the evaluator for each condition and insert assignment.
+initWriteSchemaIfNeed(properties)
+val insertConditionAndAssignments =
+  
ExpressionPayload.getEvaluator(insertConditionAndAssignmentsText.toString, 
writeSchema)
+var resultRecordOpt: HOption[IndexedRecord] = null
+for ((conditionEvaluator, assignmentEvaluator) <- 
insertConditionAndAssignments
+ if resultRecordOpt == null) {
+  val conditionVal = evaluate(conditionEvaluator, 
inputRecord).head.asInstanceOf[Boolean]
+  // If matched the insert condition then execute the assignment 
expressions to compute the
+  // result record. We will return the first matched record.
+  if (conditionVal) {
+val results = evaluate(assignmentEvaluator, inputRecord)
+resultRecordOpt = HOption.of(convertToRecord(results, writeSchema))
+  }
+}
+if (resultRecordOpt != null) {
+  resultRecordOpt
+} else {
+  // If there is no condition matched, just filter this record.
+  // Here we return a IGNORE_RECORD, HoodieCreateHandle will not handle it.
+  HOption.of(HoodieWriteHandle.IGNORE_RECORD)
+}
+  }
+
   override def getInsertValue(schema: Schema, properties: Properties): 
HOption[IndexedRecord] = {
 val incomingRecord = bytesToAvro(recordBytes, schema)
 if (isDeleteRecord(incomingRecord)) {
   HOption.empty[IndexedRecord]()
 } else {
-  val insertConditionAndAssignmentsText =
-
properties.get(ExpressionPayload.PAYLOAD_INSERT_CONDITION_AND_ASSIGNMENTS)
-  // Process insert
   val sqlTypedRecord = new SqlTypedRecord(incomingRecord)
-  // Get the evaluator for each condition and insert assignment.
-  initWriteSchemaIfNeed(properties)
-  val insertConditionAndAssignments =
-
ExpressionPayload.getEvaluator(insertConditionAndAssignmentsText.toString, 
writeSchema)
-  var resultRecordOpt: HOption[IndexedRecord] = null
-  for ((conditionEvaluator, assignmentEvaluator) <- 
insertConditionAndAssignments
-   if resultRecordOpt == null) {
-val conditionVal = evaluate(conditionEvaluator, 
sqlTypedRecord).head.asInstanceOf[Boolean]
-// If matched the insert condition then execute the assignment 
expressions to compute the
-// result record. We will return the first matched record.
-if (conditionVal) {
-  val results = evaluate(assignmentEvaluator, sqlTypedRecord)
-  resultRecordOpt = HOption.of(convertToRecord(results, writeSchema))
-}
-  }
-
-  // Process delete for MOR
-  if (resultRecordOpt == null && isMORTable(properties)) {
-val deleteConditionText = 

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3230: [HUDI-2139] MergeInto MOR Table May Result InCorrect Result

2021-07-14 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3230:
URL: https://github.com/apache/hudi/pull/3230#discussion_r670098428



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##
@@ -189,8 +191,14 @@ protected boolean isUpdateRecord(HoodieRecord 
hoodieRecord) {
   private Option getIndexedRecord(HoodieRecord hoodieRecord) 
{
 Option> recordMetadata = 
hoodieRecord.getData().getMetadata();
 try {
-  Option avroRecord = 
hoodieRecord.getData().getInsertValue(tableSchema,
-  config.getProps());
+  // Pass the isUpdateRecord to the props for HoodieRecordPayload to judge
+  // Whether it is a update or insert record.
+  boolean isUpdateRecord = isUpdateRecord(hoodieRecord);

Review comment:
   Here I just pass the `isUpdateRecord` flag to the `ExpressionPayload`. 
So it can know current record is a matched record or not matched record. The 
matched record will execute the match-clause in merge-into, while the 
not-matched record will execute the not-match-clause. If we do not have such 
information, the result of merge into will be incorrect.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala
##
@@ -126,48 +140,62 @@ class ExpressionPayload(record: GenericRecord,
 }
   }
 
+  /**
+   * Process the not-matched record. Test if the record matched any of 
insert-conditions,
+   * if matched then return the result of insert-assignment. Or else return a
+   * {@link HoodieWriteHandle.IGNORE_RECORD} which will be ignored by 
HoodieWriteHandle.
+   *
+   * @param inputRecord The input record to process.
+   * @param properties  The properties.
+   * @return The result of the record to insert.
+   */
+  private def processNotMatchedRecord(inputRecord: SqlTypedRecord, properties: 
Properties): HOption[IndexedRecord] = {
+val insertConditionAndAssignmentsText =
+  
properties.get(ExpressionPayload.PAYLOAD_INSERT_CONDITION_AND_ASSIGNMENTS)
+// Get the evaluator for each condition and insert assignment.
+initWriteSchemaIfNeed(properties)
+val insertConditionAndAssignments =
+  
ExpressionPayload.getEvaluator(insertConditionAndAssignmentsText.toString, 
writeSchema)
+var resultRecordOpt: HOption[IndexedRecord] = null
+for ((conditionEvaluator, assignmentEvaluator) <- 
insertConditionAndAssignments
+ if resultRecordOpt == null) {
+  val conditionVal = evaluate(conditionEvaluator, 
inputRecord).head.asInstanceOf[Boolean]
+  // If matched the insert condition then execute the assignment 
expressions to compute the
+  // result record. We will return the first matched record.
+  if (conditionVal) {
+val results = evaluate(assignmentEvaluator, inputRecord)
+resultRecordOpt = HOption.of(convertToRecord(results, writeSchema))
+  }
+}
+if (resultRecordOpt != null) {
+  resultRecordOpt
+} else {
+  // If there is no condition matched, just filter this record.
+  // Here we return a IGNORE_RECORD, HoodieCreateHandle will not handle it.
+  HOption.of(HoodieWriteHandle.IGNORE_RECORD)
+}
+  }
+
   override def getInsertValue(schema: Schema, properties: Properties): 
HOption[IndexedRecord] = {
 val incomingRecord = bytesToAvro(recordBytes, schema)
 if (isDeleteRecord(incomingRecord)) {
   HOption.empty[IndexedRecord]()
 } else {
-  val insertConditionAndAssignmentsText =
-
properties.get(ExpressionPayload.PAYLOAD_INSERT_CONDITION_AND_ASSIGNMENTS)
-  // Process insert
   val sqlTypedRecord = new SqlTypedRecord(incomingRecord)
-  // Get the evaluator for each condition and insert assignment.
-  initWriteSchemaIfNeed(properties)
-  val insertConditionAndAssignments =
-
ExpressionPayload.getEvaluator(insertConditionAndAssignmentsText.toString, 
writeSchema)
-  var resultRecordOpt: HOption[IndexedRecord] = null
-  for ((conditionEvaluator, assignmentEvaluator) <- 
insertConditionAndAssignments
-   if resultRecordOpt == null) {
-val conditionVal = evaluate(conditionEvaluator, 
sqlTypedRecord).head.asInstanceOf[Boolean]
-// If matched the insert condition then execute the assignment 
expressions to compute the
-// result record. We will return the first matched record.
-if (conditionVal) {
-  val results = evaluate(assignmentEvaluator, sqlTypedRecord)
-  resultRecordOpt = HOption.of(convertToRecord(results, writeSchema))
-}
-  }
-
-  // Process delete for MOR
-  if (resultRecordOpt == null && isMORTable(properties)) {
-val deleteConditionText = 
properties.get(ExpressionPayload.PAYLOAD_DELETE_CONDITION)
-if (deleteConditionText != null) {
-  val deleteCondition = getEvaluator(deleteConditionText.toString, 
writeSchema).head._1
-  val deleteConditionVal = 

[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380980#comment-17380980
 ] 

ASF GitHub Bot commented on HUDI-2086:
--

xiarixiaoyao opened a new pull request #3203:
URL: https://github.com/apache/hudi/pull/3203


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   redo the logical of mor_incremental_view for hive to fix some bugs for 
mor_incremental_view for hive/sparksql
   
   purpose of the pull request:
   
   1) support read the lastest incremental datas which are stored by logs
   2) support read incremental datas which before replacecommit
   3) support read file groups which has only logs
   4) keep the logical of mor_incremental_view  as the same logicl as spark 
dataSource
   
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   new UT added
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> redo the logical of mor_incremental_view for hive
> -
>
> Key: HUDI-2086
> URL: https://issues.apache.org/jira/browse/HUDI-2086
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
> Environment: spark3.1.1
> hive3.1.1
> hadoop3.1.1
> os: suse
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
>
> now ,There are some problems with mor_incremental_view for hive。
> For example,
> 1):*hudi cannot read the lastest incremental datas which are stored by logs*
> think that:  create a mor table with bulk_insert, and then do upsert for this 
> table, 
> no we want to query the latest incremental data by hive/sparksql,   however 
> the lastest incremental datas are stored by logs,   when we do query nothings 
> will return
> step1: prepare data
> val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, 
> x+"jack", Random.nextInt(2))).toDF()
>  .withColumn("col3", expr("keyid + 3000"))
>  .withColumn("p", lit(1))
> step2: do bulk_insert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> step3: do upsert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> step4:  check the lastest commit time and do query
> spark.sql("set hoodie.inc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.inc.consume.max.commits=1")
> spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935")
> spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > 
> '20210628103935' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, 
> the incr query result is wrong when we want to query the data before 
> insert_overwrite/insert_overwrite_table*
> step1: do bulk_insert 
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> now the commits is
> [20210628160614.deltacommit ]
> step2: do insert_overwrite_table
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table")
> now the commits is
> [20210628160614.deltacommit, 20210628160923.replacecommit ]
> step3: query the data before insert_overwrite_table
> spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.overInc.consume.max.commits=1")
> spark.sql("set hoodie.overInc.consume.start.timestamp=0")
> spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > 
> '0' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 3) *hive/presto/flink  cannot read  file groups which has only logs*
> when we use hbase/inmemory as index, mor table will produce log files instead 
> of parquet file, but now hive/presto cannot read those files 

[GitHub] [hudi] xiarixiaoyao opened a new pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-07-14 Thread GitBox


xiarixiaoyao opened a new pull request #3203:
URL: https://github.com/apache/hudi/pull/3203


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   redo the logical of mor_incremental_view for hive to fix some bugs for 
mor_incremental_view for hive/sparksql
   
   purpose of the pull request:
   
   1) support read the lastest incremental datas which are stored by logs
   2) support read incremental datas which before replacecommit
   3) support read file groups which has only logs
   4) keep the logical of mor_incremental_view  as the same logicl as spark 
dataSource
   
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   new UT added
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380979#comment-17380979
 ] 

ASF GitHub Bot commented on HUDI-2086:
--

xiarixiaoyao closed pull request #3203:
URL: https://github.com/apache/hudi/pull/3203


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> redo the logical of mor_incremental_view for hive
> -
>
> Key: HUDI-2086
> URL: https://issues.apache.org/jira/browse/HUDI-2086
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
> Environment: spark3.1.1
> hive3.1.1
> hadoop3.1.1
> os: suse
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
>
> now ,There are some problems with mor_incremental_view for hive。
> For example,
> 1):*hudi cannot read the lastest incremental datas which are stored by logs*
> think that:  create a mor table with bulk_insert, and then do upsert for this 
> table, 
> no we want to query the latest incremental data by hive/sparksql,   however 
> the lastest incremental datas are stored by logs,   when we do query nothings 
> will return
> step1: prepare data
> val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, 
> x+"jack", Random.nextInt(2))).toDF()
>  .withColumn("col3", expr("keyid + 3000"))
>  .withColumn("p", lit(1))
> step2: do bulk_insert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> step3: do upsert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> step4:  check the lastest commit time and do query
> spark.sql("set hoodie.inc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.inc.consume.max.commits=1")
> spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935")
> spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > 
> '20210628103935' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, 
> the incr query result is wrong when we want to query the data before 
> insert_overwrite/insert_overwrite_table*
> step1: do bulk_insert 
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> now the commits is
> [20210628160614.deltacommit ]
> step2: do insert_overwrite_table
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table")
> now the commits is
> [20210628160614.deltacommit, 20210628160923.replacecommit ]
> step3: query the data before insert_overwrite_table
> spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.overInc.consume.max.commits=1")
> spark.sql("set hoodie.overInc.consume.start.timestamp=0")
> spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > 
> '0' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 3) *hive/presto/flink  cannot read  file groups which has only logs*
> when we use hbase/inmemory as index, mor table will produce log files instead 
> of parquet file, but now hive/presto cannot read those files since those 
> files are log files.
> *HUDI-2048* mentions this problem.
>  
> however when we use spark data source to executre incremental query, there is 
> no such problem above。keep the logical of mor_incremental_view for hive as 
> the same logicl as spark dataSource is necessary。
> we redo the logical of mor_incremental_view for hive,to solve above problems 
> and keep the logical of mor_incremental_view  as the same logicl as spark 
> dataSource
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xiarixiaoyao closed pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-07-14 Thread GitBox


xiarixiaoyao closed pull request #3203:
URL: https://github.com/apache/hudi/pull/3203


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1676) Support SQL with spark3

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380977#comment-17380977
 ] 

ASF GitHub Bot commented on HUDI-1676:
--

xiarixiaoyao closed pull request #2761:
URL: https://github.com/apache/hudi/pull/2761


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support SQL with spark3
> ---
>
> Key: HUDI-1676
> URL: https://issues.apache.org/jira/browse/HUDI-1676
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>
> 1、support CTAS for spark3
> 3、support INSERT for spark3
> 4、support merge、update、delete without RowKey constraint for spark3
> 5、support dataSourceV2 for spark3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2029) Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380978#comment-17380978
 ] 

ASF GitHub Bot commented on HUDI-2029:
--

nsivabalan merged pull request #3128:
URL: https://github.com/apache/hudi/pull/3128


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement compression for DiskBasedMap in Spillable Map
> ---
>
> Key: HUDI-2029
> URL: https://issues.apache.org/jira/browse/HUDI-2029
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: pull-request-available
>
> Implement compression for DiskBasedMap in Spillable Map 
> Without compression, DiskBasedMap is causing more spilling to disk than 
> RockDb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (75040ee -> d024439)

2021-07-14 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 75040ee  [HUDI-2149] Ensure and Audit docs for every configuration 
class in the codebase (#3272)
 add d024439  [HUDI-2029] Implement compression for DiskBasedMap in 
Spillable Map (#3128)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/config/HoodieWriteConfig.java  | 14 
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  2 +-
 .../common/util/collection/BitCaskDiskMap.java | 93 ++
 .../util/collection/ExternalSpillableMap.java  | 10 ++-
 .../common/util/collection/LazyFileIterable.java   |  9 ++-
 .../common/util/collection/TestBitCaskDiskMap.java | 40 ++
 .../util/collection/TestExternalSpillableMap.java  | 53 +++-
 7 files changed, 167 insertions(+), 54 deletions(-)


[GitHub] [hudi] nsivabalan merged pull request #3128: [HUDI-2029] Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread GitBox


nsivabalan merged pull request #3128:
URL: https://github.com/apache/hudi/pull/3128


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao closed pull request #2761: [HUDI-1676] Support SQL with spark3

2021-07-14 Thread GitBox


xiarixiaoyao closed pull request #2761:
URL: https://github.com/apache/hudi/pull/2761


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2029) Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380976#comment-17380976
 ] 

ASF GitHub Bot commented on HUDI-2029:
--

nsivabalan commented on a change in pull request #3128:
URL: https://github.com/apache/hudi/pull/3128#discussion_r669909750



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/BitCaskDiskMap.java
##
@@ -188,21 +204,25 @@ public R get(Object key) {
   }
 
   private R get(ValueMetadata entry) {
-return get(entry, getRandomAccessFile());
+return get(entry, getRandomAccessFile(), isCompressionEnabled);
   }
 
-  public static  R get(ValueMetadata entry, RandomAccessFile file) {
+  public static  R get(ValueMetadata entry, RandomAccessFile file, boolean 
isCompressionEnabled) {
 try {
-  return SerializationUtils
-  .deserialize(SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue()));
+  byte[] bytesFromDisk = SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue());
+  if (isCompressionEnabled) {
+return 
SerializationUtils.deserialize(DISK_COMPRESSION_REF.get().decompressBytes(bytesFromDisk));
+  }

Review comment:
   not required to fix in this patch, but something to keep in mind. would 
be good to have an explicit else block for line 216. this "if" block is just 
one line and so its fine. But if its a large "if" block, then reader/dev might 
might wonder that some code path may not return from within "if" block and 
hence we have a return outside of "if" block. 
   So, whenever you have "if" "else"s, try to always explicitly add else block. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement compression for DiskBasedMap in Spillable Map
> ---
>
> Key: HUDI-2029
> URL: https://issues.apache.org/jira/browse/HUDI-2029
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: pull-request-available
>
> Implement compression for DiskBasedMap in Spillable Map 
> Without compression, DiskBasedMap is causing more spilling to disk than 
> RockDb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380974#comment-17380974
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

nsivabalan commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670095143



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option 
record) {
   @Override
   public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload another) {
 // pick the payload with greatest ordering value
-if (another.orderingVal.compareTo(orderingVal) > 0) {
+if (another.orderingVal.compareTo(orderingVal) >= 0) {

Review comment:
   Looks good to me. 
   @vinothchandar : Can you think of any particular reason why it was done this 
way? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3128: [HUDI-2029] Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread GitBox


nsivabalan commented on a change in pull request #3128:
URL: https://github.com/apache/hudi/pull/3128#discussion_r669909750



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/BitCaskDiskMap.java
##
@@ -188,21 +204,25 @@ public R get(Object key) {
   }
 
   private R get(ValueMetadata entry) {
-return get(entry, getRandomAccessFile());
+return get(entry, getRandomAccessFile(), isCompressionEnabled);
   }
 
-  public static  R get(ValueMetadata entry, RandomAccessFile file) {
+  public static  R get(ValueMetadata entry, RandomAccessFile file, boolean 
isCompressionEnabled) {
 try {
-  return SerializationUtils
-  .deserialize(SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue()));
+  byte[] bytesFromDisk = SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue());
+  if (isCompressionEnabled) {
+return 
SerializationUtils.deserialize(DISK_COMPRESSION_REF.get().decompressBytes(bytesFromDisk));
+  }

Review comment:
   not required to fix in this patch, but something to keep in mind. would 
be good to have an explicit else block for line 216. this "if" block is just 
one line and so its fine. But if its a large "if" block, then reader/dev might 
might wonder that some code path may not return from within "if" block and 
hence we have a return outside of "if" block. 
   So, whenever you have "if" "else"s, try to always explicitly add else block. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread GitBox


nsivabalan commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670095143



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option 
record) {
   @Override
   public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload another) {
 // pick the payload with greatest ordering value
-if (another.orderingVal.compareTo(orderingVal) > 0) {
+if (another.orderingVal.compareTo(orderingVal) >= 0) {

Review comment:
   Looks good to me. 
   @vinothchandar : Can you think of any particular reason why it was done this 
way? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #3266: [SUPPORT] Upsert data with an identical record key and pre-combine field

2021-07-14 Thread GitBox


danny0405 commented on issue #3266:
URL: https://github.com/apache/hudi/issues/3266#issuecomment-880349759


   Yes, the PR would be merged soon once the CI tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2029) Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380973#comment-17380973
 ] 

ASF GitHub Bot commented on HUDI-2029:
--

rmahindra123 commented on a change in pull request #3128:
URL: https://github.com/apache/hudi/pull/3128#discussion_r670094851



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -200,7 +200,7 @@ protected void initializeIncomingRecordsMap() {
   LOG.info("MaxMemoryPerPartitionMerge => " + memoryForMerge);
   this.keyToNewRecords = new ExternalSpillableMap<>(memoryForMerge, 
config.getSpillableMapBasePath(),
   new DefaultSizeEstimator(), new 
HoodieRecordSizeEstimator(tableSchema),
-  config.getSpillableDiskMapType());
+  config.getSpillableDiskMapType(), 
config.isBitCaskDiskMapCompressionEnabled());

Review comment:
   Good point, will be done in a follow up PR 
https://issues.apache.org/jira/browse/HUDI-2044




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement compression for DiskBasedMap in Spillable Map
> ---
>
> Key: HUDI-2029
> URL: https://issues.apache.org/jira/browse/HUDI-2029
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: pull-request-available
>
> Implement compression for DiskBasedMap in Spillable Map 
> Without compression, DiskBasedMap is causing more spilling to disk than 
> RockDb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2044) Extend support for rockDB and compression for Spillable map to all consumers of ExternalSpillableMap

2021-07-14 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra updated HUDI-2044:
--
Summary: Extend support for rockDB and compression for Spillable map to all 
consumers of ExternalSpillableMap  (was: Extend support for rocked for 
spoilable map to all consumers of ExternalSpillableMap)

> Extend support for rockDB and compression for Spillable map to all consumers 
> of ExternalSpillableMap
> 
>
> Key: HUDI-2044
> URL: https://issues.apache.org/jira/browse/HUDI-2044
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Major
>
> # HUDI-2028 only implements rockDb support for Spillable map in 
> HoodieMergeHandle since we are blocked on the configuration refactor PR to 
> land
>  # This ticket will track the implementation to extend rocksDB support for 
> Spoilable Map to all consumers of ExternalSpillableMap.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] rmahindra123 commented on a change in pull request #3128: [HUDI-2029] Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread GitBox


rmahindra123 commented on a change in pull request #3128:
URL: https://github.com/apache/hudi/pull/3128#discussion_r670094851



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -200,7 +200,7 @@ protected void initializeIncomingRecordsMap() {
   LOG.info("MaxMemoryPerPartitionMerge => " + memoryForMerge);
   this.keyToNewRecords = new ExternalSpillableMap<>(memoryForMerge, 
config.getSpillableMapBasePath(),
   new DefaultSizeEstimator(), new 
HoodieRecordSizeEstimator(tableSchema),
-  config.getSpillableDiskMapType());
+  config.getSpillableDiskMapType(), 
config.isBitCaskDiskMapCompressionEnabled());

Review comment:
   Good point, will be done in a follow up PR 
https://issues.apache.org/jira/browse/HUDI-2044




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2029) Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380972#comment-17380972
 ] 

ASF GitHub Bot commented on HUDI-2029:
--

nsivabalan commented on a change in pull request #3128:
URL: https://github.com/apache/hudi/pull/3128#discussion_r669909750



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/BitCaskDiskMap.java
##
@@ -188,21 +204,25 @@ public R get(Object key) {
   }
 
   private R get(ValueMetadata entry) {
-return get(entry, getRandomAccessFile());
+return get(entry, getRandomAccessFile(), isCompressionEnabled);
   }
 
-  public static  R get(ValueMetadata entry, RandomAccessFile file) {
+  public static  R get(ValueMetadata entry, RandomAccessFile file, boolean 
isCompressionEnabled) {
 try {
-  return SerializationUtils
-  .deserialize(SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue()));
+  byte[] bytesFromDisk = SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue());
+  if (isCompressionEnabled) {
+return 
SerializationUtils.deserialize(DISK_COMPRESSION_REF.get().decompressBytes(bytesFromDisk));
+  }

Review comment:
   not required to fix in this patch, but something to keep in mind. would 
be good to have an explicit else block for line 216. this "if" block is just 
one line and so its fine. But if its a large "if" block, then reader/dev might 
might wonder that some code path may not return from within "if" block and 
hence we have a return outside of "if" block. 
   So, whenever you have if else, try to always explicitly add else block. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -200,7 +200,7 @@ protected void initializeIncomingRecordsMap() {
   LOG.info("MaxMemoryPerPartitionMerge => " + memoryForMerge);
   this.keyToNewRecords = new ExternalSpillableMap<>(memoryForMerge, 
config.getSpillableMapBasePath(),
   new DefaultSizeEstimator(), new 
HoodieRecordSizeEstimator(tableSchema),
-  config.getSpillableDiskMapType());
+  config.getSpillableDiskMapType(), 
config.isBitCaskDiskMapCompressionEnabled());

Review comment:
   Do you wanna make change in HoodieMergedLogRecordScanner as well ? Or 
thats planned for a follow up PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement compression for DiskBasedMap in Spillable Map
> ---
>
> Key: HUDI-2029
> URL: https://issues.apache.org/jira/browse/HUDI-2029
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: pull-request-available
>
> Implement compression for DiskBasedMap in Spillable Map 
> Without compression, DiskBasedMap is causing more spilling to disk than 
> RockDb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380971#comment-17380971
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

codecov-commenter edited a comment on pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#issuecomment-878977860


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3267](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd664b5) into 
[master](https://codecov.io/gh/apache/hudi/commit/b0089b894ad12da11fbd6a0fb08508c7adee68e6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b0089b8) will **decrease** coverage by `21.38%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3267/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3267   +/-   ##
   =
   - Coverage 47.72%   26.34%   -21.39% 
   + Complexity 5529 1303 -4226 
   =
 Files   934  386  -548 
 Lines 4145716006-25451 
 Branches   4167 1379 -2788 
   =
   - Hits  19787 4217-15570 
   + Misses1990811486 -8422 
   + Partials   1762  303 -1459 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.09% <ø> (-14.37%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.57% <ø> (-49.94%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.16% <ø> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: 

[GitHub] [hudi] nsivabalan commented on a change in pull request #3128: [HUDI-2029] Implement compression for DiskBasedMap in Spillable Map

2021-07-14 Thread GitBox


nsivabalan commented on a change in pull request #3128:
URL: https://github.com/apache/hudi/pull/3128#discussion_r669909750



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/BitCaskDiskMap.java
##
@@ -188,21 +204,25 @@ public R get(Object key) {
   }
 
   private R get(ValueMetadata entry) {
-return get(entry, getRandomAccessFile());
+return get(entry, getRandomAccessFile(), isCompressionEnabled);
   }
 
-  public static  R get(ValueMetadata entry, RandomAccessFile file) {
+  public static  R get(ValueMetadata entry, RandomAccessFile file, boolean 
isCompressionEnabled) {
 try {
-  return SerializationUtils
-  .deserialize(SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue()));
+  byte[] bytesFromDisk = SpillableMapUtils.readBytesFromDisk(file, 
entry.getOffsetOfValue(), entry.getSizeOfValue());
+  if (isCompressionEnabled) {
+return 
SerializationUtils.deserialize(DISK_COMPRESSION_REF.get().decompressBytes(bytesFromDisk));
+  }

Review comment:
   not required to fix in this patch, but something to keep in mind. would 
be good to have an explicit else block for line 216. this "if" block is just 
one line and so its fine. But if its a large "if" block, then reader/dev might 
might wonder that some code path may not return from within "if" block and 
hence we have a return outside of "if" block. 
   So, whenever you have if else, try to always explicitly add else block. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -200,7 +200,7 @@ protected void initializeIncomingRecordsMap() {
   LOG.info("MaxMemoryPerPartitionMerge => " + memoryForMerge);
   this.keyToNewRecords = new ExternalSpillableMap<>(memoryForMerge, 
config.getSpillableMapBasePath(),
   new DefaultSizeEstimator(), new 
HoodieRecordSizeEstimator(tableSchema),
-  config.getSpillableDiskMapType());
+  config.getSpillableDiskMapType(), 
config.isBitCaskDiskMapCompressionEnabled());

Review comment:
   Do you wanna make change in HoodieMergedLogRecordScanner as well ? Or 
thats planned for a follow up PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#issuecomment-878977860


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3267](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd664b5) into 
[master](https://codecov.io/gh/apache/hudi/commit/b0089b894ad12da11fbd6a0fb08508c7adee68e6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b0089b8) will **decrease** coverage by `21.38%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3267/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3267   +/-   ##
   =
   - Coverage 47.72%   26.34%   -21.39% 
   + Complexity 5529 1303 -4226 
   =
 Files   934  386  -548 
 Lines 4145716006-25451 
 Branches   4167 1379 -2788 
   =
   - Hits  19787 4217-15570 
   + Misses1990811486 -8422 
   + Partials   1762  303 -1459 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.09% <ø> (-14.37%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.57% <ø> (-49.94%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.16% <ø> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-1676) Support SQL with spark3

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380966#comment-17380966
 ] 

ASF GitHub Bot commented on HUDI-1676:
--

xiarixiaoyao commented on pull request #2761:
URL: https://github.com/apache/hudi/pull/2761#issuecomment-880347459


   @lw309637554 。
   Thank you for paying attention to this pr。 since #2645 has been merged,  i 
will close this pr。 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support SQL with spark3
> ---
>
> Key: HUDI-1676
> URL: https://issues.apache.org/jira/browse/HUDI-1676
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>
> 1、support CTAS for spark3
> 3、support INSERT for spark3
> 4、support merge、update、delete without RowKey constraint for spark3
> 5、support dataSourceV2 for spark3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xiarixiaoyao commented on pull request #2761: [HUDI-1676] Support SQL with spark3

2021-07-14 Thread GitBox


xiarixiaoyao commented on pull request #2761:
URL: https://github.com/apache/hudi/pull/2761#issuecomment-880347459


   @lw309637554 。
   Thank you for paying attention to this pr。 since #2645 has been merged,  i 
will close this pr。 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380954#comment-17380954
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

codecov-commenter edited a comment on pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#issuecomment-878977860


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3267](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd664b5) into 
[master](https://codecov.io/gh/apache/hudi/commit/b0089b894ad12da11fbd6a0fb08508c7adee68e6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b0089b8) will **decrease** coverage by `21.51%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3267/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3267   +/-   ##
   =
   - Coverage 47.72%   26.21%   -21.52% 
   + Complexity 5529 1291 -4238 
   =
 Files   934  386  -548 
 Lines 4145715977-25480 
 Branches   4167 1378 -2789 
   =
   - Hits  19787 4189-15598 
   + Misses1990811485 -8423 
   + Partials   1762  303 -1459 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `19.90% <ø> (-14.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.57% <ø> (-49.94%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.16% <ø> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#issuecomment-878977860


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3267](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd664b5) into 
[master](https://codecov.io/gh/apache/hudi/commit/b0089b894ad12da11fbd6a0fb08508c7adee68e6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b0089b8) will **decrease** coverage by `21.51%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3267/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3267   +/-   ##
   =
   - Coverage 47.72%   26.21%   -21.52% 
   + Complexity 5529 1291 -4238 
   =
 Files   934  386  -548 
 Lines 4145715977-25480 
 Branches   4167 1378 -2789 
   =
   - Hits  19787 4189-15598 
   + Misses1990811485 -8423 
   + Partials   1762  303 -1459 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `19.90% <ø> (-14.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.57% <ø> (-49.94%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.16% <ø> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Resolved] (HUDI-2180) Fix Compile Error For Spark3

2021-07-14 Thread pengzhiwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengzhiwei resolved HUDI-2180.
--
Resolution: Fixed

> Fix Compile Error For Spark3
> 
>
> Key: HUDI-2180
> URL: https://issues.apache.org/jira/browse/HUDI-2180
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2180) Fix Compile Error For Spark3

2021-07-14 Thread pengzhiwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengzhiwei reassigned HUDI-2180:


Assignee: pengzhiwei

> Fix Compile Error For Spark3
> 
>
> Key: HUDI-2180
> URL: https://issues.apache.org/jira/browse/HUDI-2180
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on pull request #3271: [Minor] Correct the logs of enable/not-enable async cleaner service.

2021-07-14 Thread GitBox


zhangyue19921010 commented on pull request #3271:
URL: https://github.com/apache/hudi/pull/3271#issuecomment-880336702


   Hi @leesf Thanks a lot for your review and merge :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380947#comment-17380947
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

codecov-commenter edited a comment on pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#issuecomment-878977860


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3267](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd664b5) into 
[master](https://codecov.io/gh/apache/hudi/commit/b0089b894ad12da11fbd6a0fb08508c7adee68e6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b0089b8) will **decrease** coverage by `21.51%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3267/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3267   +/-   ##
   =
   - Coverage 47.72%   26.21%   -21.52% 
   + Complexity 5529 1291 -4238 
   =
 Files   934  386  -548 
 Lines 4145715977-25480 
 Branches   4167 1378 -2789 
   =
   - Hits  19787 4189-15598 
   + Misses1990811485 -8423 
   + Partials   1762  303 -1459 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `19.90% <ø> (-14.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.57% <ø> (-49.94%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.16% <ø> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread GitBox


codecov-commenter edited a comment on pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#issuecomment-878977860


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3267](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd664b5) into 
[master](https://codecov.io/gh/apache/hudi/commit/b0089b894ad12da11fbd6a0fb08508c7adee68e6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b0089b8) will **decrease** coverage by `21.51%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3267/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3267   +/-   ##
   =
   - Coverage 47.72%   26.21%   -21.52% 
   + Complexity 5529 1291 -4238 
   =
 Files   934  386  -548 
 Lines 4145715977-25480 
 Branches   4167 1378 -2789 
   =
   - Hits  19787 4189-15598 
   + Misses1990811485 -8423 
   + Partials   1762  303 -1459 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `19.90% <ø> (-14.56%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.57% <ø> (-49.94%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.16% <ø> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3267?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3267/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380945#comment-17380945
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

danny0405 opened a new pull request #3267:
URL: https://github.com/apache/hudi/pull/3267


   Now in OverwriteWithLatestAvroPayload.preCombine, we still choose the
   old record when the new record has the same preCombine field with the
   old one, actually it is more natural to keep the new incoming record
   instead. The DefaultHoodieRecordPayload.combineAndGetUpdateValue method
   already does that.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380944#comment-17380944
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

danny0405 closed pull request #3267:
URL: https://github.com/apache/hudi/pull/3267


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380943#comment-17380943
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670069995



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteWithLatestAvroPayload.java
##
@@ -72,6 +72,17 @@ public void testActiveRecords() throws IOException {
 
 assertEquals(payload1.combineAndGetUpdateValue(record2, schema).get(), 
record1);
 assertEquals(payload2.combineAndGetUpdateValue(record1, schema).get(), 
record2);
+
+GenericRecord record3 = new GenericData.Record(schema);
+record3.put("id", "3");
+record3.put("partition", "partition2");

Review comment:
   `ts` is actually not the `preCombine` field, the `preCombine` was passed 
explicitly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 closed pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread GitBox


danny0405 closed pull request #3267:
URL: https://github.com/apache/hudi/pull/3267


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >