[GitHub] [hudi] danny0405 commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-14 Thread GitBox


danny0405 commented on a change in pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#discussion_r651468804



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java
##
@@ -63,7 +63,7 @@ public 
BaseScheduleCompactionActionExecutor(HoodieEngineContext context,
   + ", Compaction scheduled at " + instantTime));
   // Committed and pending compaction instants should have strictly lower 
timestamps
   List conflictingInstants = table.getActiveTimeline()
-  .getWriteTimeline().getInstants()
+  
.getWriteTimeline().filterCompletedAndCompactionInstants().getInstants()

Review comment:
   Take a look at the comments
   ```java
   // Committed and pending compaction instants should have strictly lower 
timestamps
   ```
   I think the code before that used `commitsAndCompactionTimeline()` is 
already wrong, it add restrictions that we can not generate compaction plan 
when there are inflight commits, of course we can actually.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] swuferhong commented on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


swuferhong commented on pull request #3050:
URL: https://github.com/apache/hudi/pull/3050#issuecomment-861190938


   > @swuferhong Thanks for opening this. Looks like it might be a duplicate. 
Can you check why the CI is failing ?
   
   Yes, we reopened this PR, and CLI passing now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #2819:
URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (199e377) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **decrease** coverage by `41.60%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2819   +/-   ##
   
   - Coverage 50.04%   8.43%   -41.61% 
   + Complexity 3685  62 -3623 
   
 Files   526  70  -456 
 Lines 254662880-22586 
 Branches   2886 359 -2527 
   
   - Hits  12744 243-12501 
   + Misses114542616 -8838 
   + Partials   1268  21 -1247 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.09% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-97.83%)` | :arrow_down: |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | `0.00% <0.00%> (-90.91%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh)
 | `0.00% <0.00%> (-84.85%)` | :arrow_down: |
   | 

[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-14 Thread GitBox


n3nash commented on a change in pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#discussion_r651456965



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java
##
@@ -63,7 +63,7 @@ public 
BaseScheduleCompactionActionExecutor(HoodieEngineContext context,
   + ", Compaction scheduled at " + instantTime));
   // Committed and pending compaction instants should have strictly lower 
timestamps
   List conflictingInstants = table.getActiveTimeline()
-  .getWriteTimeline().getInstants()
+  
.getWriteTimeline().filterCompletedAndCompactionInstants().getInstants()

Review comment:
   @danny0405 This class was NOT using 
`filterCompletedAndCompactionInstants` before. It was using 
`commitsAndCompactionTimeline()`. See this -> 
https://github.com/apache/hudi/blob/release-0.7.0/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java#L65
   Let me know if there is any confusion. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-14 Thread GitBox


n3nash commented on a change in pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#discussion_r651456965



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java
##
@@ -63,7 +63,7 @@ public 
BaseScheduleCompactionActionExecutor(HoodieEngineContext context,
   + ", Compaction scheduled at " + instantTime));
   // Committed and pending compaction instants should have strictly lower 
timestamps
   List conflictingInstants = table.getActiveTimeline()
-  .getWriteTimeline().getInstants()
+  
.getWriteTimeline().filterCompletedAndCompactionInstants().getInstants()

Review comment:
   @danny0405 This class was NOT using 
`filterCompletedAndCompactionInstants` before. It was using 
`commitsAndCompactionTimeline()`. See this -> 
https://github.com/apache/hudi/blob/release-0.7.0/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java#L65




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-14 Thread GitBox


n3nash commented on a change in pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#discussion_r651456965



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java
##
@@ -63,7 +63,7 @@ public 
BaseScheduleCompactionActionExecutor(HoodieEngineContext context,
   + ", Compaction scheduled at " + instantTime));
   // Committed and pending compaction instants should have strictly lower 
timestamps
   List conflictingInstants = table.getActiveTimeline()
-  .getWriteTimeline().getInstants()
+  
.getWriteTimeline().filterCompletedAndCompactionInstants().getInstants()

Review comment:
   You are right, I looked at the wrong API. This is the correct one -> 
https://github.com/apache/hudi/blob/3e71c915271d77c7306ca0325b212f71ce723fc0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java#L104.
 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3050:
URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1f85ab3) into 
[master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (f760ec5) will **decrease** coverage by `0.32%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3050  +/-   ##
   
   - Coverage 55.31%   54.99%   -0.33% 
   - Complexity 4026 4044  +18 
   
 Files   520  526   +6 
 Lines 2529525668 +373 
 Branches   2872 2950  +78 
   
   + Hits  1399314117 +124 
   - Misses 991410164 +250 
   + Partials   1388 1387   -1 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.03% <ø> (+0.05%)` | :arrow_up: |
   | hudiflink | `60.58% <ø> (-2.26%)` | :arrow_down: |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.53% <ø> (+0.01%)` | :arrow_up: |
   | hudisync | `47.94% <ø> (-3.51%)` | :arrow_down: |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | `52.56% <0.00%> (-19.04%)` | :arrow_down: |
   | 
[...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==)
 | `65.00% <0.00%> (-10.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh)
 | `66.17% <0.00%> (-0.75%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh)
 | `91.66% <0.00%> (-0.23%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | `100.00% <0.00%> (ø)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3050:
URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1f85ab3) into 
[master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (f760ec5) will **decrease** coverage by `0.32%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3050  +/-   ##
   
   - Coverage 55.31%   54.99%   -0.33% 
   - Complexity 4026 4044  +18 
   
 Files   520  526   +6 
 Lines 2529525668 +373 
 Branches   2872 2950  +78 
   
   + Hits  1399314115 +122 
   - Misses 991410165 +251 
 Partials   1388 1388  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.01% <ø> (+0.03%)` | :arrow_up: |
   | hudiflink | `60.58% <ø> (-2.26%)` | :arrow_down: |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.53% <ø> (+0.01%)` | :arrow_up: |
   | hudisync | `47.94% <ø> (-3.51%)` | :arrow_down: |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | `52.56% <0.00%> (-19.04%)` | :arrow_down: |
   | 
[...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==)
 | `65.00% <0.00%> (-10.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh)
 | `66.17% <0.00%> (-0.75%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh)
 | `91.66% <0.00%> (-0.23%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | `100.00% <0.00%> (ø)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3067:
URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3067](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (a25ebc2) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **increase** coverage by `5.02%`.
   > The diff coverage is `73.80%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3067/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3067  +/-   ##
   
   + Coverage 50.04%   55.07%   +5.02% 
   - Complexity 3685 4035 +350 
   
 Files   526  526  
 Lines 2546625479  +13 
 Branches   2886 2886  
   
   + Hits  1274414032+1288 
   + Misses1145410057-1397 
   - Partials   1268 1390 +122 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.01% <ø> (ø)` | |
   | hudiflink | `60.73% <73.80%> (+0.15%)` | :arrow_up: |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.53% <ø> (ø)` | |
   | hudisync | `51.45% <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `71.06% <ø> (+61.96%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...e/hudi/sink/partitioner/profile/WriteProfiles.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlcy5qYXZh)
 | `55.88% <54.16%> (-4.12%)` | :arrow_down: |
   | 
[...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh)
 | `70.28% <100.00%> (ø)` | |
   | 
[...di/sink/partitioner/profile/DeltaWriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvRGVsdGFXcml0ZVByb2ZpbGUuamF2YQ==)
 | `69.23% <100.00%> (ø)` | |
   | 
[...he/hudi/sink/partitioner/profile/WriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlLmphdmE=)
 | `88.00% <100.00%> (+3.62%)` | :arrow_up: |
   | 
[...ache/hudi/source/StreamReadMonitoringFunction.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zb3VyY2UvU3RyZWFtUmVhZE1vbml0b3JpbmdGdW5jdGlvbi5qYXZh)
 | `80.48% <100.00%> (+4.62%)` | :arrow_up: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3050:
URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1f85ab3) into 
[master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (f760ec5) will **decrease** coverage by `0.32%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3050  +/-   ##
   
   - Coverage 55.31%   54.99%   -0.33% 
   - Complexity 4026 4044  +18 
   
 Files   520  526   +6 
 Lines 2529525668 +373 
 Branches   2872 2950  +78 
   
   + Hits  1399314115 +122 
   - Misses 991410165 +251 
 Partials   1388 1388  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.01% <ø> (+0.03%)` | :arrow_up: |
   | hudiflink | `60.58% <ø> (-2.26%)` | :arrow_down: |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.53% <ø> (+0.01%)` | :arrow_up: |
   | hudisync | `47.94% <ø> (-3.51%)` | :arrow_down: |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | `52.56% <0.00%> (-19.04%)` | :arrow_down: |
   | 
[...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==)
 | `65.00% <0.00%> (-10.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh)
 | `66.17% <0.00%> (-0.75%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh)
 | `91.66% <0.00%> (-0.23%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | `100.00% <0.00%> (ø)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3067:
URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3067](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (a25ebc2) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **increase** coverage by `5.02%`.
   > The diff coverage is `73.80%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3067/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3067  +/-   ##
   
   + Coverage 50.04%   55.06%   +5.02% 
   - Complexity 3685 4034 +349 
   
 Files   526  526  
 Lines 2546625479  +13 
 Branches   2886 2886  
   
   + Hits  1274414031+1287 
   + Misses1145410057-1397 
   - Partials   1268 1391 +123 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.01% <ø> (ø)` | |
   | hudiflink | `60.73% <73.80%> (+0.15%)` | :arrow_up: |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.53% <ø> (ø)` | |
   | hudisync | `51.45% <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...e/hudi/sink/partitioner/profile/WriteProfiles.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlcy5qYXZh)
 | `55.88% <54.16%> (-4.12%)` | :arrow_down: |
   | 
[...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh)
 | `70.28% <100.00%> (ø)` | |
   | 
[...di/sink/partitioner/profile/DeltaWriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvRGVsdGFXcml0ZVByb2ZpbGUuamF2YQ==)
 | `69.23% <100.00%> (ø)` | |
   | 
[...he/hudi/sink/partitioner/profile/WriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlLmphdmE=)
 | `88.00% <100.00%> (+3.62%)` | :arrow_up: |
   | 
[...ache/hudi/source/StreamReadMonitoringFunction.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zb3VyY2UvU3RyZWFtUmVhZE1vbml0b3JpbmdGdW5jdGlvbi5qYXZh)
 | `80.48% <100.00%> (+4.62%)` | :arrow_up: |
   | 

[GitHub] [hudi] fengjian428 commented on issue #3054: [SUPPORT] Point query at hudi tables

2021-06-14 Thread GitBox


fengjian428 commented on issue #3054:
URL: https://github.com/apache/hudi/issues/3054#issuecomment-861154782


   @n3nash is this new data skipping index can improve incremental 
query‘s performance?seems when using  incremental query, need use 
INCR_PATH_GLOB_OPT_KEY to set pattern to filter on path, otherwise query will 
pull all the data in commit time range 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 closed pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile

2021-06-14 Thread GitBox


danny0405 closed pull request #3067:
URL: https://github.com/apache/hudi/pull/3067


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3067:
URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3067:
URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3067](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (a25ebc2) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **increase** coverage by `2.59%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3067/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3067  +/-   ##
   
   + Coverage 50.04%   52.63%   +2.59% 
   + Complexity 3685  407-3278 
   
 Files   526   70 -456 
 Lines 25466 2880   -22586 
 Branches   2886  359-2527 
   
   - Hits  12744 1516   -11228 
   + Misses11454 1220   -10234 
   + Partials   1268  144-1124 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-97.83%)` | :arrow_down: |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | `0.00% <0.00%> (-90.91%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh)
 | `0.00% <0.00%> (-84.85%)` | :arrow_down: |
   | 

[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


nsivabalan commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r651418719



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+throw new 

[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


nsivabalan commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r651418549



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJdbcSource.java
##
@@ -0,0 +1,442 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.testutils.UtilitiesTestBase;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.SQLException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.clearAndInsert;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.close;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.count;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.insert;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.update;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.fail;
+
+/**
+ * Tests {@link JdbcSource}.
+ */
+public class TestJdbcSource extends UtilitiesTestBase {
+
+  private static final TypedProperties PROPS = new TypedProperties();
+  private static final HoodieTestDataGenerator DATA_GENERATOR = new 
HoodieTestDataGenerator();
+  private static Connection connection;
+
+  @BeforeEach
+  public void setup() throws Exception {
+super.setup();
+PROPS.setProperty("hoodie.deltastreamer.jdbc.url", "jdbc:h2:mem:test_mem");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.driver.class", 
"org.h2.Driver");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.user", "test");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.password", "jdbc");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.table.name", "triprec");
+connection = DriverManager.getConnection("jdbc:h2:mem:test_mem", "test", 
"jdbc");
+  }
+
+  @AfterEach
+  public void teardown() throws Exception {
+super.teardown();
+close(connection);
+  }
+
+  @Test
+  public void testSingleCommit() {
+PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true");
+
PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", 
"last_insert");
+
+try {
+  int numRecords = 100;
+  String commitTime = "000";
+
+  // Insert 100 records with commit time
+  clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, 
PROPS);
+
+  // Validate if we have specified records in db
+  assertEquals(numRecords, count(connection, "triprec"));
+
+  // Start JdbcSource
+  Dataset rowDataset = runSource(Option.empty(), 
numRecords).getBatch().get();
+  assertEquals(numRecords, rowDataset.count());
+} catch (SQLException e) {
+  fail(e.getMessage());
+}
+  }
+
+  @Test
+  public void testInsertAndUpdate() {
+PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true");
+
PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", 
"last_insert");
+
+try {
+  final String commitTime = "000";
+  final int numRecords = 100;
+
+  // Add 100 records. Update half of them with commit time "007".
+  update("007",
+  clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, 
PROPS)
+  .stream()
+  .limit(50)
+  

[GitHub] [hudi] danny0405 commented on a change in pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile

2021-06-14 Thread GitBox


danny0405 commented on a change in pull request #3067:
URL: https://github.com/apache/hudi/pull/3067#discussion_r651408416



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java
##
@@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile(
   public static void clean(String path) {
 PROFILES.remove(path);
   }
+
+  /**
+   * Returns all the incremental write file path statuses with the given 
commits metadata.
+   *
+   * @param basePath Table base path
+   * @param hadoopConf   The hadoop conf
+   * @param metadataList The commits metadata
+   * @return the file statuses array
+   */
+  public static FileStatus[] getWritePathsOfInstants(
+  Path basePath,
+  Configuration hadoopConf,
+  List metadataList) {
+FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf);
+return metadataList.stream().map(metadata -> 
getWritePathsOfInstant(basePath, metadata, fs))
+.flatMap(Collection::stream).toArray(FileStatus[]::new);
+  }
+
+  private static List getWritePathsOfInstant(Path basePath, 
HoodieCommitMetadata metadata, FileSystem fs) {
+return 
metadata.getFileIdAndFullPaths(basePath.toString()).values().stream()
+.map(org.apache.hadoop.fs.Path::new)
+// filter out the file paths that does not exist, some files may be 
cleaned by
+// the cleaner.
+.filter(path -> {
+  try {
+return fs.exists(path);
+  } catch (IOException e) {
+LOG.error("Checking exists of path: {} error", path);
+throw new HoodieException(e);
+  }
+}).map(path -> {
+  try {
+return fs.getFileStatus(path);
+  } catch (IOException e) {
+LOG.error("Get write status of path: {} error", path);
+throw new HoodieException(e);
+  }
+})
+// filter out crushed files

Review comment:
   The write should not affect the read. The code was added long time ago, 
a committed file (merge handle) was later modified by the following 
modification instants. The first version write handle was not closed until 
checkpoint success event received(it was modified now), a merge handle may be 
empty if it does not invoke close.
   
   We can till keep the filtering to make the read robust.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on a change in pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile

2021-06-14 Thread GitBox


garyli1019 commented on a change in pull request #3067:
URL: https://github.com/apache/hudi/pull/3067#discussion_r651401752



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java
##
@@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile(
   public static void clean(String path) {
 PROFILES.remove(path);
   }
+
+  /**
+   * Returns all the incremental write file path statuses with the given 
commits metadata.
+   *
+   * @param basePath Table base path
+   * @param hadoopConf   The hadoop conf
+   * @param metadataList The commits metadata
+   * @return the file statuses array
+   */
+  public static FileStatus[] getWritePathsOfInstants(
+  Path basePath,
+  Configuration hadoopConf,
+  List metadataList) {
+FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf);
+return metadataList.stream().map(metadata -> 
getWritePathsOfInstant(basePath, metadata, fs))
+.flatMap(Collection::stream).toArray(FileStatus[]::new);
+  }
+
+  private static List getWritePathsOfInstant(Path basePath, 
HoodieCommitMetadata metadata, FileSystem fs) {
+return 
metadata.getFileIdAndFullPaths(basePath.toString()).values().stream()
+.map(org.apache.hadoop.fs.Path::new)
+// filter out the file paths that does not exist, some files may be 
cleaned by
+// the cleaner.
+.filter(path -> {
+  try {
+return fs.exists(path);
+  } catch (IOException e) {
+LOG.error("Checking exists of path: {} error", path);
+throw new HoodieException(e);
+  }
+}).map(path -> {
+  try {
+return fs.getFileStatus(path);
+  } catch (IOException e) {
+LOG.error("Get write status of path: {} error", path);
+throw new HoodieException(e);
+  }
+})
+// filter out crushed files

Review comment:
   crushed files might cause errors on the query side. How are those 
crushed files produced? 

##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java
##
@@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile(
   public static void clean(String path) {
 PROFILES.remove(path);
   }
+
+  /**
+   * Returns all the incremental write file path statuses with the given 
commits metadata.
+   *
+   * @param basePath Table base path
+   * @param hadoopConf   The hadoop conf
+   * @param metadataList The commits metadata
+   * @return the file statuses array
+   */
+  public static FileStatus[] getWritePathsOfInstants(
+  Path basePath,
+  Configuration hadoopConf,
+  List metadataList) {
+FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf);
+return metadataList.stream().map(metadata -> 
getWritePathsOfInstant(basePath, metadata, fs))
+.flatMap(Collection::stream).toArray(FileStatus[]::new);
+  }
+
+  private static List getWritePathsOfInstant(Path basePath, 
HoodieCommitMetadata metadata, FileSystem fs) {
+return 
metadata.getFileIdAndFullPaths(basePath.toString()).values().stream()
+.map(org.apache.hadoop.fs.Path::new)
+// filter out the file paths that does not exist, some files may be 
cleaned by
+// the cleaner.
+.filter(path -> {
+  try {
+return fs.exists(path);
+  } catch (IOException e) {
+LOG.error("Checking exists of path: {} error", path);
+throw new HoodieException(e);
+  }
+}).map(path -> {
+  try {
+return fs.getFileStatus(path);
+  } catch (IOException e) {
+LOG.error("Get write status of path: {} error", path);
+throw new HoodieException(e);
+  }
+})
+// filter out crushed files
+.filter(fileStatus -> fileStatus.getLen() > 0)
+.collect(Collectors.toList());
+  }
+
+  public static HoodieCommitMetadata getCommitMetadata(

Review comment:
   ditto

##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java
##
@@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile(
   public static void clean(String path) {
 PROFILES.remove(path);
   }
+
+  /**
+   * Returns all the incremental write file path statuses with the given 
commits metadata.
+   *
+   * @param basePath Table base path
+   * @param hadoopConf   The hadoop conf
+   * @param metadataList The commits metadata
+   * @return the file statuses array
+   */
+  public static FileStatus[] getWritePathsOfInstants(
+  Path basePath,
+  Configuration hadoopConf,
+  List metadataList) {
+FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf);
+return metadataList.stream().map(metadata -> 
getWritePathsOfInstant(basePath, metadata, fs))
+.flatMap(Collection::stream).toArray(FileStatus[]::new);
+  }
+
+  private 

[GitHub] [hudi] danny0405 commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-14 Thread GitBox


danny0405 commented on a change in pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#discussion_r651402324



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java
##
@@ -63,7 +63,7 @@ public 
BaseScheduleCompactionActionExecutor(HoodieEngineContext context,
   + ", Compaction scheduled at " + instantTime));
   // Committed and pending compaction instants should have strictly lower 
timestamps
   List conflictingInstants = table.getActiveTimeline()
-  .getWriteTimeline().getInstants()
+  
.getWriteTimeline().filterCompletedAndCompactionInstants().getInstants()

Review comment:
   No, the method `getWriteTimeline()` does not really follow the behavior 
of `filterCompletedAndCompactionInstants`,
   `getWriteTimeline()` actually may include any `INFLIGHT` instants but 
`filterCompletedAndCompactionInstants` only include `COMPACTION` `INFLIGHT` 
instants.
   
   We should allow scheduling compaction if there are inflight commits or 
inflight delta_commits.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-06-14 Thread liwei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363315#comment-17363315
 ] 

liwei commented on HUDI-1138:
-

[~guoyihua] thanks .

“We may consider blocking the requests for batching so that the timeline server 
sends the actual responses only after MARKERS are overwritten / updated.”

If  waiting the batch requests overwrite/updated successfully. The create 
marker file request from spark task will wait long time such as 200ms interval 
plus the markerfiles read and overwrite.  

 

Do you have same plan to update the marker file?

> Re-implement marker files via timeline server
> -
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3050:
URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1f85ab3) into 
[master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (f760ec5) will **decrease** coverage by `2.95%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3050  +/-   ##
   
   - Coverage 55.31%   52.36%   -2.96% 
   + Complexity 4026  422-3604 
   
 Files   520   70 -450 
 Lines 25295 3082   -22213 
 Branches   2872  423-2449 
   
   - Hits  13993 1614   -12379 
   + Misses 9914 1327-8587 
   + Partials   1388  141-1247 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.14% <ø> (-45.32%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-97.83%)` | :arrow_down: |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | `0.00% <0.00%> (-90.91%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh)
 | `0.00% <0.00%> (-84.85%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3050:
URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1f85ab3) into 
[master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (f760ec5) will **decrease** coverage by `6.48%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3050  +/-   ##
   
   - Coverage 55.31%   48.83%   -6.49% 
   + Complexity 4026  404-3622 
   
 Files   520   70 -450 
 Lines 25295 3082   -22213 
 Branches   2872  423-2449 
   
   - Hits  13993 1505   -12488 
   + Misses 9914 1434-8480 
   + Partials   1388  143-1245 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.14% <ø> (-45.32%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `66.77% <ø> (-4.30%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-97.83%)` | :arrow_down: |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | `0.00% <0.00%> (-90.91%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh)
 | `0.00% <0.00%> (-84.85%)` | :arrow_down: |
   | 

[GitHub] [hudi] swuferhong closed pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


swuferhong closed pull request #3050:
URL: https://github.com/apache/hudi/pull/3050


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-14 Thread GitBox


n3nash commented on a change in pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#discussion_r651297449



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java
##
@@ -63,7 +63,7 @@ public 
BaseScheduleCompactionActionExecutor(HoodieEngineContext context,
   + ", Compaction scheduled at " + instantTime));
   // Committed and pending compaction instants should have strictly lower 
timestamps
   List conflictingInstants = table.getActiveTimeline()
-  .getWriteTimeline().getInstants()
+  
.getWriteTimeline().filterCompletedAndCompactionInstants().getInstants()

Review comment:
   @swuferhong @danny0405 If you take a look at the previous version of 
this file, the method is called before was `commitsAndCompactionTimeline` -> 
https://github.com/apache/hudi/blob/3e71c915271d77c7306ca0325b212f71ce723fc0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java#L110
   
   This follows the same behavior as getWriteTimeline().getInstants(). Can you 
please explain what is a possible bug here ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…

2021-06-14 Thread GitBox


n3nash commented on pull request #3050:
URL: https://github.com/apache/hudi/pull/3050#issuecomment-861011193


   @swuferhong Thanks for opening this. Looks like it might be a duplicate. Can 
you check why the CI is failing ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node

2021-06-14 Thread GitBox


n3nash commented on a change in pull request #3074:
URL: https://github.com/apache/hudi/pull/3074#discussion_r651289505



##
File path: 
hudi-integ-test/src/main/scala/org/apache/hudi/integ/testsuite/dag/nodes/SparkBulkInsertNode.scala
##
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.integ.testsuite.dag.nodes
+
+import org.apache.hudi.client.WriteStatus
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.integ.testsuite.configuration.DeltaConfig.Config
+import org.apache.hudi.integ.testsuite.dag.ExecutionContext
+import org.apache.hudi.{AvroConversionUtils, DataSourceWriteOptions}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SaveMode
+
+import scala.collection.JavaConverters._
+
+/**
+ * Spark datasource based bulk insert node
+ * @param config1
+ */
+class SparkBulkInsertNode(config1: Config) extends DagNode[RDD[WriteStatus]] {
+
+  config = config1
+
+  /**
+   * Execute the {@link DagNode}.
+   *
+   * @param context The context needed for an execution of a node.
+   * @param curItrCount iteration count for executing the node.
+   * @throws Exception Thrown if the execution failed.
+   */
+  override def execute(context: ExecutionContext, curItrCount: Int): Unit = {
+if (!config.isDisableGenerate) {
+  //println("Generating input data for node {}", this.getName)

Review comment:
   please remove the print comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node

2021-06-14 Thread GitBox


n3nash commented on a change in pull request #3074:
URL: https://github.com/apache/hudi/pull/3074#discussion_r651289412



##
File path: 
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/configuration/DeltaConfig.java
##
@@ -189,6 +190,10 @@ public boolean validateClean() {
   return Boolean.valueOf(configsMap.getOrDefault(VALIDATE_CLEAN, 
false).toString());
 }
 
+public boolean doEnableRowWriting() {

Review comment:
   doEnableRowWriting -> doRowWriting or enableRowWriting 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node

2021-06-14 Thread GitBox


n3nash commented on a change in pull request #3074:
URL: https://github.com/apache/hudi/pull/3074#discussion_r651289200



##
File path: hudi-integ-test/pom.xml
##
@@ -407,7 +407,46 @@
   
 
   
+

Review comment:
   Can you explain the need for this plugin ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #2819:
URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3deb5e7) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **increase** coverage by `4.98%`.
   > The diff coverage is `56.52%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2819  +/-   ##
   
   + Coverage 50.04%   55.03%   +4.98% 
   - Complexity 3685 4033 +348 
   
 Files   526  527   +1 
 Lines 2546625477  +11 
 Branches   2886 2886  
   
   + Hits  1274414020+1276 
   + Misses1145410067-1387 
   - Partials   1268 1390 +122 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <0.00%> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `49.99% <61.11%> (-0.03%)` | :arrow_down: |
   | hudiflink | `60.58% <ø> (ø)` | |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.53% <100.00%> (ø)` | |
   | hudisync | `51.45% <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ain/java/org/apache/hudi/cli/utils/CommitUtil.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL0NvbW1pdFV0aWwuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=)
 | `65.21% <40.00%> (-1.60%)` | :arrow_down: |
   | 
[...mon/table/timeline/HoodieInstantTimeGenerator.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnRUaW1lR2VuZXJhdG9yLmphdmE=)
 | `69.23% <69.23%> (ø)` | |
   | 
[.../spark/sql/hudi/streaming/HoodieStreamSource.scala](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9zcGFyay9zcWwvaHVkaS9zdHJlYW1pbmcvSG9vZGllU3RyZWFtU291cmNlLnNjYWxh)
 | `67.46% <100.00%> (ø)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #2819:
URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3deb5e7) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **increase** coverage by `3.37%`.
   > The diff coverage is `56.52%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2819  +/-   ##
   
   + Coverage 50.04%   53.42%   +3.37% 
   - Complexity 3685 3827 +142 
   
 Files   526  517   -9 
 Lines 2546624649 -817 
 Branches   2886 2833  -53 
   
   + Hits  1274413168 +424 
   + Misses1145410173-1281 
   - Partials   1268 1308  +40 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <0.00%> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `49.99% <61.11%> (-0.03%)` | :arrow_down: |
   | hudiflink | `60.58% <ø> (ø)` | |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.53% <100.00%> (ø)` | |
   | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ain/java/org/apache/hudi/cli/utils/CommitUtil.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL0NvbW1pdFV0aWwuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=)
 | `65.21% <40.00%> (-1.60%)` | :arrow_down: |
   | 
[...mon/table/timeline/HoodieInstantTimeGenerator.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnRUaW1lR2VuZXJhdG9yLmphdmE=)
 | `69.23% <69.23%> (ø)` | |
   | 
[.../spark/sql/hudi/streaming/HoodieStreamSource.scala](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9zcGFyay9zcWwvaHVkaS9zdHJlYW1pbmcvSG9vZGllU3RyZWFtU291cmNlLnNjYWxh)
 | `67.46% <100.00%> (ø)` | |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter commented on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.

2021-06-14 Thread GitBox


codecov-commenter commented on pull request #2819:
URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3deb5e7) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **decrease** coverage by `41.60%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2819   +/-   ##
   
   - Coverage 50.04%   8.43%   -41.61% 
   + Complexity 3685  62 -3623 
   
 Files   526  70  -456 
 Lines 254662880-22586 
 Branches   2886 359 -2527 
   
   - Hits  12744 243-12501 
   + Misses114542616 -8838 
   + Partials   1268  21 -1247 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.09% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-97.83%)` | :arrow_down: |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | `0.00% <0.00%> (-90.91%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh)
 | `0.00% <0.00%> (-84.85%)` | :arrow_down: |
   | 

[GitHub] [hudi] prashantwason commented on a change in pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.

2021-06-14 Thread GitBox


prashantwason commented on a change in pull request #2819:
URL: https://github.com/apache/hudi/pull/2819#discussion_r651210372



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
##
@@ -73,6 +71,16 @@
   private static final Logger LOG = 
LogManager.getLogger(HoodieActiveTimeline.class);
   protected HoodieTableMetaClient metaClient;
   private static AtomicReference lastInstantTime = new 
AtomicReference<>(String.valueOf(Integer.MIN_VALUE));
+  private static ThreadLocal commitFormatHolder = new 
ThreadLocal() {

Review comment:
   Created a new class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dkapupara-te removed a comment on issue #2856: [SUPPORT] Metrics Prometheus pushgateway

2021-06-14 Thread GitBox


dkapupara-te removed a comment on issue #2856:
URL: https://github.com/apache/hudi/issues/2856#issuecomment-858026721


   I am experiencing the same issue using simpleclient with Spring boot for one 
of our cron jobs.
   
   ```Caused by: java.net.UnknownHostException: https```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

2021-06-14 Thread GitBox


tandonraghav commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-860846241


   @n3nash @vinothchandar I am seeing the entire schema is getting persisted in 
Glue **TBLPROPERTIES**. This was not the behaviour previously. Do we need 
schema there as well or we can have a config to switch it off?
   
   Hudi version - 0.9.0-SNAPSHOT
   
   
   hive> show create table max_ro;
   OK
   CREATE EXTERNAL TABLE `max_ro`(
 `_hoodie_commit_time` string, 
 `_hoodie_commit_seqno` string, 
 `_hoodie_record_key` string, 
 `_hoodie_partition_path` string, 
 `_hoodie_file_name` string, 
 `string_pincode_113` string, 
 `double_pincode_113` double, 
 `string_availability_157` string, 
 `string_availability2_169` string, 
 `string_availability3_150` string, 
 `string_availability4_158` string, 
 `string_availability5_187` string, 
 `string_availability6_150` string, 
 `string_availability7_778` string, 
 `string_availability8_192` string, 
 `string_availability9_700` string, 
 `string_availability10_131` string, 
 `string_availability11_186` string, 
 `string_availability12_878` string, 
 `string_availability13_466` string, 
 `id` string, 
 `product_id` string, 
 `catalog_id` string, 
 `feed_id` string, 
 `ts_ms` double, 
 `op` string)
   PARTITIONED BY ( 
 `db_name` string)
   ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
   WITH SERDEPROPERTIES ( 
 'path'='file:/tmp/test/hudi-user-data/max') 
   STORED AS INPUTFORMAT 
 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
   OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
 'file:/tmp/test/hudi-user-data/max'
   TBLPROPERTIES (
 'last_commit_time_sync'='20210614221425', 
 'last_modified_by'='raghav', 
 'last_modified_time'='1623689081', 
 'spark.sql.sources.provider'='hudi', 
 'spark.sql.sources.schema.numPartCols'='1', 
 'spark.sql.sources.schema.numParts'='1', 
 
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"string_pincode_113","type":"string","nullable":true,"metadata":{}},{"name":"double_pincode_113","type":"double","nullable":true,"metadata":{}},{"name":"string_availability_157","type":"string","nullable":true,"metadata":{}},{"name":"string_availability2_169","type":"string","nullable":true,"metadata":{}},{"name":"string_availability3_150","type":"string","nullable":true,"metadata":{}},{"name":"string_availability4_158","type":"string","nullable":true,"metadata":{}},{"name":"string_availability5_187","type":"string","nullable":true
 
,"metadata":{}},{"name":"string_availability6_150","type":"string","nullable":true,"metadata":{}},{"name":"string_availability7_778","type":"string","nullable":true,"metadata":{}},{"name":"string_availability8_192","type":"string","nullable":true,"metadata":{}},{"name":"string_availability9_700","type":"string","nullable":true,"metadata":{}},{"name":"string_availability10_131","type":"string","nullable":true,"metadata":{}},{"name":"string_availability11_186","type":"string","nullable":true,"metadata":{}},{"name":"string_availability12_878","type":"string","nullable":true,"metadata":{}},{"name":"string_availability13_466","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"string","nullable":true,"metadata":{}},{"name":"product_id","type":"string","nullable":true,"metadata":{}},{"name":"catalog_id","type":"string","nullable":true,"metadata":{}},{"name":"feed_id","type":"string","nullable":true,"metadata":{}},{"name":"ts_ms","type":"double","nullable":true,"metadata":{
 
}},{"name":"op","type":"string","nullable":true,"metadata":{}},{"name":"db_name","type":"string","nullable":true,"metadata":{}}]}',
 
 'spark.sql.sources.schema.partCol.0'='db_name', 
 'transient_lastDdlTime'='1623689081')
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] tandonraghav opened a new issue #3078: [SUPPORT] combineAndGetUpdateValue is not getting called when Schema evolution happens

2021-06-14 Thread GitBox


tandonraghav opened a new issue #3078:
URL: https://github.com/apache/hudi/issues/3078


   
   **Describe the problem you faced**
   
   We have a stream of partial records (Mongo CDC oplogs) and want to update 
the keys which are passed in partial records, and keeping the other values 
intact.
   
   A sample record in our CDC Kafka:-
   
   
{"op":"u","doc_id":"606ffc3c10f9138e2a6b6csdc","shard_id":"shard01","ts":{"$timestamp":{"i":1,"t":1617951883}},"db_name":"test","collection":"Users","o":{"$v":1,"$set":{"key2":"value2"}},"o2":{"_id":{"$oid":"606ffc3c10f9138e2a6b6csdc"}}}
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a custom `PAYLOAD_CLASS_OPT_KEY` as described below.
   2. Push some records (partial records) using Spark DF and persist in Hudi.
   3. Evolve the schema, add a new field to the existing Schema and persist via 
Spark DF
   4. `combineAndGetUpdateValue` is not getting called when Schema is evolved, 
which is making other values as NULL, as only partial record is getting passed 
and combine logic is present in custom class. However, this behaviour is not 
observed when Schema remains constant.
   
   **Expected behavior**
   
   Irrespective of Schema evolution, when compaction happens it should always 
go through `combineAndGetUpdateValue` of the class provided.
   
   **Environment Description**
   
   * Hudi version : 0.9.0-SNAPSHOT
   
   * Spark version : 2.4
   
   * Hive version : 
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Custom Payload class-
   
   
   
   
   import org.apache.avro.Schema;
   import org.apache.avro.generic.GenericRecord;
   import org.apache.avro.generic.IndexedRecord;
   import org.apache.hudi.avro.HoodieAvroUtils;
   import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload;
   import org.apache.hudi.common.util.Option;
   
   import java.io.IOException;
   import java.util.List;
   import java.util.Properties;
   
   public class MongoHudiCDCPayload extends OverwriteWithLatestAvroPayload {
   
   
   public MongoHudiCDCPayload(GenericRecord record, Comparable orderingVal) 
{
   super(record, orderingVal);
   }
   
   public MongoHudiCDCPayload(Option record) {
   super(record);
   }
   
   @Override
   public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema, Properties properties)
   throws IOException {
   if (this.recordBytes.length == 0) {
   return Option.empty();
   }
   GenericRecord incomingRecord = 
HoodieAvroUtils.bytesToAvro(this.recordBytes, schema);
   GenericRecord currentRecord = (GenericRecord)currentValue;
   
   List fields = incomingRecord.getSchema().getFields();
   fields.forEach((field)->{
   Object value = incomingRecord.get(field.name());
   if(value!=null)
   currentRecord.put(field.name(), value);
   });
return Option.of(currentRecord);
   }
   
   }
   
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha commented on a change in pull request #2542: Add ability to provide multi-region (global) data consistency across HMS in different regions

2021-06-14 Thread GitBox


satishkotha commented on a change in pull request #2542:
URL: https://github.com/apache/hudi/pull/2542#discussion_r651116518



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableGloballyConsistentMetaClient.java
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.table;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hudi.common.fs.ConsistencyGuardConfig;
+import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.TableNotFoundException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/*
+ * Uber specific version of HoodieTableMetaClient to make sure when a table 
level property is set
+ * to indicate a commit timestamp that is present across DC make sure to limit 
the local .hoodie
+ * timeline to upto that commit timestamp.
+ *
+ * Note: There is an assumption that this means every other commit
+ * that is present upto this commit is present globally. This assumption makes 
it easier to just
+ * trim the commit timeline at the head. Otherwise we will have to store the 
valid commit timeline
+ * in the table as a property.
+ *
+ * Note: This object should not be cached between mapreduce jobs since the 
jobConf can change
+ */
+public class HoodieTableGloballyConsistentMetaClient extends 
HoodieTableMetaClient {

Review comment:
   @jsbali good question. I think it makes sense for read global to take 
precedence. include_pending is only used for dev testing, so we can add 
instructions to explicitly disable read global for these cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jsbali commented on a change in pull request #2542: Add ability to provide multi-region (global) data consistency across HMS in different regions

2021-06-14 Thread GitBox


jsbali commented on a change in pull request #2542:
URL: https://github.com/apache/hudi/pull/2542#discussion_r651114060



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableGloballyConsistentMetaClient.java
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.table;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hudi.common.fs.ConsistencyGuardConfig;
+import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.TableNotFoundException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/*
+ * Uber specific version of HoodieTableMetaClient to make sure when a table 
level property is set
+ * to indicate a commit timestamp that is present across DC make sure to limit 
the local .hoodie
+ * timeline to upto that commit timestamp.
+ *
+ * Note: There is an assumption that this means every other commit
+ * that is present upto this commit is present globally. This assumption makes 
it easier to just
+ * trim the commit timeline at the head. Otherwise we will have to store the 
valid commit timeline
+ * in the table as a property.
+ *
+ * Note: This object should not be cached between mapreduce jobs since the 
jobConf can change
+ */
+public class HoodieTableGloballyConsistentMetaClient extends 
HoodieTableMetaClient {

Review comment:
   The file can be removed which I have done. There are few issues which 
are there if hoodie.consume is used for trimming the timeline. For example 
suppose the following confs are set. 
   1. read global as true
   2. last_rep time as 100
   3. include_pending commit true
   4. commit time 200. 
   Now the question is what takes precedence over what. Do we check if 3 and 4 
are set before using consume.commit for global reads. What should the order be. 
   For now I have kept both separate and global read takes precedence but happy 
to hear your thoughts. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2003) Auto Compute Compression ratio for input data to output parquet/orc file size

2021-06-14 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-2003:
--
Summary: Auto Compute Compression ratio for input data to output 
parquet/orc file size  (was: Auto Compute Compression)

> Auto Compute Compression ratio for input data to output parquet/orc file size
> -
>
> Key: HUDI-2003
> URL: https://issues.apache.org/jira/browse/HUDI-2003
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Vinay
>Priority: Major
>
> Context : 
> Submitted  a spark job to read 3-4B ORC records and wrote to Hudi format. 
> Creating the following table with all the runs that I had carried out based 
> on different options
>  
> ||CONFIG ||Number of Files Created||Size of each file||
> |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB|
> |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB|
> |PARQUET_FILE_MAX_BYTES=1GB
> COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=1GB
> BULKINSERT_PARALLELISM=100|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB|
> |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB|
> Based on this runs, it feels that the compression ratio is off. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2003) Auto Compute Compression ratio for input data to output parquet/orc file size

2021-06-14 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-2003:
--
Issue Type: Improvement  (was: Bug)

> Auto Compute Compression ratio for input data to output parquet/orc file size
> -
>
> Key: HUDI-2003
> URL: https://issues.apache.org/jira/browse/HUDI-2003
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinay
>Priority: Major
>
> Context : 
> Submitted  a spark job to read 3-4B ORC records and wrote to Hudi format. 
> Creating the following table with all the runs that I had carried out based 
> on different options
>  
> ||CONFIG ||Number of Files Created||Size of each file||
> |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB|
> |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB|
> |PARQUET_FILE_MAX_BYTES=1GB
> COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=1GB
> BULKINSERT_PARALLELISM=100|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB|
> |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB|
> Based on this runs, it feels that the compression ratio is off. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-14 Thread Nishith Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363067#comment-17363067
 ] 

Nishith Agarwal commented on HUDI-1910:
---

[~vinaypatil18] Yes, that makes sense, please go ahead.

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-06-14 Thread Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363053#comment-17363053
 ] 

Ethan Guo commented on HUDI-1138:
-

We may consider blocking the requests for batching so that the timeline server 
sends the actual responses only after MARKERS are overwritten / updated.  In 
this case, those files have the correct markers at the timeline server and 
rollback can be properly done.  [~shivnarayan] Let me know if I miss anything 
regarding the marker-based rollback.

> Re-implement marker files via timeline server
> -
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] FelixKJose commented on issue #3054: [SUPPORT] Point query at hudi tables

2021-06-14 Thread GitBox


FelixKJose commented on issue #3054:
URL: https://github.com/apache/hudi/issues/3054#issuecomment-860685598


   @n3nash Could you please give more details on how is it supported in Presto 
and Spark? I mean, do I have to provide some specific configurations etc and 
does it support in both MOR and COW table types? Why I am asking this is, RFC-7 
(https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table)
 seems inactive and I haven't seen any documentation regarding pooin-in-time 
query support on HUDI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] karan867 opened a new issue #3077: [SUPPORT] Large latencies in hudi writes using upsert mode.

2021-06-14 Thread GitBox


karan867 opened a new issue #3077:
URL: https://github.com/apache/hudi/issues/3077


   **Describe the problem you faced**
   I am currently working on a POC to integrate Hudi with our existing Data 
lake. I am seeing large latencies in hudi writes. It is almost 7x compared to 
the partitioned parquet writes we perform now.
   
   I am writing around 2.5 million rows (3.9 GB) in two batches with upsert 
mode. The first write gets completed in 2-3 mins. For the second batch, the 
latency is around 12-14 mins while writing with our existing system takes 
around 1.6-2 mins. The data contains negligible updates (99> % inserts and <1% 
updates). However in the rare case of duplicated trips we want to override the 
old data points with the new one.  In our write use-case, most of the data will 
impact the recent partitions. Currently, for testing I am creating and writing 
to 5 partitions according to the probability distribution [10, 10, 10, 10, 60]
   
   Pyspark configs:
   conf = conf.set(“spark.driver.memory”, ‘6g’)
   conf = conf.set(“spark.executor.instances”, 8)
   conf = conf.set(“spark.executor.memory”, ‘4g’)
   conf = conf.set(“spark.executor.cores”, 4)
   
   Hudi options:
   hudi_options = {
   'hoodie.table.name': table_name,
   'hoodie.datasource.write.recordkey.field': 
'applicationId,userId,driverId,timestamp',
   'hoodie.datasource.write.partitionpath.field': 'packet_date',
   'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator',
   'hoodie.datasource.write.hive_style_partitioning': 'true',
   'hoodie.datasource.write.table.name': table_name,
   'hoodie.datasource.write.operation': 'upsert',
   'hoodie.datasource.write.precombine.field': 'created_at_date',
   'hoodie.upsert.shuffle.parallelism': 200,
   'hoodie.insert.shuffle.parallelism': 200,
   'hoodie.bloom.index.prune.by.ranges': 'false',
   'hoodie.bloom.index.filter.type': 'DYNAMIC_V0',
   'hoodie.index.bloom.num_entries': 3,
   'hoodie.bloom.index.filter.dynamic.max.entries': 12,
   }
   
   
   
   **Environment Description**
   
   * Hudi version : 0.7.0 
   
   * Spark version : 2.4.7
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   In addition I have experimented with the following
   * Tried decreasing the file sizes
   * Increasing 'hoodie.bloom.index.parallelism'
   *  Setting 'hoodie.metadata.enable' true. 
   
   You can see the jobs taking the most time from the screen shot attached  
   
![image](https://user-images.githubusercontent.com/85880633/121893160-489dd580-cd3b-11eb-8845-0459484a0406.png)
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] fengjian428 commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-06-14 Thread GitBox


fengjian428 commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-860650106


   @ssdong is there any quick fix can do in version 0.7.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2003) Auto Compute Compression

2021-06-14 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362925#comment-17362925
 ] 

Vinay commented on HUDI-2003:
-

[~nishith29] Please do update the description if I have missed anything here

> Auto Compute Compression
> 
>
> Key: HUDI-2003
> URL: https://issues.apache.org/jira/browse/HUDI-2003
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Vinay
>Priority: Major
>
> Context : 
> Submitted  a spark job to read 3-4B ORC records and wrote to Hudi format. 
> Creating the following table with all the runs that I had carried out based 
> on different options
>  
> ||CONFIG ||Number of Files Created||Size of each file||
> |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB|
> |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB|
> |PARQUET_FILE_MAX_BYTES=1GB
> COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=1GB
> BULKINSERT_PARALLELISM=100|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB|
> |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB|
> Based on this runs, it feels that the compression ratio is off. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pratyakshsharma commented on pull request #3003: [HUDI-1939][WIP] Replace joda-time api with java8 new time api

2021-06-14 Thread GitBox


pratyakshsharma commented on pull request #3003:
URL: https://github.com/apache/hudi/pull/3003#issuecomment-860627743


   Will circle back on this by tomorrow. :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#issuecomment-832697076


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2915](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (877103f) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **increase** coverage by `4.36%`.
   > The diff coverage is `91.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2915/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2915  +/-   ##
   
   + Coverage 50.04%   54.40%   +4.36% 
   + Complexity 3685  444-3241 
   
 Files   526   72 -454 
 Lines 25466 3016   -22450 
 Branches   2886  375-2511 
   
   - Hits  12744 1641   -11103 
   + Misses11454 1221   -10233 
   + Partials   1268  154-1114 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `72.30% <91.17%> (+63.21%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/utilities/sources/JdbcSource.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSmRiY1NvdXJjZS5qYXZh)
 | `90.62% <90.62%> (ø)` | |
   | 
[...ava/org/apache/hudi/utilities/SqlQueryBuilder.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1NxbFF1ZXJ5QnVpbGRlci5qYXZh)
 | `92.50% <92.50%> (ø)` | |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-97.83%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#issuecomment-832697076


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2915](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (877103f) into 
[master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (769dd2d) will **increase** coverage by `4.36%`.
   > The diff coverage is `91.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2915/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2915  +/-   ##
   
   + Coverage 50.04%   54.40%   +4.36% 
   + Complexity 3685  444-3241 
   
 Files   526   72 -454 
 Lines 25466 3016   -22450 
 Branches   2886  375-2511 
   
   - Hits  12744 1641   -11103 
   + Misses11454 1221   -10233 
   + Partials   1268  154-1114 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `72.30% <91.17%> (+63.21%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/utilities/sources/JdbcSource.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSmRiY1NvdXJjZS5qYXZh)
 | `90.62% <90.62%> (ø)` | |
   | 
[...ava/org/apache/hudi/utilities/SqlQueryBuilder.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1NxbFF1ZXJ5QnVpbGRlci5qYXZh)
 | `92.50% <92.50%> (ø)` | |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-97.83%)` | :arrow_down: |
   | 

[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codope commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r650843643



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+throw new 

[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codope commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r650815907



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/SqlQueryBuilder.java
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.util.StringUtils;
+
+/**
+ * Fluent SQL query builder.
+ * Current support for: SELECT, FROM, JOIN, ON, WHERE, ORDER BY, LIMIT clauses.
+ */
+public class SqlQueryBuilder {
+
+  private StringBuilder sqlBuilder;
+
+  private SqlQueryBuilder(StringBuilder sqlBuilder) {
+this.sqlBuilder = sqlBuilder;
+  }
+
+  /**
+   * Creates a SELECT query.
+   *
+   * @param columns The column names to select.
+   * @return The new {@link SqlQueryBuilder} instance.
+   */
+  public static SqlQueryBuilder select(String... columns) {
+if (columns == null || columns.length == 0) {
+  throw new IllegalArgumentException();
+}
+
+StringBuilder sqlBuilder = new StringBuilder();
+sqlBuilder.append("select ");
+sqlBuilder.append(String.join(", ", columns));
+
+return new SqlQueryBuilder(sqlBuilder);
+  }
+
+  /**
+   * Appends a FROM clause to a query.
+   *
+   * @param tables The table names to select from.
+   * @return The {@link SqlQueryBuilder} instance.
+   */
+  public SqlQueryBuilder from(String... tables) {
+if (tables == null || tables.length == 0) {
+  throw new IllegalArgumentException();
+}
+
+sqlBuilder.append(" from ");
+sqlBuilder.append(String.join(", ", tables));

Review comment:
   Added a subtask to take it up after we land this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codope commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r650811408



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+import org.jetbrains.annotations.NotNull;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+

[jira] [Created] (HUDI-2012) Add source table or column validations.

2021-06-14 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-2012:
-

 Summary: Add source table or column validations.
 Key: HUDI-2012
 URL: https://issues.apache.org/jira/browse/HUDI-2012
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: Sagar Sumit
Assignee: Sagar Sumit


Based on the comment 
https://github.com/apache/hudi/pull/2915#discussion_r627851195

we need to validate the incremental column for its datatype. for eg, what 
incase a byte[] column was chosen as incremental column. Also, another 
validation is to check if the column exists in the table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2011) Parallel data sync for JDBC source

2021-06-14 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-2011:
-

 Summary: Parallel data sync for JDBC source
 Key: HUDI-2011
 URL: https://issues.apache.org/jira/browse/HUDI-2011
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: Sagar Sumit
Assignee: Sagar Sumit


Compute upsert/insert/bulk_insert parallelism according to table size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2010) Add support for multi table sync

2021-06-14 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-2010:
-

 Summary: Add support for multi table sync
 Key: HUDI-2010
 URL: https://issues.apache.org/jira/browse/HUDI-2010
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: Sagar Sumit
Assignee: Sagar Sumit


In the first phase we added single table sync for JDBC source: 
[https://github.com/apache/hudi/pull/2915]

With HoodieMultiTableDeltaStreamer, we need to support multi table sync.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codope commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r650804454



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+import org.jetbrains.annotations.NotNull;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+

[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codope commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r650797408



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+throw new 

[GitHub] [hudi] leesf commented on a change in pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path

2021-06-14 Thread GitBox


leesf commented on a change in pull request #3075:
URL: https://github.com/apache/hudi/pull/3075#discussion_r650783735



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java
##
@@ -105,4 +110,9 @@ public HoodieTable getHoodieTable() {
   public WriteOperationType getWriteOperationType() {
 return operationType;
   }
+
+  @VisibleForTesting

Review comment:
   is this annotation really needed or change `private` to 
`protected/public`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-14 Thread GitBox


codope commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r650783802



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJdbcSource.java
##
@@ -0,0 +1,442 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.testutils.UtilitiesTestBase;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.SQLException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.clearAndInsert;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.close;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.count;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.insert;
+import static org.apache.hudi.utilities.testutils.JdbcTestUtils.update;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.fail;
+
+/**
+ * Tests {@link JdbcSource}.
+ */
+public class TestJdbcSource extends UtilitiesTestBase {
+
+  private static final TypedProperties PROPS = new TypedProperties();
+  private static final HoodieTestDataGenerator DATA_GENERATOR = new 
HoodieTestDataGenerator();
+  private static Connection connection;
+
+  @BeforeEach
+  public void setup() throws Exception {
+super.setup();
+PROPS.setProperty("hoodie.deltastreamer.jdbc.url", "jdbc:h2:mem:test_mem");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.driver.class", 
"org.h2.Driver");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.user", "test");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.password", "jdbc");
+PROPS.setProperty("hoodie.deltastreamer.jdbc.table.name", "triprec");
+connection = DriverManager.getConnection("jdbc:h2:mem:test_mem", "test", 
"jdbc");
+  }
+
+  @AfterEach
+  public void teardown() throws Exception {
+super.teardown();
+close(connection);
+  }
+
+  @Test
+  public void testSingleCommit() {
+PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true");
+
PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", 
"last_insert");
+
+try {
+  int numRecords = 100;
+  String commitTime = "000";
+
+  // Insert 100 records with commit time
+  clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, 
PROPS);
+
+  // Validate if we have specified records in db
+  assertEquals(numRecords, count(connection, "triprec"));
+
+  // Start JdbcSource
+  Dataset rowDataset = runSource(Option.empty(), 
numRecords).getBatch().get();
+  assertEquals(numRecords, rowDataset.count());
+} catch (SQLException e) {
+  fail(e.getMessage());
+}
+  }
+
+  @Test
+  public void testInsertAndUpdate() {
+PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true");
+
PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", 
"last_insert");
+
+try {
+  final String commitTime = "000";
+  final int numRecords = 100;
+
+  // Add 100 records. Update half of them with commit time "007".
+  update("007",
+  clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, 
PROPS)
+  .stream()
+  .limit(50)
+  

[GitHub] [hudi] leesf commented on a change in pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path

2021-06-14 Thread GitBox


leesf commented on a change in pull request #3075:
URL: https://github.com/apache/hudi/pull/3075#discussion_r650783735



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java
##
@@ -105,4 +110,9 @@ public HoodieTable getHoodieTable() {
   public WriteOperationType getWriteOperationType() {
 return operationType;
   }
+
+  @VisibleForTesting

Review comment:
   is this annotation really needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage

2021-06-14 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-2004.
-
Resolution: Done

Done - 769dd2d7c98558146eb4accb75b6d8e339ae6e0f

> Move KafkaOffsetGen.CheckpointUtils test cases to independent class and 
> improve coverage
> 
>
> Key: HUDI-2004
> URL: https://issues.apache.org/jira/browse/HUDI-2004
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
>
> Currently KafkaOffsetGen.CheckpointUtils test cases are present in 
> TestKafkaSource which starts up hdfs, hive,zk service locally. This is not 
> required for CheckpointUtils test cases, hence should be moved to independent 
> test case of its own
>  
> Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not 
> unit tested currently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (7d9f9d7 -> 769dd2d)

2021-06-14 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 7d9f9d7  [HUDI-1991] Fixing drop dups exception in bulk insert row 
writer path (#3055)
 add 769dd2d  [HUDI-2004] Move CheckpointUtils test cases to independant 
class (#3072)

No new revisions were added by this update.

Summary of changes:
 .../utilities/sources/helpers/KafkaOffsetGen.java  |   4 +-
 .../hudi/utilities/sources/TestKafkaSource.java|  66 ---
 .../sources/helpers/TestCheckpointUtils.java   | 126 +
 3 files changed, 128 insertions(+), 68 deletions(-)
 create mode 100644 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/helpers/TestCheckpointUtils.java


[GitHub] [hudi] leesf merged pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class

2021-06-14 Thread GitBox


leesf merged pull request #3072:
URL: https://github.com/apache/hudi/pull/3072


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-14 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362688#comment-17362688
 ] 

Vinay edited comment on HUDI-1910 at 6/14/21, 9:09 AM:
---

[~nishith29] Make sense, so you are suggesting to include 
COMMIT_OFFSET_TO_KAFKA config in KafkaOffsetGen.Config class so that users can 
include it in property file like we pass topic name.

And then use it here -  
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474]
 and call commitOffsetToKafka function. is that correct ?

 

If this approach looks good, I can test this change out and create a PR


was (Author: vinaypatil18):
[~nishith29] Make sense, so you suggesting to include COMMIT_OFFSET_TO_KAFKA 
config in KafkaOffsetGen.Config class so that users can include it in property 
file like we pass topic name.

And then use it here -  
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474]
 and call commitOffsetToKafka function.

 

If this approach looks good, I can test this change out and create a PR

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] deep-teliacompany commented on issue #2970: [SUPPORT] Failed to upsert for commit time

2021-06-14 Thread GitBox


deep-teliacompany commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-860487690


   Hi, does Hudi 0.8.0 supports concurrency or from which version concurrecy is 
supported??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan opened a new pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node

2021-06-14 Thread GitBox


nsivabalan opened a new pull request #3074:
URL: https://github.com/apache/hudi/pull/3074


   ## What is the purpose of the pull request
   
   - Fixing hudi test suite for spark nodes and adding spark bulk_insert node
   - Fixed spark nodes in hudi test suite infra
   
   ## Brief change log
   
   - Fixing hudi test suite for spark nodes and adding spark bulk_insert node
   - Fixed spark nodes in hudi test suite infra
   - Added config to enable row writing
   
   ## Verify this pull request
   
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #3073: [HUDI-2006] Adding more yaml templates to test suite

2021-06-14 Thread GitBox


codecov-commenter commented on pull request #3073:
URL: https://github.com/apache/hudi/pull/3073#issuecomment-860281795


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3073](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e039f49) into 
[master](https://codecov.io/gh/apache/hudi/commit/0d0dc6fb07e0c5496224c75052ab4f43d57b40f6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (0d0dc6f) will **decrease** coverage by `46.70%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3073/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3073   +/-   ##
   
   - Coverage 55.14%   8.43%   -46.71% 
   + Complexity 3866  62 -3804 
   
 Files   488  70  -418 
 Lines 236192880-20739 
 Branches   2528 359 -2169 
   
   - Hits  13024 243-12781 
   + Misses 94372616 -6821 
   + Partials   1158  21 -1137 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.79% <ø> (-39.81%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.09% <ø> (-61.79%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | 

[GitHub] [hudi] codecov-commenter commented on pull request #3070: [HUDI-2002] Modify the log level to ERROR

2021-06-14 Thread GitBox


codecov-commenter commented on pull request #3070:
URL: https://github.com/apache/hudi/pull/3070#issuecomment-860067614


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3070?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3070](https://codecov.io/gh/apache/hudi/pull/3070?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1368d1c) into 
[master](https://codecov.io/gh/apache/hudi/commit/673d62f3c3ab07abb3fcd319607e657339bc0682?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (673d62f) will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3070/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3070?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@   Coverage Diff@@
   ## master   #3070   +/-   ##
   
 Coverage  8.43%   8.43%   
 Complexity   62  62   
   
 Files70  70   
 Lines  28802880   
 Branches359 359   
   
 Hits243 243   
 Misses 26162616   
 Partials 21  21   
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudiclient | `?` | |
   | hudisync | `6.79% <ø> (ø)` | |
   | hudiutilities | `9.09% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan edited a comment on pull request #3003: [HUDI-1939][WIP] Replace joda-time api with java8 new time api

2021-06-14 Thread GitBox


xushiyan edited a comment on pull request #3003:
URL: https://github.com/apache/hudi/pull/3003#issuecomment-860084194






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan merged pull request #2967: [HUDI-1766] Added blog for Hudi cleaner service

2021-06-14 Thread GitBox


nsivabalan merged pull request #2967:
URL: https://github.com/apache/hudi/pull/2967


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] arun990 closed issue #3069: [SUPPORT] presto query error on hudi table - hudi.hadoop.HoodieParquetInputFormat

2021-06-14 Thread GitBox


arun990 closed issue #3069:
URL: https://github.com/apache/hudi/issues/3069


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] veenaypatil commented on a change in pull request #3066: [HUDI-1997] Adding Note for explicitly setting HIVE_AUTO_CREATE_DATABASE

2021-06-14 Thread GitBox


veenaypatil commented on a change in pull request #3066:
URL: https://github.com/apache/hudi/pull/3066#discussion_r650388442



##
File path: content/docs/configurations.html
##
@@ -524,7 +524,7 @@ HIVE_USE_JDBC_OPT_KEY
 
 HIVE_AUTO_CREATE_DATABASE_OPT_KEY
 Property: hoodie.datasource.hive_sync.auto_create_database
 Default: true 
- Auto create hive database if does not exists 

+ Auto create hive database if does not exists. 
Note: for versions 0.7 and 0.8 you will have to explicitly set this to 
true 

Review comment:
   @leesf  oh ok, updated the .md file now, pls check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] arun990 opened a new issue #3069: [SUPPORT] presto query error on hudi table - hudi.hadoop.HoodieParquetInputFormat

2021-06-14 Thread GitBox


arun990 opened a new issue #3069:
URL: https://github.com/apache/hudi/issues/3069


   Hi, 
   Hudi table is working with spark and Hive.
   But when queried the hudi table using presto giving error:
   error: "Unable to create input format 
org.apache.hudi.hadoop.HoodieParquetInputFormat"
   
   Presto verions is : Presto CLI dataproc-tag-340-3
   
   Kept the hudi presto bundle jar at /usr/lib/presto/plugin/hive-hadoop2 as 
well and 
   hive-site xml has the input format set as 
org.apache.hudi.hadoop.HoodieParquetInputFormat.
   Please advise.
   
   Regards
   Arun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3072:
URL: https://github.com/apache/hudi/pull/3072#issuecomment-860107100






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1766) Write a detailed blog for HoodieCleaner

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1766:
-
Labels: pull-request-available  (was: )

> Write a detailed blog for HoodieCleaner
> ---
>
> Key: HUDI-1766
> URL: https://issues.apache.org/jira/browse/HUDI-1766
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
>
> Cleaner plays a very critical role in helping user achieve writer and reader 
> isolation. We need a blog to highlight its configurations properly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2006) Add more yamls to test suite

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2006:
-
Labels: pull-request-available  (was: )

> Add more yamls to test suite 
> -
>
> Key: HUDI-2006
> URL: https://issues.apache.org/jira/browse/HUDI-2006
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add more yaml files to test suite job suite. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan merged pull request #3070: [HUDI-2002] Modify HiveIncrementalPuller log level to ERROR

2021-06-14 Thread GitBox


xushiyan merged pull request #3070:
URL: https://github.com/apache/hudi/pull/3070


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class

2021-06-14 Thread GitBox


codecov-commenter commented on pull request #3072:
URL: https://github.com/apache/hudi/pull/3072#issuecomment-860107100


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3072](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (da5450a) into 
[master](https://codecov.io/gh/apache/hudi/commit/ba728d822f733cf1978b95c6b6af793fdf041088?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (ba728d8) will **decrease** coverage by `1.45%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3072/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3072  +/-   ##
   
   - Coverage 55.37%   53.92%   -1.46% 
   + Complexity 4029 3416 -613 
   
 Files   521  441  -80 
 Lines 2531221604-3708 
 Branches   2873 2461 -412 
   
   - Hits  1401711649-2368 
   + Misses 9907 8793-1114 
   + Partials   1388 1162 -226 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.03% <ø> (ø)` | |
   | hudiflink | `63.03% <ø> (ø)` | |
   | hudihadoopmr | `51.43% <ø> (ø)` | |
   | hudisparkdatasource | `66.51% <ø> (ø)` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../src/main/java/org/apache/hudi/dla/util/Utils.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL3V0aWwvVXRpbHMuamF2YQ==)
 | | |
   | 
[...a/org/apache/hudi/utilities/sources/SqlSource.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvU3FsU291cmNlLmphdmE=)
 | | |
   | 
[...i/hive/SlashEncodedDayPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkRGF5UGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | | |
   | 
[...java/org/apache/hudi/hive/util/HiveSchemaUtil.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9IaXZlU2NoZW1hVXRpbC5qYXZh)
 | | |
   | 
[.../hudi/utilities/schema/RowBasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9Sb3dCYXNlZFNjaGVtYVByb3ZpZGVyLmphdmE=)
 | | |
   | 
[...g/apache/hudi/timeline/service/RequestHandler.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvUmVxdWVzdEhhbmRsZXIuamF2YQ==)
 | | |
   | 

[GitHub] [hudi] nsivabalan opened a new pull request #3073: [HUDI-2006] Adding more yaml templates to test suite

2021-06-14 Thread GitBox


nsivabalan opened a new pull request #3073:
URL: https://github.com/apache/hudi/pull/3073


   
   ## What is the purpose of the pull request
   
   Added more yamls to test suite framework. 
   Default: sanity.yml
   Optional yamls
   medium_test_suite.yaml: medium sized test suite which validates entire input 
for N no of rounds
   long_test_suite.yaml: long running test suite which validates input after 
every round and deletes the input data. So that we can scale this test for 
larger iterations if required. 
   clustering.yaml : tests clustering. 
   
   ## Brief change log
   - Added more yamls to test suite framework. 
   
   ## Verify this pull request
   
   ```
   ./generate_test_suite.sh --execute_test_suite false 
--include_medium_test_suite_yaml true --include_long_test_suite_yaml true 
--include_cluster_yaml true
   Include Medium test suite true
   Medium test suite iterations = 20
   Include Long test suite true
   Long test suite iterations = 50
   Intermittent delay in mins = 1
   Table type   = COPY_ON_WRITE
   Include cluster yaml true
   Cluster total itr count  30
   Cluster delay mins 1
   Cluster exec itr count 15
   Jar name hudi-integ-test-bundle-0.9.0-SNAPSHOT.jar
   Input path \/user\/hive\/warehouse\/hudi-integ-test-suite\/input\/
   Output path \/user\/hive\/warehouse\/hudi-integ-test-suite\/output\/
   Cleaning up staging dir
   Creating staging dir
   ```
   Once executed above command, generated files can be found in staging folder. 
   ```
   ls demo/config/test-suite/staging   
   clustering.yaml
   clustering_spark_command.sh
   long_test_suite.yaml
   long_test_suite_spark_command.sh
   medium_test_suite.yaml
   medium_test_suite_spark_command.sh
   sanity.yaml
   sanity_spark_command.sh
   test.properties
   ```
   
   You can run the same command w/o "--execute_test_suite false" which will go 
ahead and execute the yamls included. 
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 merged pull request #3046: [HUDI-1984] Support independent flink hudi compaction function

2021-06-14 Thread GitBox


garyli1019 merged pull request #3046:
URL: https://github.com/apache/hudi/pull/3046


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] veenaypatil commented on pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class

2021-06-14 Thread GitBox


veenaypatil commented on pull request #3072:
URL: https://github.com/apache/hudi/pull/3072#issuecomment-860343956


   @n3nash @yanghua can you please review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3073: [HUDI-2006] Adding more yaml templates to test suite

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3073:
URL: https://github.com/apache/hudi/pull/3073#issuecomment-860281795






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] arun990 commented on issue #3069: [SUPPORT] presto query error on hudi table - hudi.hadoop.HoodieParquetInputFormat

2021-06-14 Thread GitBox


arun990 commented on issue #3069:
URL: https://github.com/apache/hudi/issues/3069#issuecomment-860143649


   Hi, restart helped after setting HoodieParquetInputFormat.
   closing this.
   Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path

2021-06-14 Thread GitBox


codecov-commenter commented on pull request #3075:
URL: https://github.com/apache/hudi/pull/3075#issuecomment-860357081


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3075](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (ed81ede) into 
[master](https://codecov.io/gh/apache/hudi/commit/7d9f9d7d8241bfb70d50c557b0194cc8a87b6ee7?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (7d9f9d7) will **decrease** coverage by `46.60%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3075/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3075   +/-   ##
   
   - Coverage 55.04%   8.43%   -46.61% 
   + Complexity 4029  62 -3967 
   
 Files   526  70  -456 
 Lines 254662880-22586 
 Branches   2886 359 -2527 
   
   - Hits  14018 243-13775 
   + Misses100572616 -7441 
   + Partials   1391  21 -1370 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.09% <ø> (-61.87%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | 

[GitHub] [hudi] calleo commented on issue #2975: [SUPPORT] Read record using index

2021-06-14 Thread GitBox


calleo commented on issue #2975:
URL: https://github.com/apache/hudi/issues/2975#issuecomment-860014912


   Will give this a try. Thanks all for helping out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3075:
URL: https://github.com/apache/hudi/pull/3075#issuecomment-860357081






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1976:
-
Labels: pull-request-available  (was: )

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on pull request #3010: Improving Hudi CLI tool docs

2021-06-14 Thread GitBox


nsivabalan commented on pull request #3010:
URL: https://github.com/apache/hudi/pull/3010#issuecomment-860282275


   do ping me here once the patch is ready to be reviewed again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-14 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jintaoguan commented on pull request #2999: [HUDI-764] [HUDI-765] ORC reader writer Implementation

2021-06-14 Thread GitBox


jintaoguan commented on pull request #2999:
URL: https://github.com/apache/hudi/pull/2999#issuecomment-859974391


   @leesf We have an umbrella ticket 
[HUDI-57](https://issues.apache.org/jira/browse/HUDI-57 ) that contains all the 
subtasks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] veenaypatil opened a new pull request #3071: [WIP] [HUDI-1976] Resolve vulnerability

2021-06-14 Thread GitBox


veenaypatil opened a new pull request #3071:
URL: https://github.com/apache/hudi/pull/3071


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on pull request #3003: [HUDI-1939][WIP] Replace joda-time api with java8 new time api

2021-06-14 Thread GitBox


xushiyan commented on pull request #3003:
URL: https://github.com/apache/hudi/pull/3003#issuecomment-860084194


   Good to see this effort revived! We had some earlier discussion, worth 
consideration while migrating this.
   
   
https://lists.apache.org/thread.html/rdfd91fc5e8e76a7434da0975141b0629411d507ce804236596b69ede%40%3Cdev.hudi.apache.org%3E
   
   cc @pratyakshsharma do you also want to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2967: Added blog for Hudi cleaner service

2021-06-14 Thread GitBox


nsivabalan commented on pull request #2967:
URL: https://github.com/apache/hudi/pull/2967#issuecomment-860282110


   awesome, thanks for your contribution. This will definitely benefit the 
community. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf merged pull request #3066: [HUDI-1997] Adding Note for explicitly setting HIVE_AUTO_CREATE_DATABASE

2021-06-14 Thread GitBox


leesf merged pull request #3066:
URL: https://github.com/apache/hudi/pull/3066


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] veenaypatil commented on pull request #3071: [WIP] [HUDI-1976] Resolve vulnerability

2021-06-14 Thread GitBox


veenaypatil commented on pull request #3071:
URL: https://github.com/apache/hudi/pull/3071#issuecomment-860088720


   The build is passing locally but there are many overlapping classes warnings
   
   ```
   jackson-annotations-2.6.7.jar, hive-exec-2.3.9.jar define 58 overlapping 
classes:
   
   parquet-avro-1.11.1.jar, parquet-column-1.11.1.jar define 145 overlapping 
classes: 
   
   joda-time-2.9.9.jar, hive-exec-2.3.9.jar define 246 overlapping classes: 
   
   parquet-format-structures-1.11.1.jar, hive-exec-2.3.9.jar define 81 
overlapping classes:
   
   parquet-encoding-1.11.1.jar, hive-exec-2.3.9.jar define 169 overlapping 
classes: 
   
   parquet-common-1.11.1.jar, hive-exec-2.3.9.jar define 44 overlapping 
classes: 
   
   avro-1.10.0.jar, hive-exec-2.3.9.jar define 346 overlapping classes: 
   
   jackson-core-2.6.7.jar, hive-exec-2.3.9.jar define 93 overlapping classes: 
   
   parquet-hadoop-1.11.1.jar, parquet-avro-1.11.1.jar, 
parquet-column-1.11.1.jar define 50 overlapping classes:
   
   hive-exec-2.3.9.jar, jackson-databind-2.9.10.8.jar define 549 overlapping 
classes:
   ```
   
   Also, this needs to be tested properly to ensure that there are no 
dependency conflict issues arising because of this change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #3066: [HUDI-1997] Adding Note for explicitly setting HIVE_AUTO_CREATE_DATABASE

2021-06-14 Thread GitBox


leesf commented on a change in pull request #3066:
URL: https://github.com/apache/hudi/pull/3066#discussion_r650379858



##
File path: content/docs/configurations.html
##
@@ -524,7 +524,7 @@ HIVE_USE_JDBC_OPT_KEY
 
 HIVE_AUTO_CREATE_DATABASE_OPT_KEY
 Property: hoodie.datasource.hive_sync.auto_create_database
 Default: true 
- Auto create hive database if does not exists 

+ Auto create hive database if does not exists. 
Note: for versions 0.7 and 0.8 you will have to explicitly set this to 
true 

Review comment:
   @veenaypatil you should update the .md file instead of html file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3055: [HUDI-1991] Fixing drop dups exception in bulk insert row writer path

2021-06-14 Thread GitBox


nsivabalan commented on a change in pull request #3055:
URL: https://github.com/apache/hudi/pull/3055#discussion_r650395149



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##
@@ -181,8 +181,45 @@ class HoodieSparkSqlWriterSuite extends FunSuite with 
Matchers {
 val path = java.nio.file.Files.createTempDirectory("hoodie_test_path")
 try {
 
-  val sqlContext = session.sqlContext
-  val sc = session.sparkContext
+  val hoodieFooTableName = "hoodie_foo_tbl"
+
+  //create a new table
+  val fooTableModifier = Map("path" -> path.toAbsolutePath.toString,
+HoodieWriteConfig.TABLE_NAME -> hoodieFooTableName,
+DataSourceWriteOptions.TABLE_TYPE_OPT_KEY -> 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
+"hoodie.bulkinsert.shuffle.parallelism" -> "4",
+DataSourceWriteOptions.OPERATION_OPT_KEY -> 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL,
+DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY -> "true",
+INSERT_DROP_DUPS_OPT_KEY -> "true",
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "partition",
+DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> 
"org.apache.hudi.keygen.SimpleKeyGenerator")
+  val fooTableParams = 
HoodieWriterUtils.parametersWithWriteDefaults(fooTableModifier)
+
+  // generate the inserts
+  val schema = DataSourceTestUtils.getStructTypeExampleSchema
+  val structType = 
AvroConversionUtils.convertAvroSchemaToStructType(schema)
+  val records = DataSourceTestUtils.generateRandomRows(100)
+  val recordsSeq = convertRowListToSeq(records)
+  val df = spark.createDataFrame(sc.parallelize(recordsSeq), structType)
+  // write to Hudi
+  HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append, fooTableParams, 
df)
+  fail("Drop duplicates with bulk insert in row writing should have thrown 
exception")
+} catch {
+  case e: HoodieException => println("Dropping duplicates with bulk_insert 
in row writer path is not supported yet")

Review comment:
   my bad, some copy paste mistake. will fix it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2008) Add an annotation to suppress the compiler warnings

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2008:
-
Labels: pull-request-available  (was: )

> Add an annotation to suppress the compiler warnings
> ---
>
> Key: HUDI-2008
> URL: https://issues.apache.org/jira/browse/HUDI-2008
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Wei
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-06-14 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362691#comment-17362691
 ] 

Vinay edited comment on HUDI-1975 at 6/14/21, 7:54 AM:
---

[~nishith29] Updated the metrics.version in pom to 3.1.2 , the build fails with
{code:java}
/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49]
 cannot find symbol
{code}
MetricsRegistry does not have gauge method in 3.1.2 version, this is part of 
metrics-core dependency. There is a workaround of doing so here - 
[https://github.com/eclipse/microprofile-metrics/issues/244] 


was (Author: vinaypatil18):
Updated the metrics.version in pom to 3.1.2 , the build fails with
{code:java}
/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49]
 cannot find symbol
{code}
MetricsRegistry does not gauge method in 3.1.2 version, this is part of 
metrics-core dependency. There is a workaround of doing so here - 
[https://github.com/eclipse/microprofile-metrics/issues/244] 

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Find more details here -> https://github.com/apache/hudi/issues/2774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3076: [HUDI-2008] Add an annotation to suppress the compiler warnings

2021-06-14 Thread GitBox


codecov-commenter edited a comment on pull request #3076:
URL: https://github.com/apache/hudi/pull/3076#issuecomment-860358006






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >