[GitHub] [hudi] danny0405 commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
danny0405 commented on a change in pull request #3025: URL: https://github.com/apache/hudi/pull/3025#discussion_r651468804 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java ## @@ -63,7 +63,7 @@ public BaseScheduleCompactionActionExecutor(HoodieEngineContext context, + ", Compaction scheduled at " + instantTime)); // Committed and pending compaction instants should have strictly lower timestamps List conflictingInstants = table.getActiveTimeline() - .getWriteTimeline().getInstants() + .getWriteTimeline().filterCompletedAndCompactionInstants().getInstants() Review comment: Take a look at the comments ```java // Committed and pending compaction instants should have strictly lower timestamps ``` I think the code before that used `commitsAndCompactionTimeline()` is already wrong, it add restrictions that we can not generate compaction plan when there are inflight commits, of course we can actually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] swuferhong commented on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
swuferhong commented on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-861190938 > @swuferhong Thanks for opening this. Looks like it might be a duplicate. Can you check why the CI is failing ? Yes, we reopened this PR, and CLI passing now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.
codecov-commenter edited a comment on pull request #2819: URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (199e377) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **decrease** coverage by `41.60%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2819 +/- ## - Coverage 50.04% 8.43% -41.61% + Complexity 3685 62 -3623 Files 526 70 -456 Lines 254662880-22586 Branches 2886 359 -2527 - Hits 12744 243-12501 + Misses114542616 -8838 + Partials 1268 21 -1247 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.09% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | `0.00% <0.00%> (-97.83%)` | :arrow_down: | | [.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=) | `0.00% <0.00%> (-90.91%)` | :arrow_down: | | [...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh) | `0.00% <0.00%> (-84.85%)` | :arrow_down: | |
[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
n3nash commented on a change in pull request #3025: URL: https://github.com/apache/hudi/pull/3025#discussion_r651456965 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java ## @@ -63,7 +63,7 @@ public BaseScheduleCompactionActionExecutor(HoodieEngineContext context, + ", Compaction scheduled at " + instantTime)); // Committed and pending compaction instants should have strictly lower timestamps List conflictingInstants = table.getActiveTimeline() - .getWriteTimeline().getInstants() + .getWriteTimeline().filterCompletedAndCompactionInstants().getInstants() Review comment: @danny0405 This class was NOT using `filterCompletedAndCompactionInstants` before. It was using `commitsAndCompactionTimeline()`. See this -> https://github.com/apache/hudi/blob/release-0.7.0/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java#L65 Let me know if there is any confusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
n3nash commented on a change in pull request #3025: URL: https://github.com/apache/hudi/pull/3025#discussion_r651456965 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java ## @@ -63,7 +63,7 @@ public BaseScheduleCompactionActionExecutor(HoodieEngineContext context, + ", Compaction scheduled at " + instantTime)); // Committed and pending compaction instants should have strictly lower timestamps List conflictingInstants = table.getActiveTimeline() - .getWriteTimeline().getInstants() + .getWriteTimeline().filterCompletedAndCompactionInstants().getInstants() Review comment: @danny0405 This class was NOT using `filterCompletedAndCompactionInstants` before. It was using `commitsAndCompactionTimeline()`. See this -> https://github.com/apache/hudi/blob/release-0.7.0/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java#L65 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
n3nash commented on a change in pull request #3025: URL: https://github.com/apache/hudi/pull/3025#discussion_r651456965 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java ## @@ -63,7 +63,7 @@ public BaseScheduleCompactionActionExecutor(HoodieEngineContext context, + ", Compaction scheduled at " + instantTime)); // Committed and pending compaction instants should have strictly lower timestamps List conflictingInstants = table.getActiveTimeline() - .getWriteTimeline().getInstants() + .getWriteTimeline().filterCompletedAndCompactionInstants().getInstants() Review comment: You are right, I looked at the wrong API. This is the correct one -> https://github.com/apache/hudi/blob/3e71c915271d77c7306ca0325b212f71ce723fc0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java#L104. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
codecov-commenter edited a comment on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1f85ab3) into [master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (f760ec5) will **decrease** coverage by `0.32%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3050 +/- ## - Coverage 55.31% 54.99% -0.33% - Complexity 4026 4044 +18 Files 520 526 +6 Lines 2529525668 +373 Branches 2872 2950 +78 + Hits 1399314117 +124 - Misses 991410164 +250 + Partials 1388 1387 -1 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.03% <ø> (+0.05%)` | :arrow_up: | | hudiflink | `60.58% <ø> (-2.26%)` | :arrow_down: | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.53% <ø> (+0.01%)` | :arrow_up: | | hudisync | `47.94% <ø> (-3.51%)` | :arrow_down: | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh) | `52.56% <0.00%> (-19.04%)` | :arrow_down: | | [...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==) | `65.00% <0.00%> (-10.00%)` | :arrow_down: | | [...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh) | `66.17% <0.00%> (-0.75%)` | :arrow_down: | | [...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh) | `91.66% <0.00%> (-0.23%)` | :arrow_down: | | [...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=) | `100.00% <0.00%> (ø)` | | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
codecov-commenter edited a comment on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1f85ab3) into [master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (f760ec5) will **decrease** coverage by `0.32%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3050 +/- ## - Coverage 55.31% 54.99% -0.33% - Complexity 4026 4044 +18 Files 520 526 +6 Lines 2529525668 +373 Branches 2872 2950 +78 + Hits 1399314115 +122 - Misses 991410165 +251 Partials 1388 1388 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.01% <ø> (+0.03%)` | :arrow_up: | | hudiflink | `60.58% <ø> (-2.26%)` | :arrow_down: | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.53% <ø> (+0.01%)` | :arrow_up: | | hudisync | `47.94% <ø> (-3.51%)` | :arrow_down: | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh) | `52.56% <0.00%> (-19.04%)` | :arrow_down: | | [...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==) | `65.00% <0.00%> (-10.00%)` | :arrow_down: | | [...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh) | `66.17% <0.00%> (-0.75%)` | :arrow_down: | | [...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh) | `91.66% <0.00%> (-0.23%)` | :arrow_down: | | [...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=) | `100.00% <0.00%> (ø)` | | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile
codecov-commenter edited a comment on pull request #3067: URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3067](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (a25ebc2) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **increase** coverage by `5.02%`. > The diff coverage is `73.80%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3067/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3067 +/- ## + Coverage 50.04% 55.07% +5.02% - Complexity 3685 4035 +350 Files 526 526 Lines 2546625479 +13 Branches 2886 2886 + Hits 1274414032+1288 + Misses1145410057-1397 - Partials 1268 1390 +122 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.01% <ø> (ø)` | | | hudiflink | `60.73% <73.80%> (+0.15%)` | :arrow_up: | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.53% <ø> (ø)` | | | hudisync | `51.45% <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `71.06% <ø> (+61.96%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...e/hudi/sink/partitioner/profile/WriteProfiles.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlcy5qYXZh) | `55.88% <54.16%> (-4.12%)` | :arrow_down: | | [...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh) | `70.28% <100.00%> (ø)` | | | [...di/sink/partitioner/profile/DeltaWriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvRGVsdGFXcml0ZVByb2ZpbGUuamF2YQ==) | `69.23% <100.00%> (ø)` | | | [...he/hudi/sink/partitioner/profile/WriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlLmphdmE=) | `88.00% <100.00%> (+3.62%)` | :arrow_up: | | [...ache/hudi/source/StreamReadMonitoringFunction.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zb3VyY2UvU3RyZWFtUmVhZE1vbml0b3JpbmdGdW5jdGlvbi5qYXZh) | `80.48% <100.00%> (+4.62%)` | :arrow_up: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
codecov-commenter edited a comment on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1f85ab3) into [master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (f760ec5) will **decrease** coverage by `0.32%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3050 +/- ## - Coverage 55.31% 54.99% -0.33% - Complexity 4026 4044 +18 Files 520 526 +6 Lines 2529525668 +373 Branches 2872 2950 +78 + Hits 1399314115 +122 - Misses 991410165 +251 Partials 1388 1388 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.01% <ø> (+0.03%)` | :arrow_up: | | hudiflink | `60.58% <ø> (-2.26%)` | :arrow_down: | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.53% <ø> (+0.01%)` | :arrow_up: | | hudisync | `47.94% <ø> (-3.51%)` | :arrow_down: | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh) | `52.56% <0.00%> (-19.04%)` | :arrow_down: | | [...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==) | `65.00% <0.00%> (-10.00%)` | :arrow_down: | | [...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh) | `66.17% <0.00%> (-0.75%)` | :arrow_down: | | [...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh) | `91.66% <0.00%> (-0.23%)` | :arrow_down: | | [...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=) | `100.00% <0.00%> (ø)` | | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile
codecov-commenter edited a comment on pull request #3067: URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3067](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (a25ebc2) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **increase** coverage by `5.02%`. > The diff coverage is `73.80%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3067/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3067 +/- ## + Coverage 50.04% 55.06% +5.02% - Complexity 3685 4034 +349 Files 526 526 Lines 2546625479 +13 Branches 2886 2886 + Hits 1274414031+1287 + Misses1145410057-1397 - Partials 1268 1391 +123 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.01% <ø> (ø)` | | | hudiflink | `60.73% <73.80%> (+0.15%)` | :arrow_up: | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.53% <ø> (ø)` | | | hudisync | `51.45% <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...e/hudi/sink/partitioner/profile/WriteProfiles.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlcy5qYXZh) | `55.88% <54.16%> (-4.12%)` | :arrow_down: | | [...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh) | `70.28% <100.00%> (ø)` | | | [...di/sink/partitioner/profile/DeltaWriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvRGVsdGFXcml0ZVByb2ZpbGUuamF2YQ==) | `69.23% <100.00%> (ø)` | | | [...he/hudi/sink/partitioner/profile/WriteProfile.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL3Byb2ZpbGUvV3JpdGVQcm9maWxlLmphdmE=) | `88.00% <100.00%> (+3.62%)` | :arrow_up: | | [...ache/hudi/source/StreamReadMonitoringFunction.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zb3VyY2UvU3RyZWFtUmVhZE1vbml0b3JpbmdGdW5jdGlvbi5qYXZh) | `80.48% <100.00%> (+4.62%)` | :arrow_up: | |
[GitHub] [hudi] fengjian428 commented on issue #3054: [SUPPORT] Point query at hudi tables
fengjian428 commented on issue #3054: URL: https://github.com/apache/hudi/issues/3054#issuecomment-861154782 @n3nash is this new data skipping index can improve incremental query‘s performance?seems when using incremental query, need use INCR_PATH_GLOB_OPT_KEY to set pattern to filter on path, otherwise query will pull all the data in commit time range -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 closed pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile
danny0405 closed pull request #3067: URL: https://github.com/apache/hudi/pull/3067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile
codecov-commenter edited a comment on pull request #3067: URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile
codecov-commenter edited a comment on pull request #3067: URL: https://github.com/apache/hudi/pull/3067#issuecomment-859459755 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3067](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (a25ebc2) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **increase** coverage by `2.59%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3067/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3067 +/- ## + Coverage 50.04% 52.63% +2.59% + Complexity 3685 407-3278 Files 526 70 -456 Lines 25466 2880 -22586 Branches 2886 359-2527 - Hits 12744 1516 -11228 + Misses11454 1220 -10234 + Partials 1268 144-1124 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3067?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | `0.00% <0.00%> (-97.83%)` | :arrow_down: | | [.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=) | `0.00% <0.00%> (-90.91%)` | :arrow_down: | | [...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/3067/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh) | `0.00% <0.00%> (-84.85%)` | :arrow_down: | |
[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
nsivabalan commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r651418719 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +throw new
[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
nsivabalan commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r651418549 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJdbcSource.java ## @@ -0,0 +1,442 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.testutils.HoodieTestDataGenerator; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.testutils.UtilitiesTestBase; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.SQLException; +import java.util.stream.Collectors; + +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.clearAndInsert; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.close; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.count; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.insert; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.update; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.fail; + +/** + * Tests {@link JdbcSource}. + */ +public class TestJdbcSource extends UtilitiesTestBase { + + private static final TypedProperties PROPS = new TypedProperties(); + private static final HoodieTestDataGenerator DATA_GENERATOR = new HoodieTestDataGenerator(); + private static Connection connection; + + @BeforeEach + public void setup() throws Exception { +super.setup(); +PROPS.setProperty("hoodie.deltastreamer.jdbc.url", "jdbc:h2:mem:test_mem"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.driver.class", "org.h2.Driver"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.user", "test"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.password", "jdbc"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.table.name", "triprec"); +connection = DriverManager.getConnection("jdbc:h2:mem:test_mem", "test", "jdbc"); + } + + @AfterEach + public void teardown() throws Exception { +super.teardown(); +close(connection); + } + + @Test + public void testSingleCommit() { +PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true"); + PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", "last_insert"); + +try { + int numRecords = 100; + String commitTime = "000"; + + // Insert 100 records with commit time + clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, PROPS); + + // Validate if we have specified records in db + assertEquals(numRecords, count(connection, "triprec")); + + // Start JdbcSource + Dataset rowDataset = runSource(Option.empty(), numRecords).getBatch().get(); + assertEquals(numRecords, rowDataset.count()); +} catch (SQLException e) { + fail(e.getMessage()); +} + } + + @Test + public void testInsertAndUpdate() { +PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true"); + PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", "last_insert"); + +try { + final String commitTime = "000"; + final int numRecords = 100; + + // Add 100 records. Update half of them with commit time "007". + update("007", + clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, PROPS) + .stream() + .limit(50) +
[GitHub] [hudi] danny0405 commented on a change in pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile
danny0405 commented on a change in pull request #3067: URL: https://github.com/apache/hudi/pull/3067#discussion_r651408416 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java ## @@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile( public static void clean(String path) { PROFILES.remove(path); } + + /** + * Returns all the incremental write file path statuses with the given commits metadata. + * + * @param basePath Table base path + * @param hadoopConf The hadoop conf + * @param metadataList The commits metadata + * @return the file statuses array + */ + public static FileStatus[] getWritePathsOfInstants( + Path basePath, + Configuration hadoopConf, + List metadataList) { +FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf); +return metadataList.stream().map(metadata -> getWritePathsOfInstant(basePath, metadata, fs)) +.flatMap(Collection::stream).toArray(FileStatus[]::new); + } + + private static List getWritePathsOfInstant(Path basePath, HoodieCommitMetadata metadata, FileSystem fs) { +return metadata.getFileIdAndFullPaths(basePath.toString()).values().stream() +.map(org.apache.hadoop.fs.Path::new) +// filter out the file paths that does not exist, some files may be cleaned by +// the cleaner. +.filter(path -> { + try { +return fs.exists(path); + } catch (IOException e) { +LOG.error("Checking exists of path: {} error", path); +throw new HoodieException(e); + } +}).map(path -> { + try { +return fs.getFileStatus(path); + } catch (IOException e) { +LOG.error("Get write status of path: {} error", path); +throw new HoodieException(e); + } +}) +// filter out crushed files Review comment: The write should not affect the read. The code was added long time ago, a committed file (merge handle) was later modified by the following modification instants. The first version write handle was not closed until checkpoint success event received(it was modified now), a merge handle may be empty if it does not invoke close. We can till keep the filtering to make the read robust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 commented on a change in pull request #3067: [HUDI-1999] Refresh the base file view cache for WriteProfile
garyli1019 commented on a change in pull request #3067: URL: https://github.com/apache/hudi/pull/3067#discussion_r651401752 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java ## @@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile( public static void clean(String path) { PROFILES.remove(path); } + + /** + * Returns all the incremental write file path statuses with the given commits metadata. + * + * @param basePath Table base path + * @param hadoopConf The hadoop conf + * @param metadataList The commits metadata + * @return the file statuses array + */ + public static FileStatus[] getWritePathsOfInstants( + Path basePath, + Configuration hadoopConf, + List metadataList) { +FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf); +return metadataList.stream().map(metadata -> getWritePathsOfInstant(basePath, metadata, fs)) +.flatMap(Collection::stream).toArray(FileStatus[]::new); + } + + private static List getWritePathsOfInstant(Path basePath, HoodieCommitMetadata metadata, FileSystem fs) { +return metadata.getFileIdAndFullPaths(basePath.toString()).values().stream() +.map(org.apache.hadoop.fs.Path::new) +// filter out the file paths that does not exist, some files may be cleaned by +// the cleaner. +.filter(path -> { + try { +return fs.exists(path); + } catch (IOException e) { +LOG.error("Checking exists of path: {} error", path); +throw new HoodieException(e); + } +}).map(path -> { + try { +return fs.getFileStatus(path); + } catch (IOException e) { +LOG.error("Get write status of path: {} error", path); +throw new HoodieException(e); + } +}) +// filter out crushed files Review comment: crushed files might cause errors on the query side. How are those crushed files produced? ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java ## @@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile( public static void clean(String path) { PROFILES.remove(path); } + + /** + * Returns all the incremental write file path statuses with the given commits metadata. + * + * @param basePath Table base path + * @param hadoopConf The hadoop conf + * @param metadataList The commits metadata + * @return the file statuses array + */ + public static FileStatus[] getWritePathsOfInstants( + Path basePath, + Configuration hadoopConf, + List metadataList) { +FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf); +return metadataList.stream().map(metadata -> getWritePathsOfInstant(basePath, metadata, fs)) +.flatMap(Collection::stream).toArray(FileStatus[]::new); + } + + private static List getWritePathsOfInstant(Path basePath, HoodieCommitMetadata metadata, FileSystem fs) { +return metadata.getFileIdAndFullPaths(basePath.toString()).values().stream() +.map(org.apache.hadoop.fs.Path::new) +// filter out the file paths that does not exist, some files may be cleaned by +// the cleaner. +.filter(path -> { + try { +return fs.exists(path); + } catch (IOException e) { +LOG.error("Checking exists of path: {} error", path); +throw new HoodieException(e); + } +}).map(path -> { + try { +return fs.getFileStatus(path); + } catch (IOException e) { +LOG.error("Get write status of path: {} error", path); +throw new HoodieException(e); + } +}) +// filter out crushed files +.filter(fileStatus -> fileStatus.getLen() > 0) +.collect(Collectors.toList()); + } + + public static HoodieCommitMetadata getCommitMetadata( Review comment: ditto ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java ## @@ -58,4 +76,61 @@ private static WriteProfile getWriteProfile( public static void clean(String path) { PROFILES.remove(path); } + + /** + * Returns all the incremental write file path statuses with the given commits metadata. + * + * @param basePath Table base path + * @param hadoopConf The hadoop conf + * @param metadataList The commits metadata + * @return the file statuses array + */ + public static FileStatus[] getWritePathsOfInstants( + Path basePath, + Configuration hadoopConf, + List metadataList) { +FileSystem fs = FSUtils.getFs(basePath.toString(), hadoopConf); +return metadataList.stream().map(metadata -> getWritePathsOfInstant(basePath, metadata, fs)) +.flatMap(Collection::stream).toArray(FileStatus[]::new); + } + + private
[GitHub] [hudi] danny0405 commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
danny0405 commented on a change in pull request #3025: URL: https://github.com/apache/hudi/pull/3025#discussion_r651402324 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java ## @@ -63,7 +63,7 @@ public BaseScheduleCompactionActionExecutor(HoodieEngineContext context, + ", Compaction scheduled at " + instantTime)); // Committed and pending compaction instants should have strictly lower timestamps List conflictingInstants = table.getActiveTimeline() - .getWriteTimeline().getInstants() + .getWriteTimeline().filterCompletedAndCompactionInstants().getInstants() Review comment: No, the method `getWriteTimeline()` does not really follow the behavior of `filterCompletedAndCompactionInstants`, `getWriteTimeline()` actually may include any `INFLIGHT` instants but `filterCompletedAndCompactionInstants` only include `COMPACTION` `INFLIGHT` instants. We should allow scheduling compaction if there are inflight commits or inflight delta_commits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363315#comment-17363315 ] liwei commented on HUDI-1138: - [~guoyihua] thanks . “We may consider blocking the requests for batching so that the timeline server sends the actual responses only after MARKERS are overwritten / updated.” If waiting the batch requests overwrite/updated successfully. The create marker file request from spark task will wait long time such as 200ms interval plus the markerfiles read and overwrite. Do you have same plan to update the marker file? > Re-implement marker files via timeline server > - > > Key: HUDI-1138 > URL: https://issues.apache.org/jira/browse/HUDI-1138 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Vinoth Chandar >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.9.0 > > > Even as you can argue that RFC-15/consolidated metadata, removes the need for > deleting partial files written due to spark task failures/stage retries. It > will still leave extra files inside the table (and users will pay for it > every month) and we need the marker mechanism to be able to delete these > partial files. > Here we explore if we can improve the current marker file mechanism, that > creates one marker file per data file written, by > Delegating the createMarker() call to the driver/timeline server, and have it > create marker metadata into a single file handle, that is flushed for > durability guarantees > > P.S: I was tempted to think Spark listener mechanism can help us deal with > failed tasks, but it has no guarantees. the writer job could die without > deleting a partial file. i.e it can improve things, but cant provide > guarantees -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
codecov-commenter edited a comment on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1f85ab3) into [master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (f760ec5) will **decrease** coverage by `2.95%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3050 +/- ## - Coverage 55.31% 52.36% -2.96% + Complexity 4026 422-3604 Files 520 70 -450 Lines 25295 3082 -22213 Branches 2872 423-2449 - Hits 13993 1614 -12379 + Misses 9914 1327-8587 + Partials 1388 141-1247 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.14% <ø> (-45.32%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `71.79% <ø> (+0.73%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | `0.00% <0.00%> (-97.83%)` | :arrow_down: | | [.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=) | `0.00% <0.00%> (-90.91%)` | :arrow_down: | | [...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh) | `0.00% <0.00%> (-84.85%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
codecov-commenter edited a comment on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-856664630 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3050](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1f85ab3) into [master](https://codecov.io/gh/apache/hudi/commit/f760ec543ec9ea23b7d4c9f61c76a283bd737f27?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (f760ec5) will **decrease** coverage by `6.48%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3050/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3050 +/- ## - Coverage 55.31% 48.83% -6.49% + Complexity 4026 404-3622 Files 520 70 -450 Lines 25295 3082 -22213 Branches 2872 423-2449 - Hits 13993 1505 -12488 + Misses 9914 1434-8480 + Partials 1388 143-1245 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.14% <ø> (-45.32%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `66.77% <ø> (-4.30%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3050?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | `0.00% <0.00%> (-97.83%)` | :arrow_down: | | [.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=) | `0.00% <0.00%> (-90.91%)` | :arrow_down: | | [...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/3050/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh) | `0.00% <0.00%> (-84.85%)` | :arrow_down: | |
[GitHub] [hudi] swuferhong closed pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
swuferhong closed pull request #3050: URL: https://github.com/apache/hudi/pull/3050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
n3nash commented on a change in pull request #3025: URL: https://github.com/apache/hudi/pull/3025#discussion_r651297449 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java ## @@ -63,7 +63,7 @@ public BaseScheduleCompactionActionExecutor(HoodieEngineContext context, + ", Compaction scheduled at " + instantTime)); // Committed and pending compaction instants should have strictly lower timestamps List conflictingInstants = table.getActiveTimeline() - .getWriteTimeline().getInstants() + .getWriteTimeline().filterCompletedAndCompactionInstants().getInstants() Review comment: @swuferhong @danny0405 If you take a look at the previous version of this file, the method is called before was `commitsAndCompactionTimeline` -> https://github.com/apache/hudi/blob/3e71c915271d77c7306ca0325b212f71ce723fc0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java#L110 This follows the same behavior as getWriteTimeline().getInstants(). Can you please explain what is a possible bug here ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
n3nash commented on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-861011193 @swuferhong Thanks for opening this. Looks like it might be a duplicate. Can you check why the CI is failing ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node
n3nash commented on a change in pull request #3074: URL: https://github.com/apache/hudi/pull/3074#discussion_r651289505 ## File path: hudi-integ-test/src/main/scala/org/apache/hudi/integ/testsuite/dag/nodes/SparkBulkInsertNode.scala ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.integ.testsuite.dag.nodes + +import org.apache.hudi.client.WriteStatus +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.integ.testsuite.configuration.DeltaConfig.Config +import org.apache.hudi.integ.testsuite.dag.ExecutionContext +import org.apache.hudi.{AvroConversionUtils, DataSourceWriteOptions} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.SaveMode + +import scala.collection.JavaConverters._ + +/** + * Spark datasource based bulk insert node + * @param config1 + */ +class SparkBulkInsertNode(config1: Config) extends DagNode[RDD[WriteStatus]] { + + config = config1 + + /** + * Execute the {@link DagNode}. + * + * @param context The context needed for an execution of a node. + * @param curItrCount iteration count for executing the node. + * @throws Exception Thrown if the execution failed. + */ + override def execute(context: ExecutionContext, curItrCount: Int): Unit = { +if (!config.isDisableGenerate) { + //println("Generating input data for node {}", this.getName) Review comment: please remove the print comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node
n3nash commented on a change in pull request #3074: URL: https://github.com/apache/hudi/pull/3074#discussion_r651289412 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/configuration/DeltaConfig.java ## @@ -189,6 +190,10 @@ public boolean validateClean() { return Boolean.valueOf(configsMap.getOrDefault(VALIDATE_CLEAN, false).toString()); } +public boolean doEnableRowWriting() { Review comment: doEnableRowWriting -> doRowWriting or enableRowWriting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node
n3nash commented on a change in pull request #3074: URL: https://github.com/apache/hudi/pull/3074#discussion_r651289200 ## File path: hudi-integ-test/pom.xml ## @@ -407,7 +407,46 @@ + Review comment: Can you explain the need for this plugin ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.
codecov-commenter edited a comment on pull request #2819: URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (3deb5e7) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **increase** coverage by `4.98%`. > The diff coverage is `56.52%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2819 +/- ## + Coverage 50.04% 55.03% +4.98% - Complexity 3685 4033 +348 Files 526 527 +1 Lines 2546625477 +11 Branches 2886 2886 + Hits 1274414020+1276 + Misses1145410067-1387 - Partials 1268 1390 +122 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <0.00%> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `49.99% <61.11%> (-0.03%)` | :arrow_down: | | hudiflink | `60.58% <ø> (ø)` | | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.53% <100.00%> (ø)` | | | hudisync | `51.45% <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ain/java/org/apache/hudi/cli/utils/CommitUtil.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL0NvbW1pdFV0aWwuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=) | `65.21% <40.00%> (-1.60%)` | :arrow_down: | | [...mon/table/timeline/HoodieInstantTimeGenerator.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnRUaW1lR2VuZXJhdG9yLmphdmE=) | `69.23% <69.23%> (ø)` | | | [.../spark/sql/hudi/streaming/HoodieStreamSource.scala](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9zcGFyay9zcWwvaHVkaS9zdHJlYW1pbmcvSG9vZGllU3RyZWFtU291cmNlLnNjYWxh) | `67.46% <100.00%> (ø)` | | | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.
codecov-commenter edited a comment on pull request #2819: URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (3deb5e7) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **increase** coverage by `3.37%`. > The diff coverage is `56.52%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2819 +/- ## + Coverage 50.04% 53.42% +3.37% - Complexity 3685 3827 +142 Files 526 517 -9 Lines 2546624649 -817 Branches 2886 2833 -53 + Hits 1274413168 +424 + Misses1145410173-1281 - Partials 1268 1308 +40 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <0.00%> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `49.99% <61.11%> (-0.03%)` | :arrow_down: | | hudiflink | `60.58% <ø> (ø)` | | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.53% <100.00%> (ø)` | | | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ain/java/org/apache/hudi/cli/utils/CommitUtil.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL0NvbW1pdFV0aWwuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=) | `65.21% <40.00%> (-1.60%)` | :arrow_down: | | [...mon/table/timeline/HoodieInstantTimeGenerator.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnRUaW1lR2VuZXJhdG9yLmphdmE=) | `69.23% <69.23%> (ø)` | | | [.../spark/sql/hudi/streaming/HoodieStreamSource.scala](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9zcGFyay9zcWwvaHVkaS9zdHJlYW1pbmcvSG9vZGllU3RyZWFtU291cmNlLnNjYWxh) | `67.46% <100.00%> (ø)` | | | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter commented on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.
codecov-commenter commented on pull request #2819: URL: https://github.com/apache/hudi/pull/2819#issuecomment-860933035 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2819](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (3deb5e7) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **decrease** coverage by `41.60%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2819/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2819 +/- ## - Coverage 50.04% 8.43% -41.61% + Complexity 3685 62 -3623 Files 526 70 -456 Lines 254662880-22586 Branches 2886 359 -2527 - Hits 12744 243-12501 + Misses114542616 -8838 + Partials 1268 21 -1247 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.09% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2819?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | `0.00% <0.00%> (-97.83%)` | :arrow_down: | | [.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=) | `0.00% <0.00%> (-90.91%)` | :arrow_down: | | [...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/2819/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh) | `0.00% <0.00%> (-84.85%)` | :arrow_down: | |
[GitHub] [hudi] prashantwason commented on a change in pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.
prashantwason commented on a change in pull request #2819: URL: https://github.com/apache/hudi/pull/2819#discussion_r651210372 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -73,6 +71,16 @@ private static final Logger LOG = LogManager.getLogger(HoodieActiveTimeline.class); protected HoodieTableMetaClient metaClient; private static AtomicReference lastInstantTime = new AtomicReference<>(String.valueOf(Integer.MIN_VALUE)); + private static ThreadLocal commitFormatHolder = new ThreadLocal() { Review comment: Created a new class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dkapupara-te removed a comment on issue #2856: [SUPPORT] Metrics Prometheus pushgateway
dkapupara-te removed a comment on issue #2856: URL: https://github.com/apache/hudi/issues/2856#issuecomment-858026721 I am experiencing the same issue using simpleclient with Spring boot for one of our cron jobs. ```Caused by: java.net.UnknownHostException: https``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields
tandonraghav commented on issue #2919: URL: https://github.com/apache/hudi/issues/2919#issuecomment-860846241 @n3nash @vinothchandar I am seeing the entire schema is getting persisted in Glue **TBLPROPERTIES**. This was not the behaviour previously. Do we need schema there as well or we can have a config to switch it off? Hudi version - 0.9.0-SNAPSHOT hive> show create table max_ro; OK CREATE EXTERNAL TABLE `max_ro`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `string_pincode_113` string, `double_pincode_113` double, `string_availability_157` string, `string_availability2_169` string, `string_availability3_150` string, `string_availability4_158` string, `string_availability5_187` string, `string_availability6_150` string, `string_availability7_778` string, `string_availability8_192` string, `string_availability9_700` string, `string_availability10_131` string, `string_availability11_186` string, `string_availability12_878` string, `string_availability13_466` string, `id` string, `product_id` string, `catalog_id` string, `feed_id` string, `ts_ms` double, `op` string) PARTITIONED BY ( `db_name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='file:/tmp/test/hudi-user-data/max') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/test/hudi-user-data/max' TBLPROPERTIES ( 'last_commit_time_sync'='20210614221425', 'last_modified_by'='raghav', 'last_modified_time'='1623689081', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numPartCols'='1', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"string_pincode_113","type":"string","nullable":true,"metadata":{}},{"name":"double_pincode_113","type":"double","nullable":true,"metadata":{}},{"name":"string_availability_157","type":"string","nullable":true,"metadata":{}},{"name":"string_availability2_169","type":"string","nullable":true,"metadata":{}},{"name":"string_availability3_150","type":"string","nullable":true,"metadata":{}},{"name":"string_availability4_158","type":"string","nullable":true,"metadata":{}},{"name":"string_availability5_187","type":"string","nullable":true ,"metadata":{}},{"name":"string_availability6_150","type":"string","nullable":true,"metadata":{}},{"name":"string_availability7_778","type":"string","nullable":true,"metadata":{}},{"name":"string_availability8_192","type":"string","nullable":true,"metadata":{}},{"name":"string_availability9_700","type":"string","nullable":true,"metadata":{}},{"name":"string_availability10_131","type":"string","nullable":true,"metadata":{}},{"name":"string_availability11_186","type":"string","nullable":true,"metadata":{}},{"name":"string_availability12_878","type":"string","nullable":true,"metadata":{}},{"name":"string_availability13_466","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"string","nullable":true,"metadata":{}},{"name":"product_id","type":"string","nullable":true,"metadata":{}},{"name":"catalog_id","type":"string","nullable":true,"metadata":{}},{"name":"feed_id","type":"string","nullable":true,"metadata":{}},{"name":"ts_ms","type":"double","nullable":true,"metadata":{ }},{"name":"op","type":"string","nullable":true,"metadata":{}},{"name":"db_name","type":"string","nullable":true,"metadata":{}}]}', 'spark.sql.sources.schema.partCol.0'='db_name', 'transient_lastDdlTime'='1623689081') -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] tandonraghav opened a new issue #3078: [SUPPORT] combineAndGetUpdateValue is not getting called when Schema evolution happens
tandonraghav opened a new issue #3078: URL: https://github.com/apache/hudi/issues/3078 **Describe the problem you faced** We have a stream of partial records (Mongo CDC oplogs) and want to update the keys which are passed in partial records, and keeping the other values intact. A sample record in our CDC Kafka:- {"op":"u","doc_id":"606ffc3c10f9138e2a6b6csdc","shard_id":"shard01","ts":{"$timestamp":{"i":1,"t":1617951883}},"db_name":"test","collection":"Users","o":{"$v":1,"$set":{"key2":"value2"}},"o2":{"_id":{"$oid":"606ffc3c10f9138e2a6b6csdc"}}} **To Reproduce** Steps to reproduce the behavior: 1. Create a custom `PAYLOAD_CLASS_OPT_KEY` as described below. 2. Push some records (partial records) using Spark DF and persist in Hudi. 3. Evolve the schema, add a new field to the existing Schema and persist via Spark DF 4. `combineAndGetUpdateValue` is not getting called when Schema is evolved, which is making other values as NULL, as only partial record is getting passed and combine logic is present in custom class. However, this behaviour is not observed when Schema remains constant. **Expected behavior** Irrespective of Schema evolution, when compaction happens it should always go through `combineAndGetUpdateValue` of the class provided. **Environment Description** * Hudi version : 0.9.0-SNAPSHOT * Spark version : 2.4 * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** Custom Payload class- import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; import org.apache.avro.generic.IndexedRecord; import org.apache.hudi.avro.HoodieAvroUtils; import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload; import org.apache.hudi.common.util.Option; import java.io.IOException; import java.util.List; import java.util.Properties; public class MongoHudiCDCPayload extends OverwriteWithLatestAvroPayload { public MongoHudiCDCPayload(GenericRecord record, Comparable orderingVal) { super(record, orderingVal); } public MongoHudiCDCPayload(Option record) { super(record); } @Override public Option combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties) throws IOException { if (this.recordBytes.length == 0) { return Option.empty(); } GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(this.recordBytes, schema); GenericRecord currentRecord = (GenericRecord)currentValue; List fields = incomingRecord.getSchema().getFields(); fields.forEach((field)->{ Object value = incomingRecord.get(field.name()); if(value!=null) currentRecord.put(field.name(), value); }); return Option.of(currentRecord); } } **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha commented on a change in pull request #2542: Add ability to provide multi-region (global) data consistency across HMS in different regions
satishkotha commented on a change in pull request #2542: URL: https://github.com/apache/hudi/pull/2542#discussion_r651116518 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableGloballyConsistentMetaClient.java ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.table; + +import java.io.IOException; +import java.util.List; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hudi.common.fs.ConsistencyGuardConfig; +import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion; +import org.apache.hudi.common.table.timeline.HoodieTimeline; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.exception.TableNotFoundException; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; + +/* + * Uber specific version of HoodieTableMetaClient to make sure when a table level property is set + * to indicate a commit timestamp that is present across DC make sure to limit the local .hoodie + * timeline to upto that commit timestamp. + * + * Note: There is an assumption that this means every other commit + * that is present upto this commit is present globally. This assumption makes it easier to just + * trim the commit timeline at the head. Otherwise we will have to store the valid commit timeline + * in the table as a property. + * + * Note: This object should not be cached between mapreduce jobs since the jobConf can change + */ +public class HoodieTableGloballyConsistentMetaClient extends HoodieTableMetaClient { Review comment: @jsbali good question. I think it makes sense for read global to take precedence. include_pending is only used for dev testing, so we can add instructions to explicitly disable read global for these cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jsbali commented on a change in pull request #2542: Add ability to provide multi-region (global) data consistency across HMS in different regions
jsbali commented on a change in pull request #2542: URL: https://github.com/apache/hudi/pull/2542#discussion_r651114060 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableGloballyConsistentMetaClient.java ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.table; + +import java.io.IOException; +import java.util.List; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hudi.common.fs.ConsistencyGuardConfig; +import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion; +import org.apache.hudi.common.table.timeline.HoodieTimeline; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.exception.TableNotFoundException; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; + +/* + * Uber specific version of HoodieTableMetaClient to make sure when a table level property is set + * to indicate a commit timestamp that is present across DC make sure to limit the local .hoodie + * timeline to upto that commit timestamp. + * + * Note: There is an assumption that this means every other commit + * that is present upto this commit is present globally. This assumption makes it easier to just + * trim the commit timeline at the head. Otherwise we will have to store the valid commit timeline + * in the table as a property. + * + * Note: This object should not be cached between mapreduce jobs since the jobConf can change + */ +public class HoodieTableGloballyConsistentMetaClient extends HoodieTableMetaClient { Review comment: The file can be removed which I have done. There are few issues which are there if hoodie.consume is used for trimming the timeline. For example suppose the following confs are set. 1. read global as true 2. last_rep time as 100 3. include_pending commit true 4. commit time 200. Now the question is what takes precedence over what. Do we check if 3 and 4 are set before using consume.commit for global reads. What should the order be. For now I have kept both separate and global read takes precedence but happy to hear your thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2003) Auto Compute Compression ratio for input data to output parquet/orc file size
[ https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-2003: -- Summary: Auto Compute Compression ratio for input data to output parquet/orc file size (was: Auto Compute Compression) > Auto Compute Compression ratio for input data to output parquet/orc file size > - > > Key: HUDI-2003 > URL: https://issues.apache.org/jira/browse/HUDI-2003 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Vinay >Priority: Major > > Context : > Submitted a spark job to read 3-4B ORC records and wrote to Hudi format. > Creating the following table with all the runs that I had carried out based > on different options > > ||CONFIG ||Number of Files Created||Size of each file|| > |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB| > |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB| > |PARQUET_FILE_MAX_BYTES=1GB > COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=1GB > BULKINSERT_PARALLELISM=100|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB| > |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB| > Based on this runs, it feels that the compression ratio is off. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2003) Auto Compute Compression ratio for input data to output parquet/orc file size
[ https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-2003: -- Issue Type: Improvement (was: Bug) > Auto Compute Compression ratio for input data to output parquet/orc file size > - > > Key: HUDI-2003 > URL: https://issues.apache.org/jira/browse/HUDI-2003 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Vinay >Priority: Major > > Context : > Submitted a spark job to read 3-4B ORC records and wrote to Hudi format. > Creating the following table with all the runs that I had carried out based > on different options > > ||CONFIG ||Number of Files Created||Size of each file|| > |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB| > |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB| > |PARQUET_FILE_MAX_BYTES=1GB > COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=1GB > BULKINSERT_PARALLELISM=100|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB| > |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB| > Based on this runs, it feels that the compression ratio is off. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363067#comment-17363067 ] Nishith Agarwal commented on HUDI-1910: --- [~vinaypatil18] Yes, that makes sense, please go ahead. > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363053#comment-17363053 ] Ethan Guo commented on HUDI-1138: - We may consider blocking the requests for batching so that the timeline server sends the actual responses only after MARKERS are overwritten / updated. In this case, those files have the correct markers at the timeline server and rollback can be properly done. [~shivnarayan] Let me know if I miss anything regarding the marker-based rollback. > Re-implement marker files via timeline server > - > > Key: HUDI-1138 > URL: https://issues.apache.org/jira/browse/HUDI-1138 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Vinoth Chandar >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.9.0 > > > Even as you can argue that RFC-15/consolidated metadata, removes the need for > deleting partial files written due to spark task failures/stage retries. It > will still leave extra files inside the table (and users will pay for it > every month) and we need the marker mechanism to be able to delete these > partial files. > Here we explore if we can improve the current marker file mechanism, that > creates one marker file per data file written, by > Delegating the createMarker() call to the driver/timeline server, and have it > create marker metadata into a single file handle, that is flushed for > durability guarantees > > P.S: I was tempted to think Spark listener mechanism can help us deal with > failed tasks, but it has no guarantees. the writer job could die without > deleting a partial file. i.e it can improve things, but cant provide > guarantees -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] FelixKJose commented on issue #3054: [SUPPORT] Point query at hudi tables
FelixKJose commented on issue #3054: URL: https://github.com/apache/hudi/issues/3054#issuecomment-860685598 @n3nash Could you please give more details on how is it supported in Presto and Spark? I mean, do I have to provide some specific configurations etc and does it support in both MOR and COW table types? Why I am asking this is, RFC-7 (https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table) seems inactive and I haven't seen any documentation regarding pooin-in-time query support on HUDI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] karan867 opened a new issue #3077: [SUPPORT] Large latencies in hudi writes using upsert mode.
karan867 opened a new issue #3077: URL: https://github.com/apache/hudi/issues/3077 **Describe the problem you faced** I am currently working on a POC to integrate Hudi with our existing Data lake. I am seeing large latencies in hudi writes. It is almost 7x compared to the partitioned parquet writes we perform now. I am writing around 2.5 million rows (3.9 GB) in two batches with upsert mode. The first write gets completed in 2-3 mins. For the second batch, the latency is around 12-14 mins while writing with our existing system takes around 1.6-2 mins. The data contains negligible updates (99> % inserts and <1% updates). However in the rare case of duplicated trips we want to override the old data points with the new one. In our write use-case, most of the data will impact the recent partitions. Currently, for testing I am creating and writing to 5 partitions according to the probability distribution [10, 10, 10, 10, 60] Pyspark configs: conf = conf.set(“spark.driver.memory”, ‘6g’) conf = conf.set(“spark.executor.instances”, 8) conf = conf.set(“spark.executor.memory”, ‘4g’) conf = conf.set(“spark.executor.cores”, 4) Hudi options: hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': 'applicationId,userId,driverId,timestamp', 'hoodie.datasource.write.partitionpath.field': 'packet_date', 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator', 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.precombine.field': 'created_at_date', 'hoodie.upsert.shuffle.parallelism': 200, 'hoodie.insert.shuffle.parallelism': 200, 'hoodie.bloom.index.prune.by.ranges': 'false', 'hoodie.bloom.index.filter.type': 'DYNAMIC_V0', 'hoodie.index.bloom.num_entries': 3, 'hoodie.bloom.index.filter.dynamic.max.entries': 12, } **Environment Description** * Hudi version : 0.7.0 * Spark version : 2.4.7 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** In addition I have experimented with the following * Tried decreasing the file sizes * Increasing 'hoodie.bloom.index.parallelism' * Setting 'hoodie.metadata.enable' true. You can see the jobs taking the most time from the screen shot attached ![image](https://user-images.githubusercontent.com/85880633/121893160-489dd580-cd3b-11eb-8845-0459484a0406.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] fengjian428 commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
fengjian428 commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-860650106 @ssdong is there any quick fix can do in version 0.7.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2003) Auto Compute Compression
[ https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362925#comment-17362925 ] Vinay commented on HUDI-2003: - [~nishith29] Please do update the description if I have missed anything here > Auto Compute Compression > > > Key: HUDI-2003 > URL: https://issues.apache.org/jira/browse/HUDI-2003 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Vinay >Priority: Major > > Context : > Submitted a spark job to read 3-4B ORC records and wrote to Hudi format. > Creating the following table with all the runs that I had carried out based > on different options > > ||CONFIG ||Number of Files Created||Size of each file|| > |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB| > |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB| > |PARQUET_FILE_MAX_BYTES=1GB > COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=1GB > BULKINSERT_PARALLELISM=100|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB| > |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB| > Based on this runs, it feels that the compression ratio is off. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pratyakshsharma commented on pull request #3003: [HUDI-1939][WIP] Replace joda-time api with java8 new time api
pratyakshsharma commented on pull request #3003: URL: https://github.com/apache/hudi/pull/3003#issuecomment-860627743 Will circle back on this by tomorrow. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codecov-commenter edited a comment on pull request #2915: URL: https://github.com/apache/hudi/pull/2915#issuecomment-832697076 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2915](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (877103f) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **increase** coverage by `4.36%`. > The diff coverage is `91.17%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2915/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2915 +/- ## + Coverage 50.04% 54.40% +4.36% + Complexity 3685 444-3241 Files 526 72 -454 Lines 25466 3016 -22450 Branches 2886 375-2511 - Hits 12744 1641 -11103 + Misses11454 1221 -10233 + Partials 1268 154-1114 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `72.30% <91.17%> (+63.21%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../org/apache/hudi/utilities/sources/JdbcSource.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSmRiY1NvdXJjZS5qYXZh) | `90.62% <90.62%> (ø)` | | | [...ava/org/apache/hudi/utilities/SqlQueryBuilder.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1NxbFF1ZXJ5QnVpbGRlci5qYXZh) | `92.50% <92.50%> (ø)` | | | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | `0.00% <0.00%> (-97.83%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codecov-commenter edited a comment on pull request #2915: URL: https://github.com/apache/hudi/pull/2915#issuecomment-832697076 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2915](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (877103f) into [master](https://codecov.io/gh/apache/hudi/commit/769dd2d7c98558146eb4accb75b6d8e339ae6e0f?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (769dd2d) will **increase** coverage by `4.36%`. > The diff coverage is `91.17%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2915/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2915 +/- ## + Coverage 50.04% 54.40% +4.36% + Complexity 3685 444-3241 Files 526 72 -454 Lines 25466 3016 -22450 Branches 2886 375-2511 - Hits 12744 1641 -11103 + Misses11454 1221 -10233 + Partials 1268 154-1114 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `72.30% <91.17%> (+63.21%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2915?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../org/apache/hudi/utilities/sources/JdbcSource.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSmRiY1NvdXJjZS5qYXZh) | `90.62% <90.62%> (ø)` | | | [...ava/org/apache/hudi/utilities/SqlQueryBuilder.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1NxbFF1ZXJ5QnVpbGRlci5qYXZh) | `92.50% <92.50%> (ø)` | | | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2915/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | `0.00% <0.00%> (-97.83%)` | :arrow_down: | |
[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codope commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r650843643 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +throw new
[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codope commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r650815907 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/SqlQueryBuilder.java ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities; + +import org.apache.hudi.common.util.StringUtils; + +/** + * Fluent SQL query builder. + * Current support for: SELECT, FROM, JOIN, ON, WHERE, ORDER BY, LIMIT clauses. + */ +public class SqlQueryBuilder { + + private StringBuilder sqlBuilder; + + private SqlQueryBuilder(StringBuilder sqlBuilder) { +this.sqlBuilder = sqlBuilder; + } + + /** + * Creates a SELECT query. + * + * @param columns The column names to select. + * @return The new {@link SqlQueryBuilder} instance. + */ + public static SqlQueryBuilder select(String... columns) { +if (columns == null || columns.length == 0) { + throw new IllegalArgumentException(); +} + +StringBuilder sqlBuilder = new StringBuilder(); +sqlBuilder.append("select "); +sqlBuilder.append(String.join(", ", columns)); + +return new SqlQueryBuilder(sqlBuilder); + } + + /** + * Appends a FROM clause to a query. + * + * @param tables The table names to select from. + * @return The {@link SqlQueryBuilder} instance. + */ + public SqlQueryBuilder from(String... tables) { +if (tables == null || tables.length == 0) { + throw new IllegalArgumentException(); +} + +sqlBuilder.append(" from "); +sqlBuilder.append(String.join(", ", tables)); Review comment: Added a subtask to take it up after we land this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codope commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r650811408 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; +import org.jetbrains.annotations.NotNull; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +
[jira] [Created] (HUDI-2012) Add source table or column validations.
Sagar Sumit created HUDI-2012: - Summary: Add source table or column validations. Key: HUDI-2012 URL: https://issues.apache.org/jira/browse/HUDI-2012 Project: Apache Hudi Issue Type: Sub-task Reporter: Sagar Sumit Assignee: Sagar Sumit Based on the comment https://github.com/apache/hudi/pull/2915#discussion_r627851195 we need to validate the incremental column for its datatype. for eg, what incase a byte[] column was chosen as incremental column. Also, another validation is to check if the column exists in the table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2011) Parallel data sync for JDBC source
Sagar Sumit created HUDI-2011: - Summary: Parallel data sync for JDBC source Key: HUDI-2011 URL: https://issues.apache.org/jira/browse/HUDI-2011 Project: Apache Hudi Issue Type: Sub-task Reporter: Sagar Sumit Assignee: Sagar Sumit Compute upsert/insert/bulk_insert parallelism according to table size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2010) Add support for multi table sync
Sagar Sumit created HUDI-2010: - Summary: Add support for multi table sync Key: HUDI-2010 URL: https://issues.apache.org/jira/browse/HUDI-2010 Project: Apache Hudi Issue Type: Sub-task Reporter: Sagar Sumit Assignee: Sagar Sumit In the first phase we added single table sync for JDBC source: [https://github.com/apache/hudi/pull/2915] With HoodieMultiTableDeltaStreamer, we need to support multi table sync. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codope commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r650804454 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; +import org.jetbrains.annotations.NotNull; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +
[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codope commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r650797408 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +throw new
[GitHub] [hudi] leesf commented on a change in pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path
leesf commented on a change in pull request #3075: URL: https://github.com/apache/hudi/pull/3075#discussion_r650783735 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java ## @@ -105,4 +110,9 @@ public HoodieTable getHoodieTable() { public WriteOperationType getWriteOperationType() { return operationType; } + + @VisibleForTesting Review comment: is this annotation really needed or change `private` to `protected/public`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
codope commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r650783802 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJdbcSource.java ## @@ -0,0 +1,442 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.testutils.HoodieTestDataGenerator; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.testutils.UtilitiesTestBase; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.SQLException; +import java.util.stream.Collectors; + +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.clearAndInsert; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.close; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.count; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.insert; +import static org.apache.hudi.utilities.testutils.JdbcTestUtils.update; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.fail; + +/** + * Tests {@link JdbcSource}. + */ +public class TestJdbcSource extends UtilitiesTestBase { + + private static final TypedProperties PROPS = new TypedProperties(); + private static final HoodieTestDataGenerator DATA_GENERATOR = new HoodieTestDataGenerator(); + private static Connection connection; + + @BeforeEach + public void setup() throws Exception { +super.setup(); +PROPS.setProperty("hoodie.deltastreamer.jdbc.url", "jdbc:h2:mem:test_mem"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.driver.class", "org.h2.Driver"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.user", "test"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.password", "jdbc"); +PROPS.setProperty("hoodie.deltastreamer.jdbc.table.name", "triprec"); +connection = DriverManager.getConnection("jdbc:h2:mem:test_mem", "test", "jdbc"); + } + + @AfterEach + public void teardown() throws Exception { +super.teardown(); +close(connection); + } + + @Test + public void testSingleCommit() { +PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true"); + PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", "last_insert"); + +try { + int numRecords = 100; + String commitTime = "000"; + + // Insert 100 records with commit time + clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, PROPS); + + // Validate if we have specified records in db + assertEquals(numRecords, count(connection, "triprec")); + + // Start JdbcSource + Dataset rowDataset = runSource(Option.empty(), numRecords).getBatch().get(); + assertEquals(numRecords, rowDataset.count()); +} catch (SQLException e) { + fail(e.getMessage()); +} + } + + @Test + public void testInsertAndUpdate() { +PROPS.setProperty("hoodie.deltastreamer.jdbc.incremental.pull", "true"); + PROPS.setProperty("hoodie.deltastreamer.jdbc.table.incremental.column.name", "last_insert"); + +try { + final String commitTime = "000"; + final int numRecords = 100; + + // Add 100 records. Update half of them with commit time "007". + update("007", + clearAndInsert(commitTime, numRecords, connection, DATA_GENERATOR, PROPS) + .stream() + .limit(50) +
[GitHub] [hudi] leesf commented on a change in pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path
leesf commented on a change in pull request #3075: URL: https://github.com/apache/hudi/pull/3075#discussion_r650783735 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java ## @@ -105,4 +110,9 @@ public HoodieTable getHoodieTable() { public WriteOperationType getWriteOperationType() { return operationType; } + + @VisibleForTesting Review comment: is this annotation really needed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage
[ https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-2004. - Resolution: Done Done - 769dd2d7c98558146eb4accb75b6d8e339ae6e0f > Move KafkaOffsetGen.CheckpointUtils test cases to independent class and > improve coverage > > > Key: HUDI-2004 > URL: https://issues.apache.org/jira/browse/HUDI-2004 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > > Currently KafkaOffsetGen.CheckpointUtils test cases are present in > TestKafkaSource which starts up hdfs, hive,zk service locally. This is not > required for CheckpointUtils test cases, hence should be moved to independent > test case of its own > > Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not > unit tested currently -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated (7d9f9d7 -> 769dd2d)
This is an automated email from the ASF dual-hosted git repository. leesf pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 7d9f9d7 [HUDI-1991] Fixing drop dups exception in bulk insert row writer path (#3055) add 769dd2d [HUDI-2004] Move CheckpointUtils test cases to independant class (#3072) No new revisions were added by this update. Summary of changes: .../utilities/sources/helpers/KafkaOffsetGen.java | 4 +- .../hudi/utilities/sources/TestKafkaSource.java| 66 --- .../sources/helpers/TestCheckpointUtils.java | 126 + 3 files changed, 128 insertions(+), 68 deletions(-) create mode 100644 hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/helpers/TestCheckpointUtils.java
[GitHub] [hudi] leesf merged pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class
leesf merged pull request #3072: URL: https://github.com/apache/hudi/pull/3072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362688#comment-17362688 ] Vinay edited comment on HUDI-1910 at 6/14/21, 9:09 AM: --- [~nishith29] Make sense, so you are suggesting to include COMMIT_OFFSET_TO_KAFKA config in KafkaOffsetGen.Config class so that users can include it in property file like we pass topic name. And then use it here - [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474] and call commitOffsetToKafka function. is that correct ? If this approach looks good, I can test this change out and create a PR was (Author: vinaypatil18): [~nishith29] Make sense, so you suggesting to include COMMIT_OFFSET_TO_KAFKA config in KafkaOffsetGen.Config class so that users can include it in property file like we pass topic name. And then use it here - [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474] and call commitOffsetToKafka function. If this approach looks good, I can test this change out and create a PR > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] deep-teliacompany commented on issue #2970: [SUPPORT] Failed to upsert for commit time
deep-teliacompany commented on issue #2970: URL: https://github.com/apache/hudi/issues/2970#issuecomment-860487690 Hi, does Hudi 0.8.0 supports concurrency or from which version concurrecy is supported?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node
nsivabalan opened a new pull request #3074: URL: https://github.com/apache/hudi/pull/3074 ## What is the purpose of the pull request - Fixing hudi test suite for spark nodes and adding spark bulk_insert node - Fixed spark nodes in hudi test suite infra ## Brief change log - Fixing hudi test suite for spark nodes and adding spark bulk_insert node - Fixed spark nodes in hudi test suite infra - Added config to enable row writing ## Verify this pull request - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #3073: [HUDI-2006] Adding more yaml templates to test suite
codecov-commenter commented on pull request #3073: URL: https://github.com/apache/hudi/pull/3073#issuecomment-860281795 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3073](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e039f49) into [master](https://codecov.io/gh/apache/hudi/commit/0d0dc6fb07e0c5496224c75052ab4f43d57b40f6?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0d0dc6f) will **decrease** coverage by `46.70%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3073/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3073 +/- ## - Coverage 55.14% 8.43% -46.71% + Complexity 3866 62 -3804 Files 488 70 -418 Lines 236192880-20739 Branches 2528 359 -2169 - Hits 13024 243-12781 + Misses 94372616 -6821 + Partials 1158 21 -1137 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.79% <ø> (-39.81%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.09% <ø> (-61.79%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3073?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3073/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` |
[GitHub] [hudi] codecov-commenter commented on pull request #3070: [HUDI-2002] Modify the log level to ERROR
codecov-commenter commented on pull request #3070: URL: https://github.com/apache/hudi/pull/3070#issuecomment-860067614 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3070?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3070](https://codecov.io/gh/apache/hudi/pull/3070?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1368d1c) into [master](https://codecov.io/gh/apache/hudi/commit/673d62f3c3ab07abb3fcd319607e657339bc0682?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (673d62f) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3070/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3070?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff@@ ## master #3070 +/- ## Coverage 8.43% 8.43% Complexity 62 62 Files70 70 Lines 28802880 Branches359 359 Hits243 243 Misses 26162616 Partials 21 21 ``` | Flag | Coverage Δ | | |---|---|---| | hudiclient | `?` | | | hudisync | `6.79% <ø> (ø)` | | | hudiutilities | `9.09% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan edited a comment on pull request #3003: [HUDI-1939][WIP] Replace joda-time api with java8 new time api
xushiyan edited a comment on pull request #3003: URL: https://github.com/apache/hudi/pull/3003#issuecomment-860084194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan merged pull request #2967: [HUDI-1766] Added blog for Hudi cleaner service
nsivabalan merged pull request #2967: URL: https://github.com/apache/hudi/pull/2967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] arun990 closed issue #3069: [SUPPORT] presto query error on hudi table - hudi.hadoop.HoodieParquetInputFormat
arun990 closed issue #3069: URL: https://github.com/apache/hudi/issues/3069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] veenaypatil commented on a change in pull request #3066: [HUDI-1997] Adding Note for explicitly setting HIVE_AUTO_CREATE_DATABASE
veenaypatil commented on a change in pull request #3066: URL: https://github.com/apache/hudi/pull/3066#discussion_r650388442 ## File path: content/docs/configurations.html ## @@ -524,7 +524,7 @@ HIVE_USE_JDBC_OPT_KEY HIVE_AUTO_CREATE_DATABASE_OPT_KEY Property: hoodie.datasource.hive_sync.auto_create_database Default: true - Auto create hive database if does not exists + Auto create hive database if does not exists. Note: for versions 0.7 and 0.8 you will have to explicitly set this to true Review comment: @leesf oh ok, updated the .md file now, pls check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] arun990 opened a new issue #3069: [SUPPORT] presto query error on hudi table - hudi.hadoop.HoodieParquetInputFormat
arun990 opened a new issue #3069: URL: https://github.com/apache/hudi/issues/3069 Hi, Hudi table is working with spark and Hive. But when queried the hudi table using presto giving error: error: "Unable to create input format org.apache.hudi.hadoop.HoodieParquetInputFormat" Presto verions is : Presto CLI dataproc-tag-340-3 Kept the hudi presto bundle jar at /usr/lib/presto/plugin/hive-hadoop2 as well and hive-site xml has the input format set as org.apache.hudi.hadoop.HoodieParquetInputFormat. Please advise. Regards Arun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class
codecov-commenter edited a comment on pull request #3072: URL: https://github.com/apache/hudi/pull/3072#issuecomment-860107100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1766) Write a detailed blog for HoodieCleaner
[ https://issues.apache.org/jira/browse/HUDI-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1766: - Labels: pull-request-available (was: ) > Write a detailed blog for HoodieCleaner > --- > > Key: HUDI-1766 > URL: https://issues.apache.org/jira/browse/HUDI-1766 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > Labels: pull-request-available > > Cleaner plays a very critical role in helping user achieve writer and reader > isolation. We need a blog to highlight its configurations properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2006) Add more yamls to test suite
[ https://issues.apache.org/jira/browse/HUDI-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2006: - Labels: pull-request-available (was: ) > Add more yamls to test suite > - > > Key: HUDI-2006 > URL: https://issues.apache.org/jira/browse/HUDI-2006 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Add more yaml files to test suite job suite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xushiyan merged pull request #3070: [HUDI-2002] Modify HiveIncrementalPuller log level to ERROR
xushiyan merged pull request #3070: URL: https://github.com/apache/hudi/pull/3070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class
codecov-commenter commented on pull request #3072: URL: https://github.com/apache/hudi/pull/3072#issuecomment-860107100 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3072](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (da5450a) into [master](https://codecov.io/gh/apache/hudi/commit/ba728d822f733cf1978b95c6b6af793fdf041088?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (ba728d8) will **decrease** coverage by `1.45%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3072/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3072 +/- ## - Coverage 55.37% 53.92% -1.46% + Complexity 4029 3416 -613 Files 521 441 -80 Lines 2531221604-3708 Branches 2873 2461 -412 - Hits 1401711649-2368 + Misses 9907 8793-1114 + Partials 1388 1162 -226 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.03% <ø> (ø)` | | | hudiflink | `63.03% <ø> (ø)` | | | hudihadoopmr | `51.43% <ø> (ø)` | | | hudisparkdatasource | `66.51% <ø> (ø)` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3072?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../src/main/java/org/apache/hudi/dla/util/Utils.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL3V0aWwvVXRpbHMuamF2YQ==) | | | | [...a/org/apache/hudi/utilities/sources/SqlSource.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvU3FsU291cmNlLmphdmE=) | | | | [...i/hive/SlashEncodedDayPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkRGF5UGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | | | | [...java/org/apache/hudi/hive/util/HiveSchemaUtil.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9IaXZlU2NoZW1hVXRpbC5qYXZh) | | | | [.../hudi/utilities/schema/RowBasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9Sb3dCYXNlZFNjaGVtYVByb3ZpZGVyLmphdmE=) | | | | [...g/apache/hudi/timeline/service/RequestHandler.java](https://codecov.io/gh/apache/hudi/pull/3072/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvUmVxdWVzdEhhbmRsZXIuamF2YQ==) | | | |
[GitHub] [hudi] nsivabalan opened a new pull request #3073: [HUDI-2006] Adding more yaml templates to test suite
nsivabalan opened a new pull request #3073: URL: https://github.com/apache/hudi/pull/3073 ## What is the purpose of the pull request Added more yamls to test suite framework. Default: sanity.yml Optional yamls medium_test_suite.yaml: medium sized test suite which validates entire input for N no of rounds long_test_suite.yaml: long running test suite which validates input after every round and deletes the input data. So that we can scale this test for larger iterations if required. clustering.yaml : tests clustering. ## Brief change log - Added more yamls to test suite framework. ## Verify this pull request ``` ./generate_test_suite.sh --execute_test_suite false --include_medium_test_suite_yaml true --include_long_test_suite_yaml true --include_cluster_yaml true Include Medium test suite true Medium test suite iterations = 20 Include Long test suite true Long test suite iterations = 50 Intermittent delay in mins = 1 Table type = COPY_ON_WRITE Include cluster yaml true Cluster total itr count 30 Cluster delay mins 1 Cluster exec itr count 15 Jar name hudi-integ-test-bundle-0.9.0-SNAPSHOT.jar Input path \/user\/hive\/warehouse\/hudi-integ-test-suite\/input\/ Output path \/user\/hive\/warehouse\/hudi-integ-test-suite\/output\/ Cleaning up staging dir Creating staging dir ``` Once executed above command, generated files can be found in staging folder. ``` ls demo/config/test-suite/staging clustering.yaml clustering_spark_command.sh long_test_suite.yaml long_test_suite_spark_command.sh medium_test_suite.yaml medium_test_suite_spark_command.sh sanity.yaml sanity_spark_command.sh test.properties ``` You can run the same command w/o "--execute_test_suite false" which will go ahead and execute the yamls included. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 merged pull request #3046: [HUDI-1984] Support independent flink hudi compaction function
garyli1019 merged pull request #3046: URL: https://github.com/apache/hudi/pull/3046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] veenaypatil commented on pull request #3072: [HUDI-2004] Move CheckpointUtils test cases to independant class
veenaypatil commented on pull request #3072: URL: https://github.com/apache/hudi/pull/3072#issuecomment-860343956 @n3nash @yanghua can you please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3073: [HUDI-2006] Adding more yaml templates to test suite
codecov-commenter edited a comment on pull request #3073: URL: https://github.com/apache/hudi/pull/3073#issuecomment-860281795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] arun990 commented on issue #3069: [SUPPORT] presto query error on hudi table - hudi.hadoop.HoodieParquetInputFormat
arun990 commented on issue #3069: URL: https://github.com/apache/hudi/issues/3069#issuecomment-860143649 Hi, restart helped after setting HoodieParquetInputFormat. closing this. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path
codecov-commenter commented on pull request #3075: URL: https://github.com/apache/hudi/pull/3075#issuecomment-860357081 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3075](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (ed81ede) into [master](https://codecov.io/gh/apache/hudi/commit/7d9f9d7d8241bfb70d50c557b0194cc8a87b6ee7?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (7d9f9d7) will **decrease** coverage by `46.60%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3075/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3075 +/- ## - Coverage 55.04% 8.43% -46.61% + Complexity 4029 62 -3967 Files 526 70 -456 Lines 254662880-22586 Branches 2886 359 -2527 - Hits 14018 243-13775 + Misses100572616 -7441 + Partials 1391 21 -1370 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `6.79% <ø> (-44.66%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.09% <ø> (-61.87%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3075?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3075/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` |
[GitHub] [hudi] calleo commented on issue #2975: [SUPPORT] Read record using index
calleo commented on issue #2975: URL: https://github.com/apache/hudi/issues/2975#issuecomment-860014912 Will give this a try. Thanks all for helping out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3075: [HUDI-2009] Fixing extra commit metadata in row writer path
codecov-commenter edited a comment on pull request #3075: URL: https://github.com/apache/hudi/pull/3075#issuecomment-860357081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1976: - Labels: pull-request-available (was: ) > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > > > Key: HUDI-1976 > URL: https://issues.apache.org/jira/browse/HUDI-1976 > Project: Apache Hudi > Issue Type: Task > Components: Hive Integration >Reporter: Nishith Agarwal >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/2827] > [https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on pull request #3010: Improving Hudi CLI tool docs
nsivabalan commented on pull request #3010: URL: https://github.com/apache/hudi/pull/3010#issuecomment-860282275 do ping me here once the patch is ready to be reviewed again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jintaoguan commented on pull request #2999: [HUDI-764] [HUDI-765] ORC reader writer Implementation
jintaoguan commented on pull request #2999: URL: https://github.com/apache/hudi/pull/2999#issuecomment-859974391 @leesf We have an umbrella ticket [HUDI-57](https://issues.apache.org/jira/browse/HUDI-57 ) that contains all the subtasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] veenaypatil opened a new pull request #3071: [WIP] [HUDI-1976] Resolve vulnerability
veenaypatil opened a new pull request #3071: URL: https://github.com/apache/hudi/pull/3071 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #3003: [HUDI-1939][WIP] Replace joda-time api with java8 new time api
xushiyan commented on pull request #3003: URL: https://github.com/apache/hudi/pull/3003#issuecomment-860084194 Good to see this effort revived! We had some earlier discussion, worth consideration while migrating this. https://lists.apache.org/thread.html/rdfd91fc5e8e76a7434da0975141b0629411d507ce804236596b69ede%40%3Cdev.hudi.apache.org%3E cc @pratyakshsharma do you also want to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2967: Added blog for Hudi cleaner service
nsivabalan commented on pull request #2967: URL: https://github.com/apache/hudi/pull/2967#issuecomment-860282110 awesome, thanks for your contribution. This will definitely benefit the community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf merged pull request #3066: [HUDI-1997] Adding Note for explicitly setting HIVE_AUTO_CREATE_DATABASE
leesf merged pull request #3066: URL: https://github.com/apache/hudi/pull/3066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] veenaypatil commented on pull request #3071: [WIP] [HUDI-1976] Resolve vulnerability
veenaypatil commented on pull request #3071: URL: https://github.com/apache/hudi/pull/3071#issuecomment-860088720 The build is passing locally but there are many overlapping classes warnings ``` jackson-annotations-2.6.7.jar, hive-exec-2.3.9.jar define 58 overlapping classes: parquet-avro-1.11.1.jar, parquet-column-1.11.1.jar define 145 overlapping classes: joda-time-2.9.9.jar, hive-exec-2.3.9.jar define 246 overlapping classes: parquet-format-structures-1.11.1.jar, hive-exec-2.3.9.jar define 81 overlapping classes: parquet-encoding-1.11.1.jar, hive-exec-2.3.9.jar define 169 overlapping classes: parquet-common-1.11.1.jar, hive-exec-2.3.9.jar define 44 overlapping classes: avro-1.10.0.jar, hive-exec-2.3.9.jar define 346 overlapping classes: jackson-core-2.6.7.jar, hive-exec-2.3.9.jar define 93 overlapping classes: parquet-hadoop-1.11.1.jar, parquet-avro-1.11.1.jar, parquet-column-1.11.1.jar define 50 overlapping classes: hive-exec-2.3.9.jar, jackson-databind-2.9.10.8.jar define 549 overlapping classes: ``` Also, this needs to be tested properly to ensure that there are no dependency conflict issues arising because of this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #3066: [HUDI-1997] Adding Note for explicitly setting HIVE_AUTO_CREATE_DATABASE
leesf commented on a change in pull request #3066: URL: https://github.com/apache/hudi/pull/3066#discussion_r650379858 ## File path: content/docs/configurations.html ## @@ -524,7 +524,7 @@ HIVE_USE_JDBC_OPT_KEY HIVE_AUTO_CREATE_DATABASE_OPT_KEY Property: hoodie.datasource.hive_sync.auto_create_database Default: true - Auto create hive database if does not exists + Auto create hive database if does not exists. Note: for versions 0.7 and 0.8 you will have to explicitly set this to true Review comment: @veenaypatil you should update the .md file instead of html file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #3055: [HUDI-1991] Fixing drop dups exception in bulk insert row writer path
nsivabalan commented on a change in pull request #3055: URL: https://github.com/apache/hudi/pull/3055#discussion_r650395149 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala ## @@ -181,8 +181,45 @@ class HoodieSparkSqlWriterSuite extends FunSuite with Matchers { val path = java.nio.file.Files.createTempDirectory("hoodie_test_path") try { - val sqlContext = session.sqlContext - val sc = session.sparkContext + val hoodieFooTableName = "hoodie_foo_tbl" + + //create a new table + val fooTableModifier = Map("path" -> path.toAbsolutePath.toString, +HoodieWriteConfig.TABLE_NAME -> hoodieFooTableName, +DataSourceWriteOptions.TABLE_TYPE_OPT_KEY -> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, +"hoodie.bulkinsert.shuffle.parallelism" -> "4", +DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL, +DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY -> "true", +INSERT_DROP_DUPS_OPT_KEY -> "true", +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "partition", +DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> "org.apache.hudi.keygen.SimpleKeyGenerator") + val fooTableParams = HoodieWriterUtils.parametersWithWriteDefaults(fooTableModifier) + + // generate the inserts + val schema = DataSourceTestUtils.getStructTypeExampleSchema + val structType = AvroConversionUtils.convertAvroSchemaToStructType(schema) + val records = DataSourceTestUtils.generateRandomRows(100) + val recordsSeq = convertRowListToSeq(records) + val df = spark.createDataFrame(sc.parallelize(recordsSeq), structType) + // write to Hudi + HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append, fooTableParams, df) + fail("Drop duplicates with bulk insert in row writing should have thrown exception") +} catch { + case e: HoodieException => println("Dropping duplicates with bulk_insert in row writer path is not supported yet") Review comment: my bad, some copy paste mistake. will fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2008) Add an annotation to suppress the compiler warnings
[ https://issues.apache.org/jira/browse/HUDI-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2008: - Labels: pull-request-available (was: ) > Add an annotation to suppress the compiler warnings > --- > > Key: HUDI-2008 > URL: https://issues.apache.org/jira/browse/HUDI-2008 > Project: Apache Hudi > Issue Type: Improvement > Components: Utilities >Reporter: Wei >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362691#comment-17362691 ] Vinay edited comment on HUDI-1975 at 6/14/21, 7:54 AM: --- [~nishith29] Updated the metrics.version in pom to 3.1.2 , the build fails with {code:java} /hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49] cannot find symbol {code} MetricsRegistry does not have gauge method in 3.1.2 version, this is part of metrics-core dependency. There is a workaround of doing so here - [https://github.com/eclipse/microprofile-metrics/issues/244] was (Author: vinaypatil18): Updated the metrics.version in pom to 3.1.2 , the build fails with {code:java} /hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49] cannot find symbol {code} MetricsRegistry does not gauge method in 3.1.2 version, this is part of metrics-core dependency. There is a workaround of doing so here - [https://github.com/eclipse/microprofile-metrics/issues/244] > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > > Find more details here -> https://github.com/apache/hudi/issues/2774 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3076: [HUDI-2008] Add an annotation to suppress the compiler warnings
codecov-commenter edited a comment on pull request #3076: URL: https://github.com/apache/hudi/pull/3076#issuecomment-860358006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org