[GitHub] [hudi] yuzhaojing edited a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
yuzhaojing edited a comment on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853583886 > > @wangxianghu please review this pr > > thanks, will review soon Thanks for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
yuzhaojing commented on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853583886 > > @wangxianghu please review this pr > > thanks, will review soon Thranks for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
wangxianghu commented on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853581984 > @wangxianghu please review this pr thanks, will review soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
yuzhaojing commented on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853579575 @wangxianghu please review this pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing removed a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
yuzhaojing removed a comment on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853539076 > 1、ttl expired key shoud not be loaded to index state again. I will add judgment between instant and ttl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356174#comment-17356174 ] Vinay commented on HUDI-1148: - [~vbalaji] I can take a look at this, looking at the code we printing the entire hadoop conf ``` LOG.info(String.format("Hadoop Configuration: fs.defaultFS: [%s], Config:[%s], FileSystem: [%s]", conf.getRaw("fs.defaultFS"), conf.toString(), fs.toString())); ``` Will make this a debug log. Also, will check the logs while running with HudiDeltaStreamer and writing to CoW/MoR table > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1148: Status: In Progress (was: Open) > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1148: --- Assignee: Vinay > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState
codecov-commenter edited a comment on pull request #3026: URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState
codecov-commenter edited a comment on pull request #3026: URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3026](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1d0a065) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **increase** coverage by `15.69%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3026/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3026 +/- ## = + Coverage 55.13% 70.83% +15.69% + Complexity 3864 385 -3479 = Files 487 54 -433 Lines 23608 2016-21592 Branches 2527 241 -2286 = - Hits 13016 1428-11588 + Misses 9437 454 -8983 + Partials 1155 134 -1021 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.84% <0.00%> (-0.34%)` | :arrow_down: | | [...e/hudi/common/table/log/block/HoodieDataBlock.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEYXRhQmxvY2suamF2YQ==) | | | | [...di/hadoop/realtime/HoodieRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyLmphdmE=) | | | | [...rg/apache/hudi/common/model/HoodieAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUF2cm9QYXlsb2FkLmphdmE=) | | | | [...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh) | | | | [...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=) | | | |
[GitHub] [hudi] loukey-lj removed a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
loukey-lj removed a comment on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853546766 2、The task must perform a data load each time it recovers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState
codecov-commenter edited a comment on pull request #3026: URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3026](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1d0a065) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **increase** coverage by `15.69%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3026/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3026 +/- ## = + Coverage 55.13% 70.83% +15.69% + Complexity 3864 385 -3479 = Files 487 54 -433 Lines 23608 2016-21592 Branches 2527 241 -2286 = - Hits 13016 1428-11588 + Misses 9437 454 -8983 + Partials 1155 134 -1021 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.84% <0.00%> (-0.34%)` | :arrow_down: | | [...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==) | | | | [...java/org/apache/hudi/util/AvroSchemaConverter.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL0F2cm9TY2hlbWFDb252ZXJ0ZXIuamF2YQ==) | | | | [...udi/timeline/service/handlers/BaseFileHandler.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvQmFzZUZpbGVIYW5kbGVyLmphdmE=) | | | | [...ache/hudi/common/table/timeline/HoodieInstant.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnQuamF2YQ==) | | | | [...g/apache/hudi/timeline/service/RequestHandler.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvUmVxdWVzdEhhbmRsZXIuamF2YQ==) | | | |
[GitHub] [hudi] loukey-lj commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
loukey-lj commented on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853546766 2、The task must perform a data load each time it recovers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj removed a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
loukey-lj removed a comment on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853538379 1、ttl expired key shoud not be loaded to index state again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState
codecov-commenter commented on pull request #3026: URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3026](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (1d0a065) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **decrease** coverage by `45.85%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3026/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3026 +/- ## - Coverage 55.13% 9.27% -45.86% + Complexity 3864 48 -3816 Files 487 54 -433 Lines 236082016-21592 Branches 2527 241 -2286 - Hits 13016 187-12829 + Misses 94371816 -7621 + Partials 1155 13 -1142 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `9.27% <ø> (-61.61%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[jira] [Closed] (HUDI-1956) BucketAssignFunction use ValueState instead of MapState
[ https://issues.apache.org/jira/browse/HUDI-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-1956. Resolution: Duplicate > BucketAssignFunction use ValueState instead of MapState > --- > > Key: HUDI-1956 > URL: https://issues.apache.org/jira/browse/HUDI-1956 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 0.9.0 > > > Use the value state to reduce the memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] danny0405 opened a new pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState
danny0405 opened a new pull request #3026: URL: https://github.com/apache/hudi/pull/3026 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
yuzhaojing commented on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853539076 > 1、ttl expired key shoud not be loaded to index state again. I will add judgment between instant and ttl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj edited a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
loukey-lj edited a comment on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853538379 1、ttl expired key shoud not be loaded to index state again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
loukey-lj commented on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-853538379 1、ttl expired shoud not load to index state again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
codecov-commenter edited a comment on pull request #3025: URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
codecov-commenter edited a comment on pull request #3025: URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3025](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0ec9a07) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **increase** coverage by `15.69%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3025/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3025 +/- ## = + Coverage 55.13% 70.83% +15.69% + Complexity 3864 385 -3479 = Files 487 54 -433 Lines 23608 2016-21592 Branches 2527 241 -2286 = - Hits 13016 1428-11588 + Misses 9437 454 -8983 + Partials 1155 134 -1021 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.84% <0.00%> (-0.34%)` | :arrow_down: | | [...n/java/org/apache/hudi/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0RlZmF1bHRTb3VyY2UuamF2YQ==) | | | | [...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh) | | | | [...he/hudi/exception/HoodieNotSupportedException.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZU5vdFN1cHBvcnRlZEV4Y2VwdGlvbi5qYXZh) | | | | [...spark/src/main/scala/org/apache/hudi/package.scala](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL3BhY2thZ2Uuc2NhbGE=) | | | | [...g/apache/hudi/common/table/log/LogReaderUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Mb2dSZWFkZXJVdGlscy5qYXZh) | | | |
[jira] [Created] (HUDI-1956) BucketAssignFunction use ValueState instead of MapState
Danny Chen created HUDI-1956: Summary: BucketAssignFunction use ValueState instead of MapState Key: HUDI-1956 URL: https://issues.apache.org/jira/browse/HUDI-1956 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.9.0 Use the value state to reduce the memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-commenter edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-commenter edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (fd115f8) into [master](https://codecov.io/gh/apache/hudi/commit/dcd7c331dc72df9ab10e4867a3592faf89f1480b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (dcd7c33) will **increase** coverage by `15.67%`. > The diff coverage is `n/a`. > :exclamation: Current head fd115f8 differs from pull request most recent head f2af1e0. Consider uploading reports for the commit f2af1e0 to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2645 +/- ## = + Coverage 55.15% 70.83% +15.67% + Complexity 3851 385 -3466 = Files 485 54 -431 Lines 23542 2016-21526 Branches 2522 241 -2281 = - Hits 12985 1428-11557 + Misses 9405 454 -8951 + Partials 1152 134 -1018 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.84% <0.00%> (-0.34%)` | :arrow_down: | | [...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=) | | | | [...on/table/view/SpillableMapBasedFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvU3BpbGxhYmxlTWFwQmFzZWRGaWxlU3lzdGVtVmlldy5qYXZh) | | | | [...spark/src/main/scala/org/apache/hudi/package.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL3BhY2thZ2Uuc2NhbGE=) | | | | [...pache/hudi/common/model/HoodieMetadataWrapper.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZU1ldGFkYXRhV3JhcHBlci5qYXZh) | | | |
[GitHub] [hudi] loukey-lj closed pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state
loukey-lj closed pull request #2994: URL: https://github.com/apache/hudi/pull/2994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
codecov-commenter edited a comment on pull request #3025: URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3025](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0ec9a07) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **increase** coverage by `15.69%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3025/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3025 +/- ## = + Coverage 55.13% 70.83% +15.69% + Complexity 3864 385 -3479 = Files 487 54 -433 Lines 23608 2016-21592 Branches 2527 241 -2286 = - Hits 13016 1428-11588 + Misses 9437 454 -8983 + Partials 1155 134 -1021 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.84% <0.00%> (-0.34%)` | :arrow_down: | | [...java/org/apache/hudi/common/util/NumericUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvTnVtZXJpY1V0aWxzLmphdmE=) | | | | [...src/main/java/org/apache/hudi/QuickstartUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvUXVpY2tzdGFydFV0aWxzLmphdmE=) | | | | [...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh) | | | | [...udi/common/table/timeline/dto/CompactionOpDTO.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Db21wYWN0aW9uT3BEVE8uamF2YQ==) | | | | [...i/src/main/java/org/apache/hudi/cli/HoodieCLI.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZUNMSS5qYXZh) | | | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-commenter edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (fd115f8) into [master](https://codecov.io/gh/apache/hudi/commit/dcd7c331dc72df9ab10e4867a3592faf89f1480b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (dcd7c33) will **increase** coverage by `15.67%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2645 +/- ## = + Coverage 55.15% 70.83% +15.67% + Complexity 3851 385 -3466 = Files 485 54 -431 Lines 23542 2016-21526 Branches 2522 241 -2281 = - Hits 12985 1428-11557 + Misses 9405 454 -8951 + Partials 1152 134 -1018 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.84% <0.00%> (-0.34%)` | :arrow_down: | | [.../org/apache/hudi/common/model/HoodieWriteStat.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVdyaXRlU3RhdC5qYXZh) | | | | [...g/apache/hudi/cli/utils/SparkTempViewProvider.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL1NwYXJrVGVtcFZpZXdQcm92aWRlci5qYXZh) | | | | [...pache/hudi/common/model/HoodieMetadataWrapper.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZU1ldGFkYXRhV3JhcHBlci5qYXZh) | | | | [...apache/hudi/table/format/cow/RunLengthDecoder.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L1J1bkxlbmd0aERlY29kZXIuamF2YQ==) | | | | [...pache/hudi/sink/compact/CompactionCommitEvent.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdEV2ZW50LmphdmE=) | | | |
[GitHub] [hudi] codecov-commenter commented on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
codecov-commenter commented on pull request #3025: URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3025](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0ec9a07) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **decrease** coverage by `45.85%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3025/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3025 +/- ## - Coverage 55.13% 9.27% -45.86% + Complexity 3864 48 -3816 Files 487 54 -433 Lines 236082016-21592 Branches 2527 241 -2286 - Hits 13016 187-12829 + Misses 94371816 -7621 + Partials 1155 13 -1142 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `9.27% <ø> (-61.61%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-commenter edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (fd115f8) into [master](https://codecov.io/gh/apache/hudi/commit/dcd7c331dc72df9ab10e4867a3592faf89f1480b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (dcd7c33) will **decrease** coverage by `45.88%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2645 +/- ## - Coverage 55.15% 9.27% -45.89% + Complexity 3851 48 -3803 Files 485 54 -431 Lines 235422016-21526 Branches 2522 241 -2281 - Hits 12985 187-12798 + Misses 94051816 -7589 + Partials 1152 13 -1139 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `9.27% <ø> (-61.61%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[jira] [Updated] (HUDI-1955) The filter condition is missing in the judgment condition of compaction instance
[ https://issues.apache.org/jira/browse/HUDI-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1955: - Labels: pull-request-available (was: ) > The filter condition is missing in the judgment condition of compaction > instance > > > Key: HUDI-1955 > URL: https://issues.apache.org/jira/browse/HUDI-1955 > Project: Apache Hudi > Issue Type: Bug > Components: Compaction >Reporter: Zheng yunhong >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > The filter condition is missing in the judgment condition of compaction > instance in BaseScheduleCompactionActionExecutor.java. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] swuferhong opened a new pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
swuferhong opened a new pull request #3025: URL: https://github.com/apache/hudi/pull/3025 …action instance ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request The filter condition is missing in the judgment condition of compaction instance in BaseScheduleCompactionActionExecutor.java. In method execute(), it needs a filter condition, but now it don't have. ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1955) The filter condition is missing in the judgment condition of compaction instance
[ https://issues.apache.org/jira/browse/HUDI-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng yunhong updated HUDI-1955: Fix Version/s: 0.9.0 > The filter condition is missing in the judgment condition of compaction > instance > > > Key: HUDI-1955 > URL: https://issues.apache.org/jira/browse/HUDI-1955 > Project: Apache Hudi > Issue Type: Bug > Components: Compaction >Reporter: Zheng yunhong >Priority: Major > Fix For: 0.9.0 > > > The filter condition is missing in the judgment condition of compaction > instance in BaseScheduleCompactionActionExecutor.java. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * fa95b4448c260e28d0aa7506c9ad71234c154d4f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=137) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1955) The filter condition is missing in the judgment condition of compaction instance
Zheng yunhong created HUDI-1955: --- Summary: The filter condition is missing in the judgment condition of compaction instance Key: HUDI-1955 URL: https://issues.apache.org/jira/browse/HUDI-1955 Project: Apache Hudi Issue Type: Bug Components: Compaction Reporter: Zheng yunhong The filter condition is missing in the judgment condition of compaction instance in BaseScheduleCompactionActionExecutor.java. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hushenmin commented on issue #3005: [SUPPORT]How to query history snapshot by given one history partition?
hushenmin commented on issue #3005: URL: https://github.com/apache/hudi/issues/3005#issuecomment-853508788 In short, in the case of non-global index writes, backtrack historical snapshots and remove duplicates -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=136) * fa95b4448c260e28d0aa7506c9ad71234c154d4f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=137) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135) * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=136) * fa95b4448c260e28d0aa7506c9ad71234c154d4f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135) * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=136) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135) * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * 49373be61ff4e802fdb794411ef1b3e17d547aff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=134) * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR
hudi-bot edited a comment on pull request #2984: URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102 ## CI report: * 49373be61ff4e802fdb794411ef1b3e17d547aff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=134) * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jtmzheng commented on issue #2995: [SUPPORT] Upserts creating duplicates after enabling metadata table in Hudi 0.7 indexing pipeline
jtmzheng commented on issue #2995: URL: https://github.com/apache/hudi/issues/2995#issuecomment-853465482 Update, I was able to run the metadata commands from the CLI and check `list-partitions` and `list-files`. It seems like every partition is there, but spot checking some partitions its clear not all the files are there: ``` 21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of log files scanned => 0 21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: MaxMemoryInBytes allowed for compaction => 0 21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of entries in MemoryBasedMap in ExternalSpillableMap => 0 21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Total size in bytes of MemoryBasedMap in ExternalSpillableMap => 0 21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of entries in DiskBasedMap in ExternalSpillableMap => 0 21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Size of file spilled to disk => 0 21/06/03 00:10:32 INFO metadata.HoodieBackedTableMetadata: Opened metadata log files from [] at instant 20210524060712(dataset instant=20210524060712, metadata instant=20201216222013) 21/06/03 00:10:32 INFO compress.CodecPool: Got brand-new decompressor [.gz] 21/06/03 00:10:32 INFO metadata.HoodieBackedTableMetadata: Metadata read for key 2020/9/4 took [open, baseFileRead, logMerge] [94, 71, 0] ms 21/06/03 00:10:32 INFO metadata.BaseTableMetadata: Listed file in partition from metadata: partition=2020/9/4, #files=121 .f8a8f054-6d0e-43e8-9412-3dfec79d7d53-0_20210509151344.log.1_10765-6377-69863781 .f75e9845-a3a2-4b41-b59a-8effc2ee049a-0_20210429061318.log.1_12603-1679-22268308 .f745a448-31fa-4a95-a3c4-5f88f6c95bb2-0_20210515000505.log.2_11136-1920-26157888 .f745a448-31fa-4a95-a3c4-5f88f6c95bb2-0_20210515000505.log.1_12428-158-1954195 ... ``` Listing shows 1,539 files under that partition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356054#comment-17356054 ] Vinoth Chandar commented on HUDI-1138: -- [~guoyihua] Your approach looks good. cc [~shivnarayan] who can help you out if you are blocked. > Re-implement marker files via timeline server > - > > Key: HUDI-1138 > URL: https://issues.apache.org/jira/browse/HUDI-1138 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Vinoth Chandar >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.9.0 > > > Even as you can argue that RFC-15/consolidated metadata, removes the need for > deleting partial files written due to spark task failures/stage retries. It > will still leave extra files inside the table (and users will pay for it > every month) and we need the marker mechanism to be able to delete these > partial files. > Here we explore if we can improve the current marker file mechanism, that > creates one marker file per data file written, by > Delegating the createMarker() call to the driver/timeline server, and have it > create marker metadata into a single file handle, that is flushed for > durability guarantees > > P.S: I was tempted to think Spark listener mechanism can help us deal with > failed tasks, but it has no guarantees. the writer job could die without > deleting a partial file. i.e it can improve things, but cant provide > guarantees -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356053#comment-17356053 ] Vinoth Chandar commented on HUDI-1138: -- We cannot, right?. It fundamentally relies on timeline server, as the single writer for the `MARKERS` file. > Re-implement marker files via timeline server > - > > Key: HUDI-1138 > URL: https://issues.apache.org/jira/browse/HUDI-1138 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Vinoth Chandar >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.9.0 > > > Even as you can argue that RFC-15/consolidated metadata, removes the need for > deleting partial files written due to spark task failures/stage retries. It > will still leave extra files inside the table (and users will pay for it > every month) and we need the marker mechanism to be able to delete these > partial files. > Here we explore if we can improve the current marker file mechanism, that > creates one marker file per data file written, by > Delegating the createMarker() call to the driver/timeline server, and have it > create marker metadata into a single file handle, that is flushed for > durability guarantees > > P.S: I was tempted to think Spark listener mechanism can help us deal with > failed tasks, but it has no guarantees. the writer job could die without > deleting a partial file. i.e it can improve things, but cant provide > guarantees -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355933#comment-17355933 ] Nishith Agarwal commented on HUDI-1138: --- [~guoyihua] Thanks for the explanation. Is there a way to decouple this marker file implementation from the timeline service ? cc [~vinoth] > Re-implement marker files via timeline server > - > > Key: HUDI-1138 > URL: https://issues.apache.org/jira/browse/HUDI-1138 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Vinoth Chandar >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.9.0 > > > Even as you can argue that RFC-15/consolidated metadata, removes the need for > deleting partial files written due to spark task failures/stage retries. It > will still leave extra files inside the table (and users will pay for it > every month) and we need the marker mechanism to be able to delete these > partial files. > Here we explore if we can improve the current marker file mechanism, that > creates one marker file per data file written, by > Delegating the createMarker() call to the driver/timeline server, and have it > create marker metadata into a single file handle, that is flushed for > durability guarantees > > P.S: I was tempted to think Spark listener mechanism can help us deal with > failed tasks, but it has no guarantees. the writer job could die without > deleting a partial file. i.e it can improve things, but cant provide > guarantees -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] jtmzheng commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper
jtmzheng commented on issue #2470: URL: https://github.com/apache/hudi/issues/2470#issuecomment-853177797 Unfortunately no, ran into https://github.com/apache/hudi/issues/2995 I think this issue is fine to close out, https://github.com/apache/hudi/issues/2470#issuecomment-769948718 got our rollback performance to an acceptable state even without the metadata table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state
codecov-commenter edited a comment on pull request #2994: URL: https://github.com/apache/hudi/pull/2994#issuecomment-848024173 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2994](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8eef75f) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **increase** coverage by `7.91%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2994/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2994 +/- ## + Coverage 55.13% 63.04% +7.91% + Complexity 3864 346-3518 Files 487 54 -433 Lines 23608 2016 -21592 Branches 2527 241-2286 - Hits 13016 1271 -11745 + Misses 9437 621-8816 + Partials 1155 124-1031 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `63.04% <ø> (-7.84%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | :arrow_down: | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `40.69% <0.00%> (-23.84%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized
codecov-commenter edited a comment on pull request #3020: URL: https://github.com/apache/hudi/pull/3020#issuecomment-851899487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized
codecov-commenter edited a comment on pull request #3020: URL: https://github.com/apache/hudi/pull/3020#issuecomment-851899487 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3020](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (24ec6b0) into [master](https://codecov.io/gh/apache/hudi/commit/e6a71ea544f3dd1ab5227b4c89a1540df5f7891a?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e6a71ea) will **decrease** coverage by `0.00%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3020/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3020 +/- ## - Coverage 55.14% 55.13% -0.01% - Complexity 3850 3865 +15 Files 485 487 +2 Lines 2354223608 +66 Branches 2522 2527 +5 + Hits 1298213017 +35 - Misses 9406 9435 +29 - Partials 1154 1156 +2 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.55% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.31% <ø> (+0.02%)` | :arrow_up: | | hudiflink | `63.34% <ø> (-0.30%)` | :arrow_down: | | hudihadoopmr | `51.54% <ø> (ø)` | | | hudisparkdatasource | `74.28% <ø> (ø)` | | | hudisync | `46.44% <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `70.88% <ø> (+0.04%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh) | `50.00% <ø> (ø)` | | | [...g/apache/hudi/sink/partitioner/BucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVyLmphdmE=) | `82.29% <0.00%> (-6.24%)` | :arrow_down: | | [...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==) | `87.09% <0.00%> (-2.56%)` | :arrow_down: | | [...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=) | `76.41% <0.00%> (-1.47%)` | :arrow_down: | | [.../org/apache/hudi/streamer/HoodieFlinkStreamer.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9Ib29kaWVGbGlua1N0cmVhbWVyLmphdmE=) | `0.00% <0.00%> (ø)` | | |
[jira] [Updated] (HUDI-1941) Fix callers of HoodieRecordPayload.preCombine() to use new api with props arg
[ https://issues.apache.org/jira/browse/HUDI-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1941: -- Labels: sev:high (was: sev:critical) > Fix callers of HoodieRecordPayload.preCombine() to use new api with props arg > - > > Key: HUDI-1941 > URL: https://issues.apache.org/jira/browse/HUDI-1941 > Project: Apache Hudi > Issue Type: Task > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: sev:high > > We deprecated old api for preCombine and introduced new one. But haven't > fixed the callers to use the new api. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HUDI-1719) hive on spark/mr,Incremental query of the mor table, the partition field is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reopened HUDI-1719: --- > hive on spark/mr,Incremental query of the mor table, the partition field is > incorrect > - > > Key: HUDI-1719 > URL: https://issues.apache.org/jira/browse/HUDI-1719 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.7.0, 0.8.0 > Environment: spark2.4.5, hadoop 3.1.1, hive 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > > now hudi use HoodieCombineHiveInputFormat to achieve Incremental query of the > mor table. > when we have some small files in different partitions, > HoodieCombineHiveInputFormat will combine those small file readers. > HoodieCombineHiveInputFormat build partition field base on the first file > reader in it, however now HoodieCombineHiveInputFormat holds other file > readers which come from different partitions. > When switching readers, we should update ioctx > test env: > spark2.4.5, hadoop 3.1.1, hive 3.1.1 > test step: > step1: > val df = spark.range(0, 1).toDF("keyid") > .withColumn("col3", expr("keyid + 1000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(6)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // create hudi table which has three level partitions p,p1,p2 > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > > step2: > val df = spark.range(0, 1).toDF("keyid") > .withColumn("col3", expr("keyid + 1000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(7)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // upsert current table > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert") > hive beeline: > set > hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; > set hoodie.hive_8b.consume.mode=INCREMENTAL; > set hoodie.hive_8b.consume.max.commits=3; > set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp > is smaller the earlist commit, so we can query whole commits > select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where > `_hoodie_commit_time`>'20210325141300' and `keyid` < 5; > query result: > +-+++-+ > |p|p1|p2|keyid| > +-+++-+ > |0|0|6|0| > |0|0|6|1| > |0|0|6|2| > |0|0|6|3| > |0|0|6|4| > |0|0|6|4| > |0|0|6|0| > |0|0|6|3| > |0|0|6|2| > |0|0|6|1| > +-+++-+ > this result is wrong, since the second step we insert new data in table which > p2=7, however in the query result we cannot find p2=7, all p2= 6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1719) hive on spark/mr,Incremental query of the mor table, the partition field is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1719. --- Resolution: Fixed > hive on spark/mr,Incremental query of the mor table, the partition field is > incorrect > - > > Key: HUDI-1719 > URL: https://issues.apache.org/jira/browse/HUDI-1719 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.7.0, 0.8.0 > Environment: spark2.4.5, hadoop 3.1.1, hive 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > > now hudi use HoodieCombineHiveInputFormat to achieve Incremental query of the > mor table. > when we have some small files in different partitions, > HoodieCombineHiveInputFormat will combine those small file readers. > HoodieCombineHiveInputFormat build partition field base on the first file > reader in it, however now HoodieCombineHiveInputFormat holds other file > readers which come from different partitions. > When switching readers, we should update ioctx > test env: > spark2.4.5, hadoop 3.1.1, hive 3.1.1 > test step: > step1: > val df = spark.range(0, 1).toDF("keyid") > .withColumn("col3", expr("keyid + 1000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(6)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // create hudi table which has three level partitions p,p1,p2 > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > > step2: > val df = spark.range(0, 1).toDF("keyid") > .withColumn("col3", expr("keyid + 1000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(7)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // upsert current table > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert") > hive beeline: > set > hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; > set hoodie.hive_8b.consume.mode=INCREMENTAL; > set hoodie.hive_8b.consume.max.commits=3; > set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp > is smaller the earlist commit, so we can query whole commits > select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where > `_hoodie_commit_time`>'20210325141300' and `keyid` < 5; > query result: > +-+++-+ > |p|p1|p2|keyid| > +-+++-+ > |0|0|6|0| > |0|0|6|1| > |0|0|6|2| > |0|0|6|3| > |0|0|6|4| > |0|0|6|4| > |0|0|6|0| > |0|0|6|3| > |0|0|6|2| > |0|0|6|1| > +-+++-+ > this result is wrong, since the second step we insert new data in table which > p2=7, however in the query result we cannot find p2=7, all p2= 6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1719) hive on spark/mr,Incremental query of the mor table, the partition field is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1719: -- Status: Closed (was: Patch Available) > hive on spark/mr,Incremental query of the mor table, the partition field is > incorrect > - > > Key: HUDI-1719 > URL: https://issues.apache.org/jira/browse/HUDI-1719 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.7.0, 0.8.0 > Environment: spark2.4.5, hadoop 3.1.1, hive 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > > now hudi use HoodieCombineHiveInputFormat to achieve Incremental query of the > mor table. > when we have some small files in different partitions, > HoodieCombineHiveInputFormat will combine those small file readers. > HoodieCombineHiveInputFormat build partition field base on the first file > reader in it, however now HoodieCombineHiveInputFormat holds other file > readers which come from different partitions. > When switching readers, we should update ioctx > test env: > spark2.4.5, hadoop 3.1.1, hive 3.1.1 > test step: > step1: > val df = spark.range(0, 1).toDF("keyid") > .withColumn("col3", expr("keyid + 1000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(6)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // create hudi table which has three level partitions p,p1,p2 > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > > step2: > val df = spark.range(0, 1).toDF("keyid") > .withColumn("col3", expr("keyid + 1000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(7)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // upsert current table > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert") > hive beeline: > set > hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; > set hoodie.hive_8b.consume.mode=INCREMENTAL; > set hoodie.hive_8b.consume.max.commits=3; > set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp > is smaller the earlist commit, so we can query whole commits > select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where > `_hoodie_commit_time`>'20210325141300' and `keyid` < 5; > query result: > +-+++-+ > |p|p1|p2|keyid| > +-+++-+ > |0|0|6|0| > |0|0|6|1| > |0|0|6|2| > |0|0|6|3| > |0|0|6|4| > |0|0|6|4| > |0|0|6|0| > |0|0|6|3| > |0|0|6|2| > |0|0|6|1| > +-+++-+ > this result is wrong, since the second step we insert new data in table which > p2=7, however in the query result we cannot find p2=7, all p2= 6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HUDI-1800) Incorrect HoodieTableFileSystem API usage for pending slices causing issues
[ https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reopened HUDI-1800: --- > Incorrect HoodieTableFileSystem API usage for pending slices causing issues > --- > > Key: HUDI-1800 > URL: https://issues.apache.org/jira/browse/HUDI-1800 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Nishith Agarwal >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available, sev:critical > > From [~vbalaji] > > We are using wrong API of FileSystemView here > [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85] > We don't include file groups that are in pending compaction but with Hbase > Index we are including them. With the current state of code, Including files > in pending compaction is an issue. > This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by > CompactionAdminClient to figure out log files that were added after pending > compaction and rename them such that we can undo the effects of compaction > scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" > which gives a consolidated view of the latest file slice and includes all > data both before and after compaction. This is what should be used in > [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85] > The other workaround would be excluding file slices in pending compaction > when we select small files to avoid the interaction between compactor and > ingestion in this case. But, I think we can go with the first option > > More details can be found here -> https://github.com/apache/hudi/issues/2633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1800) Incorrect HoodieTableFileSystem API usage for pending slices causing issues
[ https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1800. --- Fix Version/s: 0.9.0 Resolution: Fixed > Incorrect HoodieTableFileSystem API usage for pending slices causing issues > --- > > Key: HUDI-1800 > URL: https://issues.apache.org/jira/browse/HUDI-1800 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Nishith Agarwal >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.9.0 > > > From [~vbalaji] > > We are using wrong API of FileSystemView here > [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85] > We don't include file groups that are in pending compaction but with Hbase > Index we are including them. With the current state of code, Including files > in pending compaction is an issue. > This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by > CompactionAdminClient to figure out log files that were added after pending > compaction and rename them such that we can undo the effects of compaction > scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" > which gives a consolidated view of the latest file slice and includes all > data both before and after compaction. This is what should be used in > [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85] > The other workaround would be excluding file slices in pending compaction > when we select small files to avoid the interaction between compactor and > ingestion in this case. But, I think we can go with the first option > > More details can be found here -> https://github.com/apache/hudi/issues/2633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1800) Incorrect HoodieTableFileSystem API usage for pending slices causing issues
[ https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1800: -- Status: Closed (was: Patch Available) > Incorrect HoodieTableFileSystem API usage for pending slices causing issues > --- > > Key: HUDI-1800 > URL: https://issues.apache.org/jira/browse/HUDI-1800 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Nishith Agarwal >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available, sev:critical > > From [~vbalaji] > > We are using wrong API of FileSystemView here > [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85] > We don't include file groups that are in pending compaction but with Hbase > Index we are including them. With the current state of code, Including files > in pending compaction is an issue. > This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by > CompactionAdminClient to figure out log files that were added after pending > compaction and rename them such that we can undo the effects of compaction > scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" > which gives a consolidated view of the latest file slice and includes all > data both before and after compaction. This is what should be used in > [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85] > The other workaround would be excluding file slices in pending compaction > when we select small files to avoid the interaction between compactor and > ingestion in this case. But, I think we can go with the first option > > More details can be found here -> https://github.com/apache/hudi/issues/2633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized
codecov-commenter edited a comment on pull request #3020: URL: https://github.com/apache/hudi/pull/3020#issuecomment-851899487 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3020](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (24ec6b0) into [master](https://codecov.io/gh/apache/hudi/commit/e6a71ea544f3dd1ab5227b4c89a1540df5f7891a?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e6a71ea) will **decrease** coverage by `0.00%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3020/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3020 +/- ## - Coverage 55.14% 55.13% -0.01% - Complexity 3850 3865 +15 Files 485 487 +2 Lines 2354223608 +66 Branches 2522 2527 +5 + Hits 1298213017 +35 - Misses 9406 9435 +29 - Partials 1154 1156 +2 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.55% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `50.31% <ø> (+0.02%)` | :arrow_up: | | hudiflink | `63.34% <ø> (-0.30%)` | :arrow_down: | | hudihadoopmr | `51.54% <ø> (ø)` | | | hudisparkdatasource | `74.28% <ø> (ø)` | | | hudisync | `46.44% <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `70.88% <ø> (+0.04%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh) | `50.00% <ø> (ø)` | | | [...g/apache/hudi/sink/partitioner/BucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVyLmphdmE=) | `82.29% <0.00%> (-6.24%)` | :arrow_down: | | [...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==) | `87.09% <0.00%> (-2.56%)` | :arrow_down: | | [...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=) | `76.41% <0.00%> (-1.47%)` | :arrow_down: | | [.../org/apache/hudi/streamer/HoodieFlinkStreamer.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9Ib29kaWVGbGlua1N0cmVhbWVyLmphdmE=) | `0.00% <0.00%> (ø)` | | |
[GitHub] [hudi] yanghua commented on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state
yanghua commented on pull request #2994: URL: https://github.com/apache/hudi/pull/2994#issuecomment-853049109 Busy recently, will review it tommorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua closed pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized
yanghua closed pull request #3020: URL: https://github.com/apache/hudi/pull/3020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized
yanghua commented on pull request #3020: URL: https://github.com/apache/hudi/pull/3020#issuecomment-853047795 > @yanghua hi, I have modified the code to prevent it from failing due to code style verification, but the ci still fails to build code. It should not be caused by my modification. Can you help me see it? what should i do? Yes, I checked, it's not due to your change. But, make sure the CI success before merging is a good practice. I will take over it. Merge it soon. You can move your focus to other things. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
nsivabalan commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r643976142 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; +import org.jetbrains.annotations.NotNull; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +
[GitHub] [hudi] wangxianghu commented on a change in pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type
wangxianghu commented on a change in pull request #2993: URL: https://github.com/apache/hudi/pull/2993#discussion_r643974052 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java ## @@ -220,5 +425,21 @@ public void testComplexRecordKeysWithComplexPartitionPath() { Row row = KeyGeneratorTestUtilities.getRow(record); Assertions.assertEquals(keyGenerator.getRecordKey(row), "_row_key:key1,pii_col:pi"); Assertions.assertEquals(keyGenerator.getPartitionPath(row), "timestamp=4357686/ts_ms=20200321"); + +// Test config with HoodieWriteConfig.KEYGENERATOR_TYPE_PROP +complexRecordKeyAndPartitionPathProps = getComplexRecordKeyAndPartitionPathProps(); +complexRecordKeyAndPartitionPathProps +.put(HoodieWriteConfig.KEYGENERATOR_TYPE_PROP, KeyGeneratorType.CUSTOM.name()); + +keyGenerator = HoodieSparkKeyGeneratorFactory.createKeyGenerator(complexRecordKeyAndPartitionPathProps); + +record = getRecord(); Review comment: > would be nice to avoid copy pasting code. Let's try to parametrize tests. sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type
wangxianghu commented on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-853037788 > May I know what do you mean by "Yes, several places. I will change them one by one" ? Do you mean to say, you plan to address in a follow up patch ? IMO, it makes sense to do it in this patch itself. Ok, I'll update this pr to change all of them at once -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
nsivabalan commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r643894403 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +throw new
[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
liujinhui1994 commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-852970149 I am currently facing a problem and would like to hear your opinion After we add this type, hoodie.deltastreamer.source.kafka.checkpoint.type=timestamp I am currently thinking, does deltastreamer.checkpoint.key maintain the status quo? The format is still: topicName,0:123,1:456 If we continue to maintain the above format, when we specify: for example --checkpoint 1622635064, we need to determine the relationship between commitMetadata.getMetadata(CHECKPOINT_KEY) and --checkpoint 1622635064 in org.apache.hudi.utilities.deltastreamer.DeltaSync#readFromSource, This seems to be contrary to the results of our discussion, do not add kafka dependent code in DeltaSync Do you have any suggestions for this? thanks @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
nsivabalan commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r643892144 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; +import org.jetbrains.annotations.NotNull; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s", properties.getString(Config.PASSWORD_FILE))); +FileSystem fileSystem = FileSystem.get(session.sparkContext().hadoopConfiguration()); +passwordFileStream = fileSystem.open(new Path(properties.getString(Config.PASSWORD_FILE))); +byte[] bytes = new byte[passwordFileStream.available()]; +passwordFileStream.read(bytes); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new String(bytes)); + } else { +
[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
nsivabalan commented on a change in pull request #2915: URL: https://github.com/apache/hudi/pull/2915#discussion_r643884770 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1591,6 +1596,45 @@ public void testCsvDFSSourceNoHeaderWithSchemaProviderAndTransformer() throws Ex testCsvDFSSource(false, '\t', true, Collections.singletonList(TripsWithDistanceTransformer.class.getName())); } + @Test + public void testIncrementalFetchInContinuousMode() { Review comment: minor. rename to "testJDBCSourceIncremental." ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.sources; + +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.SqlQueryBuilder; +import org.apache.hudi.utilities.schema.SchemaProvider; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Column; +import org.apache.spark.sql.DataFrameReader; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.functions; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.storage.StorageLevel; + +import java.net.URI; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +/** + * Reads data from RDBMS data sources. + */ + +public class JdbcSource extends RowSource { + + private static final Logger LOG = LogManager.getLogger(JdbcSource.class); + private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", "postgresql", "h2"); + private static final String URI_JDBC_PREFIX = "jdbc:"; + + public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, +SchemaProvider schemaProvider) { +super(props, sparkContext, sparkSession, schemaProvider); + } + + /** + * Validates all user properties and prepares the {@link DataFrameReader} to read from RDBMS. + * + * @param sessionThe {@link SparkSession}. + * @param properties The JDBC connection properties and data source options. + * @return The {@link DataFrameReader} to read from RDBMS + * @throws HoodieException + */ + private static DataFrameReader validatePropsAndGetDataFrameReader(final SparkSession session, +final TypedProperties properties) + throws HoodieException { +DataFrameReader dataFrameReader; +FSDataInputStream passwordFileStream = null; +try { + dataFrameReader = session.read().format("jdbc"); + dataFrameReader = dataFrameReader.option(Config.URL_PROP, properties.getString(Config.URL)); + dataFrameReader = dataFrameReader.option(Config.USER_PROP, properties.getString(Config.USER)); + dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, properties.getString(Config.DRIVER_CLASS)); + dataFrameReader = dataFrameReader + .option(Config.RDBMS_TABLE_PROP, properties.getString(Config.RDBMS_TABLE_NAME)); + + if (properties.containsKey(Config.PASSWORD)) { +LOG.info("Reading JDBC password from properties file"); +dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, properties.getString(Config.PASSWORD)); + } else if (properties.containsKey(Config.PASSWORD_FILE) + && !StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) { +LOG.info(String.format("Reading JDBC password from password file %s",
[GitHub] [hudi] nsivabalan commented on a change in pull request #2963: [HUDI-1904] Make SchemaProvider spark free and move it to hudi-client-common
nsivabalan commented on a change in pull request #2963: URL: https://github.com/apache/hudi/pull/2963#discussion_r643879466 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/schema/SchemaProvider.java ## @@ -34,18 +32,9 @@ @PublicAPIClass(maturity = ApiMaturityLevel.STABLE) public abstract class SchemaProvider implements Serializable { - protected TypedProperties config; + protected Schema sourceSchema; - protected JavaSparkContext jssc; - - public SchemaProvider(TypedProperties props) { Review comment: just now read other comments. I understand the intent to make it agnostic to engines, but not gonna be easy to make it backwards compatible. One more thought: we might need to make the base abstract class generic with two types (a config class and engine context may be). But this def needs more thought. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2963: [HUDI-1904] Make SchemaProvider spark free and move it to hudi-client-common
nsivabalan commented on a change in pull request #2963: URL: https://github.com/apache/hudi/pull/2963#discussion_r643874947 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/schema/SchemaProvider.java ## @@ -34,18 +32,9 @@ @PublicAPIClass(maturity = ApiMaturityLevel.STABLE) public abstract class SchemaProvider implements Serializable { - protected TypedProperties config; + protected Schema sourceSchema; - protected JavaSparkContext jssc; - - public SchemaProvider(TypedProperties props) { Review comment: Can we think about making this backwards compatible. If a user defined their own schemaProvider, super(...) calls may start to fail if we remove these 2 constructors. If this was a private interface, we could evolve w/o much consideration, but since this is a public api, we got to be careful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type
nsivabalan commented on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-852948354 May I know what do you mean by "Yes, several places. I will change them one by one" ? Do you mean to say, you plan to address in a follow up patch ? IMO, it makes sense to do it in this patch itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type
nsivabalan commented on a change in pull request #2993: URL: https://github.com/apache/hudi/pull/2993#discussion_r643861744 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorType.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.keygen.constant; + +/** + * Types of {@link org.apache.hudi.keygen.KeyGenerator}. + */ +public enum KeyGeneratorType { + /** + * Simple key generator, which takes names of fields to be used for recordKey and partitionPath as configs. + */ + SIMPLE, + + /** + * Complex key generator, which takes names of fields to be used for recordKey and partitionPath as configs. + */ + COMPLEX, + + /** + * Key generator, that relies on timestamps for partitioning field. Still picks record key by name. + */ + TIMESTAMP, + + /** + * A generic implementation of KeyGenerator where users can configure record key as a single field or a combination Review comment: Can we please add an example for custom key gen type. users might confuse between complex and custom. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type
nsivabalan commented on a change in pull request #2993: URL: https://github.com/apache/hudi/pull/2993#discussion_r643863514 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java ## @@ -98,53 +103,145 @@ private TypedProperties getPropertiesForNonPartitionedKeyGen() { } @Test - public void testSimpleKeyGenerator() { -BuiltinKeyGenerator keyGenerator = new CustomKeyGenerator(getPropertiesForSimpleKeyGen()); + public void testSimpleKeyGenerator() throws IOException { +TypedProperties propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen(); + +// Test config with HoodieWriteConfig.KEYGENERATOR_CLASS_PROP +propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, CustomKeyGenerator.class.getName()); + +BuiltinKeyGenerator keyGenerator = HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen); GenericRecord record = getRecord(); HoodieKey key = keyGenerator.getKey(record); Assertions.assertEquals(key.getRecordKey(), "key1"); Assertions.assertEquals(key.getPartitionPath(), "timestamp=4357686"); Row row = KeyGeneratorTestUtilities.getRow(record); Assertions.assertEquals(keyGenerator.getRecordKey(row), "key1"); Assertions.assertEquals(keyGenerator.getPartitionPath(row), "timestamp=4357686"); + +// Test config with HoodieWriteConfig.KEYGENERATOR_TYPE_PROP +propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen(); +propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_TYPE_PROP, KeyGeneratorType.CUSTOM.name()); + +keyGenerator = HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen); Review comment: we could trying moving this repetitive code to a common private method and re-use them. I mean lines 125 to 132. Similarly for all key types. ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java ## @@ -98,53 +103,145 @@ private TypedProperties getPropertiesForNonPartitionedKeyGen() { } @Test - public void testSimpleKeyGenerator() { -BuiltinKeyGenerator keyGenerator = new CustomKeyGenerator(getPropertiesForSimpleKeyGen()); + public void testSimpleKeyGenerator() throws IOException { +TypedProperties propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen(); + +// Test config with HoodieWriteConfig.KEYGENERATOR_CLASS_PROP +propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, CustomKeyGenerator.class.getName()); + +BuiltinKeyGenerator keyGenerator = HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen); GenericRecord record = getRecord(); HoodieKey key = keyGenerator.getKey(record); Assertions.assertEquals(key.getRecordKey(), "key1"); Assertions.assertEquals(key.getPartitionPath(), "timestamp=4357686"); Row row = KeyGeneratorTestUtilities.getRow(record); Assertions.assertEquals(keyGenerator.getRecordKey(row), "key1"); Assertions.assertEquals(keyGenerator.getPartitionPath(row), "timestamp=4357686"); + +// Test config with HoodieWriteConfig.KEYGENERATOR_TYPE_PROP +propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen(); +propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_TYPE_PROP, KeyGeneratorType.CUSTOM.name()); + +keyGenerator = HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen); Review comment: or better option would be to go with Parameterized tests. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorType.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.keygen.constant; + +/** + * Types of {@link org.apache.hudi.keygen.KeyGenerator}. + */ +public enum KeyGeneratorType { + /** + * Simple key generator, which takes names of fields to be used for recordKey and partitionPath as configs. + */ + SIMPLE, + + /** + * Complex key generator, which takes names of fields to be used for recordKey and partitionPath as configs. + */ + COMPLEX, + + /** +
[GitHub] [hudi] arun990 commented on issue #3009: Dependency error when attempt to build Hudi from git source ..
arun990 commented on issue #3009: URL: https://github.com/apache/hudi/issues/3009#issuecomment-852930223 Hi, Tried to mvn build using the settings.xml and getting the same error as pasted below. [INFO] Building hudi-hadoop-mr 0.8.0 [6/42] [INFO] [ jar ]- [INFO] [INFO] Reactor Summary for Hudi 0.8.0: [INFO] [INFO] Hudi ... SUCCESS [02:09 min] [INFO] hudi-common SUCCESS [01:00 min] [INFO] hudi-timeline-service .. SUCCESS [ 5.223 s] [INFO] hudi-client SUCCESS [ 0.229 s] [INFO] hudi-client-common . SUCCESS [ 31.635 s] [INFO] hudi-hadoop-mr . FAILURE [ 0.327 s] [INFO] hudi-spark-client .. SKIPPED [INFO] hudi-sync-common ... SKIPPED [INFO] hudi-hive-sync . SKIPPED ---do --skipped [INFO] hudi-flink_2.12 SKIPPED [INFO] hudi-flink-bundle_2.12 . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 03:48 min [INFO] Finished at: 2021-06-02T10:53:56Z [INFO] [ERROR] Failed to execute goal on project hudi-hadoop-mr: Could not resolve dependencies for project org.apache.hudi :hudi-hadoop-mr:jar:0.8.0: Failed to collect dependencies at org.apache.hive:hive-exec:jar:core:2.3.1 -> org.apache. calcite:calcite-core:jar:1.10.0 -> org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Failed to read artifac t descriptor for org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Could not transfer artifact org.pentaho: pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde from/to maven-default-http-blocker (http://0.0.0.0/): Blocked mirror f or repositories: [datanucleus (http://www.datanucleus.org/downloads/maven2, default, releases), glassfish-repository (http://maven.glassfish.org/content/groups/glassfish, default, disabled), glassfish-repo-archive (http://maven.glas sfish.org/content/groups/glassfish, default, disabled), apache.snapshots (http://repository.apache.org/snapshots, de fault, snapshots), conjars (http://conjars.org/repo, default, releases+snapshots)] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hudi-hadoop-mr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Tandoy commented on issue #3009: Dependency error when attempt to build Hudi from git source ..
Tandoy commented on issue #3009: URL: https://github.com/apache/hudi/issues/3009#issuecomment-852891970 [settings.md](https://github.com/apache/hudi/files/6583594/settings.md) You can try this maven configuration file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1953) No set the output type of the operator, Throw java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1953: - Labels: pull-request-available (was: ) > No set the output type of the operator, Throw java.lang.NullPointerException > > > Key: HUDI-1953 > URL: https://issues.apache.org/jira/browse/HUDI-1953 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: taylor liao >Assignee: taylor liao >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > Need to set the output type of the operator, Otherwise throw > java.lang.NullPointerException. > java.lang.NullPointerException > at java.util.Objects.requireNonNull(Objects.java:203) > at > org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] taylorliao commented on pull request #3023: [HUDI-1953] No set the output type of the operator, Throw java.lang.NullPointerException
taylorliao commented on pull request #3023: URL: https://github.com/apache/hudi/pull/3023#issuecomment-852852811 @yanghua hi, i have created a jira ticket and fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1954) StreamWriterFunction only reset when flush success
yuzhaojing created HUDI-1954: Summary: StreamWriterFunction only reset when flush success Key: HUDI-1954 URL: https://issues.apache.org/jira/browse/HUDI-1954 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Now StreamWriterFunction flush bucket is unsafe. When instant is null, flushBucket will return immediately, and then reset this bucket resulting in data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1953) No set the output type of the operator, Throw java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] taylor liao updated HUDI-1953: -- Summary: No set the output type of the operator, Throw java.lang.NullPointerException (was: Don't set the output type of the operator, Throw java.lang.NullPointerException) > No set the output type of the operator, Throw java.lang.NullPointerException > > > Key: HUDI-1953 > URL: https://issues.apache.org/jira/browse/HUDI-1953 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: taylor liao >Assignee: taylor liao >Priority: Blocker > Fix For: 0.9.0 > > > Need to set the output type of the operator, Otherwise throw > java.lang.NullPointerException. > java.lang.NullPointerException > at java.util.Objects.requireNonNull(Objects.java:203) > at > org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
codecov-commenter edited a comment on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-852820330 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3024](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (89e90c5) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **increase** coverage by `7.91%`. > The diff coverage is `n/a`. > :exclamation: Current head 89e90c5 differs from pull request most recent head 3935834. Consider uploading reports for the commit 3935834 to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3024/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3024 +/- ## + Coverage 55.13% 63.04% +7.91% + Complexity 3864 346-3518 Files 487 54 -433 Lines 23608 2016 -21592 Branches 2527 241-2286 - Hits 13016 1271 -11745 + Misses 9437 621-8816 + Partials 1155 124-1031 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `63.04% <ø> (-7.84%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | :arrow_down: | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `40.69% <0.00%> (-23.84%)` | :arrow_down: | |
[GitHub] [hudi] chaplinthink commented on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized
chaplinthink commented on pull request #3020: URL: https://github.com/apache/hudi/pull/3020#issuecomment-852832226 @yanghua hi, I have modified the code to prevent it from failing due to code style verification, but the ci still fails to build code. It should not be caused by my modification. Can you help me see it? what should i do? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Issue Comment Deleted] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] taylor liao updated HUDI-1953: -- Comment: was deleted (was: PR: [https://github.com/apache/hudi/pull/3023 ]) > Don't set the output type of the operator, Throw > java.lang.NullPointerException > --- > > Key: HUDI-1953 > URL: https://issues.apache.org/jira/browse/HUDI-1953 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: taylor liao >Assignee: taylor liao >Priority: Blocker > Fix For: 0.9.0 > > > Need to set the output type of the operator, Otherwise throw > java.lang.NullPointerException. > java.lang.NullPointerException > at java.util.Objects.requireNonNull(Objects.java:203) > at > org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
codecov-commenter commented on pull request #3024: URL: https://github.com/apache/hudi/pull/3024#issuecomment-852820330 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3024](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (89e90c5) into [master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (05a9830) will **increase** coverage by `7.91%`. > The diff coverage is `n/a`. > :exclamation: Current head 89e90c5 differs from pull request most recent head 3935834. Consider uploading reports for the commit 3935834 to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3024/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3024 +/- ## + Coverage 55.13% 63.04% +7.91% + Complexity 3864 346-3518 Files 487 54 -433 Lines 23608 2016 -21592 Branches 2527 241-2286 - Hits 13016 1271 -11745 + Misses 9437 621-8816 + Partials 1155 124-1031 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `63.04% <ø> (-7.84%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | :arrow_down: | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `40.69% <0.00%> (-23.84%)` | :arrow_down: | |
[jira] [Updated] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] taylor liao updated HUDI-1953: -- Status: In Progress (was: Open) > Don't set the output type of the operator, Throw > java.lang.NullPointerException > --- > > Key: HUDI-1953 > URL: https://issues.apache.org/jira/browse/HUDI-1953 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: taylor liao >Assignee: taylor liao >Priority: Blocker > Fix For: 0.9.0 > > > Need to set the output type of the operator, Otherwise throw > java.lang.NullPointerException. > java.lang.NullPointerException > at java.util.Objects.requireNonNull(Objects.java:203) > at > org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hk-lrzy commented on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state
hk-lrzy commented on pull request #2994: URL: https://github.com/apache/hudi/pull/2994#issuecomment-852818090 > @hk-lrzy thanks for review,Please read the master code carefully,Using MapSate cause state bloat,which is a major bug. sorry,i got your points. `mapstate` saved unnecessary `recordkey` than valuestate like you said -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hk-lrzy removed a comment on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state
hk-lrzy removed a comment on pull request #2994: URL: https://github.com/apache/hudi/pull/2994#issuecomment-851997089 `indexState` is a mapstate and mapstate is also belong to keystate, so i think the `indexstate` need not to change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355532#comment-17355532 ] taylor liao commented on HUDI-1953: --- PR: [https://github.com/apache/hudi/pull/3023 ] > Don't set the output type of the operator, Throw > java.lang.NullPointerException > --- > > Key: HUDI-1953 > URL: https://issues.apache.org/jira/browse/HUDI-1953 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: taylor liao >Assignee: taylor liao >Priority: Blocker > Fix For: 0.9.0 > > > Need to set the output type of the operator, Otherwise throw > java.lang.NullPointerException. > java.lang.NullPointerException > at java.util.Objects.requireNonNull(Objects.java:203) > at > org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException
taylor liao created HUDI-1953: - Summary: Don't set the output type of the operator, Throw java.lang.NullPointerException Key: HUDI-1953 URL: https://issues.apache.org/jira/browse/HUDI-1953 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: taylor liao Assignee: taylor liao Fix For: 0.9.0 Need to set the output type of the operator, Otherwise throw java.lang.NullPointerException. java.lang.NullPointerException at java.util.Objects.requireNonNull(Objects.java:203) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70) at org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709) at org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] taylorliao commented on pull request #3023: [MINOR] Don't set the output type of the operator, Throw java.lang.NullPointerException
taylorliao commented on pull request #3023: URL: https://github.com/apache/hudi/pull/3023#issuecomment-852803307 > > @yanghua Can you help to review this PR? > > OK, no problem. IMO, it's a bug, can we file a jira ticket to track it? > > In addition, another case like it in `StreamWriteITCase` can we also fix it together? ok, i will file a jira ticket to track it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #3023: [MINOR] Don't set the output type of the operator, Throw java.lang.NullPointerException
yanghua commented on pull request #3023: URL: https://github.com/apache/hudi/pull/3023#issuecomment-852800181 > @yanghua Can you help to review this PR? OK, no problem. IMO, it's a bug, can we file a jira ticket to track it? In addition, another case like it in `StreamWriteITCase` can we also fix it together? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #2437: deltastreamer fails due to "Error upserting bucketType UPDATE for partition" and ArrayIndexOutOfBoundsException
n3nash closed issue #2437: URL: https://github.com/apache/hudi/issues/2437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2437: deltastreamer fails due to "Error upserting bucketType UPDATE for partition" and ArrayIndexOutOfBoundsException
n3nash commented on issue #2437: URL: https://github.com/apache/hudi/issues/2437#issuecomment-852798103 @jiangok2006 Closing this issue since the logs are not enough to reproduce the issue. Please re-open if you need further help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] taylorliao commented on pull request #3023: [MINOR] Don't set the output type of the operator, Throw java.lang.NullPointerException
taylorliao commented on pull request #3023: URL: https://github.com/apache/hudi/pull/3023#issuecomment-852797354 @yanghua Can you help to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists
n3nash commented on issue #2448: URL: https://github.com/apache/hudi/issues/2448#issuecomment-852796345 @peng-xin Since we haven't heard from you in a while and this issue has not been reported by anyone else, I'm assuming this to be a transient issue with some of your settings. Let me know if you need further help. @root18039532923 Please feel free to re-open if you are still confused about how to use async compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists
n3nash closed issue #2448: URL: https://github.com/apache/hudi/issues/2448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing opened a new pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap
yuzhaojing opened a new pull request #3024: URL: https://github.com/apache/hudi/pull/3024 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *Now flinkStreamer load index in BucketAssginer operator, but in this operator every task will load all BaseFile. To improve efficiency in load index, add BootstrapFunction to load index and shuffle to BucketAssginer, that wo can assigned BaseFile to subtasks. issue: https://issues.apache.org/jira/browse/HUDI-1924* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1924) Support bootstrap operator to load index from hoodieTable
[ https://issues.apache.org/jira/browse/HUDI-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1924: - Labels: pull-request-available (was: ) > Support bootstrap operator to load index from hoodieTable > -- > > Key: HUDI-1924 > URL: https://issues.apache.org/jira/browse/HUDI-1924 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > Now we load index in BucketAssign, but hoodieRecords in a baseFile may be > belong many task, So we have to load all files in any BucketAssign task. > If we add a operator before BucketAssign, then key by index Record to > BucketAssign, that we can implement assign part of files to any task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] n3nash closed issue #2461: All records are present in athena query result on glue crawled Hudi tables
n3nash closed issue #2461: URL: https://github.com/apache/hudi/issues/2461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2461: All records are present in athena query result on glue crawled Hudi tables
n3nash commented on issue #2461: URL: https://github.com/apache/hudi/issues/2461#issuecomment-852791741 @vrtrepp @noobarcitect Closing this issue since the proposed solution is straightforward. If you still need help making your glue connector work, please feel free to re-open. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org