[GitHub] [hudi] yuzhaojing edited a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


yuzhaojing edited a comment on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853583886


   > > @wangxianghu please review this pr
   > 
   > thanks, will review soon
   
   Thanks for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yuzhaojing commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


yuzhaojing commented on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853583886


   > > @wangxianghu please review this pr
   > 
   > thanks, will review soon
   
   Thranks for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


wangxianghu commented on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853581984


   > @wangxianghu please review this pr
   
   thanks, will review soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yuzhaojing commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


yuzhaojing commented on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853579575


   @wangxianghu please review this pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yuzhaojing removed a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


yuzhaojing removed a comment on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853539076


   > 1、ttl expired key shoud not be loaded to index state again.
   
   I will add judgment between instant and ttl


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-02 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356174#comment-17356174
 ] 

Vinay commented on HUDI-1148:
-

[~vbalaji] I can take a look at this, looking at the code we printing the 
entire hadoop conf

```

LOG.info(String.format("Hadoop Configuration: fs.defaultFS: [%s], Config:[%s], 
FileSystem: [%s]",
 conf.getRaw("fs.defaultFS"), conf.toString(), fs.toString()));

```

Will make this a debug log. Also, will check the logs while running with 
HudiDeltaStreamer and writing to CoW/MoR table

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-02 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1148:

Status: In Progress  (was: Open)

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-02 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1148:
---

Assignee: Vinay

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3026:
URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3026:
URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3026](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1d0a065) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **increase** coverage by `15.69%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3026/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3026   +/-   ##
   =
   + Coverage 55.13%   70.83%   +15.69% 
   + Complexity 3864  385 -3479 
   =
 Files   487   54  -433 
 Lines 23608 2016-21592 
 Branches   2527  241 -2286 
   =
   - Hits  13016 1428-11588 
   + Misses 9437  454 -8983 
   + Partials   1155  134 -1021 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.84% <0.00%> (-0.34%)` | :arrow_down: |
   | 
[...e/hudi/common/table/log/block/HoodieDataBlock.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEYXRhQmxvY2suamF2YQ==)
 | | |
   | 
[...di/hadoop/realtime/HoodieRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyLmphdmE=)
 | | |
   | 
[...rg/apache/hudi/common/model/HoodieAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUF2cm9QYXlsb2FkLmphdmE=)
 | | |
   | 
[...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh)
 | | |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | | |
   | 

[GitHub] [hudi] loukey-lj removed a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


loukey-lj removed a comment on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853546766


   2、The task must perform a data load each time it recovers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3026:
URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3026](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1d0a065) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **increase** coverage by `15.69%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3026/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3026   +/-   ##
   =
   + Coverage 55.13%   70.83%   +15.69% 
   + Complexity 3864  385 -3479 
   =
 Files   487   54  -433 
 Lines 23608 2016-21592 
 Branches   2527  241 -2286 
   =
   - Hits  13016 1428-11588 
   + Misses 9437  454 -8983 
   + Partials   1155  134 -1021 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.84% <0.00%> (-0.34%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | | |
   | 
[...java/org/apache/hudi/util/AvroSchemaConverter.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL0F2cm9TY2hlbWFDb252ZXJ0ZXIuamF2YQ==)
 | | |
   | 
[...udi/timeline/service/handlers/BaseFileHandler.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvQmFzZUZpbGVIYW5kbGVyLmphdmE=)
 | | |
   | 
[...ache/hudi/common/table/timeline/HoodieInstant.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnQuamF2YQ==)
 | | |
   | 
[...g/apache/hudi/timeline/service/RequestHandler.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvUmVxdWVzdEhhbmRsZXIuamF2YQ==)
 | | |
   | 

[GitHub] [hudi] loukey-lj commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


loukey-lj commented on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853546766


   2、The task must perform a data load each time it recovers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] loukey-lj removed a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


loukey-lj removed a comment on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853538379


   1、ttl expired key shoud not be loaded to index state  again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread GitBox


codecov-commenter commented on pull request #3026:
URL: https://github.com/apache/hudi/pull/3026#issuecomment-853541709


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3026](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1d0a065) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **decrease** coverage by `45.85%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3026/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3026   +/-   ##
   
   - Coverage 55.13%   9.27%   -45.86% 
   + Complexity 3864  48 -3816 
   
 Files   487  54  -433 
 Lines 236082016-21592 
 Branches   2527 241 -2286 
   
   - Hits  13016 187-12829 
   + Misses 94371816 -7621 
   + Partials   1155  13 -1142 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.27% <ø> (-61.61%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3026?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3026/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Closed] (HUDI-1956) BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-1956.

Resolution: Duplicate

> BucketAssignFunction use ValueState instead of MapState
> ---
>
> Key: HUDI-1956
> URL: https://issues.apache.org/jira/browse/HUDI-1956
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> Use the value state to reduce the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #3026: [HUDI-1931] BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread GitBox


danny0405 opened a new pull request #3026:
URL: https://github.com/apache/hudi/pull/3026


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yuzhaojing commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


yuzhaojing commented on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853539076


   > 1、ttl expired key shoud not be loaded to index state again.
   
   I will add judgment between instant and ttl


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] loukey-lj edited a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


loukey-lj edited a comment on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853538379


   1、ttl expired key shoud not be loaded to index state  again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] loukey-lj commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


loukey-lj commented on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-853538379


   1、ttl expired shoud not load to index state  again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3025](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (0ec9a07) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **increase** coverage by `15.69%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3025/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3025   +/-   ##
   =
   + Coverage 55.13%   70.83%   +15.69% 
   + Complexity 3864  385 -3479 
   =
 Files   487   54  -433 
 Lines 23608 2016-21592 
 Branches   2527  241 -2286 
   =
   - Hits  13016 1428-11588 
   + Misses 9437  454 -8983 
   + Partials   1155  134 -1021 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.84% <0.00%> (-0.34%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0RlZmF1bHRTb3VyY2UuamF2YQ==)
 | | |
   | 
[...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh)
 | | |
   | 
[...he/hudi/exception/HoodieNotSupportedException.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZU5vdFN1cHBvcnRlZEV4Y2VwdGlvbi5qYXZh)
 | | |
   | 
[...spark/src/main/scala/org/apache/hudi/package.scala](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL3BhY2thZ2Uuc2NhbGE=)
 | | |
   | 
[...g/apache/hudi/common/table/log/LogReaderUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Mb2dSZWFkZXJVdGlscy5qYXZh)
 | | |
   | 

[jira] [Created] (HUDI-1956) BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread Danny Chen (Jira)
Danny Chen created HUDI-1956:


 Summary: BucketAssignFunction use ValueState instead of MapState
 Key: HUDI-1956
 URL: https://issues.apache.org/jira/browse/HUDI-1956
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Use the value state to reduce the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd115f8) into 
[master](https://codecov.io/gh/apache/hudi/commit/dcd7c331dc72df9ab10e4867a3592faf89f1480b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (dcd7c33) will **increase** coverage by `15.67%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head fd115f8 differs from pull request most recent 
head f2af1e0. Consider uploading reports for the commit f2af1e0 to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2645   +/-   ##
   =
   + Coverage 55.15%   70.83%   +15.67% 
   + Complexity 3851  385 -3466 
   =
 Files   485   54  -431 
 Lines 23542 2016-21526 
 Branches   2522  241 -2281 
   =
   - Hits  12985 1428-11557 
   + Misses 9405  454 -8951 
   + Partials   1152  134 -1018 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.84% <0.00%> (-0.34%)` | :arrow_down: |
   | 
[...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=)
 | | |
   | 
[...on/table/view/SpillableMapBasedFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvU3BpbGxhYmxlTWFwQmFzZWRGaWxlU3lzdGVtVmlldy5qYXZh)
 | | |
   | 
[...spark/src/main/scala/org/apache/hudi/package.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL3BhY2thZ2Uuc2NhbGE=)
 | | |
   | 
[...pache/hudi/common/model/HoodieMetadataWrapper.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZU1ldGFkYXRhV3JhcHBlci5qYXZh)
 | | |
   | 

[GitHub] [hudi] loukey-lj closed pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state

2021-06-02 Thread GitBox


loukey-lj closed pull request #2994:
URL: https://github.com/apache/hudi/pull/2994


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3025](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (0ec9a07) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **increase** coverage by `15.69%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3025/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3025   +/-   ##
   =
   + Coverage 55.13%   70.83%   +15.69% 
   + Complexity 3864  385 -3479 
   =
 Files   487   54  -433 
 Lines 23608 2016-21592 
 Branches   2527  241 -2286 
   =
   - Hits  13016 1428-11588 
   + Misses 9437  454 -8983 
   + Partials   1155  134 -1021 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.84% <0.00%> (-0.34%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/common/util/NumericUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvTnVtZXJpY1V0aWxzLmphdmE=)
 | | |
   | 
[...src/main/java/org/apache/hudi/QuickstartUtils.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvUXVpY2tzdGFydFV0aWxzLmphdmE=)
 | | |
   | 
[...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh)
 | | |
   | 
[...udi/common/table/timeline/dto/CompactionOpDTO.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Db21wYWN0aW9uT3BEVE8uamF2YQ==)
 | | |
   | 
[...i/src/main/java/org/apache/hudi/cli/HoodieCLI.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZUNMSS5qYXZh)
 | | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd115f8) into 
[master](https://codecov.io/gh/apache/hudi/commit/dcd7c331dc72df9ab10e4867a3592faf89f1480b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (dcd7c33) will **increase** coverage by `15.67%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2645   +/-   ##
   =
   + Coverage 55.15%   70.83%   +15.67% 
   + Complexity 3851  385 -3466 
   =
 Files   485   54  -431 
 Lines 23542 2016-21526 
 Branches   2522  241 -2281 
   =
   - Hits  12985 1428-11557 
   + Misses 9405  454 -8951 
   + Partials   1152  134 -1018 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `70.83% <ø> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.84% <0.00%> (-0.34%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieWriteStat.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVdyaXRlU3RhdC5qYXZh)
 | | |
   | 
[...g/apache/hudi/cli/utils/SparkTempViewProvider.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL1NwYXJrVGVtcFZpZXdQcm92aWRlci5qYXZh)
 | | |
   | 
[...pache/hudi/common/model/HoodieMetadataWrapper.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZU1ldGFkYXRhV3JhcHBlci5qYXZh)
 | | |
   | 
[...apache/hudi/table/format/cow/RunLengthDecoder.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L1J1bkxlbmd0aERlY29kZXIuamF2YQ==)
 | | |
   | 
[...pache/hudi/sink/compact/CompactionCommitEvent.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdEV2ZW50LmphdmE=)
 | | |
   | 

[GitHub] [hudi] codecov-commenter commented on pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-02 Thread GitBox


codecov-commenter commented on pull request #3025:
URL: https://github.com/apache/hudi/pull/3025#issuecomment-853523365


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3025](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (0ec9a07) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **decrease** coverage by `45.85%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3025/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3025   +/-   ##
   
   - Coverage 55.13%   9.27%   -45.86% 
   + Complexity 3864  48 -3816 
   
 Files   487  54  -433 
 Lines 236082016-21592 
 Branches   2527 241 -2286 
   
   - Hits  13016 187-12829 
   + Misses 94371816 -7621 
   + Partials   1155  13 -1142 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.27% <ø> (-61.61%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3025?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3025/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (fd115f8) into 
[master](https://codecov.io/gh/apache/hudi/commit/dcd7c331dc72df9ab10e4867a3592faf89f1480b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (dcd7c33) will **decrease** coverage by `45.88%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2645   +/-   ##
   
   - Coverage 55.15%   9.27%   -45.89% 
   + Complexity 3851  48 -3803 
   
 Files   485  54  -431 
 Lines 235422016-21526 
 Branches   2522 241 -2281 
   
   - Hits  12985 187-12798 
   + Misses 94051816 -7589 
   + Partials   1152  13 -1139 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.27% <ø> (-61.61%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Updated] (HUDI-1955) The filter condition is missing in the judgment condition of compaction instance

2021-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1955:
-
Labels: pull-request-available  (was: )

> The filter condition is missing in the judgment condition of compaction 
> instance
> 
>
> Key: HUDI-1955
> URL: https://issues.apache.org/jira/browse/HUDI-1955
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> The filter condition is missing in the judgment condition of compaction 
> instance in BaseScheduleCompactionActionExecutor.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] swuferhong opened a new pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…

2021-06-02 Thread GitBox


swuferhong opened a new pull request #3025:
URL: https://github.com/apache/hudi/pull/3025


   …action instance
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   The filter condition is missing in the judgment condition of compaction 
instance in BaseScheduleCompactionActionExecutor.java. In method execute(), it 
needs a filter condition, but now it don't have.
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1955) The filter condition is missing in the judgment condition of compaction instance

2021-06-02 Thread Zheng yunhong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng yunhong updated HUDI-1955:

Fix Version/s: 0.9.0

> The filter condition is missing in the judgment condition of compaction 
> instance
> 
>
> Key: HUDI-1955
> URL: https://issues.apache.org/jira/browse/HUDI-1955
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Zheng yunhong
>Priority: Major
> Fix For: 0.9.0
>
>
> The filter condition is missing in the judgment condition of compaction 
> instance in BaseScheduleCompactionActionExecutor.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * fa95b4448c260e28d0aa7506c9ad71234c154d4f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=137)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1955) The filter condition is missing in the judgment condition of compaction instance

2021-06-02 Thread Zheng yunhong (Jira)
Zheng yunhong created HUDI-1955:
---

 Summary: The filter condition is missing in the judgment condition 
of compaction instance
 Key: HUDI-1955
 URL: https://issues.apache.org/jira/browse/HUDI-1955
 Project: Apache Hudi
  Issue Type: Bug
  Components: Compaction
Reporter: Zheng yunhong


The filter condition is missing in the judgment condition of compaction 
instance in BaseScheduleCompactionActionExecutor.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hushenmin commented on issue #3005: [SUPPORT]How to query history snapshot by given one history partition?

2021-06-02 Thread GitBox


hushenmin commented on issue #3005:
URL: https://github.com/apache/hudi/issues/3005#issuecomment-853508788


   In short, in the case of non-global index writes, backtrack historical 
snapshots and remove duplicates


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=136)
 
   * fa95b4448c260e28d0aa7506c9ad71234c154d4f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=137)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135)
 
   * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=136)
 
   * fa95b4448c260e28d0aa7506c9ad71234c154d4f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135)
 
   * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=136)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135)
 
   * ccdcaa2112f0292593f19b7c77f23a8be8bd6fa7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * 49373be61ff4e802fdb794411ef1b3e17d547aff Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=134)
 
   * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=135)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2984: (Azure CI) test PR

2021-06-02 Thread GitBox


hudi-bot edited a comment on pull request #2984:
URL: https://github.com/apache/hudi/pull/2984#issuecomment-846794102


   
   ## CI report:
   
   * 49373be61ff4e802fdb794411ef1b3e17d547aff Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=134)
 
   * 389da6052b5f7735d81d6de8b7118ff8da4c4df9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jtmzheng commented on issue #2995: [SUPPORT] Upserts creating duplicates after enabling metadata table in Hudi 0.7 indexing pipeline

2021-06-02 Thread GitBox


jtmzheng commented on issue #2995:
URL: https://github.com/apache/hudi/issues/2995#issuecomment-853465482


   Update, I was able to run the metadata commands from the CLI and check 
`list-partitions` and `list-files`. It seems like every partition is there, but 
spot checking some partitions its clear not all the files are there:
   
   ```
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of log files 
scanned => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: MaxMemoryInBytes 
allowed for compaction => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of entries 
in MemoryBasedMap in ExternalSpillableMap => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Total size in bytes 
of MemoryBasedMap in ExternalSpillableMap => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of entries 
in DiskBasedMap in ExternalSpillableMap => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Size of file 
spilled to disk => 0
   21/06/03 00:10:32 INFO metadata.HoodieBackedTableMetadata: Opened metadata 
log files from [] at instant 20210524060712(dataset instant=20210524060712, 
metadata instant=20201216222013)
   21/06/03 00:10:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
   21/06/03 00:10:32 INFO metadata.HoodieBackedTableMetadata: Metadata read for 
key 2020/9/4 took [open, baseFileRead, logMerge] [94, 71, 0] ms
   21/06/03 00:10:32 INFO metadata.BaseTableMetadata: Listed file in partition 
from metadata: partition=2020/9/4, #files=121
   
   
.f8a8f054-6d0e-43e8-9412-3dfec79d7d53-0_20210509151344.log.1_10765-6377-69863781
   
.f75e9845-a3a2-4b41-b59a-8effc2ee049a-0_20210429061318.log.1_12603-1679-22268308
   
.f745a448-31fa-4a95-a3c4-5f88f6c95bb2-0_20210515000505.log.2_11136-1920-26157888
   
.f745a448-31fa-4a95-a3c4-5f88f6c95bb2-0_20210515000505.log.1_12428-158-1954195
   ...
   ```
   
   Listing shows 1,539 files under that partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-06-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356054#comment-17356054
 ] 

Vinoth Chandar commented on HUDI-1138:
--

[~guoyihua] Your approach looks good. cc [~shivnarayan] who can help you out if 
you are blocked.

> Re-implement marker files via timeline server
> -
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-06-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356053#comment-17356053
 ] 

Vinoth Chandar commented on HUDI-1138:
--

We cannot, right?. It fundamentally relies on timeline server, as the single 
writer for the `MARKERS` file.

> Re-implement marker files via timeline server
> -
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-06-02 Thread Nishith Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355933#comment-17355933
 ] 

Nishith Agarwal commented on HUDI-1138:
---

[~guoyihua] Thanks for the explanation. Is there a way to decouple this marker 
file implementation from the timeline service ?

cc [~vinoth]

> Re-implement marker files via timeline server
> -
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] jtmzheng commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper

2021-06-02 Thread GitBox


jtmzheng commented on issue #2470:
URL: https://github.com/apache/hudi/issues/2470#issuecomment-853177797


   Unfortunately no, ran into https://github.com/apache/hudi/issues/2995
   
   I think this issue is fine to close out, 
https://github.com/apache/hudi/issues/2470#issuecomment-769948718 got our 
rollback performance to an acceptable state even without the metadata table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #2994:
URL: https://github.com/apache/hudi/pull/2994#issuecomment-848024173


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2994](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8eef75f) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **increase** coverage by `7.91%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2994/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2994  +/-   ##
   
   + Coverage 55.13%   63.04%   +7.91% 
   + Complexity 3864  346-3518 
   
 Files   487   54 -433 
 Lines 23608 2016   -21592 
 Branches   2527  241-2286 
   
   - Hits  13016 1271   -11745 
   + Misses 9437  621-8816 
   + Partials   1155  124-1031 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `63.04% <ø> (-7.84%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2994?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2994/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `40.69% <0.00%> (-23.84%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3020:
URL: https://github.com/apache/hudi/pull/3020#issuecomment-851899487






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3020:
URL: https://github.com/apache/hudi/pull/3020#issuecomment-851899487


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3020](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (24ec6b0) into 
[master](https://codecov.io/gh/apache/hudi/commit/e6a71ea544f3dd1ab5227b4c89a1540df5f7891a?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e6a71ea) will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3020/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3020  +/-   ##
   
   - Coverage 55.14%   55.13%   -0.01% 
   - Complexity 3850 3865  +15 
   
 Files   485  487   +2 
 Lines 2354223608  +66 
 Branches   2522 2527   +5 
   
   + Hits  1298213017  +35 
   - Misses 9406 9435  +29 
   - Partials   1154 1156   +2 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.55% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.31% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `63.34% <ø> (-0.30%)` | :arrow_down: |
   | hudihadoopmr | `51.54% <ø> (ø)` | |
   | hudisparkdatasource | `74.28% <ø> (ø)` | |
   | hudisync | `46.44% <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `70.88% <ø> (+0.04%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh)
 | `50.00% <ø> (ø)` | |
   | 
[...g/apache/hudi/sink/partitioner/BucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVyLmphdmE=)
 | `82.29% <0.00%> (-6.24%)` | :arrow_down: |
   | 
[...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==)
 | `87.09% <0.00%> (-2.56%)` | :arrow_down: |
   | 
[...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=)
 | `76.41% <0.00%> (-1.47%)` | :arrow_down: |
   | 
[.../org/apache/hudi/streamer/HoodieFlinkStreamer.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9Ib29kaWVGbGlua1N0cmVhbWVyLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 

[jira] [Updated] (HUDI-1941) Fix callers of HoodieRecordPayload.preCombine() to use new api with props arg

2021-06-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1941:
--
Labels: sev:high  (was: sev:critical)

> Fix callers of HoodieRecordPayload.preCombine() to use new api with props arg
> -
>
> Key: HUDI-1941
> URL: https://issues.apache.org/jira/browse/HUDI-1941
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:high
>
> We deprecated old api for preCombine and introduced new one. But haven't 
> fixed the callers to use the new api. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-1719) hive on spark/mr,Incremental query of the mor table, the partition field is incorrect

2021-06-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-1719:
---

> hive on spark/mr,Incremental query of the mor table, the partition field is 
> incorrect
> -
>
> Key: HUDI-1719
> URL: https://issues.apache.org/jira/browse/HUDI-1719
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.7.0, 0.8.0
> Environment: spark2.4.5, hadoop 3.1.1, hive 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> now hudi use HoodieCombineHiveInputFormat to achieve Incremental query of the 
> mor table.
> when we have some small files in different partitions, 
> HoodieCombineHiveInputFormat  will combine those small file readers.   
> HoodieCombineHiveInputFormat  build partition field base on  the first file 
> reader in it, however now HoodieCombineHiveInputFormat  holds other file 
> readers which come from different partitions.
> When switching readers, we should  update ioctx
> test env:
> spark2.4.5, hadoop 3.1.1, hive 3.1.1
> test step:
> step1:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(6))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // create hudi table which has three  level partitions p,p1,p2
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
>  
> step2:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // upsert current table
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> hive beeline:
> set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> set hoodie.hive_8b.consume.mode=INCREMENTAL;
> set hoodie.hive_8b.consume.max.commits=3;
> set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp 
> is smaller the earlist commit, so  we can query whole commits
> select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where 
> `_hoodie_commit_time`>'20210325141300' and `keyid` < 5;
> query result:
> +-+++-+
> |p|p1|p2|keyid|
> +-+++-+
> |0|0|6|0|
> |0|0|6|1|
> |0|0|6|2|
> |0|0|6|3|
> |0|0|6|4|
> |0|0|6|4|
> |0|0|6|0|
> |0|0|6|3|
> |0|0|6|2|
> |0|0|6|1|
> +-+++-+
> this result is wrong, since the second step we insert new data in table which 
> p2=7, however in the query result we cannot find p2=7, all p2= 6
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1719) hive on spark/mr,Incremental query of the mor table, the partition field is incorrect

2021-06-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1719.
---
Resolution: Fixed

> hive on spark/mr,Incremental query of the mor table, the partition field is 
> incorrect
> -
>
> Key: HUDI-1719
> URL: https://issues.apache.org/jira/browse/HUDI-1719
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.7.0, 0.8.0
> Environment: spark2.4.5, hadoop 3.1.1, hive 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> now hudi use HoodieCombineHiveInputFormat to achieve Incremental query of the 
> mor table.
> when we have some small files in different partitions, 
> HoodieCombineHiveInputFormat  will combine those small file readers.   
> HoodieCombineHiveInputFormat  build partition field base on  the first file 
> reader in it, however now HoodieCombineHiveInputFormat  holds other file 
> readers which come from different partitions.
> When switching readers, we should  update ioctx
> test env:
> spark2.4.5, hadoop 3.1.1, hive 3.1.1
> test step:
> step1:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(6))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // create hudi table which has three  level partitions p,p1,p2
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
>  
> step2:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // upsert current table
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> hive beeline:
> set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> set hoodie.hive_8b.consume.mode=INCREMENTAL;
> set hoodie.hive_8b.consume.max.commits=3;
> set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp 
> is smaller the earlist commit, so  we can query whole commits
> select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where 
> `_hoodie_commit_time`>'20210325141300' and `keyid` < 5;
> query result:
> +-+++-+
> |p|p1|p2|keyid|
> +-+++-+
> |0|0|6|0|
> |0|0|6|1|
> |0|0|6|2|
> |0|0|6|3|
> |0|0|6|4|
> |0|0|6|4|
> |0|0|6|0|
> |0|0|6|3|
> |0|0|6|2|
> |0|0|6|1|
> +-+++-+
> this result is wrong, since the second step we insert new data in table which 
> p2=7, however in the query result we cannot find p2=7, all p2= 6
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1719) hive on spark/mr,Incremental query of the mor table, the partition field is incorrect

2021-06-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1719:
--
Status: Closed  (was: Patch Available)

> hive on spark/mr,Incremental query of the mor table, the partition field is 
> incorrect
> -
>
> Key: HUDI-1719
> URL: https://issues.apache.org/jira/browse/HUDI-1719
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.7.0, 0.8.0
> Environment: spark2.4.5, hadoop 3.1.1, hive 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> now hudi use HoodieCombineHiveInputFormat to achieve Incremental query of the 
> mor table.
> when we have some small files in different partitions, 
> HoodieCombineHiveInputFormat  will combine those small file readers.   
> HoodieCombineHiveInputFormat  build partition field base on  the first file 
> reader in it, however now HoodieCombineHiveInputFormat  holds other file 
> readers which come from different partitions.
> When switching readers, we should  update ioctx
> test env:
> spark2.4.5, hadoop 3.1.1, hive 3.1.1
> test step:
> step1:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(6))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // create hudi table which has three  level partitions p,p1,p2
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
>  
> step2:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // upsert current table
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> hive beeline:
> set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> set hoodie.hive_8b.consume.mode=INCREMENTAL;
> set hoodie.hive_8b.consume.max.commits=3;
> set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp 
> is smaller the earlist commit, so  we can query whole commits
> select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where 
> `_hoodie_commit_time`>'20210325141300' and `keyid` < 5;
> query result:
> +-+++-+
> |p|p1|p2|keyid|
> +-+++-+
> |0|0|6|0|
> |0|0|6|1|
> |0|0|6|2|
> |0|0|6|3|
> |0|0|6|4|
> |0|0|6|4|
> |0|0|6|0|
> |0|0|6|3|
> |0|0|6|2|
> |0|0|6|1|
> +-+++-+
> this result is wrong, since the second step we insert new data in table which 
> p2=7, however in the query result we cannot find p2=7, all p2= 6
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-1800) Incorrect HoodieTableFileSystem API usage for pending slices causing issues

2021-06-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-1800:
---

> Incorrect HoodieTableFileSystem API usage for pending slices causing issues
> ---
>
> Key: HUDI-1800
> URL: https://issues.apache.org/jira/browse/HUDI-1800
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available, sev:critical
>
> From [~vbalaji]
>  
> We are using wrong API of FileSystemView here
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> We don't include file groups that are in pending compaction but with Hbase 
> Index we are including them. With the current state of code, Including files 
> in pending compaction is an issue.
> This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by 
> CompactionAdminClient to figure out log files that were added after pending 
> compaction and rename them such that we can undo the effects of compaction 
> scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" 
> which gives a consolidated view of the latest file slice and includes all 
> data both before and after compaction. This is what should be used in
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> The other workaround would be excluding file slices in pending compaction 
> when we select small files to avoid the interaction between compactor and 
> ingestion in this case. But, I think we can go with the first option
>  
> More details can be found here -> https://github.com/apache/hudi/issues/2633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1800) Incorrect HoodieTableFileSystem API usage for pending slices causing issues

2021-06-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1800.
---
Fix Version/s: 0.9.0
   Resolution: Fixed

> Incorrect HoodieTableFileSystem API usage for pending slices causing issues
> ---
>
> Key: HUDI-1800
> URL: https://issues.apache.org/jira/browse/HUDI-1800
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.9.0
>
>
> From [~vbalaji]
>  
> We are using wrong API of FileSystemView here
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> We don't include file groups that are in pending compaction but with Hbase 
> Index we are including them. With the current state of code, Including files 
> in pending compaction is an issue.
> This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by 
> CompactionAdminClient to figure out log files that were added after pending 
> compaction and rename them such that we can undo the effects of compaction 
> scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" 
> which gives a consolidated view of the latest file slice and includes all 
> data both before and after compaction. This is what should be used in
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> The other workaround would be excluding file slices in pending compaction 
> when we select small files to avoid the interaction between compactor and 
> ingestion in this case. But, I think we can go with the first option
>  
> More details can be found here -> https://github.com/apache/hudi/issues/2633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1800) Incorrect HoodieTableFileSystem API usage for pending slices causing issues

2021-06-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1800:
--
Status: Closed  (was: Patch Available)

> Incorrect HoodieTableFileSystem API usage for pending slices causing issues
> ---
>
> Key: HUDI-1800
> URL: https://issues.apache.org/jira/browse/HUDI-1800
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available, sev:critical
>
> From [~vbalaji]
>  
> We are using wrong API of FileSystemView here
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> We don't include file groups that are in pending compaction but with Hbase 
> Index we are including them. With the current state of code, Including files 
> in pending compaction is an issue.
> This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by 
> CompactionAdminClient to figure out log files that were added after pending 
> compaction and rename them such that we can undo the effects of compaction 
> scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" 
> which gives a consolidated view of the latest file slice and includes all 
> data both before and after compaction. This is what should be used in
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> The other workaround would be excluding file slices in pending compaction 
> when we select small files to avoid the interaction between compactor and 
> ingestion in this case. But, I think we can go with the first option
>  
> More details can be found here -> https://github.com/apache/hudi/issues/2633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3020:
URL: https://github.com/apache/hudi/pull/3020#issuecomment-851899487


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3020](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (24ec6b0) into 
[master](https://codecov.io/gh/apache/hudi/commit/e6a71ea544f3dd1ab5227b4c89a1540df5f7891a?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e6a71ea) will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3020/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3020  +/-   ##
   
   - Coverage 55.14%   55.13%   -0.01% 
   - Complexity 3850 3865  +15 
   
 Files   485  487   +2 
 Lines 2354223608  +66 
 Branches   2522 2527   +5 
   
   + Hits  1298213017  +35 
   - Misses 9406 9435  +29 
   - Partials   1154 1156   +2 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.55% <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `50.31% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `63.34% <ø> (-0.30%)` | :arrow_down: |
   | hudihadoopmr | `51.54% <ø> (ø)` | |
   | hudisparkdatasource | `74.28% <ø> (ø)` | |
   | hudisync | `46.44% <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `70.88% <ø> (+0.04%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3020?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh)
 | `50.00% <ø> (ø)` | |
   | 
[...g/apache/hudi/sink/partitioner/BucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVyLmphdmE=)
 | `82.29% <0.00%> (-6.24%)` | :arrow_down: |
   | 
[...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==)
 | `87.09% <0.00%> (-2.56%)` | :arrow_down: |
   | 
[...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=)
 | `76.41% <0.00%> (-1.47%)` | :arrow_down: |
   | 
[.../org/apache/hudi/streamer/HoodieFlinkStreamer.java](https://codecov.io/gh/apache/hudi/pull/3020/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9Ib29kaWVGbGlua1N0cmVhbWVyLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 

[GitHub] [hudi] yanghua commented on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state

2021-06-02 Thread GitBox


yanghua commented on pull request #2994:
URL: https://github.com/apache/hudi/pull/2994#issuecomment-853049109


   Busy recently, will review it tommorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua closed pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized

2021-06-02 Thread GitBox


yanghua closed pull request #3020:
URL: https://github.com/apache/hudi/pull/3020


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized

2021-06-02 Thread GitBox


yanghua commented on pull request #3020:
URL: https://github.com/apache/hudi/pull/3020#issuecomment-853047795


   > @yanghua hi, I have modified the code to prevent it from failing due to 
code style verification, but the ci still fails to build code. It should not be 
caused by my modification. Can you help me see it? what should i do?
   
   Yes, I checked, it's not due to your change. But, make sure the CI success 
before merging is a good practice.
   
   I will take over it. Merge it soon. You can move your focus to other things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r643976142



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+import org.jetbrains.annotations.NotNull;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+

[GitHub] [hudi] wangxianghu commented on a change in pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-06-02 Thread GitBox


wangxianghu commented on a change in pull request #2993:
URL: https://github.com/apache/hudi/pull/2993#discussion_r643974052



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java
##
@@ -220,5 +425,21 @@ public void 
testComplexRecordKeysWithComplexPartitionPath() {
 Row row = KeyGeneratorTestUtilities.getRow(record);
 Assertions.assertEquals(keyGenerator.getRecordKey(row), 
"_row_key:key1,pii_col:pi");
 Assertions.assertEquals(keyGenerator.getPartitionPath(row), 
"timestamp=4357686/ts_ms=20200321");
+
+// Test config with HoodieWriteConfig.KEYGENERATOR_TYPE_PROP
+complexRecordKeyAndPartitionPathProps = 
getComplexRecordKeyAndPartitionPathProps();
+complexRecordKeyAndPartitionPathProps
+.put(HoodieWriteConfig.KEYGENERATOR_TYPE_PROP, 
KeyGeneratorType.CUSTOM.name());
+
+keyGenerator = 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(complexRecordKeyAndPartitionPathProps);
+
+record = getRecord();

Review comment:
   > would be nice to avoid copy pasting code. Let's try to parametrize 
tests.
   
   sure 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-06-02 Thread GitBox


wangxianghu commented on pull request #2993:
URL: https://github.com/apache/hudi/pull/2993#issuecomment-853037788


   > May I know what do you mean by "Yes, several places. I will change them 
one by one" ? Do you mean to say, you plan to address in a follow up patch ? 
IMO, it makes sense to do it in this patch itself.
   
   Ok, I'll update this pr to change all of them  at once


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r643894403



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+throw new 

[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-06-02 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-852970149


   I am currently facing a problem and would like to hear your opinion
   After we add this type, 
hoodie.deltastreamer.source.kafka.checkpoint.type=timestamp
   I am currently thinking, does deltastreamer.checkpoint.key maintain the 
status quo? The format is still: topicName,0:123,1:456
   If we continue to maintain the above format, when we specify: for example 
--checkpoint 1622635064, we need to determine the relationship between 
commitMetadata.getMetadata(CHECKPOINT_KEY) and --checkpoint 1622635064 in 
org.apache.hudi.utilities.deltastreamer.DeltaSync#readFromSource, This seems to 
be contrary to the results of our discussion, do not add kafka dependent code 
in DeltaSync
   
   Do you have any suggestions for this? thanks 
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r643892144



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+import org.jetbrains.annotations.NotNull;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 
properties.getString(Config.PASSWORD_FILE)));
+FileSystem fileSystem = 
FileSystem.get(session.sparkContext().hadoopConfiguration());
+passwordFileStream = fileSystem.open(new 
Path(properties.getString(Config.PASSWORD_FILE)));
+byte[] bytes = new byte[passwordFileStream.available()];
+passwordFileStream.read(bytes);
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, new 
String(bytes));
+  } else {
+

[GitHub] [hudi] nsivabalan commented on a change in pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2915:
URL: https://github.com/apache/hudi/pull/2915#discussion_r643884770



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1591,6 +1596,45 @@ public void 
testCsvDFSSourceNoHeaderWithSchemaProviderAndTransformer() throws Ex
 testCsvDFSSource(false, '\t', true, 
Collections.singletonList(TripsWithDistanceTransformer.class.getName()));
   }
 
+  @Test
+  public void testIncrementalFetchInContinuousMode() {

Review comment:
   minor. rename to "testJDBCSourceIncremental." 

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.SqlQueryBuilder;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.storage.StorageLevel;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Reads data from RDBMS data sources.
+ */
+
+public class JdbcSource extends RowSource {
+
+  private static final Logger LOG = LogManager.getLogger(JdbcSource.class);
+  private static final List DB_LIMIT_CLAUSE = Arrays.asList("mysql", 
"postgresql", "h2");
+  private static final String URI_JDBC_PREFIX = "jdbc:";
+
+  public JdbcSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
+SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+  }
+
+  /**
+   * Validates all user properties and prepares the {@link DataFrameReader} to 
read from RDBMS.
+   *
+   * @param sessionThe {@link SparkSession}.
+   * @param properties The JDBC connection properties and data source options.
+   * @return The {@link DataFrameReader} to read from RDBMS
+   * @throws HoodieException
+   */
+  private static DataFrameReader validatePropsAndGetDataFrameReader(final 
SparkSession session,
+final 
TypedProperties properties)
+  throws HoodieException {
+DataFrameReader dataFrameReader;
+FSDataInputStream passwordFileStream = null;
+try {
+  dataFrameReader = session.read().format("jdbc");
+  dataFrameReader = dataFrameReader.option(Config.URL_PROP, 
properties.getString(Config.URL));
+  dataFrameReader = dataFrameReader.option(Config.USER_PROP, 
properties.getString(Config.USER));
+  dataFrameReader = dataFrameReader.option(Config.DRIVER_PROP, 
properties.getString(Config.DRIVER_CLASS));
+  dataFrameReader = dataFrameReader
+  .option(Config.RDBMS_TABLE_PROP, 
properties.getString(Config.RDBMS_TABLE_NAME));
+
+  if (properties.containsKey(Config.PASSWORD)) {
+LOG.info("Reading JDBC password from properties file");
+dataFrameReader = dataFrameReader.option(Config.PASSWORD_PROP, 
properties.getString(Config.PASSWORD));
+  } else if (properties.containsKey(Config.PASSWORD_FILE)
+  && 
!StringUtils.isNullOrEmpty(properties.getString(Config.PASSWORD_FILE))) {
+LOG.info(String.format("Reading JDBC password from password file %s", 

[GitHub] [hudi] nsivabalan commented on a change in pull request #2963: [HUDI-1904] Make SchemaProvider spark free and move it to hudi-client-common

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#discussion_r643879466



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/schema/SchemaProvider.java
##
@@ -34,18 +32,9 @@
 @PublicAPIClass(maturity = ApiMaturityLevel.STABLE)
 public abstract class SchemaProvider implements Serializable {
 
-  protected TypedProperties config;
+  protected Schema sourceSchema;
 
-  protected JavaSparkContext jssc;
-
-  public SchemaProvider(TypedProperties props) {

Review comment:
   just now read other comments. I understand the intent to make it 
agnostic to engines, but not gonna be easy to make it backwards compatible. 
   
   One more thought: we might need to make the base abstract class generic with 
two types (a config class and engine context may be). But this def needs more 
thought. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2963: [HUDI-1904] Make SchemaProvider spark free and move it to hudi-client-common

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#discussion_r643874947



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/schema/SchemaProvider.java
##
@@ -34,18 +32,9 @@
 @PublicAPIClass(maturity = ApiMaturityLevel.STABLE)
 public abstract class SchemaProvider implements Serializable {
 
-  protected TypedProperties config;
+  protected Schema sourceSchema;
 
-  protected JavaSparkContext jssc;
-
-  public SchemaProvider(TypedProperties props) {

Review comment:
   Can we think about making this backwards compatible. If a user defined 
their own schemaProvider, super(...) calls may start to fail if we remove these 
2 constructors. If this was a private interface, we could evolve w/o much 
consideration, but since this is a public api, we got to be careful. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-06-02 Thread GitBox


nsivabalan commented on pull request #2993:
URL: https://github.com/apache/hudi/pull/2993#issuecomment-852948354


   May I know what do you mean by "Yes, several places. I will change them one 
by one" ? Do you mean to say, you plan to address in a follow up patch ? IMO, 
it makes sense to do it in this patch itself. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2993:
URL: https://github.com/apache/hudi/pull/2993#discussion_r643861744



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorType.java
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen.constant;
+
+/**
+ * Types of {@link org.apache.hudi.keygen.KeyGenerator}.
+ */
+public enum KeyGeneratorType {
+  /**
+   * Simple key generator, which takes names of fields to be used for 
recordKey and partitionPath as configs.
+   */
+  SIMPLE,
+
+  /**
+   * Complex key generator, which takes names of fields to be used for 
recordKey and partitionPath as configs.
+   */
+  COMPLEX,
+
+  /**
+   * Key generator, that relies on timestamps for partitioning field. Still 
picks record key by name.
+   */
+  TIMESTAMP,
+
+  /**
+   * A generic implementation of KeyGenerator where users can configure record 
key as a single field or a combination

Review comment:
   Can we please add an example for custom key gen type. users might 
confuse between complex and custom. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-06-02 Thread GitBox


nsivabalan commented on a change in pull request #2993:
URL: https://github.com/apache/hudi/pull/2993#discussion_r643863514



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java
##
@@ -98,53 +103,145 @@ private TypedProperties 
getPropertiesForNonPartitionedKeyGen() {
   }
 
   @Test
-  public void testSimpleKeyGenerator() {
-BuiltinKeyGenerator keyGenerator = new 
CustomKeyGenerator(getPropertiesForSimpleKeyGen());
+  public void testSimpleKeyGenerator() throws IOException {
+TypedProperties propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen();
+
+// Test config with HoodieWriteConfig.KEYGENERATOR_CLASS_PROP
+propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
CustomKeyGenerator.class.getName());
+
+BuiltinKeyGenerator keyGenerator = 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen);
 GenericRecord record = getRecord();
 HoodieKey key = keyGenerator.getKey(record);
 Assertions.assertEquals(key.getRecordKey(), "key1");
 Assertions.assertEquals(key.getPartitionPath(), "timestamp=4357686");
 Row row = KeyGeneratorTestUtilities.getRow(record);
 Assertions.assertEquals(keyGenerator.getRecordKey(row), "key1");
 Assertions.assertEquals(keyGenerator.getPartitionPath(row), 
"timestamp=4357686");
+
+// Test config with HoodieWriteConfig.KEYGENERATOR_TYPE_PROP
+propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen();
+propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_TYPE_PROP, 
KeyGeneratorType.CUSTOM.name());
+
+keyGenerator = 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen);

Review comment:
   we could trying moving this repetitive code to a common private method 
and re-use them. I mean lines 125 to 132. Similarly for all key types. 

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java
##
@@ -98,53 +103,145 @@ private TypedProperties 
getPropertiesForNonPartitionedKeyGen() {
   }
 
   @Test
-  public void testSimpleKeyGenerator() {
-BuiltinKeyGenerator keyGenerator = new 
CustomKeyGenerator(getPropertiesForSimpleKeyGen());
+  public void testSimpleKeyGenerator() throws IOException {
+TypedProperties propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen();
+
+// Test config with HoodieWriteConfig.KEYGENERATOR_CLASS_PROP
+propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_CLASS_PROP, 
CustomKeyGenerator.class.getName());
+
+BuiltinKeyGenerator keyGenerator = 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen);
 GenericRecord record = getRecord();
 HoodieKey key = keyGenerator.getKey(record);
 Assertions.assertEquals(key.getRecordKey(), "key1");
 Assertions.assertEquals(key.getPartitionPath(), "timestamp=4357686");
 Row row = KeyGeneratorTestUtilities.getRow(record);
 Assertions.assertEquals(keyGenerator.getRecordKey(row), "key1");
 Assertions.assertEquals(keyGenerator.getPartitionPath(row), 
"timestamp=4357686");
+
+// Test config with HoodieWriteConfig.KEYGENERATOR_TYPE_PROP
+propertiesForSimpleKeyGen = getPropertiesForSimpleKeyGen();
+propertiesForSimpleKeyGen.put(HoodieWriteConfig.KEYGENERATOR_TYPE_PROP, 
KeyGeneratorType.CUSTOM.name());
+
+keyGenerator = 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(propertiesForSimpleKeyGen);

Review comment:
   or better option would be to go with Parameterized tests. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorType.java
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen.constant;
+
+/**
+ * Types of {@link org.apache.hudi.keygen.KeyGenerator}.
+ */
+public enum KeyGeneratorType {
+  /**
+   * Simple key generator, which takes names of fields to be used for 
recordKey and partitionPath as configs.
+   */
+  SIMPLE,
+
+  /**
+   * Complex key generator, which takes names of fields to be used for 
recordKey and partitionPath as configs.
+   */
+  COMPLEX,
+
+  /**
+ 

[GitHub] [hudi] arun990 commented on issue #3009: Dependency error when attempt to build Hudi from git source ..

2021-06-02 Thread GitBox


arun990 commented on issue #3009:
URL: https://github.com/apache/hudi/issues/3009#issuecomment-852930223


   Hi, Tried to mvn build using the settings.xml and getting the same error as 
pasted below.
   
   [INFO] Building hudi-hadoop-mr 0.8.0 
[6/42]
   [INFO] [ jar 
]-
   [INFO] 

   [INFO] Reactor Summary for Hudi 0.8.0:
   [INFO] 
   [INFO] Hudi ... SUCCESS [02:09 
min]
   [INFO] hudi-common  SUCCESS [01:00 
min]
   [INFO] hudi-timeline-service .. SUCCESS [  5.223 
s]
   [INFO] hudi-client  SUCCESS [  0.229 
s]
   [INFO] hudi-client-common . SUCCESS [ 31.635 
s]
   [INFO] hudi-hadoop-mr . FAILURE [  0.327 
s]
   [INFO] hudi-spark-client .. SKIPPED
   [INFO] hudi-sync-common ... SKIPPED
   [INFO] hudi-hive-sync . SKIPPED
   ---do --skipped
   [INFO] hudi-flink_2.12  SKIPPED
   [INFO] hudi-flink-bundle_2.12 . SKIPPED
   [INFO] 

   [INFO] BUILD FAILURE
   [INFO] 

   [INFO] Total time:  03:48 min
   [INFO] Finished at: 2021-06-02T10:53:56Z
   [INFO] 

   [ERROR] Failed to execute goal on project hudi-hadoop-mr: Could not resolve 
dependencies for project org.apache.hudi
   :hudi-hadoop-mr:jar:0.8.0: Failed to collect dependencies at 
org.apache.hive:hive-exec:jar:core:2.3.1 -> org.apache.
   calcite:calcite-core:jar:1.10.0 -> 
org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Failed to read 
artifac
   t descriptor for org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: 
Could not transfer artifact org.pentaho:
   pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde from/to 
maven-default-http-blocker (http://0.0.0.0/): Blocked mirror f
   or repositories: [datanucleus (http://www.datanucleus.org/downloads/maven2, 
default, releases), glassfish-repository
(http://maven.glassfish.org/content/groups/glassfish, default, disabled), 
glassfish-repo-archive (http://maven.glas
   sfish.org/content/groups/glassfish, default, disabled), apache.snapshots 
(http://repository.apache.org/snapshots, de
   fault, snapshots), conjars (http://conjars.org/repo, default, 
releases+snapshots)] -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please 
read the following articles:
   [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
   [ERROR] 
   [ERROR] After correcting the problems, you can resume the build with the 
command
   [ERROR]   mvn  -rf :hudi-hadoop-mr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Tandoy commented on issue #3009: Dependency error when attempt to build Hudi from git source ..

2021-06-02 Thread GitBox


Tandoy commented on issue #3009:
URL: https://github.com/apache/hudi/issues/3009#issuecomment-852891970


   [settings.md](https://github.com/apache/hudi/files/6583594/settings.md)
   You can try this maven configuration file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1953) No set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1953:
-
Labels: pull-request-available  (was: )

> No set the output type of the operator, Throw java.lang.NullPointerException
> 
>
> Key: HUDI-1953
> URL: https://issues.apache.org/jira/browse/HUDI-1953
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: taylor liao
>Assignee: taylor liao
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Need to set the output type of the operator, Otherwise throw 
> java.lang.NullPointerException.
> java.lang.NullPointerException
> at java.util.Objects.requireNonNull(Objects.java:203)
> at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] taylorliao commented on pull request #3023: [HUDI-1953] No set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread GitBox


taylorliao commented on pull request #3023:
URL: https://github.com/apache/hudi/pull/3023#issuecomment-852852811


   @yanghua hi, i have created a jira ticket and fixed it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1954) StreamWriterFunction only reset when flush success

2021-06-02 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-1954:


 Summary: StreamWriterFunction only reset when flush success
 Key: HUDI-1954
 URL: https://issues.apache.org/jira/browse/HUDI-1954
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Now StreamWriterFunction flush bucket is unsafe. When instant is null, 
flushBucket will return immediately, and then reset this bucket resulting in 
data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1953) No set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread taylor liao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

taylor liao updated HUDI-1953:
--
Summary: No set the output type of the operator, Throw 
java.lang.NullPointerException  (was: Don't set the output type of the 
operator, Throw java.lang.NullPointerException)

> No set the output type of the operator, Throw java.lang.NullPointerException
> 
>
> Key: HUDI-1953
> URL: https://issues.apache.org/jira/browse/HUDI-1953
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: taylor liao
>Assignee: taylor liao
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Need to set the output type of the operator, Otherwise throw 
> java.lang.NullPointerException.
> java.lang.NullPointerException
> at java.util.Objects.requireNonNull(Objects.java:203)
> at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


codecov-commenter edited a comment on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-852820330


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3024](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (89e90c5) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **increase** coverage by `7.91%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 89e90c5 differs from pull request most recent 
head 3935834. Consider uploading reports for the commit 3935834 to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3024/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3024  +/-   ##
   
   + Coverage 55.13%   63.04%   +7.91% 
   + Complexity 3864  346-3518 
   
 Files   487   54 -433 
 Lines 23608 2016   -21592 
 Branches   2527  241-2286 
   
   - Hits  13016 1271   -11745 
   + Misses 9437  621-8816 
   + Partials   1155  124-1031 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `63.04% <ø> (-7.84%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `40.69% <0.00%> (-23.84%)` | :arrow_down: |
   | 

[GitHub] [hudi] chaplinthink commented on pull request #3020: [MINOR] The super class has been serialized, the subclass does not need to be serialized

2021-06-02 Thread GitBox


chaplinthink commented on pull request #3020:
URL: https://github.com/apache/hudi/pull/3020#issuecomment-852832226


   @yanghua hi, I have modified the code to prevent it from failing due to code 
style verification, but the ci still fails to build code. It should not be 
caused by  my modification. Can you help me see it? what should i do?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Issue Comment Deleted] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread taylor liao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

taylor liao updated HUDI-1953:
--
Comment: was deleted

(was: PR: [https://github.com/apache/hudi/pull/3023
])

> Don't set the output type of the operator, Throw 
> java.lang.NullPointerException
> ---
>
> Key: HUDI-1953
> URL: https://issues.apache.org/jira/browse/HUDI-1953
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: taylor liao
>Assignee: taylor liao
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Need to set the output type of the operator, Otherwise throw 
> java.lang.NullPointerException.
> java.lang.NullPointerException
> at java.util.Objects.requireNonNull(Objects.java:203)
> at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter commented on pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


codecov-commenter commented on pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#issuecomment-852820330


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3024](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (89e90c5) into 
[master](https://codecov.io/gh/apache/hudi/commit/05a9830e861f21f74d0674edbed10f8fff853c08?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (05a9830) will **increase** coverage by `7.91%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 89e90c5 differs from pull request most recent 
head 3935834. Consider uploading reports for the commit 3935834 to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3024/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3024  +/-   ##
   
   + Coverage 55.13%   63.04%   +7.91% 
   + Complexity 3864  346-3518 
   
 Files   487   54 -433 
 Lines 23608 2016   -21592 
 Branches   2527  241-2286 
   
   - Hits  13016 1271   -11745 
   + Misses 9437  621-8816 
   + Partials   1155  124-1031 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `63.04% <ø> (-7.84%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3024?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3024/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `40.69% <0.00%> (-23.84%)` | :arrow_down: |
   | 

[jira] [Updated] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread taylor liao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

taylor liao updated HUDI-1953:
--
Status: In Progress  (was: Open)

> Don't set the output type of the operator, Throw 
> java.lang.NullPointerException
> ---
>
> Key: HUDI-1953
> URL: https://issues.apache.org/jira/browse/HUDI-1953
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: taylor liao
>Assignee: taylor liao
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Need to set the output type of the operator, Otherwise throw 
> java.lang.NullPointerException.
> java.lang.NullPointerException
> at java.util.Objects.requireNonNull(Objects.java:203)
> at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hk-lrzy commented on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state

2021-06-02 Thread GitBox


hk-lrzy commented on pull request #2994:
URL: https://github.com/apache/hudi/pull/2994#issuecomment-852818090


   > @hk-lrzy thanks for review,Please read the master code carefully,Using 
MapSate cause state bloat,which is a major bug.
   
   sorry,i got your points. `mapstate` saved unnecessary `recordkey` than 
valuestate like you said


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hk-lrzy removed a comment on pull request #2994: [HUDI-1931] BucketAssignFunction use wrong state

2021-06-02 Thread GitBox


hk-lrzy removed a comment on pull request #2994:
URL: https://github.com/apache/hudi/pull/2994#issuecomment-851997089


   `indexState` is a mapstate and mapstate is also belong to keystate, so i 
think the `indexstate` need not to change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread taylor liao (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355532#comment-17355532
 ] 

taylor liao commented on HUDI-1953:
---

PR: [https://github.com/apache/hudi/pull/3023
]

> Don't set the output type of the operator, Throw 
> java.lang.NullPointerException
> ---
>
> Key: HUDI-1953
> URL: https://issues.apache.org/jira/browse/HUDI-1953
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: taylor liao
>Assignee: taylor liao
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Need to set the output type of the operator, Otherwise throw 
> java.lang.NullPointerException.
> java.lang.NullPointerException
> at java.util.Objects.requireNonNull(Objects.java:203)
> at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1953) Don't set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread taylor liao (Jira)
taylor liao created HUDI-1953:
-

 Summary: Don't set the output type of the operator, Throw 
java.lang.NullPointerException
 Key: HUDI-1953
 URL: https://issues.apache.org/jira/browse/HUDI-1953
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: taylor liao
Assignee: taylor liao
 Fix For: 0.9.0


Need to set the output type of the operator, Otherwise throw 
java.lang.NullPointerException.
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.(StreamElementSerializer.java:65)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.(RecordWriterOutput.java:70)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:709)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainOutputs(OperatorChain.java:270)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] taylorliao commented on pull request #3023: [MINOR] Don't set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread GitBox


taylorliao commented on pull request #3023:
URL: https://github.com/apache/hudi/pull/3023#issuecomment-852803307


   > > @yanghua Can you help to review this PR?
   > 
   > OK, no problem. IMO, it's a bug, can we file a jira ticket to track it?
   > 
   > In addition, another case like it in `StreamWriteITCase` can we also fix 
it together?
   
   ok, i will file a jira ticket to track it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #3023: [MINOR] Don't set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread GitBox


yanghua commented on pull request #3023:
URL: https://github.com/apache/hudi/pull/3023#issuecomment-852800181


   > @yanghua Can you help to review this PR?
   
   OK, no problem. IMO, it's a bug, can we file a jira ticket to track it?
   
   In addition, another case like it in `StreamWriteITCase` can we also fix it 
together?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash closed issue #2437: deltastreamer fails due to "Error upserting bucketType UPDATE for partition" and ArrayIndexOutOfBoundsException

2021-06-02 Thread GitBox


n3nash closed issue #2437:
URL: https://github.com/apache/hudi/issues/2437


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on issue #2437: deltastreamer fails due to "Error upserting bucketType UPDATE for partition" and ArrayIndexOutOfBoundsException

2021-06-02 Thread GitBox


n3nash commented on issue #2437:
URL: https://github.com/apache/hudi/issues/2437#issuecomment-852798103


   @jiangok2006 Closing this issue since the logs are not enough to reproduce 
the issue. Please re-open if you need further help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] taylorliao commented on pull request #3023: [MINOR] Don't set the output type of the operator, Throw java.lang.NullPointerException

2021-06-02 Thread GitBox


taylorliao commented on pull request #3023:
URL: https://github.com/apache/hudi/pull/3023#issuecomment-852797354


   @yanghua Can you help to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists

2021-06-02 Thread GitBox


n3nash commented on issue #2448:
URL: https://github.com/apache/hudi/issues/2448#issuecomment-852796345


   @peng-xin Since we haven't heard from you in a while and this issue has not 
been reported by anyone else, I'm assuming this to be a transient issue with 
some of your settings. Let me know if you need further help. 
   
   @root18039532923 Please feel free to re-open if you are still confused about 
how to use async compaction. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash closed issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists

2021-06-02 Thread GitBox


n3nash closed issue #2448:
URL: https://github.com/apache/hudi/issues/2448


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yuzhaojing opened a new pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

2021-06-02 Thread GitBox


yuzhaojing opened a new pull request #3024:
URL: https://github.com/apache/hudi/pull/3024


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Now flinkStreamer load index in BucketAssginer operator, but in this 
operator every task will load all BaseFile.
   To improve efficiency in load index, add BootstrapFunction to load index and 
shuffle to BucketAssginer, that wo can assigned BaseFile to subtasks.
   issue: https://issues.apache.org/jira/browse/HUDI-1924*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1924) Support bootstrap operator to load index from hoodieTable

2021-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1924:
-
Labels: pull-request-available  (was: )

> Support bootstrap operator to load index from hoodieTable 
> --
>
> Key: HUDI-1924
> URL: https://issues.apache.org/jira/browse/HUDI-1924
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Now we load index in BucketAssign, but hoodieRecords in a baseFile may be 
> belong many task,  So we have to load all files in any BucketAssign task.
> If we add a operator before BucketAssign, then key by index Record to 
> BucketAssign, that we can implement assign part of files to any task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] n3nash closed issue #2461: All records are present in athena query result on glue crawled Hudi tables

2021-06-02 Thread GitBox


n3nash closed issue #2461:
URL: https://github.com/apache/hudi/issues/2461


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on issue #2461: All records are present in athena query result on glue crawled Hudi tables

2021-06-02 Thread GitBox


n3nash commented on issue #2461:
URL: https://github.com/apache/hudi/issues/2461#issuecomment-852791741


   @vrtrepp @noobarcitect Closing this issue since the proposed solution is 
straightforward. If you still need help making your glue connector work, please 
feel free to re-open. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >