[GitHub] [hudi] hudi-bot edited a comment on pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#issuecomment-869653867


   
   ## CI report:
   
   * 7aef86d8976d44c1427ccf944d3ecfa00b629159 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=763)
 
   * 5afbbaabe333b8d290c79dfdeb10d8d3aaca11c7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378064#comment-17378064
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * 15ea5785bc8556cd17b6dc4da5cce7d542fbd896 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=674)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=682)
 
   * 4f3a389384a1fbf0deb654caa490f3b32c3b7e41 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=821)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * 15ea5785bc8556cd17b6dc4da5cce7d542fbd896 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=674)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=682)
 
   * 4f3a389384a1fbf0deb654caa490f3b32c3b7e41 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=821)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #3248: [MINOR] Fix some wrong assert reasons

2021-07-09 Thread GitBox


codecov-commenter commented on pull request #3248:
URL: https://github.com/apache/hudi/pull/3248#issuecomment-877187703


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3248?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3248](https://codecov.io/gh/apache/hudi/pull/3248?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (64ad40a) into 
[master](https://codecov.io/gh/apache/hudi/commit/371526789d663dee85041eb31c27c52c81ef87ef?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (3715267) will **decrease** coverage by `24.52%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3248/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3248?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3248   +/-   ##
   
   - Coverage 27.40%   2.88%   -24.53% 
   + Complexity 1287  85 -1202 
   
 Files   381 281  -100 
 Lines 15108   11620 -3488 
 Branches   1305 952  -353 
   
   - Hits   4141 335 -3806 
   - Misses10667   11259  +592 
   + Partials300  26  -274 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudiclient | `0.00% <ø> (-21.06%)` | :arrow_down: |
   | hudisync | `5.37% <ø> (ø)` | |
   | hudiutilities | `9.25% <ø> (-49.32%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3248?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3248/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3248/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3248/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3248/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3248/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3248/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff

[jira] [Created] (HUDI-2157) Spark write the bucket index table

2021-07-09 Thread XiaoyuGeng (Jira)
XiaoyuGeng created HUDI-2157:


 Summary: Spark write the bucket index table
 Key: HUDI-2157
 URL: https://issues.apache.org/jira/browse/HUDI-2157
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: XiaoyuGeng
Assignee: XiaoyuGeng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3184: [HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * 15ea5785bc8556cd17b6dc4da5cce7d542fbd896 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=674)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=682)
 
   * 4f3a389384a1fbf0deb654caa490f3b32c3b7e41 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1860) Add INSERT_OVERWRITE support to DeltaStreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378063#comment-17378063
 ] 

ASF GitHub Bot commented on HUDI-1860:
--

hudi-bot edited a comment on pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#issuecomment-870410669


   
   ## CI report:
   
   * 15ea5785bc8556cd17b6dc4da5cce7d542fbd896 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=674)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=682)
 
   * 4f3a389384a1fbf0deb654caa490f3b32c3b7e41 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add INSERT_OVERWRITE support to DeltaStreamer
> -
>
> Key: HUDI-1860
> URL: https://issues.apache.org/jira/browse/HUDI-1860
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3250:
URL: https://github.com/apache/hudi/pull/3250#issuecomment-877145194


   
   ## CI report:
   
   * 0420f87e4e013d2c6c1fdff56b82e26d464c635c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=820)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2156) Cluster the table with bucket index

2021-07-09 Thread XiaoyuGeng (Jira)
XiaoyuGeng created HUDI-2156:


 Summary: Cluster the table with bucket index
 Key: HUDI-2156
 URL: https://issues.apache.org/jira/browse/HUDI-2156
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: XiaoyuGeng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2155) Bulk insert support bucket index

2021-07-09 Thread XiaoyuGeng (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaoyuGeng updated HUDI-2155:
-
Summary: Bulk insert support bucket index  (was: Bulk insert support Hive 
index)

> Bulk insert support bucket index
> 
>
> Key: HUDI-2155
> URL: https://issues.apache.org/jira/browse/HUDI-2155
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>
> when index key is String type and the index config is empty



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2155) Bulk insert support Hive index

2021-07-09 Thread XiaoyuGeng (Jira)
XiaoyuGeng created HUDI-2155:


 Summary: Bulk insert support Hive index
 Key: HUDI-2155
 URL: https://issues.apache.org/jira/browse/HUDI-2155
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Index
Reporter: XiaoyuGeng
Assignee: XiaoyuGeng


when index key is String type and the index config is empty



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2141) Integration flink metric in flink stream

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378037#comment-17378037
 ] 

ASF GitHub Bot commented on HUDI-2141:
--

hudi-bot edited a comment on pull request #3235:
URL: https://github.com/apache/hudi/pull/3235#issuecomment-875512974


   
   ## CI report:
   
   * 12f9fc5e0391242aaddc4e0c09ada8ee3c745a47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=819)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Integration flink metric in flink stream
> 
>
> Key: HUDI-2141
> URL: https://issues.apache.org/jira/browse/HUDI-2141
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Now hoodie metrics can't work in flink stream because Designed for batch 
> processing,  integration flink metric in flink stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3235: [HUDI-2141] Integration flink metric in flink stream

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3235:
URL: https://github.com/apache/hudi/pull/3235#issuecomment-875512974


   
   ## CI report:
   
   * 12f9fc5e0391242aaddc4e0c09ada8ee3c745a47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=819)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3250:
URL: https://github.com/apache/hudi/pull/3250#issuecomment-877145194


   
   ## CI report:
   
   * 0420f87e4e013d2c6c1fdff56b82e26d464c635c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=820)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config

2021-07-09 Thread GitBox


hudi-bot commented on pull request #3250:
URL: https://github.com/apache/hudi/pull/3250#issuecomment-877145194


   
   ## CI report:
   
   * 0420f87e4e013d2c6c1fdff56b82e26d464c635c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-866996072


   
   ## CI report:
   
   * 890e9822855fcd45c8387f83740975f43474cddc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=814)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=818)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378033#comment-17378033
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

hudi-bot edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-866996072


   
   ## CI report:
   
   * 890e9822855fcd45c8387f83740975f43474cddc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=814)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=818)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope opened a new pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config

2021-07-09 Thread GitBox


codope opened a new pull request #3250:
URL: https://github.com/apache/hudi/pull/3250


   ## What is the purpose of the pull request
   
   `EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION` config key depends on 
`AVRO_SCHEMA`, which is no longer a static string. This PR sets the key 
correctly for the aforementioned config.
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as 
`TestHoodieWriteConfig`. Added assert for this particular config to verify the 
fix.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jazzyMin opened a new issue #3249: execute insert into ... error:Unexpected exception. This is a bug. Please consider filing an issue.

2021-07-09 Thread GitBox


jazzyMin opened a new issue #3249:
URL: https://github.com/apache/hudi/issues/3249


   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :0.9.0
   
   * Flink version : 1.12.2
   
   * Hive version : 2.3.6
   
   * Hadoop version :3.0
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   `Exception in thread "main" 
org.apache.flink.table.client.SqlClientException: Unexpected exception. This is 
a bug. Please consider filing an issue.
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:215)
   Caused by: java.lang.RuntimeException: Error running SQL job.
at 
org.apache.flink.table.client.gateway.local.LocalExecutor.lambda$executeUpdateInternal$4(LocalExecutor.java:514)
at 
org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:256)
at 
org.apache.flink.table.client.gateway.local.LocalExecutor.executeUpdateInternal(LocalExecutor.java:507)
at 
org.apache.flink.table.client.gateway.local.LocalExecutor.executeUpdate(LocalExecutor.java:428)
at 
org.apache.flink.table.client.cli.CliClient.callInsert(CliClient.java:690)
at 
org.apache.flink.table.client.cli.CliClient.callCommand(CliClient.java:327)
at java.util.Optional.ifPresent(Optional.java:159)
at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:214)
at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:144)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:115)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201)
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
JobGraph.
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.flink.table.client.gateway.local.LocalExecutor.lambda$executeUpdateInternal$4(LocalExecutor.java:511)
... 10 more
   Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit JobGraph.
at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:400)
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:390)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at 
org.apache.fli

[jira] [Commented] (HUDI-2141) Integration flink metric in flink stream

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378018#comment-17378018
 ] 

ASF GitHub Bot commented on HUDI-2141:
--

hudi-bot edited a comment on pull request #3235:
URL: https://github.com/apache/hudi/pull/3235#issuecomment-875512974


   
   ## CI report:
   
   * fbf4eb53e9c96782796612e8bbdaed4aa22d8caf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=788)
 
   * 12f9fc5e0391242aaddc4e0c09ada8ee3c745a47 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=819)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Integration flink metric in flink stream
> 
>
> Key: HUDI-2141
> URL: https://issues.apache.org/jira/browse/HUDI-2141
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Now hoodie metrics can't work in flink stream because Designed for batch 
> processing,  integration flink metric in flink stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3235: [HUDI-2141] Integration flink metric in flink stream

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3235:
URL: https://github.com/apache/hudi/pull/3235#issuecomment-875512974


   
   ## CI report:
   
   * fbf4eb53e9c96782796612e8bbdaed4aa22d8caf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=788)
 
   * 12f9fc5e0391242aaddc4e0c09ada8ee3c745a47 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=819)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2141) Integration flink metric in flink stream

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378016#comment-17378016
 ] 

ASF GitHub Bot commented on HUDI-2141:
--

hudi-bot edited a comment on pull request #3235:
URL: https://github.com/apache/hudi/pull/3235#issuecomment-875512974


   
   ## CI report:
   
   * fbf4eb53e9c96782796612e8bbdaed4aa22d8caf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=788)
 
   * 12f9fc5e0391242aaddc4e0c09ada8ee3c745a47 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Integration flink metric in flink stream
> 
>
> Key: HUDI-2141
> URL: https://issues.apache.org/jira/browse/HUDI-2141
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Now hoodie metrics can't work in flink stream because Designed for batch 
> processing,  integration flink metric in flink stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3235: [HUDI-2141] Integration flink metric in flink stream

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3235:
URL: https://github.com/apache/hudi/pull/3235#issuecomment-875512974


   
   ## CI report:
   
   * fbf4eb53e9c96782796612e8bbdaed4aa22d8caf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=788)
 
   * 12f9fc5e0391242aaddc4e0c09ada8ee3c745a47 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378011#comment-17378011
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

codecov-commenter edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871511739


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3193](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (4cb7cc5) into 
[master](https://codecov.io/gh/apache/hudi/commit/c50c24908a4eb1ed769efb8878e6075d3e162d55?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c50c249) will **decrease** coverage by `30.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3193/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3193   +/-   ##
   =
   - Coverage 47.62%   17.60%   -30.02% 
   + Complexity 5504  883 -4621 
   =
 Files   930  381  -549 
 Lines 4127515108-26167 
 Branches   4138 1305 -2833 
   =
   - Hits  19657 2660-16997 
   + Misses1987012283 -7587 
   + Partials   1748  165 -1583 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `21.05% <ø> (-13.54%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.37% <ø> (-49.15%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.25% <ø> (-49.32%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-09 Thread GitBox


codecov-commenter edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871511739


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3193](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (4cb7cc5) into 
[master](https://codecov.io/gh/apache/hudi/commit/c50c24908a4eb1ed769efb8878e6075d3e162d55?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c50c249) will **decrease** coverage by `30.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3193/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3193   +/-   ##
   =
   - Coverage 47.62%   17.60%   -30.02% 
   + Complexity 5504  883 -4621 
   =
 Files   930  381  -549 
 Lines 4127515108-26167 
 Branches   4138 1305 -2833 
   =
   - Hits  19657 2660-16997 
   + Misses1987012283 -7587 
   + Partials   1748  165 -1583 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `21.05% <ø> (-13.54%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.37% <ø> (-49.15%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.25% <ø> (-49.32%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3193?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3193/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...rg/apache/hudi/utilities/source

[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378009#comment-17378009
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

hudi-bot edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-866996072


   
   ## CI report:
   
   * 890e9822855fcd45c8387f83740975f43474cddc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=814)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=818)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-866996072


   
   ## CI report:
   
   * 890e9822855fcd45c8387f83740975f43474cddc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=814)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=818)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378008#comment-17378008
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

codope commented on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-877110964


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope commented on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-09 Thread GitBox


codope commented on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-877110964


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2147) Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377991#comment-17377991
 ] 

ASF GitHub Bot commented on HUDI-2147:
--

wangxianghu commented on pull request #3243:
URL: https://github.com/apache/hudi/pull/3243#issuecomment-877088985


   hi @yanghua  please take a look when free.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove unused class AvroConvertor in hudi-flink
> ---
>
> Key: HUDI-2147
> URL: https://issues.apache.org/jira/browse/HUDI-2147
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] wangxianghu commented on pull request #3243: [HUDI-2147] Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread GitBox


wangxianghu commented on pull request #3243:
URL: https://github.com/apache/hudi/pull/3243#issuecomment-877088985


   hi @yanghua  please take a look when free.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2143) Tweak the default compaction target IO to 500GB when flink async compaction is off

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377988#comment-17377988
 ] 

ASF GitHub Bot commented on HUDI-2143:
--

danny0405 commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666847718



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   There is already one, the default value is 5GB for online compaction, 
this patch fix the offline  compaction with default value as 500GB.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tweak the default compaction target IO to 500GB when flink async compaction 
> is off
> --
>
> Key: HUDI-2143
> URL: https://issues.apache.org/jira/browse/HUDI-2143
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 commented on a change in pull request #3238: [HUDI-2143] Tweak the default compaction target IO to 500GB when flin…

2021-07-09 Thread GitBox


danny0405 commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666847718



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   There is already one, the default value is 5GB for online compaction, 
this patch fix the offline  compaction with default value as 500GB.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3248: [MINOR] Fix some wrong assert reasons

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3248:
URL: https://github.com/apache/hudi/pull/3248#issuecomment-877033209


   
   ## CI report:
   
   * 64ad40af7f34ebdc4f3fac3331fbad8523051591 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=817)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-09 Thread liwei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377966#comment-17377966
 ] 

liwei commented on HUDI-1468:
-

[~vinoth] hello , is [https://github.com/apache/hudi/pull/3139/files] land it , 
this issue can close?

> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3248: [MINOR] Fix some wrong assert reasons

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3248:
URL: https://github.com/apache/hudi/pull/3248#issuecomment-877033209


   
   ## CI report:
   
   * 64ad40af7f34ebdc4f3fac3331fbad8523051591 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=817)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3248: [MINOR] Fix some wrong assert reasons

2021-07-09 Thread GitBox


hudi-bot commented on pull request #3248:
URL: https://github.com/apache/hudi/pull/3248#issuecomment-877033209


   
   ## CI report:
   
   * 64ad40af7f34ebdc4f3fac3331fbad8523051591 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua opened a new pull request #3248: [MINOR] Fix some wrong assert reasons

2021-07-09 Thread GitBox


yanghua opened a new pull request #3248:
URL: https://github.com/apache/hudi/pull/3248


   ## What is the purpose of the pull request
   
   *Fix some wrong assert reasons*
   
   ## Brief change log
   
 - *Fix some wrong assert reasons*
   
   ## Verify this pull request
   
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2143) Tweak the default compaction target IO to 500GB when flink async compaction is off

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377947#comment-17377947
 ] 

ASF GitHub Bot commented on HUDI-2143:
--

yanghua commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666787064



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   The rewritten value is a constant value. Why not provide an 
`SYNC_COMPACTION_TARGET_IO` option and make it to be `500GB`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tweak the default compaction target IO to 500GB when flink async compaction 
> is off
> --
>
> Key: HUDI-2143
> URL: https://issues.apache.org/jira/browse/HUDI-2143
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua commented on a change in pull request #3238: [HUDI-2143] Tweak the default compaction target IO to 500GB when flin…

2021-07-09 Thread GitBox


yanghua commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666787064



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   The rewritten value is a constant value. Why not provide an 
`SYNC_COMPACTION_TARGET_IO` option and make it to be `500GB`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377940#comment-17377940
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * 4cb7cc5952b6aa3cd81cd3367c6c974bba104d00 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=815)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * 4cb7cc5952b6aa3cd81cd3367c6c974bba104d00 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=815)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2154) Add index key field into HoodieKey

2021-07-09 Thread XiaoyuGeng (Jira)
XiaoyuGeng created HUDI-2154:


 Summary: Add index key field into HoodieKey
 Key: HUDI-2154
 URL: https://issues.apache.org/jira/browse/HUDI-2154
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Index
Reporter: XiaoyuGeng
Assignee: XiaoyuGeng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2143) Tweak the default compaction target IO to 500GB when flink async compaction is off

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377930#comment-17377930
 ] 

ASF GitHub Bot commented on HUDI-2143:
--

danny0405 commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666775074



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   Why the user needs to know that ? They can set up the target IO by 
themself if they want BTW.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tweak the default compaction target IO to 500GB when flink async compaction 
> is off
> --
>
> Key: HUDI-2143
> URL: https://issues.apache.org/jira/browse/HUDI-2143
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 commented on a change in pull request #3238: [HUDI-2143] Tweak the default compaction target IO to 500GB when flin…

2021-07-09 Thread GitBox


danny0405 commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666775074



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   Why the user needs to know that ? They can set up the target IO by 
themself if they want BTW.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2143) Tweak the default compaction target IO to 500GB when flink async compaction is off

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377926#comment-17377926
 ] 

ASF GitHub Bot commented on HUDI-2143:
--

yanghua commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666768715



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   Yes, but the user may not know it would be rewritten under some 
condition like this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tweak the default compaction target IO to 500GB when flink async compaction 
> is off
> --
>
> Key: HUDI-2143
> URL: https://issues.apache.org/jira/browse/HUDI-2143
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua commented on a change in pull request #3238: [HUDI-2143] Tweak the default compaction target IO to 500GB when flin…

2021-07-09 Thread GitBox


yanghua commented on a change in pull request #3238:
URL: https://github.com/apache/hudi/pull/3238#discussion_r666768715



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
##
@@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration 
conf) {
   conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
   conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
 }
+if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)

Review comment:
   Yes, but the user may not know it would be rewritten under some 
condition like this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377918#comment-17377918
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

yanghua commented on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-877001823


   > Hi @yanghua , I have write a util to convert the parquet schema to spark 
schema json without the spark dependencies. Please take a look again~
   
   OK, sounds good. Will review it soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua commented on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-09 Thread GitBox


yanghua commented on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-877001823


   > Hi @yanghua , I have write a util to convert the parquet schema to spark 
schema json without the spark dependencies. Please take a look again~
   
   OK, sounds good. Will review it soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-2087) Support Append only in Flink stream

2021-07-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2087.
--
Fix Version/s: 0.9.0
   Resolution: Done

371526789d663dee85041eb31c27c52c81ef87ef

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2087) Support Append only in Flink stream

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377914#comment-17377914
 ] 

ASF GitHub Bot commented on HUDI-2087:
--

yanghua merged pull request #3174:
URL: https://github.com/apache/hudi/pull/3174


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2087) Support Append only in Flink stream

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377913#comment-17377913
 ] 

ASF GitHub Bot commented on HUDI-2087:
--

yanghua commented on pull request #3174:
URL: https://github.com/apache/hudi/pull/3174#issuecomment-876999303


   The failed test cases ran successfully in the contributor's local env. So 
merge it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2087] Support Append only in Flink stream (#3174)

2021-07-09 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 3715267  [HUDI-2087] Support Append only in Flink stream (#3174)
3715267 is described below

commit 371526789d663dee85041eb31c27c52c81ef87ef
Author: yuzhaojing <32435329+yuzhaoj...@users.noreply.github.com>
AuthorDate: Fri Jul 9 16:06:32 2021 +0800

[HUDI-2087] Support Append only in Flink stream (#3174)

Co-authored-by: 喻兆靖 
---
 .../hudi/table/action/commit/BucketType.java   |  2 +-
 .../apache/hudi/client/HoodieFlinkWriteClient.java | 10 ++-
 .../java/org/apache/hudi/io/FlinkAppendHandle.java |  3 +-
 .../java/org/apache/hudi/io/FlinkCreateHandle.java | 12 
 .../commit/BaseFlinkCommitActionExecutor.java  |  5 +-
 .../apache/hudi/configuration/FlinkOptions.java|  6 ++
 .../sink/partitioner/BucketAssignFunction.java | 44 +---
 .../hudi/sink/partitioner/BucketAssigner.java  | 10 ++-
 .../apache/hudi/streamer/FlinkStreamerConfig.java  | 11 ++-
 .../org/apache/hudi/table/HoodieTableFactory.java  |  6 ++
 .../org/apache/hudi/sink/TestWriteCopyOnWrite.java | 79 ++
 .../org/apache/hudi/sink/TestWriteMergeOnRead.java | 15 
 .../hudi/sink/TestWriteMergeOnReadWithCompact.java | 15 ++--
 .../test/java/org/apache/hudi/utils/TestData.java  | 61 +
 14 files changed, 246 insertions(+), 33 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BucketType.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BucketType.java
index 70ee473..e1fd161 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BucketType.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BucketType.java
@@ -19,5 +19,5 @@
 package org.apache.hudi.table.action.commit;
 
 public enum BucketType {
-  UPDATE, INSERT
+  UPDATE, INSERT, APPEND_ONLY
 }
diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
index 05e4481..71ca1b6 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
@@ -56,6 +56,7 @@ import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.HoodieTimelineArchiveLog;
 import org.apache.hudi.table.MarkerFiles;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
+import org.apache.hudi.table.action.commit.BucketType;
 import org.apache.hudi.table.action.compact.FlinkCompactHelpers;
 import org.apache.hudi.table.upgrade.FlinkUpgradeDowngrade;
 import org.apache.hudi.util.FlinkClientUtil;
@@ -408,6 +409,12 @@ public class HoodieFlinkWriteClient extends
 final HoodieRecordLocation loc = record.getCurrentLocation();
 final String fileID = loc.getFileId();
 final String partitionPath = record.getPartitionPath();
+// append only mode always use FlinkCreateHandle
+if (loc.getInstantTime().equals(BucketType.APPEND_ONLY.name())) {
+  return new FlinkCreateHandle<>(config, instantTime, table, partitionPath,
+  fileID, table.getTaskContextSupplier());
+}
+
 if (bucketToHandles.containsKey(fileID)) {
   MiniBatchHandle lastHandle = (MiniBatchHandle) 
bucketToHandles.get(fileID);
   if (lastHandle.shouldReplace()) {
@@ -424,7 +431,8 @@ public class HoodieFlinkWriteClient extends
 if (isDelta) {
   writeHandle = new FlinkAppendHandle<>(config, instantTime, table, 
partitionPath, fileID, recordItr,
   table.getTaskContextSupplier());
-} else if (loc.getInstantTime().equals("I")) {
+} else if (loc.getInstantTime().equals(BucketType.INSERT.name()) || 
loc.getInstantTime().equals(BucketType.APPEND_ONLY.name())) {
+  // use the same handle for insert bucket
   writeHandle = new FlinkCreateHandle<>(config, instantTime, table, 
partitionPath,
   fileID, table.getTaskContextSupplier());
 } else {
diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkAppendHandle.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkAppendHandle.java
index 987f335..41d0666 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkAppendHandle.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkAppendHandle.java
@@ -25,6 +25,7 @@ import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.table.action.commit.BucketTy

[GitHub] [hudi] yanghua merged pull request #3174: [HUDI-2087] Support Append only in Flink stream

2021-07-09 Thread GitBox


yanghua merged pull request #3174:
URL: https://github.com/apache/hudi/pull/3174


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #3174: [HUDI-2087] Support Append only in Flink stream

2021-07-09 Thread GitBox


yanghua commented on pull request #3174:
URL: https://github.com/apache/hudi/pull/3174#issuecomment-876999303


   The failed test cases ran successfully in the contributor's local env. So 
merge it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2087) Support Append only in Flink stream

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377911#comment-17377911
 ] 

ASF GitHub Bot commented on HUDI-2087:
--

yanghua commented on pull request #3174:
URL: https://github.com/apache/hudi/pull/3174#issuecomment-876998662


   > This CI failed doesn't seem to be due to me, I can run these test case 
success in my local.
   > ![181625816106_ 
pic_hd](https://user-images.githubusercontent.com/32435329/125041520-6e649300-e0cb-11eb-9cc5-def181ac4de2.jpg)
   
   @vinothchandar @xushiyan weird scene, we doubt that Travis may cache 
something here. Please chime in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua commented on pull request #3174: [HUDI-2087] Support Append only in Flink stream

2021-07-09 Thread GitBox


yanghua commented on pull request #3174:
URL: https://github.com/apache/hudi/pull/3174#issuecomment-876998662


   > This CI failed doesn't seem to be due to me, I can run these test case 
success in my local.
   > ![181625816106_ 
pic_hd](https://user-images.githubusercontent.com/32435329/125041520-6e649300-e0cb-11eb-9cc5-def181ac4de2.jpg)
   
   @vinothchandar @xushiyan weird scene, we doubt that Travis may cache 
something here. Please chime in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377902#comment-17377902
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * dafeb649db949d16c20837ee173933bc5ebe7bf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=807)
 
   * 4cb7cc5952b6aa3cd81cd3367c6c974bba104d00 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=815)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * dafeb649db949d16c20837ee173933bc5ebe7bf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=807)
 
   * 4cb7cc5952b6aa3cd81cd3367c6c974bba104d00 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=815)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2087) Support Append only in Flink stream

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377887#comment-17377887
 ] 

ASF GitHub Bot commented on HUDI-2087:
--

yuzhaojing commented on pull request #3174:
URL: https://github.com/apache/hudi/pull/3174#issuecomment-876982342


   This CI failed doesn't seem to be due to me, I can run these test case 
success in my local.
   ![181625816106_ 
pic_hd](https://user-images.githubusercontent.com/32435329/125041520-6e649300-e0cb-11eb-9cc5-def181ac4de2.jpg)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yuzhaojing commented on pull request #3174: [HUDI-2087] Support Append only in Flink stream

2021-07-09 Thread GitBox


yuzhaojing commented on pull request #3174:
URL: https://github.com/apache/hudi/pull/3174#issuecomment-876982342


   This CI failed doesn't seem to be due to me, I can run these test case 
success in my local.
   ![181625816106_ 
pic_hd](https://user-images.githubusercontent.com/32435329/125041520-6e649300-e0cb-11eb-9cc5-def181ac4de2.jpg)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2147) Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377883#comment-17377883
 ] 

ASF GitHub Bot commented on HUDI-2147:
--

codecov-commenter edited a comment on pull request #3243:
URL: https://github.com/apache/hudi/pull/3243#issuecomment-876425618


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3243](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (6ff9539) into 
[master](https://codecov.io/gh/apache/hudi/commit/c50c24908a4eb1ed769efb8878e6075d3e162d55?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c50c249) will **increase** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3243/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3243  +/-   ##
   
   + Coverage 47.62%   47.65%   +0.02% 
   - Complexity 5504 5505   +1 
   
 Files   930  929   -1 
 Lines 4127541258  -17 
 Branches   4138 4138  
   
   + Hits  1965719660   +3 
   + Misses1987019850  -20 
 Partials   1748 1748  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.58% <ø> (ø)` | |
   | hudicommon | `48.59% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `59.83% <ø> (+0.24%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (ø)` | |
   | hudisync | `54.51% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `58.57% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.14% <0.00%> (ø)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Last update 
[c50c249...6ff9539](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3243: [HUDI-2147] Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread GitBox


codecov-commenter edited a comment on pull request #3243:
URL: https://github.com/apache/hudi/pull/3243#issuecomment-876425618


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3243](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (6ff9539) into 
[master](https://codecov.io/gh/apache/hudi/commit/c50c24908a4eb1ed769efb8878e6075d3e162d55?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c50c249) will **increase** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3243/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3243  +/-   ##
   
   + Coverage 47.62%   47.65%   +0.02% 
   - Complexity 5504 5505   +1 
   
 Files   930  929   -1 
 Lines 4127541258  -17 
 Branches   4138 4138  
   
   + Hits  1965719660   +3 
   + Misses1987019850  -20 
 Partials   1748 1748  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.58% <ø> (ø)` | |
   | hudicommon | `48.59% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `59.83% <ø> (+0.24%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (ø)` | |
   | hudisync | `54.51% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `58.57% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.14% <0.00%> (ø)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Last update 
[c50c249...6ff9539](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Read the [comment 
docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the A

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377882#comment-17377882
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

pengzhiwei2018 commented on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-876979104


   Hi @yanghua , I have write a util to convert the parquet schema to spark 
schema json without the spark dependencies. Please take a look again~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-09 Thread GitBox


pengzhiwei2018 commented on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-876979104


   Hi @yanghua , I have write a util to convert the parquet schema to spark 
schema json without the spark dependencies. Please take a look again~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1425) Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377881#comment-17377881
 ] 

ASF GitHub Bot commented on HUDI-1425:
--

pengzhiwei2018 commented on pull request #2296:
URL: https://github.com/apache/hudi/pull/2296#issuecomment-876977979


   > if you are still interested in pursuing this. or happy to take over
   
   Yes, it is a long time for this PR. I will spend some time to review this 
issue again and solve the comments you have left. Maybe next week~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Performance loss with the additional hoodieRecords.isEmpty() in 
> HoodieSparkSqlWriter#write
> --
>
> Key: HUDI-1425
> URL: https://issues.apache.org/jira/browse/HUDI-1425
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: 截屏2020-11-30 下午9.47.55.png
>
>
> Currently in HoodieSparkSqlWriter#write, there is a _isEmpty()_ test for 
> _hoodieRecords._ This may be a heavy operator in the case when the 
> _hoodieRecords_ contains complex RDD operate.
> !截屏2020-11-30 下午9.47.55.png|width=1255,height=161!
> IMO this test does nothing to do with the performance improve,but rather 
> affects performance.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on pull request #2296: [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write

2021-07-09 Thread GitBox


pengzhiwei2018 commented on pull request #2296:
URL: https://github.com/apache/hudi/pull/2296#issuecomment-876977979


   > if you are still interested in pursuing this. or happy to take over
   
   Yes, it is a long time for this PR. I will spend some time to review this 
issue again and solve the comments you have left. Maybe next week~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377877#comment-17377877
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * dafeb649db949d16c20837ee173933bc5ebe7bf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=807)
 
   * 4cb7cc5952b6aa3cd81cd3367c6c974bba104d00 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * dafeb649db949d16c20837ee173933bc5ebe7bf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=807)
 
   * 4cb7cc5952b6aa3cd81cd3367c6c974bba104d00 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2147) Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377875#comment-17377875
 ] 

ASF GitHub Bot commented on HUDI-2147:
--

codecov-commenter edited a comment on pull request #3243:
URL: https://github.com/apache/hudi/pull/3243#issuecomment-876425618


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3243](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (6ff9539) into 
[master](https://codecov.io/gh/apache/hudi/commit/c50c24908a4eb1ed769efb8878e6075d3e162d55?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c50c249) will **increase** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3243/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3243  +/-   ##
   
   + Coverage 47.62%   47.65%   +0.02% 
   - Complexity 5504 5505   +1 
   
 Files   930  929   -1 
 Lines 4127541258  -17 
 Branches   4138 4138  
   
   + Hits  1965719660   +3 
   + Misses1987019850  -20 
 Partials   1748 1748  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.58% <ø> (ø)` | |
   | hudicommon | `48.59% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `59.83% <ø> (+0.24%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (ø)` | |
   | hudisync | `54.51% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `58.57% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.14% <0.00%> (ø)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Last update 
[c50c249...6ff9539](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3243: [HUDI-2147] Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread GitBox


codecov-commenter edited a comment on pull request #3243:
URL: https://github.com/apache/hudi/pull/3243#issuecomment-876425618


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3243](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (6ff9539) into 
[master](https://codecov.io/gh/apache/hudi/commit/c50c24908a4eb1ed769efb8878e6075d3e162d55?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c50c249) will **increase** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3243/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3243  +/-   ##
   
   + Coverage 47.62%   47.65%   +0.02% 
   - Complexity 5504 5505   +1 
   
 Files   930  929   -1 
 Lines 4127541258  -17 
 Branches   4138 4138  
   
   + Hits  1965719660   +3 
   + Misses1987019850  -20 
 Partials   1748 1748  
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.58% <ø> (ø)` | |
   | hudicommon | `48.59% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `59.83% <ø> (+0.24%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (ø)` | |
   | hudisync | `54.51% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `58.57% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.14% <0.00%> (ø)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/3243/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Last update 
[c50c249...6ff9539](https://codecov.io/gh/apache/hudi/pull/3243?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Read the [comment 
docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the A

[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377874#comment-17377874
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

hudi-bot edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-866996072


   
   ## CI report:
   
   * 890e9822855fcd45c8387f83740975f43474cddc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=814)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-09 Thread GitBox


hudi-bot edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-866996072


   
   ## CI report:
   
   * 890e9822855fcd45c8387f83740975f43474cddc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=814)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377871#comment-17377871
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

pengzhiwei2018 commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r666725447



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##
@@ -137,11 +136,14 @@ class MergeOnReadSnapshotRelation(val sqlContext: 
SQLContext,
   }
 
   def buildFileIndex(filters: Array[Filter]): List[HoodieMergeOnReadFileSplit] 
= {
-
-val fileStatuses = if (globPaths.isDefined) {
+// Get all partition paths
+val partitionPaths = if (globPaths.isDefined) {
   // Load files from the global paths if it has defined to be compatible 
with the original mode
   val inMemoryFileIndex = 
HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get)
-  inMemoryFileIndex.allFiles()
+  val fsView = new HoodieTableFileSystemView(metaClient,
+metaClient.getActiveTimeline.getCommitsTimeline
+  .filterCompletedInstants, inMemoryFileIndex.allFiles().toArray)
+  
fsView.getLatestBaseFiles.iterator().asScala.toList.map(_.getFileStatus.getPath.getParent)

Review comment:
   Currently we have already support the `non-globalPaths ` query way by 
https://github.com/apache/hudi/pull/2651, I will add some doc to introduce it 
in 0.9 release.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377872#comment-17377872
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

pengzhiwei2018 commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r666725674



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestMergeIntoTable2.scala
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.hudi.testutils.DataSourceTestUtils
+
+class TestMergeIntoTable2 extends TestHoodieSqlBase {

Review comment:
   Yes, I will rename it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-09 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r666725674



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestMergeIntoTable2.scala
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.hudi.testutils.DataSourceTestUtils
+
+class TestMergeIntoTable2 extends TestHoodieSqlBase {

Review comment:
   Yes, I will rename it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-09 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r666725447



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##
@@ -137,11 +136,14 @@ class MergeOnReadSnapshotRelation(val sqlContext: 
SQLContext,
   }
 
   def buildFileIndex(filters: Array[Filter]): List[HoodieMergeOnReadFileSplit] 
= {
-
-val fileStatuses = if (globPaths.isDefined) {
+// Get all partition paths
+val partitionPaths = if (globPaths.isDefined) {
   // Load files from the global paths if it has defined to be compatible 
with the original mode
   val inMemoryFileIndex = 
HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get)
-  inMemoryFileIndex.allFiles()
+  val fsView = new HoodieTableFileSystemView(metaClient,
+metaClient.getActiveTimeline.getCommitsTimeline
+  .filterCompletedInstants, inMemoryFileIndex.allFiles().toArray)
+  
fsView.getLatestBaseFiles.iterator().asScala.toList.map(_.getFileStatus.getPath.getParent)

Review comment:
   Currently we have already support the `non-globalPaths ` query way by 
https://github.com/apache/hudi/pull/2651, I will add some doc to introduce it 
in 0.9 release.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377870#comment-17377870
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

codecov-commenter edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-867078369


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3142](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (890e982) into 
[master](https://codecov.io/gh/apache/hudi/commit/047d956e01b6d7c92320686d8321b2bbe9d2188e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (047d956) will **decrease** coverage by `28.11%`.
   > The diff coverage is `40.35%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3142/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3142   +/-   ##
   =
   - Coverage 44.03%   15.91%   -28.12% 
   + Complexity 5100  493 -4607 
   =
 Files   930  283  -647 
 Lines 4127511710-29565 
 Branches   4138  961 -3177 
   =
   - Hits  18174 1864-16310 
   + Misses21487 9683-11804 
   + Partials   1614  163 -1451 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.59%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.37% <ø> (-49.15%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <90.19%> (+50.00%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/async/AsyncClusteringService.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2FzeW5jL0FzeW5jQ2x1c3RlcmluZ1NlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...ava/org/apache/hudi/async/AsyncCompactService.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2FzeW5jL0FzeW5jQ29tcGFjdFNlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/async/HoodieAsyncService.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2FzeW5jL0hvb2RpZUFzeW5jU2VydmljZS5qYXZh)
 | `0.00% <0.00%> (ø)` | |
   | 
[...g/apache/hudi/client/AbstractClusteringClient.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9BYnN0cmFjdENsdXN0ZXJpbmdDbGllbnQuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=co

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-09 Thread GitBox


codecov-commenter edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-867078369


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3142](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (890e982) into 
[master](https://codecov.io/gh/apache/hudi/commit/047d956e01b6d7c92320686d8321b2bbe9d2188e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (047d956) will **decrease** coverage by `28.11%`.
   > The diff coverage is `40.35%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3142/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3142   +/-   ##
   =
   - Coverage 44.03%   15.91%   -28.12% 
   + Complexity 5100  493 -4607 
   =
 Files   930  283  -647 
 Lines 4127511710-29565 
 Branches   4138  961 -3177 
   =
   - Hits  18174 1864-16310 
   + Misses21487 9683-11804 
   + Partials   1614  163 -1451 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.59%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.37% <ø> (-49.15%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <90.19%> (+50.00%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../org/apache/hudi/async/AsyncClusteringService.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2FzeW5jL0FzeW5jQ2x1c3RlcmluZ1NlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...ava/org/apache/hudi/async/AsyncCompactService.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2FzeW5jL0FzeW5jQ29tcGFjdFNlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/async/HoodieAsyncService.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2FzeW5jL0hvb2RpZUFzeW5jU2VydmljZS5qYXZh)
 | `0.00% <0.00%> (ø)` | |
   | 
[...g/apache/hudi/client/AbstractClusteringClient.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9BYnN0cmFjdENsdXN0ZXJpbmdDbGllbnQuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/3142/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVDbHVzdGVyaW5nQ29uZmlnLmphdmE=)
 | `0.00% <0.00%> (-71.28%)` | :arrow_down: |

[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377868#comment-17377868
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

pengzhiwei2018 commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r666723325



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -161,28 +162,18 @@
 .map(instant -> 
fsView.getLatestMergedFileSlicesBeforeOrOn(relPartitionPath, 
instant.getTimestamp()))
 .orElse(Stream.empty());
 
-// subgroup splits again by file id & match with log files.
-Map> groupedInputSplits = 
partitionsToParquetSplits.get(partitionPath).stream()
-.collect(Collectors.groupingBy(file -> 
FSUtils.getFileId(file.getFileStatus().getPath().getName(;
 latestFileSlices.forEach(fileSlice -> {
-  List dataFileSplits = 
groupedInputSplits.get(fileSlice.getFileId());
-  dataFileSplits.forEach(split -> {
-try {
-  List logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+  List logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
   .map(logFile -> 
logFile.getPath().toString()).collect(Collectors.toList());
-  resultMap.put(split, logFilePaths);
-} catch (Exception e) {
-  throw new HoodieException("Error creating hoodie real time split 
", e);
-}
-  });
+  baseAndLogsList.add(Pair.of(fileSlice.getBaseFile(), logFilePaths));

Review comment:
   No, for log only file slice, the fileSlice.getBaseFile() is an empty 
Option.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-09 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r666723325



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -161,28 +162,18 @@
 .map(instant -> 
fsView.getLatestMergedFileSlicesBeforeOrOn(relPartitionPath, 
instant.getTimestamp()))
 .orElse(Stream.empty());
 
-// subgroup splits again by file id & match with log files.
-Map> groupedInputSplits = 
partitionsToParquetSplits.get(partitionPath).stream()
-.collect(Collectors.groupingBy(file -> 
FSUtils.getFileId(file.getFileStatus().getPath().getName(;
 latestFileSlices.forEach(fileSlice -> {
-  List dataFileSplits = 
groupedInputSplits.get(fileSlice.getFileId());
-  dataFileSplits.forEach(split -> {
-try {
-  List logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+  List logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
   .map(logFile -> 
logFile.getPath().toString()).collect(Collectors.toList());
-  resultMap.put(split, logFilePaths);
-} catch (Exception e) {
-  throw new HoodieException("Error creating hoodie real time split 
", e);
-}
-  });
+  baseAndLogsList.add(Pair.of(fileSlice.getBaseFile(), logFilePaths));

Review comment:
   No, for log only file slice, the fileSlice.getBaseFile() is an empty 
Option.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377865#comment-17377865
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-09 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




<    1   2   3