[jira] [Commented] (HUDI-1951) Hash Index for HUDI

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378984#comment-17378984
 ] 

ASF GitHub Bot commented on HUDI-1951:
--

minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667671074



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/SimpleAvroKeyGenerator.java
##
@@ -30,19 +33,36 @@
 
   public SimpleAvroKeyGenerator(TypedProperties props) {
 this(props, 
props.getString(KeyGeneratorOptions.RECORDKEY_FIELD_OPT_KEY.key()),
-
props.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY.key()));
+props.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY.key()),
+props.getString(KeyGeneratorOptions.INDEXKEY_FILED_OPT.key(),
+KeyGeneratorOptions.INDEXKEY_FILED_OPT.defaultValue()));
   }
 
   SimpleAvroKeyGenerator(TypedProperties props, String partitionPathField) {
-this(props, null, partitionPathField);
+this(props, null, partitionPathField, null);
   }
 
   SimpleAvroKeyGenerator(TypedProperties props, String recordKeyField, String 
partitionPathField) {
+this(props, recordKeyField, partitionPathField, null);
+  }
+
+  SimpleAvroKeyGenerator(TypedProperties props, String recordKeyField, String 
partitionPathField,
+  String indexKeyField) {
 super(props);
 this.recordKeyFields = recordKeyField == null
 ? Collections.emptyList()
 : Collections.singletonList(recordKeyField);
 this.partitionPathFields = Collections.singletonList(partitionPathField);
+if (!StringUtils.isNullOrEmpty(indexKeyField) && 
!indexKeyField.equals(recordKeyField)) {

Review comment:
   Incorrect check here. But for the bucket index, indexKeyField can be the 
subset. There is an one2one match between bucketId and file groupId. Therefore, 
the record indexed by `colA` is always clustered to the same bucket and updated 
by key `colA` and `colB` with the old one stored in the bucket




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hash Index for HUDI
> ---
>
> Key: HUDI-1951
> URL: https://issues.apache.org/jira/browse/HUDI-1951
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: pull-request-available
>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] minihippo commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-07-11 Thread GitBox


minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667671074



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/SimpleAvroKeyGenerator.java
##
@@ -30,19 +33,36 @@
 
   public SimpleAvroKeyGenerator(TypedProperties props) {
 this(props, 
props.getString(KeyGeneratorOptions.RECORDKEY_FIELD_OPT_KEY.key()),
-
props.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY.key()));
+props.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY.key()),
+props.getString(KeyGeneratorOptions.INDEXKEY_FILED_OPT.key(),
+KeyGeneratorOptions.INDEXKEY_FILED_OPT.defaultValue()));
   }
 
   SimpleAvroKeyGenerator(TypedProperties props, String partitionPathField) {
-this(props, null, partitionPathField);
+this(props, null, partitionPathField, null);
   }
 
   SimpleAvroKeyGenerator(TypedProperties props, String recordKeyField, String 
partitionPathField) {
+this(props, recordKeyField, partitionPathField, null);
+  }
+
+  SimpleAvroKeyGenerator(TypedProperties props, String recordKeyField, String 
partitionPathField,
+  String indexKeyField) {
 super(props);
 this.recordKeyFields = recordKeyField == null
 ? Collections.emptyList()
 : Collections.singletonList(recordKeyField);
 this.partitionPathFields = Collections.singletonList(partitionPathField);
+if (!StringUtils.isNullOrEmpty(indexKeyField) && 
!indexKeyField.equals(recordKeyField)) {

Review comment:
   Incorrect check here. But for the bucket index, indexKeyField can be the 
subset. There is an one2one match between bucketId and file groupId. Therefore, 
the record indexed by `colA` is always clustered to the same bucket and updated 
by key `colA` and `colB` with the old one stored in the bucket




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378981#comment-17378981
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=855)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=855)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378975#comment-17378975
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5022f1d) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.87%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2438   +/-   ##
   
   - Coverage 47.72%   2.85%   -44.88% 
   + Complexity 5528  85 -5443 
   
 Files   934 283  -651 
 Lines 41457   11751-29706 
 Branches   4166 966 -3200 
   
   - Hits  19786 335-19451 
   + Misses19914   11390 -8524 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.37% <ø> (-49.15%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `8.99% <0.00%> (-50.27%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `0.00% <0.00%> (-71.57%)` | :arrow_down: |
   | 
[...apache/hudi/utilities/sources/AvroKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb0thZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...hudi/utilities/sources/helpers/KafkaOffsetGen.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9LYWZrYU9mZnNldEdlbi5qYXZh)
 | `0.00% <0.00%> (-87.69%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&ut

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5022f1d) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.87%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2438   +/-   ##
   
   - Coverage 47.72%   2.85%   -44.88% 
   + Complexity 5528  85 -5443 
   
 Files   934 283  -651 
 Lines 41457   11751-29706 
 Branches   4166 966 -3200 
   
   - Hits  19786 335-19451 
   + Misses19914   11390 -8524 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.37% <ø> (-49.15%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `8.99% <0.00%> (-50.27%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `0.00% <0.00%> (-71.57%)` | :arrow_down: |
   | 
[...apache/hudi/utilities/sources/AvroKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb0thZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...hudi/utilities/sources/helpers/KafkaOffsetGen.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9LYWZrYU9mZnNldEdlbi5qYXZh)
 | `0.00% <0.00%> (-87.69%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/ut

[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378974#comment-17378974
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=856)
 
   * 5022f1d97e4e9b140d8e41b5b49c034ceb9ae601 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=856)
 
   * 5022f1d97e4e9b140d8e41b5b49c034ceb9ae601 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378972#comment-17378972
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

zhangyue19921010 edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this scene is similar to multi writers.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 edited a comment on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-11 Thread GitBox


zhangyue19921010 edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this scene is similar to multi writers.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378970#comment-17378970
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

zhangyue19921010 edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this scene is similar to multi writer.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 edited a comment on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-11 Thread GitBox


zhangyue19921010 edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this scene is similar to multi writer.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1951) Hash Index for HUDI

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378967#comment-17378967
 ] 

ASF GitHub Bot commented on HUDI-1951:
--

minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667652419



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/HiveBucketUtils.java
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.regex.Pattern;
+
+import org.apache.hudi.common.fs.FSUtils;
+
+public class HiveBucketUtils {
+  private static final Pattern SPARK_BUCKET_NAME = 
Pattern.compile(".*_(\\d+)(?:\\..*)?$");
+
+  public static int mod(int x, int y) {
+int r = x % y;
+if (r < 0) {
+  return (r + y) % y;
+} else {
+  return r;
+}
+  }
+
+  public static int bucketId(String key, int numBuckets) {
+return bucketId(Collections.singletonList(key), numBuckets);
+  }
+
+  // copied from spark HiveHashFunction class
+  // see 
https://code.byted.org/bytedance/spark/blob/branch-3.0-bd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala

Review comment:
   done

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/HiveBucketUtils.java
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.regex.Pattern;
+
+import org.apache.hudi.common.fs.FSUtils;
+
+public class HiveBucketUtils {
+  private static final Pattern SPARK_BUCKET_NAME = 
Pattern.compile(".*_(\\d+)(?:\\..*)?$");
+
+  public static int mod(int x, int y) {
+int r = x % y;
+if (r < 0) {
+  return (r + y) % y;
+} else {
+  return r;
+}
+  }
+
+  public static int bucketId(String key, int numBuckets) {
+return bucketId(Collections.singletonList(key), numBuckets);
+  }
+
+  // copied from spark HiveHashFunction class
+  // see 
https://code.byted.org/bytedance/spark/blob/branch-3.0-bd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
+  public static int bucketId(List values, int numBuckets) {
+int hash = 0;
+for (Object value : values) {
+  hash = 31 * hash;
+  if (value == null) {
+hash += 0;
+  } else if (value instanceof Boolean) {
+hash += HiveHasher.hashInt(value.equals(Boolean.TRUE) ? 1 : 0);
+  } else if (value instanceof Integer) {
+hash += HiveHasher.hashInt((Integer) value);
+  } else if (value instanceof Long) {
+hash += HiveHasher.hashLong((Long) value);
+  } else if (value instanceof Float) {
+hash += HiveHasher.hashInt(Float.floatToIntBits((Float) value));
+  } else if (value instanceof Double) {
+hash += HiveHasher.hashLong(Double.doubleToLongBits((Double) value));
+  } else if (value instanceof String) {
+byte[] a = value.toString().getBytes();
+hash += HiveHasher.hashUnsafeBytes(a, 
HiveHasher.Platform.BYTE_ARRAY_OFFSET, a.length);
+  } else {
+throw new RuntimeException("Unsupported type " + 
value.getClass().getName());
+  }
+}
+return mod(hash & Integer.MAX_VALUE, numBu

[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378969#comment-17378969
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

zhangyue19921010 edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this is this scene is similar to multi writer.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1483) async clustering for deltastreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378968#comment-17378968
 ] 

ASF GitHub Bot commented on HUDI-1483:
--

zhangyue19921010 commented on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file group named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this is this scene is similar to multi writer.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> --
>
> Key: HUDI-1483
> URL: https://issues.apache.org/jira/browse/HUDI-1483
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 edited a comment on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-11 Thread GitBox


zhangyue19921010 edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this is this scene is similar to multi writer.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

2021-07-11 Thread GitBox


zhangyue19921010 commented on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file group named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this is this scene is similar to multi writer.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1951) Hash Index for HUDI

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378965#comment-17378965
 ] 

ASF GitHub Bot commented on HUDI-1951:
--

minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667652076



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BucketInfo.java
##
@@ -30,6 +30,10 @@
   String fileIdPrefix;
   String partitionPath;
 
+  public BucketInfo() {
+new BucketInfo(null, null, null);

Review comment:
   done

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/HiveBucketUtils.java
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.regex.Pattern;
+
+import org.apache.hudi.common.fs.FSUtils;
+
+public class HiveBucketUtils {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hash Index for HUDI
> ---
>
> Key: HUDI-1951
> URL: https://issues.apache.org/jira/browse/HUDI-1951
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: pull-request-available
>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378966#comment-17378966
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 67041c2d836e61355aea26bd24f91548ec5e92ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=839)
 
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=856)
 
   * 5022f1d97e4e9b140d8e41b5b49c034ceb9ae601 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] minihippo commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-07-11 Thread GitBox


minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667652419



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/HiveBucketUtils.java
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.regex.Pattern;
+
+import org.apache.hudi.common.fs.FSUtils;
+
+public class HiveBucketUtils {
+  private static final Pattern SPARK_BUCKET_NAME = 
Pattern.compile(".*_(\\d+)(?:\\..*)?$");
+
+  public static int mod(int x, int y) {
+int r = x % y;
+if (r < 0) {
+  return (r + y) % y;
+} else {
+  return r;
+}
+  }
+
+  public static int bucketId(String key, int numBuckets) {
+return bucketId(Collections.singletonList(key), numBuckets);
+  }
+
+  // copied from spark HiveHashFunction class
+  // see 
https://code.byted.org/bytedance/spark/blob/branch-3.0-bd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala

Review comment:
   done

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/HiveBucketUtils.java
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.regex.Pattern;
+
+import org.apache.hudi.common.fs.FSUtils;
+
+public class HiveBucketUtils {
+  private static final Pattern SPARK_BUCKET_NAME = 
Pattern.compile(".*_(\\d+)(?:\\..*)?$");
+
+  public static int mod(int x, int y) {
+int r = x % y;
+if (r < 0) {
+  return (r + y) % y;
+} else {
+  return r;
+}
+  }
+
+  public static int bucketId(String key, int numBuckets) {
+return bucketId(Collections.singletonList(key), numBuckets);
+  }
+
+  // copied from spark HiveHashFunction class
+  // see 
https://code.byted.org/bytedance/spark/blob/branch-3.0-bd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
+  public static int bucketId(List values, int numBuckets) {
+int hash = 0;
+for (Object value : values) {
+  hash = 31 * hash;
+  if (value == null) {
+hash += 0;
+  } else if (value instanceof Boolean) {
+hash += HiveHasher.hashInt(value.equals(Boolean.TRUE) ? 1 : 0);
+  } else if (value instanceof Integer) {
+hash += HiveHasher.hashInt((Integer) value);
+  } else if (value instanceof Long) {
+hash += HiveHasher.hashLong((Long) value);
+  } else if (value instanceof Float) {
+hash += HiveHasher.hashInt(Float.floatToIntBits((Float) value));
+  } else if (value instanceof Double) {
+hash += HiveHasher.hashLong(Double.doubleToLongBits((Double) value));
+  } else if (value instanceof String) {
+byte[] a = value.toString().getBytes();
+hash += HiveHasher.hashUnsafeBytes(a, 
HiveHasher.Platform.BYTE_ARRAY_OFFSET, a.length);
+  } else {
+throw new RuntimeException("Unsupported type " + 
value.getClass().getName());
+  }
+}
+return mod(hash & Integer.MAX_VALUE, numBuckets);
+  }
+
+  // uuid 00499982-0f0d-473e-965d-e8a7476ec429

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific

[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 67041c2d836e61355aea26bd24f91548ec5e92ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=839)
 
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=856)
 
   * 5022f1d97e4e9b140d8e41b5b49c034ceb9ae601 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] minihippo commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-07-11 Thread GitBox


minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667652076



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BucketInfo.java
##
@@ -30,6 +30,10 @@
   String fileIdPrefix;
   String partitionPath;
 
+  public BucketInfo() {
+new BucketInfo(null, null, null);

Review comment:
   done

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/HiveBucketUtils.java
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.regex.Pattern;
+
+import org.apache.hudi.common.fs.FSUtils;
+
+public class HiveBucketUtils {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1951) Hash Index for HUDI

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378964#comment-17378964
 ] 

ASF GitHub Bot commented on HUDI-1951:
--

minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667651995



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
##
@@ -189,6 +189,13 @@
   .defaultValue("false")
   .withDocumentation("");
 
+  // * Hive Bucket Index Configs *

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hash Index for HUDI
> ---
>
> Key: HUDI-1951
> URL: https://issues.apache.org/jira/browse/HUDI-1951
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: pull-request-available
>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] minihippo commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-07-11 Thread GitBox


minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r667651995



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
##
@@ -189,6 +189,13 @@
   .defaultValue("false")
   .withDocumentation("");
 
+  // * Hive Bucket Index Configs *

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378958#comment-17378958
 ] 

ASF GitHub Bot commented on HUDI-1896:
--

satishmittal closed pull request #3256:
URL: https://github.com/apache/hudi/pull/3256


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378959#comment-17378959
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 67041c2d836e61355aea26bd24f91548ec5e92ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=839)
 
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=856)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishmittal1111 closed pull request #3256: [HUDI-1896] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread GitBox


satishmittal closed pull request #3256:
URL: https://github.com/apache/hudi/pull/3256


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 67041c2d836e61355aea26bd24f91548ec5e92ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=839)
 
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=856)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378956#comment-17378956
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 67041c2d836e61355aea26bd24f91548ec5e92ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=839)
 
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 67041c2d836e61355aea26bd24f91548ec5e92ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=839)
 
   * 8bc0333e4fc14158b126da1f7b14f6c43a3abfb8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378951#comment-17378951
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **increase** coverage by `0.01%`.
   > The diff coverage is `70.67%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3120  +/-   ##
   
   + Coverage 47.72%   47.74%   +0.01% 
   - Complexity 5528   +27 
   
 Files   934  935   +1 
 Lines 4145741536  +79 
 Branches   4166 4180  +14 
   
   + Hits  1978619830  +44 
   - Misses1991419943  +29 
   - Partials   1757 1763   +6 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.45% <ø> (ø)` | |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (ø)` | |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.20% <55.55%> (-0.47%)` | :arrow_down: |
   | hudisync | `55.73% <71.77%> (+1.22%)` | :arrow_up: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `73.91% <ø> (ø)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh)
 | `71.51% <40.00%> (-0.53%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `56.94% <56.94%> (ø)` | |
   | 
[...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=)
 | `74.77% <75.00%> (-0.46%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?sr

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **increase** coverage by `0.01%`.
   > The diff coverage is `70.67%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3120  +/-   ##
   
   + Coverage 47.72%   47.74%   +0.01% 
   - Complexity 5528   +27 
   
 Files   934  935   +1 
 Lines 4145741536  +79 
 Branches   4166 4180  +14 
   
   + Hits  1978619830  +44 
   - Misses1991419943  +29 
   - Partials   1757 1763   +6 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.45% <ø> (ø)` | |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (ø)` | |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.20% <55.55%> (-0.47%)` | :arrow_down: |
   | hudisync | `55.73% <71.77%> (+1.22%)` | :arrow_up: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `73.91% <ø> (ø)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh)
 | `71.51% <40.00%> (-0.53%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `56.94% <56.94%> (ø)` | |
   | 
[...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=)
 | `74.77% <75.00%> (-0.46%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `7

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378950#comment-17378950
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `3.77%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3120  +/-   ##
   
   - Coverage 47.72%   43.95%   -3.78% 
   + Complexity 5528 4867 -661 
   
 Files   934  855  -79 
 Lines 4145737084-4373 
 Branches   4166 3493 -673 
   
   - Hits  1978616299-3487 
   + Misses1991419533 -381 
   + Partials   1757 1252 -505 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.45% <ø> (ø)` | |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (ø)` | |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `3.77%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3120  +/-   ##
   
   - Coverage 47.72%   43.95%   -3.78% 
   + Complexity 5528 4867 -661 
   
 Files   934  855  -79 
 Lines 4145737084-4373 
 Branches   4166 3493 -673 
   
   - Hits  1978616299-3487 
   + Misses1991419533 -381 
   + Partials   1757 1252 -505 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.45% <ø> (ø)` | |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (ø)` | |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down:

[jira] [Commented] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-11 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378949#comment-17378949
 ] 

Danny Chen commented on HUDI-2162:
--

You should set up the timeout correctly.

> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous , and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown. 
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once, This kind of usage is too weak under the 
> context.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2073) The sparkJob of hoodieClusteringJob running through sparkSubmit will not quit even it is finished or failed.

2021-07-11 Thread Yue Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang resolved HUDI-2073.
-
Resolution: Fixed

> The sparkJob of hoodieClusteringJob running through sparkSubmit will not quit 
> even it is finished or failed.
> 
>
> Key: HUDI-2073
> URL: https://issues.apache.org/jira/browse/HUDI-2073
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Users can launch HoodieClusteringJob through sparkSubmit to 
>  # Scheduling clustering
>  # Execute clustering
> But these spark jobs will never finished and SparkSubmit never quit even jobs 
> are finished or failed.
> This is because clustering job will init a SparkRDDWriteClient to doSchedule 
> or do cluster But did not close this client after that. It will cause that 
> `jsc.stop();` can't kill this sparkJob and hang forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378948#comment-17378948
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=855)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=855)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378947#comment-17378947
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `20.39%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   27.33%   -20.40% 
   + Complexity 5528 1291 -4237 
   =
 Files   934  386  -548 
 Lines 4145715326-26131 
 Branches   4166 1337 -2829 
   =
   - Hits  19786 4189-15597 
   + Misses1991410834 -9080 
   + Partials   1757  303 -1454 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.93% <ø> (-13.52%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+com

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `20.39%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   27.33%   -20.40% 
   + Complexity 5528 1291 -4237 
   =
 Files   934  386  -548 
 Lines 4145715326-26131 
 Branches   4166 1337 -2829 
   =
   - Hits  19786 4189-15597 
   + Misses1991410834 -9080 
   + Partials   1757  303 -1454 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.93% <ø> (-13.52%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonP

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378940#comment-17378940
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `31.95%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   15.76%   -31.96% 
   + Complexity 5528  493 -5035 
   =
 Files   934  284  -650 
 Lines 4145711822-29635 
 Branches   4166  982 -3184 
   =
   - Hits  19786 1864-17922 
   + Misses19914 9795-10119 
   + Partials   1757  163 -1594 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comm

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `31.95%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   15.76%   -31.96% 
   + Complexity 5528  493 -5035 
   =
 Files   934  284  -650 
 Lines 4145711822-29635 
 Branches   4166  982 -3184 
   =
   - Hits  19786 1864-17922 
   + Misses19914 9795-10119 
   + Partials   1757  163 -1594 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPa

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378933#comment-17378933
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.89%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3120   +/-   ##
   
   - Coverage 47.72%   2.83%   -44.90% 
   + Complexity 5528  85 -5443 
   
 Files   934 284  -650 
 Lines 41457   11822-29635 
 Branches   4166 982 -3184 
   
   - Hits  19786 335-19451 
   + Misses19914   11461 -8453 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ffa9341) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.89%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3120   +/-   ##
   
   - Coverage 47.72%   2.83%   -44.90% 
   + Complexity 5528  85 -5443 
   
 Files   934 284  -650 
 Lines 41457   11822-29635 
 Branches   4166 982 -3184 
   
   - Hits  19786 335-19451 
   + Misses19914   11461 -8453 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schem

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378931#comment-17378931
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378930#comment-17378930
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378926#comment-17378926
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   * ffa934182ad9abe78c0592772e9328dd6b0d27e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378929#comment-17378929
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   * 0fb47c68d425f44a4b3a0be43a047215f2f37a83 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378928#comment-17378928
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

pengzhiwei2018 commented on a change in pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#discussion_r667622950



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -110,6 +110,12 @@
   @Parameter(names = {"--batch-sync-num"}, description = "The number of 
partitions one batch when synchronous partitions to hive")
   public Integer batchSyncNum = 1000;
 
+  @Parameter(names = {"--spark-datasource"}, description = "Whether sync this 
table as spark data source table.")
+  public Boolean syncAsSparkDataSourceTable = true;
+
+  @Parameter(names = {"--spark-schema-length-threshold"}, description = "The 
maximum length allowed in a single cell when storing additional schema 
information in Hive's metastore.")
+  public int sparkSchemaLengthThreshold = 4000;

Review comment:
   It is the default value in spark conf: 
spark.sql.sources.schemaStringLengthThreshold

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/ConfigUtils.java
##
@@ -23,12 +23,11 @@
 import org.apache.hudi.common.util.StringUtils;
 
 public class ConfigUtils {
-
-  public static final String SPARK_QUERY_TYPE_KEY = "spark.query.type.key";
-
-  public static final String SPARK_QUERY_AS_RO_KEY = "spark.query.as.ro.key";
-
-  public static final String SPARK_QUERY_AS_RT_KEY = "spark.query.as.rt.key";
+  /**
+   * Config stored in hive serde properties to tell query engine (spark/flink) 
to
+   * read the table as a read-optimized table when this config is true.
+   */
+  public static final String IS_QUERY_AS_RO_TABLE = "hoodie.query.as.ro.table";
 

Review comment:
   SPARK_QUERY_AS_RO_KEY is introduced by 
https://github.com/apache/hudi/pull/2925 for spark sql writer to pass some 
params. It can only used for spark engine.  In this PR, we do not need this 
now.  We use IS_QUERY_AS_RO_TABLE which can be used for both spark & flink.

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/Parquet2SparkSchemaUtils.java
##
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hive.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.parquet.schema.GroupType;
+import org.apache.parquet.schema.OriginalType;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.Type;
+
+import static org.apache.parquet.schema.Type.Repetition.OPTIONAL;
+
+/**
+ * Convert the parquet schema to spark schema' json string.
+ * This code is refer to 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter
+ * in spark project.
+ */

Review comment:
   Using `ParquetToSparkSchemaConverter` directly need the spark 
dependencies for `hive-sync`. And the flink-bundle will also need the spark. In 
order to remove the spark dependencies, I write this util.

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -70,6 +69,10 @@
 return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});
   }
 
+  private static Iterable syncDataSourceTableParams() {
+return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/j

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#discussion_r667622950



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -110,6 +110,12 @@
   @Parameter(names = {"--batch-sync-num"}, description = "The number of 
partitions one batch when synchronous partitions to hive")
   public Integer batchSyncNum = 1000;
 
+  @Parameter(names = {"--spark-datasource"}, description = "Whether sync this 
table as spark data source table.")
+  public Boolean syncAsSparkDataSourceTable = true;
+
+  @Parameter(names = {"--spark-schema-length-threshold"}, description = "The 
maximum length allowed in a single cell when storing additional schema 
information in Hive's metastore.")
+  public int sparkSchemaLengthThreshold = 4000;

Review comment:
   It is the default value in spark conf: 
spark.sql.sources.schemaStringLengthThreshold

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/ConfigUtils.java
##
@@ -23,12 +23,11 @@
 import org.apache.hudi.common.util.StringUtils;
 
 public class ConfigUtils {
-
-  public static final String SPARK_QUERY_TYPE_KEY = "spark.query.type.key";
-
-  public static final String SPARK_QUERY_AS_RO_KEY = "spark.query.as.ro.key";
-
-  public static final String SPARK_QUERY_AS_RT_KEY = "spark.query.as.rt.key";
+  /**
+   * Config stored in hive serde properties to tell query engine (spark/flink) 
to
+   * read the table as a read-optimized table when this config is true.
+   */
+  public static final String IS_QUERY_AS_RO_TABLE = "hoodie.query.as.ro.table";
 

Review comment:
   SPARK_QUERY_AS_RO_KEY is introduced by 
https://github.com/apache/hudi/pull/2925 for spark sql writer to pass some 
params. It can only used for spark engine.  In this PR, we do not need this 
now.  We use IS_QUERY_AS_RO_TABLE which can be used for both spark & flink.

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/Parquet2SparkSchemaUtils.java
##
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hive.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.parquet.schema.GroupType;
+import org.apache.parquet.schema.OriginalType;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.Type;
+
+import static org.apache.parquet.schema.Type.Repetition.OPTIONAL;
+
+/**
+ * Convert the parquet schema to spark schema' json string.
+ * This code is refer to 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter
+ * in spark project.
+ */

Review comment:
   Using `ParquetToSparkSchemaConverter` directly need the spark 
dependencies for `hive-sync`. And the flink-bundle will also need the spark. In 
order to remove the spark dependencies, I write this util.

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -70,6 +69,10 @@
 return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});
   }
 
+  private static Iterable syncDataSourceTableParams() {
+return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378925#comment-17378925
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2051) Enable Hive Sync When Spark Enable Hive Meta For Spark Sql

2021-07-11 Thread pengzhiwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengzhiwei resolved HUDI-2051.
--
Resolution: Fixed

> Enable Hive Sync When Spark Enable Hive Meta For Spark Sql
> --
>
> Key: HUDI-2051
> URL: https://issues.apache.org/jira/browse/HUDI-2051
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we enable the meta sync by default for spark sql. It depend the 
> hive environment. This will not work if spark has not enable the hive meta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378924#comment-17378924
 ] 

ASF GitHub Bot commented on HUDI-1896:
--

hudi-bot edited a comment on pull request #3256:
URL: https://github.com/apache/hudi/pull/3256#issuecomment-877963467


   
   ## CI report:
   
   * 74b342890e833e84bf1f8e163465df46b325845a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=853)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378923#comment-17378923
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3256: [HUDI-1896] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3256:
URL: https://github.com/apache/hudi/pull/3256#issuecomment-877963467


   
   ## CI report:
   
   * 74b342890e833e84bf1f8e163465df46b325845a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=853)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=854)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378922#comment-17378922
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ec60b9c) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `20.39%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   27.33%   -20.40% 
   + Complexity 5528 1291 -4237 
   =
 Files   934  386  -548 
 Lines 4145715326-26131 
 Branches   4166 1337 -2829 
   =
   - Hits  19786 4189-15597 
   + Misses1991410834 -9080 
   + Partials   1757  303 -1454 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.93% <ø> (-13.52%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+com

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ec60b9c) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `20.39%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   27.33%   -20.40% 
   + Complexity 5528 1291 -4237 
   =
 Files   934  386  -548 
 Lines 4145715326-26131 
 Branches   4166 1337 -2829 
   =
   - Hits  19786 4189-15597 
   + Misses1991410834 -9080 
   + Partials   1757  303 -1454 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.93% <ø> (-13.52%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonP

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378921#comment-17378921
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (09020d6) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `20.39%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 09020d6 differs from pull request most recent 
head ec60b9c. Consider uploading reports for the commit ec60b9c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   27.33%   -20.40% 
   + Complexity 5528 1291 -4237 
   =
 Files   934  386  -548 
 Lines 4145715326-26131 
 Branches   4166 1337 -2829 
   =
   - Hits  19786 4189-15597 
   + Misses1991410834 -9080 
   + Partials   1757  303 -1454 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.93% <ø> (-13.52%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (09020d6) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `20.39%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 09020d6 differs from pull request most recent 
head ec60b9c. Consider uploading reports for the commit ec60b9c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   27.33%   -20.40% 
   + Complexity 5528 1291 -4237 
   =
 Files   934  386  -548 
 Lines 4145715326-26131 
 Branches   4166 1337 -2829 
   =
   - Hits  19786 4189-15597 
   + Misses1991410834 -9080 
   + Partials   1757  303 -1454 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `20.93% <ø> (-13.52%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGll

[jira] [Commented] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378920#comment-17378920
 ] 

ASF GitHub Bot commented on HUDI-1896:
--

hudi-bot edited a comment on pull request #3256:
URL: https://github.com/apache/hudi/pull/3256#issuecomment-877963467


   
   ## CI report:
   
   * 74b342890e833e84bf1f8e163465df46b325845a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=853)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3256: [HUDI-1896] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3256:
URL: https://github.com/apache/hudi/pull/3256#issuecomment-877963467


   
   ## CI report:
   
   * 74b342890e833e84bf1f8e163465df46b325845a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=853)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378919#comment-17378919
 ] 

ASF GitHub Bot commented on HUDI-1896:
--

hudi-bot commented on pull request #3256:
URL: https://github.com/apache/hudi/pull/3256#issuecomment-877963467


   
   ## CI report:
   
   * 74b342890e833e84bf1f8e163465df46b325845a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3256: [HUDI-1896] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread GitBox


hudi-bot commented on pull request #3256:
URL: https://github.com/apache/hudi/pull/3256#issuecomment-877963467


   
   ## CI report:
   
   * 74b342890e833e84bf1f8e163465df46b325845a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378918#comment-17378918
 ] 

ASF GitHub Bot commented on HUDI-1896:
--

satishmittal opened a new pull request #3256:
URL: https://github.com/apache/hudi/pull/3256


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1896:
-
Labels: pull-request-available  (was: )

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishmittal1111 opened a new pull request #3256: [HUDI-1896] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread GitBox


satishmittal opened a new pull request #3256:
URL: https://github.com/apache/hudi/pull/3256


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378917#comment-17378917
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ec60b9c) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `31.95%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   15.76%   -31.96% 
   + Complexity 5528  493 -5035 
   =
 Files   934  284  -650 
 Lines 4145711822-29635 
 Branches   4166  982 -3184 
   =
   - Hits  19786 1864-17922 
   + Misses19914 9795-10119 
   + Partials   1757  163 -1594 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comm

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ec60b9c) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `31.95%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   15.76%   -31.96% 
   + Complexity 5528  493 -5035 
   =
 Files   934  284  -650 
 Lines 4145711822-29635 
 Branches   4166  982 -3184 
   =
   - Hits  19786 1864-17922 
   + Misses19914 9795-10119 
   + Partials   1757  163 -1594 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPa

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378915#comment-17378915
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (09020d6) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `31.95%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 09020d6 differs from pull request most recent 
head ec60b9c. Consider uploading reports for the commit ec60b9c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   15.76%   -31.96% 
   + Complexity 5528  493 -5035 
   =
 Files   934  284  -650 
 Lines 4145711822-29635 
 Branches   4166  982 -3184 
   =
   - Hits  19786 1864-17922 
   + Misses19914 9795-10119 
   + Partials   1757  163 -1594 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (09020d6) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `31.95%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 09020d6 differs from pull request most recent 
head ec60b9c. Consider uploading reports for the commit ec60b9c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3120   +/-   ##
   =
   - Coverage 47.72%   15.76%   -31.96% 
   + Complexity 5528  493 -5035 
   =
 Files   934  284  -650 
 Lines 4145711822-29635 
 Branches   4166  982 -3184 
   =
   - Hits  19786 1864-17922 
   + Misses19914 9795-10119 
   + Partials   1757  163 -1594 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllb

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378914#comment-17378914
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

danny0405 commented on a change in pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#discussion_r667613444



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/Parquet2SparkSchemaUtils.java
##
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hive.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.parquet.schema.GroupType;
+import org.apache.parquet.schema.OriginalType;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.Type;
+
+import static org.apache.parquet.schema.Type.Repetition.OPTIONAL;
+
+/**
+ * Convert the parquet schema to spark schema' json string.
+ * This code is refer to 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter
+ * in spark project.
+ */

Review comment:
   Why not just use `ParquetToSparkSchemaConverter` directly.

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -70,6 +69,10 @@
 return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});
   }
 
+  private static Iterable syncDataSourceTableParams() {
+return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});

Review comment:
   Can we give some comments what each flag means ?

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/ConfigUtils.java
##
@@ -23,12 +23,11 @@
 import org.apache.hudi.common.util.StringUtils;
 
 public class ConfigUtils {
-
-  public static final String SPARK_QUERY_TYPE_KEY = "spark.query.type.key";
-
-  public static final String SPARK_QUERY_AS_RO_KEY = "spark.query.as.ro.key";
-
-  public static final String SPARK_QUERY_AS_RT_KEY = "spark.query.as.rt.key";
+  /**
+   * Config stored in hive serde properties to tell query engine (spark/flink) 
to
+   * read the table as a read-optimized table when this config is true.
+   */
+  public static final String IS_QUERY_AS_RO_TABLE = "hoodie.query.as.ro.table";
 

Review comment:
   What's the relationship between this key `IS_QUERY_AS_RO_TABLE` and 
`SPARK_QUERY_AS_RO_KEY ` and `SPARK_QUERY_AS_RT_KEY `.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 commented on a change in pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


danny0405 commented on a change in pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#discussion_r667613444



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/Parquet2SparkSchemaUtils.java
##
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hive.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.parquet.schema.GroupType;
+import org.apache.parquet.schema.OriginalType;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.Type;
+
+import static org.apache.parquet.schema.Type.Repetition.OPTIONAL;
+
+/**
+ * Convert the parquet schema to spark schema' json string.
+ * This code is refer to 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter
+ * in spark project.
+ */

Review comment:
   Why not just use `ParquetToSparkSchemaConverter` directly.

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -70,6 +69,10 @@
 return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});
   }
 
+  private static Iterable syncDataSourceTableParams() {
+return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});

Review comment:
   Can we give some comments what each flag means ?

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/ConfigUtils.java
##
@@ -23,12 +23,11 @@
 import org.apache.hudi.common.util.StringUtils;
 
 public class ConfigUtils {
-
-  public static final String SPARK_QUERY_TYPE_KEY = "spark.query.type.key";
-
-  public static final String SPARK_QUERY_AS_RO_KEY = "spark.query.as.ro.key";
-
-  public static final String SPARK_QUERY_AS_RT_KEY = "spark.query.as.rt.key";
+  /**
+   * Config stored in hive serde properties to tell query engine (spark/flink) 
to
+   * read the table as a read-optimized table when this config is true.
+   */
+  public static final String IS_QUERY_AS_RO_TABLE = "hoodie.query.as.ro.table";
 

Review comment:
   What's the relationship between this key `IS_QUERY_AS_RO_TABLE` and 
`SPARK_QUERY_AS_RO_KEY ` and `SPARK_QUERY_AS_RT_KEY `.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * 831cccb61d0c87866f0fd2d2c602ec25eca4ab9c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=852)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378900#comment-17378900
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * 831cccb61d0c87866f0fd2d2c602ec25eca4ab9c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=852)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378908#comment-17378908
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ec60b9c) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.89%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3120   +/-   ##
   
   - Coverage 47.72%   2.83%   -44.90% 
   + Complexity 5528  85 -5443 
   
 Files   934 284  -650 
 Lines 41457   11822-29635 
 Branches   4166 982 -3184 
   
   - Hits  19786 335-19451 
   + Misses19914   11461 -8453 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (ec60b9c) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.89%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3120   +/-   ##
   
   - Coverage 47.72%   2.83%   -44.90% 
   + Complexity 5528  85 -5443 
   
 Files   934 284  -650 
 Lines 41457   11822-29635 
 Branches   4166 982 -3184 
   
   - Hits  19786 335-19451 
   + Misses19914   11461 -8453 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schem

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378907#comment-17378907
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (09020d6) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.89%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 09020d6 differs from pull request most recent 
head ec60b9c. Consider uploading reports for the commit ec60b9c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3120   +/-   ##
   
   - Coverage 47.72%   2.83%   -44.90% 
   + Complexity 5528  85 -5443 
   
 Files   934 284  -650 
 Lines 41457   11822-29635 
 Branches   4166 982 -3184 
   
   - Hits  19786 335-19451 
   + Misses19914   11461 -8453 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...va/org/apache/hu

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864770805


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3120](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (09020d6) into 
[master](https://codecov.io/gh/apache/hudi/commit/5804ad8e32ae05758ebc5e47f5d4fb4db371ab52?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5804ad8) will **decrease** coverage by `44.89%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 09020d6 differs from pull request most recent 
head ec60b9c. Consider uploading reports for the commit ec60b9c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3120/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3120   +/-   ##
   
   - Coverage 47.72%   2.83%   -44.90% 
   + Complexity 5528  85 -5443 
   
 Files   934 284  -650 
 Lines 41457   11822-29635 
 Branches   4166 982 -3184 
   
   - Hits  19786 335-19451 
   + Misses19914   11461 -8453 
   + Partials   1757  26 -1731 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.85% <0.00%> (-49.66%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | `0.00% <0.00%> (-98.08%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | `0.00% <0.00%> (-72.36%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/hive/util/ConfigUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db25maWdVdGlscy5qYXZh)
 | `0.00% <ø> (-73.92%)` | :arrow_down: |
   | 
[...pache/hudi/hive/util/Parquet2SparkSchemaUtils.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9QYXJxdWV0MlNwYXJrU2NoZW1hVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3120/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3Jj

[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378906#comment-17378906
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

pengzhiwei2018 commented on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-877951736


   > LGTM. we can land this once CI passed.
   
   Great! The CI has passed now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

2021-07-11 Thread GitBox


pengzhiwei2018 commented on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-877951736


   > LGTM. we can land this once CI passed.
   
   Great! The CI has passed now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378905#comment-17378905
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378903#comment-17378903
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

pengzhiwei2018 commented on a change in pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#discussion_r667609527



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -236,6 +247,71 @@ private void syncSchema(String tableName, boolean 
tableExists, boolean useRealTi
 }
   }
 
+  /**
+   * Get Spark Sql related table properties. This is used for spark datasource 
table.
+   * @param schema  The schema to write to the table.
+   * @return A new parameters added the spark's table properties.
+   */
+  private Map getSparkTableProperties(int 
schemaLengthThreshold, MessageType schema)  {
+// Convert the schema and partition info used by spark sql to hive table 
properties.
+// The following code refers to the spark code in
+// 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+GroupType originGroupType = schema.asGroupType();
+List partitionNames = cfg.partitionFields;
+List partitionCols = new ArrayList<>();
+List dataCols = new ArrayList<>();
+Map column2Field = new HashMap<>();
+
+for (Type field : originGroupType.getFields()) {
+  column2Field.put(field.getName(), field);
+}
+// Get partition columns and data columns.
+for (String partitionName : partitionNames) {
+  // Default the unknown partition fields to be String.
+  // Keep the same logical with HiveSchemaUtil#getPartitionKeyType.
+  partitionCols.add(column2Field.getOrDefault(partitionName,
+  new PrimitiveType(Type.Repetition.REQUIRED, BINARY, 
partitionName, UTF8)));
+}
+for (Type field : originGroupType.getFields()) {

Review comment:
   fixed

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestParquet2SparkSchemaUtils.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hive;
+
+import org.apache.hudi.hive.util.Parquet2SparkSchemaUtils;
+import org.apache.spark.sql.execution.SparkSqlParser;
+import 
org.apache.spark.sql.execution.datasources.parquet.SparkToParquetSchemaConverter;
+import org.apache.spark.sql.internal.SQLConf;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.MapType;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.IntegerType$;
+import org.apache.spark.sql.types.StringType$;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+public class TestParquet2SparkSchemaUtils {

Review comment:
   done!

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -70,6 +69,10 @@
 return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});
   }
 
+  private static Iterable 
useJdbcAndSchemaFromCommitMetadataAndSyncAsDataSource() {

Review comment:
   Yes, will rename it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-re

[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   * ec60b9cc2356004d237127c4807806b51564543f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#discussion_r667609527



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -236,6 +247,71 @@ private void syncSchema(String tableName, boolean 
tableExists, boolean useRealTi
 }
   }
 
+  /**
+   * Get Spark Sql related table properties. This is used for spark datasource 
table.
+   * @param schema  The schema to write to the table.
+   * @return A new parameters added the spark's table properties.
+   */
+  private Map getSparkTableProperties(int 
schemaLengthThreshold, MessageType schema)  {
+// Convert the schema and partition info used by spark sql to hive table 
properties.
+// The following code refers to the spark code in
+// 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+GroupType originGroupType = schema.asGroupType();
+List partitionNames = cfg.partitionFields;
+List partitionCols = new ArrayList<>();
+List dataCols = new ArrayList<>();
+Map column2Field = new HashMap<>();
+
+for (Type field : originGroupType.getFields()) {
+  column2Field.put(field.getName(), field);
+}
+// Get partition columns and data columns.
+for (String partitionName : partitionNames) {
+  // Default the unknown partition fields to be String.
+  // Keep the same logical with HiveSchemaUtil#getPartitionKeyType.
+  partitionCols.add(column2Field.getOrDefault(partitionName,
+  new PrimitiveType(Type.Repetition.REQUIRED, BINARY, 
partitionName, UTF8)));
+}
+for (Type field : originGroupType.getFields()) {

Review comment:
   fixed

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestParquet2SparkSchemaUtils.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hive;
+
+import org.apache.hudi.hive.util.Parquet2SparkSchemaUtils;
+import org.apache.spark.sql.execution.SparkSqlParser;
+import 
org.apache.spark.sql.execution.datasources.parquet.SparkToParquetSchemaConverter;
+import org.apache.spark.sql.internal.SQLConf;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.MapType;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.IntegerType$;
+import org.apache.spark.sql.types.StringType$;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+public class TestParquet2SparkSchemaUtils {

Review comment:
   done!

##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -70,6 +69,10 @@
 return Arrays.asList(new Object[][] {{true, true, true}, {true, false, 
false}, {false, true, true}, {false, false, false}});
   }
 
+  private static Iterable 
useJdbcAndSchemaFromCommitMetadataAndSyncAsDataSource() {

Review comment:
   Yes, will rename it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (67041c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/990820476a41b318017ba63dd446911141c929ce?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (9908204) will **decrease** coverage by `1.15%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2438  +/-   ##
   
   - Coverage 47.61%   46.46%   -1.16% 
   + Complexity 5487 5030 -457 
   
 Files   924  866  -58 
 Lines 4120638565-2641 
 Branches   4133 3837 -296 
   
   - Hits  1961917918-1701 
   + Misses1984419060 -784 
   + Partials   1743 1587 -156 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.20% <ø> (-0.38%)` | :arrow_down: |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (+0.44%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (-0.01%)` | :arrow_down: |
   | hudisync | `50.55% <ø> (-3.93%)` | :arrow_down: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (-60.22%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/index/SparkHoodieIndex.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvU3BhcmtIb29kaWVJbmRleC5qYXZh)
 | `56.52% <0.00%> (-30.15%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `84.61% <0.00%> (-7.06%)` | :arrow_down: |
   | 
[...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVDbHVzdGVyaW5nQ29uZmlnLmphdmE=)
 | `67.74% <0.00%> (-3.54%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3J

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378896#comment-17378896
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-11 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * 3bb76014c4a7c7eb58a4f2c382f83bde474995c7 UNKNOWN
   * 5bbcd6a3d7460f76fee4c539c5b8bb9aeb1dcdd8 UNKNOWN
   * 1742e1831691ef9ebbf98d3fa29fe24aa1077072 UNKNOWN
   * b5bf84aaa8e74abeee0ecffc9c3966350f727673 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813)
 
   * 09020d66b59cb051cccacd894203ee7c6859ee3e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (67041c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/990820476a41b318017ba63dd446911141c929ce?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (9908204) will **decrease** coverage by `0.92%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2438  +/-   ##
   
   - Coverage 47.61%   46.68%   -0.93% 
   + Complexity 5487 5039 -448 
   
 Files   924  867  -57 
 Lines 4120638791-2415 
 Branches   4133 3927 -206 
   
   - Hits  1961918110-1509 
   + Misses1984419079 -765 
   + Partials   1743 1602 -141 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.20% <ø> (-0.38%)` | :arrow_down: |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (+0.44%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `68.36% <ø> (+1.03%)` | :arrow_up: |
   | hudisync | `50.55% <ø> (-3.93%)` | :arrow_down: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (-60.22%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/index/SparkHoodieIndex.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvU3BhcmtIb29kaWVJbmRleC5qYXZh)
 | `56.52% <0.00%> (-30.15%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `84.61% <0.00%> (-7.06%)` | :arrow_down: |
   | 
[...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVDbHVzdGVyaW5nQ29uZmlnLmphdmE=)
 | `67.74% <0.00%> (-3.54%)` | :arrow_down: |
   | 
[...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb

[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378889#comment-17378889
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (67041c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/990820476a41b318017ba63dd446911141c929ce?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (9908204) will **decrease** coverage by `1.15%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2438  +/-   ##
   
   - Coverage 47.61%   46.46%   -1.16% 
   + Complexity 5487 5030 -457 
   
 Files   924  866  -58 
 Lines 4120638565-2641 
 Branches   4133 3837 -296 
   
   - Hits  1961917918-1701 
   + Misses1984419060 -784 
   + Partials   1743 1587 -156 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.20% <ø> (-0.38%)` | :arrow_down: |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (+0.44%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (-0.01%)` | :arrow_down: |
   | hudisync | `50.55% <ø> (-3.93%)` | :arrow_down: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (-60.22%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/index/SparkHoodieIndex.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvU3BhcmtIb29kaWVJbmRleC5qYXZh)
 | `56.52% <0.00%> (-30.15%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `84.61% <0.00%> (-7.06%)` | :arrow_down: |
   | 
[...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVDbHVzdGVyaW5nQ29uZmlnLmphdmE=)
 | `67.74% <0.00%> (-3.54%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](htt

[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378891#comment-17378891
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (67041c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/990820476a41b318017ba63dd446911141c929ce?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (9908204) will **decrease** coverage by `0.92%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2438  +/-   ##
   
   - Coverage 47.61%   46.68%   -0.93% 
   + Complexity 5487 5039 -448 
   
 Files   924  867  -57 
 Lines 4120638791-2415 
 Branches   4133 3927 -206 
   
   - Hits  1961918110-1509 
   + Misses1984419079 -765 
   + Partials   1743 1602 -141 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.20% <ø> (-0.38%)` | :arrow_down: |
   | hudicommon | `48.58% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (+0.44%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `68.36% <ø> (+1.03%)` | :arrow_up: |
   | hudisync | `50.55% <ø> (-3.93%)` | :arrow_down: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (-60.22%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/index/SparkHoodieIndex.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvU3BhcmtIb29kaWVJbmRleC5qYXZh)
 | `56.52% <0.00%> (-30.15%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `84.61% <0.00%> (-7.06%)` | :arrow_down: |
   | 
[...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVDbHVzdGVyaW5nQ29uZmlnLmphdmE=)
 | `67.74% <0.00%> (-3.54%)` | :arrow_down: |
   | 
[...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https

[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378884#comment-17378884
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (67041c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/990820476a41b318017ba63dd446911141c929ce?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (9908204) will **decrease** coverage by `0.94%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2438  +/-   ##
   
   - Coverage 47.61%   46.66%   -0.95% 
   + Complexity 5487 5027 -460 
   
 Files   924  864  -60 
 Lines 4120638317-2889 
 Branches   4133 3824 -309 
   
   - Hits  1961917880-1739 
   + Misses1984418850 -994 
   + Partials   1743 1587 -156 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.55% <ø> (-0.03%)` | :arrow_down: |
   | hudicommon | `48.59% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (+0.44%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (-0.01%)` | :arrow_down: |
   | hudisync | `50.55% <ø> (-3.93%)` | :arrow_down: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (-60.22%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/index/SparkHoodieIndex.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvU3BhcmtIb29kaWVJbmRleC5qYXZh)
 | `56.52% <0.00%> (-30.15%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `84.61% <0.00%> (-7.06%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/common/model/FileSlice.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0ZpbGVTbGljZS5qYXZh)
 | `73.80% <0.00%> (-2.39%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieFileGroup.java](https://codecov.io/gh/apache/hudi/pull/

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-11 Thread GitBox


codecov-commenter edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-850284847


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2438](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (67041c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/990820476a41b318017ba63dd446911141c929ce?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (9908204) will **decrease** coverage by `0.94%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2438/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2438  +/-   ##
   
   - Coverage 47.61%   46.66%   -0.95% 
   + Complexity 5487 5027 -460 
   
 Files   924  864  -60 
 Lines 4120638317-2889 
 Branches   4133 3824 -309 
   
   - Hits  1961917880-1739 
   + Misses1984418850 -994 
   + Partials   1743 1587 -156 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.55% <ø> (-0.03%)` | :arrow_down: |
   | hudicommon | `48.59% <ø> (+0.02%)` | :arrow_up: |
   | hudiflink | `60.03% <ø> (+0.44%)` | :arrow_up: |
   | hudihadoopmr | `51.29% <ø> (ø)` | |
   | hudisparkdatasource | `67.32% <ø> (-0.01%)` | :arrow_down: |
   | hudisync | `50.55% <ø> (-3.93%)` | :arrow_down: |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2438?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (-60.22%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/index/SparkHoodieIndex.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvU3BhcmtIb29kaWVJbmRleC5qYXZh)
 | `56.52% <0.00%> (-30.15%)` | :arrow_down: |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `84.61% <0.00%> (-7.06%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/common/model/FileSlice.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0ZpbGVTbGljZS5qYXZh)
 | `73.80% <0.00%> (-2.39%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieFileGroup.java](https://codecov.io/gh/apache/hudi/pull/2438/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVHcm91cC5qYXZ

[jira] [Commented] (HUDI-2151) Make performant out-of-box configs

2021-07-11 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378873#comment-17378873
 ] 

Vinoth Chandar commented on HUDI-2151:
--

Should this be turned to true?

 
{code:java}
 
public static final ConfigProperty MERGE_ALLOW_DUPLICATE_ON_INSERTS = 
ConfigProperty
 .key("hoodie.merge.allow.duplicate.on.inserts")
 .defaultValue("false")
 .withDocumentation("When enabled, we allow duplicate keys even if inserts are 
routed to merge with an existing file (for ensuring file sizing)." +
 " This is only relevant for insert operation, since upsert, delete operations 
will ensure unique key constraints are maintained.");
{code}
 

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378868#comment-17378868
 ] 

ASF GitHub Bot commented on HUDI-2107:
--

hudi-bot edited a comment on pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#issuecomment-871363205


   
   ## CI report:
   
   * 864dff7a0cc4389905067abee96046d5f72b004f UNKNOWN
   * f8bac2f4e7133eb3f9cbe4c15a20da49e30dd6eb UNKNOWN
   * 2a1ce1b4b826344ec024bd51b8af5ee5543a0986 UNKNOWN
   * ce51b2d836504936b23368492c071fdfe4d94594 UNKNOWN
   * a619116de9d59eff189e05f593b917cc0b762f25 UNKNOWN
   * 8296acbf4bb8423eb0ae25a57327b5b9009cc63a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=848)
 
   * 831cccb61d0c87866f0fd2d2c602ec25eca4ab9c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=852)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -
>
> Key: HUDI-2107
> URL: https://issues.apache.org/jira/browse/HUDI-2107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >