[jira] [Resolved] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect

2021-03-15 Thread Vinoth Govindarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Govindarajan resolved HUDI-1695.
---
Resolution: Fixed

> Deltastreamer HoodieIncrSource exception error messaging is incorrect
> -
>
> Key: HUDI-1695
> URL: https://issues.apache.org/jira/browse/HUDI-1695
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Trivial
>  Labels: beginner, pull-request-available
> Fix For: 0.8.0
>
>
> When you set your source_class as HoodieIncrSource and invoke deltastreamer 
> without any checkpoint, it throws the following Exception:
>  
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code}
>  
> The error messaging is wrong and misleading, the correct parameter is:
> {code:java}
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt
> {code}
> Check out the correct parameter in this 
> [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78]
>  
> The correct messaging should be:
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect

2021-03-15 Thread Vinoth Govindarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Govindarajan closed HUDI-1695.
-

> Deltastreamer HoodieIncrSource exception error messaging is incorrect
> -
>
> Key: HUDI-1695
> URL: https://issues.apache.org/jira/browse/HUDI-1695
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Trivial
>  Labels: beginner, pull-request-available
> Fix For: 0.8.0
>
>
> When you set your source_class as HoodieIncrSource and invoke deltastreamer 
> without any checkpoint, it throws the following Exception:
>  
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code}
>  
> The error messaging is wrong and misleading, the correct parameter is:
> {code:java}
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt
> {code}
> Check out the correct parameter in this 
> [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78]
>  
> The correct messaging should be:
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


liujinhui1994 commented on a change in pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#discussion_r594877281



##
File path: 
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieDeltaStreamerWrapper.java
##
@@ -65,7 +65,7 @@ public void scheduleCompact() throws Exception {
 return upsert(WriteOperationType.UPSERT);
   }
 
-  public Pair>> 
fetchSource() throws Exception {
+  public Pair>, Pair> fetchSource() throws Exception {

Review comment:
   After your PR is over, continue with the next PR?
   @nsivabalan 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect

2021-03-15 Thread Vinoth Govindarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302235#comment-17302235
 ] 

Vinoth Govindarajan commented on HUDI-1695:
---

PR has been merged.

> Deltastreamer HoodieIncrSource exception error messaging is incorrect
> -
>
> Key: HUDI-1695
> URL: https://issues.apache.org/jira/browse/HUDI-1695
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Trivial
>  Labels: beginner, pull-request-available
> Fix For: 0.8.0
>
>
> When you set your source_class as HoodieIncrSource and invoke deltastreamer 
> without any checkpoint, it throws the following Exception:
>  
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code}
>  
> The error messaging is wrong and misleading, the correct parameter is:
> {code:java}
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt
> {code}
> Check out the correct parameter in this 
> [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78]
>  
> The correct messaging should be:
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


nsivabalan commented on a change in pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#discussion_r594471195



##
File path: 
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieDeltaStreamerWrapper.java
##
@@ -65,7 +65,7 @@ public void scheduleCompact() throws Exception {
 return upsert(WriteOperationType.UPSERT);
   }
 
-  public Pair>> 
fetchSource() throws Exception {
+  public Pair>, Pair> fetchSource() throws Exception {

Review comment:
   this is getting out of hand(two pairs within a pair). we can't keep 
adding more Pairs here. I am adding a class to hold the return value in a class 
here in one of my PRs. Lets see if we can rebase once the other PR lands.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


nsivabalan commented on a change in pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#discussion_r594471195



##
File path: 
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieDeltaStreamerWrapper.java
##
@@ -65,7 +65,7 @@ public void scheduleCompact() throws Exception {
 return upsert(WriteOperationType.UPSERT);
   }
 
-  public Pair>> 
fetchSource() throws Exception {
+  public Pair>, Pair> fetchSource() throws Exception {

Review comment:
   this is getting out of hand(two pairs within a pair). we can't keep 
adding more Pairs here. I am adding a class to hold the return value here in 
one of my PRs. Lets see if we can rebase once the other PR lands.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (3b36cb8 -> 16864ae)

2021-03-15 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 3b36cb8  [HUDI-1552] Improve performance of key lookups from base file 
in Metadata Table. (#2494)
 add 16864ae  [HUDI-1695] Fixed the error messaging (#2679)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[GitHub] [hudi] yanghua merged pull request #2679: [MINOR] Fixed the error messaging

2021-03-15 Thread GitBox


yanghua merged pull request #2679:
URL: https://github.com/apache/hudi/pull/2679


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2673: [HUDI-1688] Uncache Rdd once write operation is complete

2021-03-15 Thread GitBox


xiarixiaoyao commented on pull request #2673:
URL: https://github.com/apache/hudi/pull/2673#issuecomment-799905602


   @nsivabalan  yes, since the problem of company information security, i 
cannot paste screenshot of test result and dump.
   before fix
   env:  (executor 4 core 8G)*50
   step1: merge(df, 800  , "hudikey", "testOOM", 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert")   time cost: 616s
   step2: merge(df, 800  , "hudikey", "testOOM1", 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 710s
   step3: merge(df, 800  , "hudikey", "testOOM2", 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 676s
   step4: merge(df, 800  , "hudikey", "testOOM3", 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 1077s
   step5: merge(df, 800  , "hudikey", "testOOM4", 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 1154s
   step6: merge(df, 800  , "hudikey", "testOOM5", 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 2055s 
(some executor oom)
   
   
   Analysis dump: we find More than 90 percent of memory is consumed by cached 
rdd
   
   after fix
   step1: merge(df, 800  , "hudikey", "testOOM", 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert")   time cost: 632s
   step2: merge(df, 800  , "hudikey", "testOOM1", 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 710s
   step3: merge(df, 800  , "hudikey", "testOOM2", 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 698s
   step4: merge(df, 800  , "hudikey", "testOOM3", 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 723s
   step5: merge(df, 800  , "hudikey", "testOOM4", 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 616s
   step6: merge(df, 800  , "hudikey", "testOOM5", 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert")  time cost: 703s
   
   One last point, when we cached some rdds, we should uncache those rdds 
timely once those rdds are not used。 spark can uncached rdds automaticly but 
this process is uncertain。



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on issue #2656: HUDI insert operation is working same as upsert

2021-03-15 Thread GitBox


pengzhiwei2018 commented on issue #2656:
URL: https://github.com/apache/hudi/issues/2656#issuecomment-799888195


   Hi @shivabansal1046 , no need to add a extra column,  just set 
`DataSourceWriteOptions#KEYGENERATOR_CLASS_OPT_KEY` to 
`classOf[UuidKeyGenKeyGenerator]` which generate a uuid key.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1526) Translate the spark api partitionBy to hoodie.datasource.write.partitionpath.field

2021-03-15 Thread Xianghu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghu Wang updated HUDI-1526:
---
Fix Version/s: 0.8.0

> Translate the spark api partitionBy to 
> hoodie.datasource.write.partitionpath.field
> --
>
> Key: HUDI-1526
> URL: https://issues.apache.org/jira/browse/HUDI-1526
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: teeyog
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Currently, if you want to set the partition of hudi, you must configure it 
> with the parameter hoodie.datasource.write.partitionpath.field, but the Spark 
> DataFrame api partitonBy does not take effect. We can automatically translate 
> the parameter of partitionBy into the partition field of hudi.
> [https://github.com/apache/hudi/pull/2431|https://github.com/apache/hudi/pull/2431/commits/fa597aa31b5af5ceea651af32bc163911137552c]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1526) Translate the spark api partitionBy to hoodie.datasource.write.partitionpath.field

2021-03-15 Thread Xianghu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghu Wang resolved HUDI-1526.

Resolution: Resolved

Resolved via master branch : 26da4f546275e8ab6496537743efe73510cb723d

> Translate the spark api partitionBy to 
> hoodie.datasource.write.partitionpath.field
> --
>
> Key: HUDI-1526
> URL: https://issues.apache.org/jira/browse/HUDI-1526
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: teeyog
>Priority: Major
>  Labels: pull-request-available
>
> Currently, if you want to set the partition of hudi, you must configure it 
> with the parameter hoodie.datasource.write.partitionpath.field, but the Spark 
> DataFrame api partitonBy does not take effect. We can automatically translate 
> the parameter of partitionBy into the partition field of hudi.
> [https://github.com/apache/hudi/pull/2431|https://github.com/apache/hudi/pull/2431/commits/fa597aa31b5af5ceea651af32bc163911137552c]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishkotha commented on a change in pull request #2678: Added support for replace commits in commit showpartitions, commit sh…

2021-03-15 Thread GitBox


satishkotha commented on a change in pull request #2678:
URL: https://github.com/apache/hudi/pull/2678#discussion_r594753875



##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
##
@@ -431,4 +442,20 @@ public String syncCommits(@CliOption(key = {"path"}, help 
= "Path of the table t
 return "Load sync state between " + 
HoodieCLI.getTableMetaClient().getTableConfig().getTableName() + " and "
 + HoodieCLI.syncTableMetadata.getTableConfig().getTableName();
   }
+
+  /*
+  Checks whether a commit or replacecommit action exists in the timeline.
+  * */
+  private Option getCommitOrReplaceCommitInstant(HoodieTimeline 
timeline, String instantTime) {

Review comment:
   consider changing signature to return Option and 
deserialize instant details inside this method. This would avoid repetition to 
get instant details in multiple places. You can also do additional validation. 
for example: for replace commit, deserialize  using HoodieReplaceCommitMetadata 
class 

##
File path: 
hudi-cli/src/test/java/org/apache/hudi/cli/testutils/HoodieTestReplaceCommitMetadatGenerator.java
##
@@ -0,0 +1,74 @@
+package org.apache.hudi.cli.testutils;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.model.HoodieWriteStat;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.testutils.FileCreateUtils;
+import org.apache.hudi.common.util.Option;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.UUID;
+
+import static org.apache.hudi.common.testutils.FileCreateUtils.baseFileName;
+import static org.apache.hudi.common.util.CollectionUtils.createImmutableList;
+
+public class HoodieTestReplaceCommitMetadatGenerator extends 
HoodieTestCommitMetadataGenerator{
+public static void createReplaceCommitFileWithMetadata(String basePath, 
String commitTime, Configuration configuration,
+   Option 
writes, Option updates) throws Exception {
+createReplaceCommitFileWithMetadata(basePath, commitTime, 
configuration, UUID.randomUUID().toString(),
+UUID.randomUUID().toString(), writes, updates);
+}
+
+private static void createReplaceCommitFileWithMetadata(String basePath, 
String commitTime, Configuration configuration,
+String fileId1, 
String fileId2, Option writes,
+Option 
updates) throws Exception {
+List commitFileNames = 
Arrays.asList(HoodieTimeline.makeCommitFileName(commitTime),

Review comment:
   Can we reuse replace commit generator from other places? HoodieTestTable 
for example?

##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
##
@@ -266,12 +267,15 @@ public String showCommitPartitions(
 
 HoodieActiveTimeline activeTimeline = 
HoodieCLI.getTableMetaClient().getActiveTimeline();
 HoodieTimeline timeline = 
activeTimeline.getCommitsTimeline().filterCompletedInstants();
-HoodieInstant commitInstant = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, instantTime);
 
-if (!timeline.containsInstant(commitInstant)) {
+Option hoodieInstantOptional = 
getCommitOrReplaceCommitInstant(timeline, instantTime);
+if (!hoodieInstantOptional.isPresent()) {
   return "Commit " + instantTime + " not found in Commits " + timeline;
 }
-HoodieCommitMetadata meta = 
HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(commitInstant).get(),
+
+HoodieInstant hoodieInstant = hoodieInstantOptional.get();
+
+HoodieCommitMetadata meta = 
HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(hoodieInstant).get(),
 HoodieCommitMetadata.class);
 List rows = new ArrayList<>();
 for (Map.Entry> entry : 
meta.getPartitionToWriteStats().entrySet()) {

Review comment:
   it'd be nice to  compute totalfFilesReplaced and show it in the table. 
It could be 0 for regular commits.

##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
##
@@ -431,4 +442,20 @@ public String syncCommits(@CliOption(key = {"path"}, help 
= "Path of the table t
 return "Load sync state between " + 
HoodieCLI.getTableMetaClient().getTableConfig().getTableName() + " and "
 + HoodieCLI.syncTableMetadata.getTableConfig().getTableName();
   }
+
+  /*
+  Checks whether a commit or replacecommit action exists in the timeline.
+  * */
+  private Option getCommitOrReplaceCommitInstant(HoodieTimeline 
timeline, String instantTime) {
+HoodieInstant hoodieInstant = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, instantTime);
+
+if (!timeline.containsInstant(hoodieInstant)) {
+  hoodieInstant = new H

[GitHub] [hudi] codecov-io commented on pull request #2679: [MINOR] Fixed the error messaging

2021-03-15 Thread GitBox


codecov-io commented on pull request #2679:
URL: https://github.com/apache/hudi/pull/2679#issuecomment-799814201


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=h1) Report
   > Merging 
[#2679](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=desc) (818439b) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/3b36cb805d066a3112e3a355ef502dbe4b2c1824?el=desc)
 (3b36cb8) will **increase** coverage by `17.44%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2679/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2679   +/-   ##
   =
   + Coverage 51.98%   69.43%   +17.44% 
   + Complexity 3580  363 -3217 
   =
 Files   466   53  -413 
 Lines 22318 1963-20355 
 Branches   2377  235 -2142 
   =
   - Hits  11603 1363-10240 
   + Misses 9706  466 -9240 
   + Partials   1009  134  -875 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...di/utilities/sources/helpers/IncrSourceHelper.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9JbmNyU291cmNlSGVscGVyLmphdmE=)
 | `54.54% <ø> (ø)` | `4.00 <0.00> (ø)` | |
   | 
[...he/hudi/common/model/BootstrapBaseFileMapping.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jvb3RzdHJhcEJhc2VGaWxlTWFwcGluZy5qYXZh)
 | | | |
   | 
[...n/java/org/apache/hudi/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0RlZmF1bHRTb3VyY2UuamF2YQ==)
 | | | |
   | 
[...java/org/apache/hudi/table/format/FormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRm9ybWF0VXRpbHMuamF2YQ==)
 | | | |
   | 
[...a/org/apache/hudi/common/util/CompactionUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29tcGFjdGlvblV0aWxzLmphdmE=)
 | | | |
   | 
[...ache/hudi/hadoop/utils/HoodieInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZUlucHV0Rm9ybWF0VXRpbHMuamF2YQ==)
 | | | |
   | 
[...ache/hudi/common/table/timeline/TimelineUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lVXRpbHMuamF2YQ==)
 | | | |
   | 
[.../hudi/common/util/collection/LazyFileIterable.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9MYXp5RmlsZUl0ZXJhYmxlLmphdmE=)
 | | | |
   | 
[...ava/org/apache/hudi/cli/commands/TableCommand.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1RhYmxlQ29tbWFuZC5qYXZh)
 | | | |
   | 
[...org/apache/hudi/hadoop/HoodieHFileInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZUhGaWxlSW5wdXRGb3JtYXQuamF2YQ==)
 | | | |
   | ... and [404 
more](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please cont

[jira] [Updated] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1695:
-
Labels: beginner pull-request-available  (was: beginner)

> Deltastreamer HoodieIncrSource exception error messaging is incorrect
> -
>
> Key: HUDI-1695
> URL: https://issues.apache.org/jira/browse/HUDI-1695
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Trivial
>  Labels: beginner, pull-request-available
> Fix For: 0.8.0
>
>
> When you set your source_class as HoodieIncrSource and invoke deltastreamer 
> without any checkpoint, it throws the following Exception:
>  
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code}
>  
> The error messaging is wrong and misleading, the correct parameter is:
> {code:java}
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt
> {code}
> Check out the correct parameter in this 
> [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78]
>  
> The correct messaging should be:
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vingov opened a new pull request #2679: [HUDI-1695] Fixed the error messaging

2021-03-15 Thread GitBox


vingov opened a new pull request #2679:
URL: https://github.com/apache/hudi/pull/2679


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   This pull-request fixes the error messaging with the correct hoodie conf 
parameter.
   
   ## Brief change log
   
 - *Updated the error messaging when using HoodieIncrSource class*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect

2021-03-15 Thread Vinoth Govindarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Govindarajan updated HUDI-1695:
--
Summary: Deltastreamer HoodieIncrSource exception error messaging is 
incorrect  (was: Deltastream HoodieIncrSource exception error messaging is 
incorrect)

> Deltastreamer HoodieIncrSource exception error messaging is incorrect
> -
>
> Key: HUDI-1695
> URL: https://issues.apache.org/jira/browse/HUDI-1695
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Trivial
>  Labels: beginner
> Fix For: 0.8.0
>
>
> When you set your source_class as HoodieIncrSource and invoke deltastreamer 
> without any checkpoint, it throws the following Exception:
>  
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code}
>  
> The error messaging is wrong and misleading, the correct parameter is:
> {code:java}
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt
> {code}
> Check out the correct parameter in this 
> [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78]
>  
> The correct messaging should be:
> {code:java}
> User class threw exception: java.lang.IllegalArgumentException: Missing begin 
> instant for incremental pull. For reading from latest committed instant set 
> hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1695) Deltastream HoodieIncrSource exception error messaging is incorrect

2021-03-15 Thread Vinoth Govindarajan (Jira)
Vinoth Govindarajan created HUDI-1695:
-

 Summary: Deltastream HoodieIncrSource exception error messaging is 
incorrect
 Key: HUDI-1695
 URL: https://issues.apache.org/jira/browse/HUDI-1695
 Project: Apache Hudi
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: Vinoth Govindarajan
Assignee: Vinoth Govindarajan
 Fix For: 0.8.0


When you set your source_class as HoodieIncrSource and invoke deltastreamer 
without any checkpoint, it throws the following Exception:

 
{code:java}
User class threw exception: java.lang.IllegalArgumentException: Missing begin 
instant for incremental pull. For reading from latest committed instant set 
hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code}
 

The error messaging is wrong and misleading, the correct parameter is:
{code:java}
hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt
{code}
Check out the correct parameter in this 
[file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78]

 

The correct messaging should be:
{code:java}
User class threw exception: java.lang.IllegalArgumentException: Missing begin 
instant for incremental pull. For reading from latest committed instant set 
hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] prashantwason commented on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-15 Thread GitBox


prashantwason commented on pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#issuecomment-799740584


   Looks good @vinothchandar 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-15 Thread GitBox


prashantwason commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r594671546



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##
@@ -147,82 +150,91 @@ private void initIfNeeded() {
 }
   }
   timings.add(timer.endTimer());
-  LOG.info(String.format("Metadata read for key %s took [open, 
baseFileRead, logMerge] %s ms", key, timings));
+  LOG.info(String.format("Metadata read for key %s took [baseFileRead, 
logMerge] %s ms", key, timings));
   return Option.ofNullable(hoodieRecord);
 } catch (IOException ioe) {
   throw new HoodieIOException("Error merging records from metadata table 
for key :" + key, ioe);
-} finally {

Review comment:
   Yep. Thanks for fixing it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar merged pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-15 Thread GitBox


vinothchandar merged pull request #2494:
URL: https://github.com/apache/hudi/pull/2494


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (76bf2cc -> 3b36cb8)

2021-03-15 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 76bf2cc  [HUDI-1692] Bounded source for stream writer (#2674)
 add 3b36cb8  [HUDI-1552] Improve performance of key lookups from base file 
in Metadata Table. (#2494)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/commands/MetadataCommand.java  |   6 +-
 .../java/org/apache/hudi/table/HoodieTable.java|   3 +-
 .../hudi/metadata/TestHoodieBackedMetadata.java|   1 -
 .../hudi/common/config/HoodieMetadataConfig.java   |  15 ---
 .../common/table/view/FileSystemViewManager.java   |   2 +-
 .../apache/hudi/io/storage/HoodieHFileReader.java  |  35 +++--
 .../hudi/metadata/HoodieBackedTableMetadata.java   | 142 +
 .../apache/hudi/metadata/HoodieTableMetadata.java  |   7 +-
 8 files changed, 125 insertions(+), 86 deletions(-)



[GitHub] [hudi] vinothchandar commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-15 Thread GitBox


vinothchandar commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r594656644



##
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##
@@ -232,12 +246,13 @@ public long getTotalRecords() {
   }
 
   @Override
-  public void close() {
+  public synchronized void close() {
 try {
   reader.close();
   reader = null;
+  keyScanner = null;
 } catch (IOException e) {
-  e.printStackTrace();

Review comment:
   @prashantwason fixed this as well. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-15 Thread GitBox


codecov-io edited a comment on pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#issuecomment-767956391


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=h1) Report
   > Merging 
[#2494](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=desc) (59b919a) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/d8af24d8a2fdbead4592a36df1bd9dda333f1513?el=desc)
 (d8af24d) will **increase** coverage by `17.89%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2494/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2494   +/-   ##
   =
   + Coverage 51.53%   69.43%   +17.89% 
   + Complexity 3491  363 -3128 
   =
 Files   462   53  -409 
 Lines 21881 1963-19918 
 Branches   2327  235 -2092 
   =
   - Hits  11277 1363 -9914 
   + Misses 9624  466 -9158 
   + Partials980  134  -846 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (-0.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=)
 | `54.68% <0.00%> (-1.57%)` | `13.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/common/util/queue/BoundedInMemoryQueue.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWUuamF2YQ==)
 | | | |
   | 
[...udi/operator/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9wYXJ0aXRpb25lci9CdWNrZXRBc3NpZ25GdW5jdGlvbi5qYXZh)
 | | | |
   | 
[...pache/hudi/operator/KeyedWriteProcessOperator.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9LZXllZFdyaXRlUHJvY2Vzc09wZXJhdG9yLmphdmE=)
 | | | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | | | |
   | 
[...til/jvm/HotSpotMemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL0hvdFNwb3RNZW1vcnlMYXlvdXRTcGVjaWZpY2F0aW9uNjRiaXQuamF2YQ==)
 | | | |
   | 
[...e/hudi/common/model/HoodieRollingStatMetadata.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJvbGxpbmdTdGF0TWV0YWRhdGEuamF2YQ==)
 | | | |
   | 
[...he/hudi/exception/HoodieNotSupportedException.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZU5vdFN1cHBvcnRlZEV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[...udi/common/table/timeline/dto/ClusteringOpDTO.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9DbHVzdGVyaW5nT3BEVE8uamF2YQ==)
 | | | |
   | 
[.../apache/hudi/common/model/ClusteringOperation.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0NsdXN0ZXJpbmdPcGVyYXRpb24uamF2YQ==)
 | | | |
   | ... and [394 
more](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please l

[GitHub] [hudi] nsivabalan edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


nsivabalan edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-799571386


   Thanks for your contribution. this is going to be useful to the community. 
   Few high level questions.
   1. Why not we leverage DeltaSreamerConfig.checkpoint to pass in a timestamp 
for Kafka source? Or do we expect the format of this config to be 
"topic_name,partition_num:offset,partition_num:offset," and hence we need a 
new config for timestamp based checkpoint. 
   2. If yes to (1), Did we think about parsing the checkpoint config and 
determining whether its above format or timestamp and then proceeding from 
there. Just trying to avoid introducing new configs if possible. 
   3. Checkpoint in deltastreamer in general is getting too complicated. I 
definitely see a benefit in this patch. But, is there a way we can abstract it 
out based on source. Bcoz, the new config introduced as part of this PR, is 
very specific to Kafka. So, trying to see if we can keep it abstracted out from 
deltastreamer if possible. 
   4. I see KafkaConsumer.offsetsForTimes() could return null for partitions w/ 
msgs of old format. So, what's the expected behavior for such partitions. Do we 
resume from earliest offset? 
   
   @n3nash @vinothchandar : open to hear your thoughts if any. One of my 
suggestion above, could potentially add apis to Source and hence CCing you. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


nsivabalan commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-799571386


   Few high level questions.
   1. Why not we leverage DeltaSreamerConfig.checkpoint to pass in a timestamp 
for Kafka source? Or do we expect the format of this config to be 
"topic_name,partition_num:offset,partition_num:offset," and hence we need a 
new config for timestamp based checkpoint. 
   2. If yes to (1), Did we think about parsing the checkpoint config and 
determining whether its above format or timestamp and then proceeding from 
there. Just trying to avoid introducing new configs if possible. 
   3. Checkpoint in deltastreamer in general is getting too complicated. I 
definitely see a benefit in this patch. But, is there a way we can abstract it 
out based on source. Bcoz, the new config introduced as part of this PR, is 
very specific to Kafka. So, trying to see if we can keep it abstracted out from 
deltastreamer if possible. 
   4. I see KafkaConsumer.offsetsForTimes() could return null for partitions w/ 
msgs of old format. So, what's the expected behavior for such partitions. Do we 
resume from earliest offset? 
   
   @n3nash @vinothchandar : open to hear your thoughts if any. One of my 
suggestion above, could potentially add apis to Source and hence CCing you. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jsbali opened a new pull request #2678: Added support for replace commits in commit showpartitions, commit sh…

2021-03-15 Thread GitBox


jsbali opened a new pull request #2678:
URL: https://github.com/apache/hudi/pull/2678


   …ow_write_stats, commit showfiles
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   Add support for replace commit in hudi-cli
   
   ## Brief change log
   
   Currently hudi-cli doesn't support replace commits in the commit show* 
functions. This adds the foundation for that.
   This PR still doesn't support the extraMetadata of the replace commit which 
will be added in subsequent PR's.
   
   ## Verify this pull request
   
   This PR is one part of adding replace commit support in hudi-cli.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jsbali opened a new pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-03-15 Thread GitBox


jsbali opened a new pull request #2677:
URL: https://github.com/apache/hudi/pull/2677


   …eted clean and rollback actions.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   This pull request adds testcases for the  TestHoodieTimelineArchiveLog class.
   
   ## Brief change log
   
   This pull request adds testcases for the TestHoodieTimelineArchiveLog class 
specifically the getCleanInstantsToArchive function.
   
   ## Verify this pull request
   
   This change added tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: Travis CI build asf-site

2021-03-15 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 565dac6  Travis CI build asf-site
565dac6 is described below

commit 565dac65cff779d03f3a133314b26b2bb7b341aa
Author: CI 
AuthorDate: Mon Mar 15 16:12:28 2021 +

Travis CI build asf-site
---
 content/activity.html  |  24 ++
 .../blog/hudi-file-sizing/adding_new_files.png | Bin 0 -> 44237 bytes
 .../bin_packing_existing_data_files.png| Bin 0 -> 23955 bytes
 .../blog/hudi-file-sizing/initial_layout.png   | Bin 0 -> 34742 bytes
 content/assets/js/lunr/lunr-store.js   |   5 +
 content/blog.html  |  24 ++
 content/blog/hudi-file-sizing/index.html   | 331 +
 content/cn/activity.html   |  24 ++
 content/sitemap.xml|   4 +
 9 files changed, 412 insertions(+)

diff --git a/content/activity.html b/content/activity.html
index 0c02356..0b308b4 100644
--- a/content/activity.html
+++ b/content/activity.html
@@ -193,6 +193,30 @@
 
 
   
+Streaming 
Responsibly - How Apache Hudi maintains optimum sized files
+
+  
+
+
+
+
+https://cwiki.apache.org/confluence/display/~shivnarayan";>Sivabalan 
Narayanan posted on March 1, 2021
+ 
+Maintaining 
well-sized files can improve query performance significantly
+
+  
+
+
+
+
+
+
+
+
+  https://schema.org/CreativeWork";>
+
+
+  
 Apache Hudi Key 
Generators
 
   
diff --git a/content/assets/images/blog/hudi-file-sizing/adding_new_files.png 
b/content/assets/images/blog/hudi-file-sizing/adding_new_files.png
new file mode 100644
index 000..f61cd89
Binary files /dev/null and 
b/content/assets/images/blog/hudi-file-sizing/adding_new_files.png differ
diff --git 
a/content/assets/images/blog/hudi-file-sizing/bin_packing_existing_data_files.png
 
b/content/assets/images/blog/hudi-file-sizing/bin_packing_existing_data_files.png
new file mode 100644
index 000..324c7fc
Binary files /dev/null and 
b/content/assets/images/blog/hudi-file-sizing/bin_packing_existing_data_files.png
 differ
diff --git a/content/assets/images/blog/hudi-file-sizing/initial_layout.png 
b/content/assets/images/blog/hudi-file-sizing/initial_layout.png
new file mode 100644
index 000..ae0e9a1
Binary files /dev/null and 
b/content/assets/images/blog/hudi-file-sizing/initial_layout.png differ
diff --git a/content/assets/js/lunr/lunr-store.js 
b/content/assets/js/lunr/lunr-store.js
index ae425f0..f789966 100644
--- a/content/assets/js/lunr/lunr-store.js
+++ b/content/assets/js/lunr/lunr-store.js
@@ -1443,4 +1443,9 @@ var store = [{
 "excerpt":"Every record in Hudi is uniquely identified by a HoodieKey, 
which is a pair of record key and partition path where the record belongs to. 
Hudi has imposed this constraint so that updates and deletes can be applied to 
the record of interest. Hudi relies on the partition path 
field...","categories": ["blog"],
 "tags": [],
 "url": "https://hudi.apache.org/blog/hudi-key-generators/";,
+"teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
+"title": "Streaming Responsibly - How Apache Hudi maintains optimum 
sized files",
+"excerpt":"Apache Hudi is a data lake platform technology that 
provides several functionalities needed to build and manage data lakes. One 
such key feature that hudi provides is self-managing file sizing so that users 
don’t need to worry about manual table maintenance. Having a lot of small files 
will make it...","categories": ["blog"],
+"tags": [],
+"url": "https://hudi.apache.org/blog/hudi-file-sizing/";,
 "teaser":"https://hudi.apache.org/assets/images/500x300.png"},]
diff --git a/content/blog.html b/content/blog.html
index c0d482d..30a0a7b 100644
--- a/content/blog.html
+++ b/content/blog.html
@@ -191,6 +191,30 @@
 
 
   
+Streaming 
Responsibly - How Apache Hudi maintains optimum sized files
+
+  
+
+
+
+
+https://cwiki.apache.org/confluence/display/~shivnarayan";>Sivabalan 
Narayanan posted on March 1, 2021
+ 
+Maintaining 
well-sized files can improve query performance significantly
+
+  
+
+
+
+
+
+
+
+
+  https://schema.org/CreativeWork";>
+
+
+  
 Apache Hudi Key 
Generators
 
   
diff --git a/content/blog/hudi-file-sizing/index.html 
b/content/blog/hudi-file-sizing/index.html
new file mode 100644
index 000..934174a
--- /dev/null
+++ b/content/blog/hudi-file-sizing/index.html
@@ -0,0 +1,331 @@
+
+
+  
+
+
+Streaming Responsibly - How Apache 
Hudi maintains optimum sized files - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/blog/hudi-file-sizing/";>
+

[GitHub] [hudi] shivabansal1046 commented on issue #2656: HUDI insert operation is working same as upsert

2021-03-15 Thread GitBox


shivabansal1046 commented on issue #2656:
URL: https://github.com/apache/hudi/issues/2656#issuecomment-799528507


   Hi pengzhiwei2018,
   
   Are you suggesting to add extra column which is generated key?
   Is this workaround or this is how it should be?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2619: [HUDI-1650] Custom avro kafka deserializer.

2021-03-15 Thread GitBox


vburenin commented on pull request #2619:
URL: https://github.com/apache/hudi/pull/2619#issuecomment-799525731


   @nsivabalan I am very strapped by time. I will be able get back to it only 
next Q. The overall change is trivial. If you could continue, it would be 
great. As soon as this one is done, I will publish another PR for 
SchemaRegistryProvider.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-799512747


   no problem
   
   
   
   
   
   -- Original --
   From: Sivabalan Narayanan ***@***.***>
   Date: Mon,Mar 15,2021 11:28 PM
   To: apache/hudi ***@***.***>
   Cc: liujinhui ***@***.***>, Mention ***@***.***>
   Subject: Re: [apache/hudi] [HUDI-1447] DeltaStreamer kafka source supports 
consuming from specified timestamp (#2438)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on issue #2656: HUDI insert operation is working same as upsert

2021-03-15 Thread GitBox


pengzhiwei2018 commented on issue #2656:
URL: https://github.com/apache/hudi/issues/2656#issuecomment-799511860


   Hi @shivabansal1046 , currently hudi would do the insert operator when you 
have specified a rowkey for insert. You can use `UuidKeyGenKeyGenerator` as the 
row key to walk around this.
   
   First define a `UuidKeyGenKeyGenerator` class.
   > 
   class UuidKeyGenKeyGenerator(props: TypedProperties) extends 
ComplexKeyGenerator(props) {
   
   override def getRecordKey(record: GenericRecord): String = {
 UUID.randomUUID().toString
   }
 }
   
   Then config the `DataSourceWriteOptions#KEYGENERATOR_CLASS_OPT_KEY` to 
`classOf[UuidKeyGenKeyGenerator]`.
   You can have a try , hope it can help you~
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


nsivabalan commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-799511127


   hey folks. may I know what's the status of this PR. I see this could benefit 
others in the community as well. Do you think we can take it across the finish 
line by this weekend. so that we have it for upcoming release? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2619: [HUDI-1650] Custom avro kafka deserializer.

2021-03-15 Thread GitBox


nsivabalan commented on pull request #2619:
URL: https://github.com/apache/hudi/pull/2619#issuecomment-799507450


   @vburenin : Did you get a chance to work on this PR. We would like to have 
this in before our next release. If you are strapped for time, let me know. I 
will try to squeeze in sometime this week. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [HUDI-1563] Adding hudi file sizing/ small file management blog (#2612)

2021-03-15 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 601f54f  [HUDI-1563] Adding hudi file sizing/ small file management 
blog (#2612)
601f54f is described below

commit 601f54f1ea215281ede51125872d5c2455077dba
Author: Sivabalan Narayanan 
AuthorDate: Mon Mar 15 11:18:57 2021 -0400

[HUDI-1563] Adding hudi file sizing/ small file management blog (#2612)


Co-authored-by: Vinoth Chandar 
---
 docs/_posts/2021-03-01-hudi-file-sizing.md |  85 +
 .../blog/hudi-file-sizing/adding_new_files.png | Bin 0 -> 44237 bytes
 .../bin_packing_existing_data_files.png| Bin 0 -> 23955 bytes
 .../blog/hudi-file-sizing/initial_layout.png   | Bin 0 -> 34742 bytes
 4 files changed, 85 insertions(+)

diff --git a/docs/_posts/2021-03-01-hudi-file-sizing.md 
b/docs/_posts/2021-03-01-hudi-file-sizing.md
new file mode 100644
index 000..c79ea80
--- /dev/null
+++ b/docs/_posts/2021-03-01-hudi-file-sizing.md
@@ -0,0 +1,85 @@
+---
+title: "Streaming Responsibly - How Apache Hudi maintains optimum sized files"
+excerpt: "Maintaining well-sized files can improve query performance 
significantly"
+author: shivnarayan
+category: blog
+---
+
+Apache Hudi is a data lake platform technology that provides several 
functionalities needed to build and manage data lakes. 
+One such key feature that hudi provides is self-managing file sizing so that 
users don’t need to worry about 
+manual table maintenance. Having a lot of small files will make it harder to 
achieve good query performance, due to query engines
+having to open/read/close files way too many times, to plan and execute 
queries. But for streaming data lake use-cases, 
+inherently ingests are going to end up having smaller volume of writes, which 
might result in lot of small files if no special handling is done.
+
+# During Write vs After Write
+
+Common approaches to writing very small files and then later stitching them 
together solve for system scalability issues posed 
+by small files but might violate query SLA's by exposing small files to them. 
In fact, you can easily do so on a Hudi table, 
+by running a clustering operation, as detailed in a [previous 
blog](/blog/hudi-clustering-intro/). 
+
+In this blog, we discuss file sizing optimizations in Hudi, during the initial 
write time, so we don't have to effectively 
+re-write all data again, just for file sizing. If you want to have both (a) 
self managed file sizing and 
+(b) Avoid exposing small files to queries, automatic file sizing feature saves 
the day.
+
+Hudi has the ability to maintain a configured target file size, when 
performing inserts/upsert operations. 
+(Note: bulk_insert operation does not provide this functionality and is 
designed as a simpler replacement for 
+normal `spark.write.parquet`).
+
+## Configs
+
+For illustration purposes, we are going to consider only COPY_ON_WRITE table.
+
+Configs of interest before we dive into the algorithm:
+
+- [Max file size](/docs/configurations.html#limitFileSize): Max size for a 
given data file. Hudi will try to maintain file sizes to this configured value 

+- [Soft file limit](/docs/configurations.html#compactionSmallFileSize): Max 
file size below which a given data file is considered to a small file 
+- [Insert split size](/docs/configurations.html#insertSplitSize): Number of 
inserts grouped for a single partition. This value should match 
+the number of records in a single file (you can determine based on max file 
size and per record size)
+
+For instance, if your first config value is 120MB and 2nd config value is set 
to 100MB, any file whose size is < 100MB 
+would be considered a small file.
+
+If you wish to turn off this feature, set the config value for soft file limit 
to 0.
+
+## Example
+
+Let’s say this is the layout of data files for a given partition.
+
+![Initial layout](/assets/images/blog/hudi-file-sizing/initial_layout.png)
+_Figure: Initial data file sizes for a given partition of interest_
+
+Let’s assume the configured values for max file size and small file size limit 
are 120MB and 100MB. File_1’s current 
+size is 40MB, File_2’s size is 80MB, File_3’s size is 90MB, File_4’s size is 
130MB and File_5’s size is 105MB. Let’s see 
+what happens when a new write happens. 
+
+**Step 1:** Assigning updates to files. In this step, We look up the index to 
find the tagged location and records are 
+assigned to respective files. Note that we assume updates are only going to 
increase the file size and that would simply result
+in a much bigger file. When updates lower the file size (by say, nulling out 
lot of fields), then a subsequent write will deem 
+it a small file.
+
+**Step 2:**  Determine small files for each partition path. The soft file 
limit config value will be leve

[GitHub] [hudi] nsivabalan merged pull request #2612: [HUDI-1563] Adding hudi file sizing/ small file management blog

2021-03-15 Thread GitBox


nsivabalan merged pull request #2612:
URL: https://github.com/apache/hudi/pull/2612


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1688) hudi write should uncache rdd, when the write operation is finnished

2021-03-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1688:
--
Labels: pull-request-available sev:critical user-support-issues  (was: 
pull-request-available)

> hudi write should uncache rdd, when the write operation is finnished
> 
>
> Key: HUDI-1688
> URL: https://issues.apache.org/jira/browse/HUDI-1688
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.7.0
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.8.0
>
>
> now, hudi improve write performance by cache necessary rdds; however when the 
> write operation is finnished, those cached rdds have not been uncached which 
> waste lots of memory.
> [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L115]
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L214
> In our environment:
> step1: insert 100GB data into hudi table by spark   (ok)
> step2: insert another 100GB data into hudi table by spark again (oom ) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan edited a comment on pull request #2673: [HUDI-1688] Uncache Rdd once write operation is complete

2021-03-15 Thread GitBox


nsivabalan edited a comment on pull request #2673:
URL: https://github.com/apache/hudi/pull/2673#issuecomment-799497945


   @xiarixiaoyao : thanks for your contribution. Were you able to test out the 
fix in your env. That subsequent writes don't incur OOMs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2673: [HUDI-1688] Uncache Rdd once write operation is complete

2021-03-15 Thread GitBox


nsivabalan commented on pull request #2673:
URL: https://github.com/apache/hudi/pull/2673#issuecomment-799497945


   @xiarixiaoyao : Were you able to test out the fix in your env. That 
subsequent writes don't incur OOMs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-03-15 Thread GitBox


nsivabalan commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-799494981


   @liujinhui1994 : Thanks for the contribution. There are 2 to 3 PRs with 
similar goal. Did you get happen to check out existing ones before putting this 
up? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shivabansal1046 commented on issue #2656: HUDI insert operation is working same as upsert

2021-03-15 Thread GitBox


shivabansal1046 commented on issue #2656:
URL: https://github.com/apache/hudi/issues/2656#issuecomment-799469658


   Hi,
   
   Below are the configs I am using
   .write
 .format("org.apache.hudi").
 options(getQuickstartWriteConfigs).
 option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ").
 option(OPERATION_OPT_KEY, "INSERT").
 option(PRECOMBINE_FIELD_OPT_KEY, "last_update_time").
 option(RECORDKEY_FIELD_OPT_KEY, "id").
 option(PARTITIONPATH_FIELD_OPT_KEY, "creation_date").
 option(TABLE_NAME, "my_hudi_table")
 .mode(SaveMode.Append)
 .save(args(1))
   
   And to your other question, I already have record in HUDI, and during 
another run its overwriting the record with record having same key. With insert 
option I am expecting it should simply insert new record without checking if 
record with same key is present or not.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wosow opened a new issue #2676: [SUPPORT] When I used 100,000 data to update 100 million data, The program is stuck

2021-03-15 Thread GitBox


wosow opened a new issue #2676:
URL: https://github.com/apache/hudi/issues/2676


   **Environment Description**
   
   * Hudi version : 0.7.0/0.6.0
   
   * Spark version : 2.4.4 
   
   * Hive version :2.3.1
   
   * Hadoop version : 2.7.5
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   When I used 100,000 data to update 100 million data, the program was stuck 
and could not execute further. The table type used was MOR. The program 
execution diagram is as follows:
   
![image](https://user-images.githubusercontent.com/34565079/67633-48772800-85dc-11eb-9072-1f4f7a3a2c54.png)
   
   hudi parameters as follow:
   TABLE_TYPE_OPT_KEY -> MOR_TABLE_TYPE_OPT_VAL, 
   //  OPERATION_OPT_KEY -> WriteOperationType.UPSERT.value, 
 OPERATION_OPT_KEY -> "upsert",  
 RECORDKEY_FIELD_OPT_KEY -> pkCol,  
 PRECOMBINE_FIELD_OPT_KEY -> preCombineCol,  
 "hoodie.embed.timeline.server" -> "false",
 "hoodie.cleaner.commits.retained" -> "1",
 "hoodie.cleaner.fileversions.retained" -> "1",
 "hoodie.cleaner.policy" -> 
HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name(),
 "hoodie.keep.min.commits" -> "3",
 "hoodie.keep.max.commits" -> "4",
 "hoodie.compact.inline" -> "true",
 "hoodie.compact.inline.max.delta.commits" -> "1",
 //  "hoodie.copyonwrite.record.size.estimate" -> 
String.valueOf(500),
 PARTITIONPATH_FIELD_OPT_KEY -> "dt", 
 HIVE_PARTITION_FIELDS_OPT_KEY -> "dt",
 HIVE_URL_OPT_KEY -> "jdbc:hive2:/0.0.0.0:1",
 HIVE_USER_OPT_KEY -> "",
 HIVE_PASS_OPT_KEY -> "",
 HIVE_DATABASE_OPT_KEY -> hiveDatabaseName,
 HIVE_TABLE_OPT_KEY -> hiveTableName,
 HIVE_SYNC_ENABLED_OPT_KEY -> "true",
 HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH -> "true",
 HoodieWriteConfig.TABLE_NAME -> hiveTableName,  
 HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> 
classOf[MultiPartKeysValueExtractor].getName,
 HoodieIndexConfig.INDEX_TYPE_PROP -> 
HoodieIndex.IndexType.GLOBAL_BLOOM.name(),
 "hoodie.insert.shuffle.parallelism" -> parallelism,
 "hoodie.upsert.shuffle.parallelism" -> parallelism
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-1692) Bounded source for stream writer

2021-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1692.
--
Resolution: Done

76bf2cc790edc0be10dfa2454c42687c38e7e5fc

> Bounded source for stream writer
> 
>
> Key: HUDI-1692
> URL: https://issues.apache.org/jira/browse/HUDI-1692
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Supports bounded source such as VALUES for stream mode writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (fc6c5f4 -> 76bf2cc)

2021-03-15 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from fc6c5f4  [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize 
the pacakges for hudi-flink module (#2669)
 add 76bf2cc  [HUDI-1692] Bounded source for stream writer (#2674)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/sink/StreamWriteFunction.java  | 18 +++--
 .../hudi/sink/StreamWriteOperatorCoordinator.java  | 26 +++
 .../hudi/sink/StreamWriteOperatorFactory.java  | 13 +---
 .../hudi/sink/event/BatchWriteSuccessEvent.java| 79 +++---
 .../org/apache/hudi/table/HoodieTableFactory.java  |  2 +-
 .../org/apache/hudi/table/HoodieTableSink.java |  6 +-
 .../sink/TestStreamWriteOperatorCoordinator.java   | 30 ++--
 .../sink/utils/StreamWriteFunctionWrapper.java |  2 +-
 .../apache/hudi/table/HoodieDataSourceITCase.java  | 21 --
 9 files changed, 135 insertions(+), 62 deletions(-)



[GitHub] [hudi] yanghua merged pull request #2674: [HUDI-1692] Bounded source for stream writer

2021-03-15 Thread GitBox


yanghua merged pull request #2674:
URL: https://github.com/apache/hudi/pull/2674


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2650: [HUDI-1694] Preparation for Avro update

2021-03-15 Thread GitBox


codecov-io commented on pull request #2650:
URL: https://github.com/apache/hudi/pull/2650#issuecomment-799292031


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=h1) Report
   > Merging 
[#2650](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=desc) (1a5cb70) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/899ae70fdb70c1511c099a64230fd91b2fe8d4ee?el=desc)
 (899ae70) will **increase** coverage by `0.39%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2650/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2650  +/-   ##
   
   + Coverage 51.58%   51.97%   +0.39% 
   - Complexity 3285 3579 +294 
   
 Files   446  466  +20 
 Lines 2040922275+1866 
 Branches   2116 2374 +258 
   
   + Hits  1052811578+1050 
   - Misses 9003 9689 +686 
   - Partials878 1008 +130 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (+0.14%)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.46% <0.00%> (+0.04%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `53.57% <ø> (+2.28%)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (+0.28%)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.84% <ø> (+0.17%)` | `0.00 <ø> (ø)` | |
   | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (+0.04%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `68.31% <0.00%> (-2.46%)` | `43.00% <0.00%> (-1.00%)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `82.66% <0.00%> (-2.27%)` | `59.00% <0.00%> (ø%)` | |
   | 
[...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=)
 | `66.09% <0.00%> (-1.77%)` | `23.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[.../hadoop/utils/HoodieRealtimeRecordReaderUtils.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyVXRpbHMuamF2YQ==)
 | `71.79% <0.00%> (-1.25%)` | `30.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.34% <0.00%> (-0.37%)` | `53.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=)
 | `54.68% <0.00%> (-0.16%)` | `13.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/sink/CommitSink.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0NvbW1pdFNpbmsuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../org/apache/hudi/util/RowDataToAvroConverters.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1Jvd0RhdGFUb0F2cm9Db252ZXJ0ZXJzLmphdmE=)
 | `42.05% <0.00%> (ø)` | `8.00

[GitHub] [hudi] codecov-io edited a comment on pull request #2674: [HUDI-1692] Bounded source for stream writer

2021-03-15 Thread GitBox


codecov-io edited a comment on pull request #2674:
URL: https://github.com/apache/hudi/pull/2674#issuecomment-799291092


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=h1) Report
   > Merging 
[#2674](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=desc) (8419357) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/fc6c5f4285098d18cd7f6e81785f59e68a3b6862?el=desc)
 (fc6c5f4) will **increase** coverage by `0.06%`.
   > The diff coverage is `83.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2674/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2674  +/-   ##
   
   + Coverage 51.96%   52.03%   +0.06% 
   - Complexity 3579 3580   +1 
   
 Files   466  466  
 Lines 2227522294  +19 
 Branches   2374 2374  
   
   + Hits  1157611601  +25 
   + Misses 9690 9685   -5 
   + Partials   1009 1008   -1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.46% <ø> (+0.01%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `53.96% <83.33%> (+0.39%)` | `0.00 <4.00> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.84% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...g/apache/hudi/sink/StreamWriteOperatorFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JGYWN0b3J5LmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh)
 | `69.37% <20.00%> (+0.23%)` | `32.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `14.28% <50.00%> (-2.39%)` | `2.00 <1.00> (ø)` | |
   | 
[...java/org/apache/hudi/sink/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlRnVuY3Rpb24uamF2YQ==)
 | `85.04% <90.90%> (+1.04%)` | `22.00 <0.00> (ø)` | |
   | 
[...apache/hudi/sink/event/BatchWriteSuccessEvent.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2V2ZW50L0JhdGNoV3JpdGVTdWNjZXNzRXZlbnQuamF2YQ==)
 | `92.30% <100.00%> (+6.59%)` | `9.00 <3.00> (+1.00)` | |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `76.92% <100.00%> (ø)` | `5.00 <0.00> (ø)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2674: [HUDI-1692] Bounded source for stream writer

2021-03-15 Thread GitBox


codecov-io commented on pull request #2674:
URL: https://github.com/apache/hudi/pull/2674#issuecomment-799291092


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=h1) Report
   > Merging 
[#2674](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=desc) (8419357) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/fc6c5f4285098d18cd7f6e81785f59e68a3b6862?el=desc)
 (fc6c5f4) will **decrease** coverage by `0.07%`.
   > The diff coverage is `83.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2674/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2674  +/-   ##
   
   - Coverage 51.96%   51.89%   -0.08% 
   + Complexity 3579 3390 -189 
   
 Files   466  445  -21 
 Lines 2227520783-1492 
 Branches   2374 2229 -145 
   
   - Hits  1157610785 -791 
   + Misses 9690 9065 -625 
   + Partials   1009  933  -76 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.46% <ø> (+0.01%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `53.96% <83.33%> (+0.39%)` | `0.00 <4.00> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.84% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...g/apache/hudi/sink/StreamWriteOperatorFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JGYWN0b3J5LmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh)
 | `69.37% <20.00%> (+0.23%)` | `32.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `14.28% <50.00%> (-2.39%)` | `2.00 <1.00> (ø)` | |
   | 
[...java/org/apache/hudi/sink/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlRnVuY3Rpb24uamF2YQ==)
 | `85.04% <90.90%> (+1.04%)` | `22.00 <0.00> (ø)` | |
   | 
[...apache/hudi/sink/event/BatchWriteSuccessEvent.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2V2ZW50L0JhdGNoV3JpdGVTdWNjZXNzRXZlbnQuamF2YQ==)
 | `92.30% <100.00%> (+6.59%)` | `9.00 <3.00> (+1.00)` | |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `76.92% <100.00%> (ø)` | `5.00 <0.00> (ø)` | |
   | 
[.../org/apache/hudi/hive/HoodieHiveSyncException.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZVN5bmNFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...java/org/apache/hudi/hive/util/HiveSchemaUtil.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9IaXZlU2NoZW1hVXRpbC5qYXZh)
 | | | |
   | 
[...src/main/java/org/apache/hudi/dla/DLASyncTool.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0RMQVN5bmNUb29sLmphdmE=)
 | | | |
   | 
[...main/java/org/apache/hudi/dla/HoodieDLAClient.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0hvb2RpZURMQUNsaWVudC5qYXZh)
 | | | |
   | ... and [18 
more](https://codecov.io/gh/apache/

[jira] [Created] (HUDI-1694) Preparation for Avro update

2021-03-15 Thread Sebastian Bernauer (Jira)
Sebastian Bernauer created HUDI-1694:


 Summary: Preparation for Avro update
 Key: HUDI-1694
 URL: https://issues.apache.org/jira/browse/HUDI-1694
 Project: Apache Hudi
  Issue Type: Task
  Components: Code Cleanup
Reporter: Sebastian Bernauer


We need to upgrade to at least Avro 1.9.x in production so i tried upgrading 
the avro version in the pom.xml of Hudi. Doing so i noticed some problems:

Upgrade to Avro 1.9.2:
 * Renamed method defaultValue to defaultVal
 * Moved NullNode.getInstance() to JsonProperties.NULL_VALUE
 * Avro complains about invalid schemas/default values in 
hudi-common/src/main/avro/
 * The shaded guava libs from Avro have been removed

Upgrade to Avro 1.10.1:
 * Some more stuff (Not handled in this PR)

Spark 3.2.0 (we currently use 3.1.1) will contain Avro 1.10.1 
(https://issues.apache.org/jira/browse/SPARK-27733).
 Ín order to reduce the effort switching to a newer Avro version in the future 
i provided a patch that fixes the above mentioned issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] aditiwari01 opened a new issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-15 Thread GitBox


aditiwari01 opened a new issue #2675:
URL: https://github.com/apache/hudi/issues/2675


   As per HUDI confluence 
(https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-What'sHudi'sschemaevolutionstory),
   as long as schema is backward compatible, hudi will support seamless 
read/writes.
   
   However, when I try to add a new column to my MOR table, I can successfully 
keep on writing but I can only read in read_optimised manner and not in 
snapshot manner. 
   
   The snapshot query fails with **org.apache.avro.AvroTypeException:missing 
required field newCol**.
   
   Attaching sample spark-shell commands to reproduce the issue on dummy data:
   
[Hudi_sample_commands.txt](https://github.com/apache/hudi/files/6139970/Hudi_sample_commands.txt)
   
   With some debugging the issue seems to be in:
   
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L127
   
   When we try to deserialize the older payloads into newer schema(with 
nullable new column), it fails with the above error.
   
   I tried a workaround wherein if (readerSchema != writerSchema), read as 
writerSchema then convert the payload to readerSchema. This approach is working 
fine for me in my POCs.
   
   However, since Hudi guarnatees schema evolution, I would like to know if I'm 
missing some config or is this a bug? And how does my workaround fits in case 
if it's a bug? We have a usecase where we do not want to constraint on backword 
compatible schema changes and we see MOR as viable fit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1692) Bounded source for stream writer

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1692:
-
Labels: pull-request-available  (was: )

> Bounded source for stream writer
> 
>
> Key: HUDI-1692
> URL: https://issues.apache.org/jira/browse/HUDI-1692
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Supports bounded source such as VALUES for stream mode writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #2674: [HUDI-1692] Bounded source for stream writer

2021-03-15 Thread GitBox


danny0405 opened a new pull request #2674:
URL: https://github.com/apache/hudi/pull/2674


   Supports bounded source such as VALUES for stream mode writer.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sbernauer commented on a change in pull request #2650: Preparation for Avro update

2021-03-15 Thread GitBox


sbernauer commented on a change in pull request #2650:
URL: https://github.com/apache/hudi/pull/2650#discussion_r594155813



##
File path: 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/simple/FlinkHoodieSimpleIndex.java
##
@@ -135,8 +133,8 @@ public boolean isImplicitWithStorage() {
 context.map(latestBaseFiles, partitionPathBaseFile -> new 
HoodieKeyLocationFetchHandle<>(config, hoodieTable, partitionPathBaseFile), 
parallelism);
 Map recordLocations = new HashMap<>();
 hoodieKeyLocationFetchHandles.stream()
-.flatMap(handle -> Lists.newArrayList(handle.locations()).stream())
-.forEach(x -> x.forEach(y -> recordLocations.put(y.getKey(), 
y.getRight(;
+.flatMap(handle -> handle.locations())

Review comment:
   This was changed because of the removal of the used guava libs ;)

##
File path: hudi-common/src/main/avro/HoodieRestoreMetadata.avsc
##
@@ -38,7 +38,6 @@
  /* overlaps with 'instantsToRollback' field. Adding this to track action 
type for all the instants being rolled back. */
  {
"name": "restoreInstantInfo",
-   "default": null,

Review comment:
   The default value of null doesn't match top a field with type array. 
Instead of removing the default value of `null` i now changed it to an empty 
array `[]`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1693) Add document about HUDI Flink integration

2021-03-15 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1693:
-
Summary: Add document about HUDI Flink integration  (was: Bounded source 
for stream writer)

> Add document about HUDI Flink integration
> -
>
> Key: HUDI-1693
> URL: https://issues.apache.org/jira/browse/HUDI-1693
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.8.0
>
>
> Supports bounded source such as VALUES for stream mode writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1693) Bounded source for stream writer

2021-03-15 Thread Danny Chen (Jira)
Danny Chen created HUDI-1693:


 Summary: Bounded source for stream writer
 Key: HUDI-1693
 URL: https://issues.apache.org/jira/browse/HUDI-1693
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.8.0


Supports bounded source such as VALUES for stream mode writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1692) Bounded source for stream writer

2021-03-15 Thread Danny Chen (Jira)
Danny Chen created HUDI-1692:


 Summary: Bounded source for stream writer
 Key: HUDI-1692
 URL: https://issues.apache.org/jira/browse/HUDI-1692
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.8.0


Supports bounded source such as VALUES for stream mode writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2673: [HUDI-1688]hudi write should uncache rdd, when the write operation is finished

2021-03-15 Thread GitBox


xiarixiaoyao edited a comment on pull request #2673:
URL: https://github.com/apache/hudi/pull/2673#issuecomment-799059684


   cc @garyli1019 @nsivabalan  , could you help me review this pr, thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Sugamber commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table

2021-03-15 Thread GitBox


Sugamber commented on issue #2637:
URL: https://github.com/apache/hudi/issues/2637#issuecomment-799211234


   @nsivabalan  Yes, 
   I have added the jar file in both driver and executor class path.
   `spark-submit --jars 
/path/lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar,/path/hudi-support-jars/org.apache.avro_avro-1.8.2.jar,/path/hudi-support-jars/spark-avro_2.11-2.4.4.jar,/path/hudi-support-jars/hudi-spark-bundle_2.11-0.7.0.jar
 --master yarn --deploy-mode cluster --num-executors 2 --executor-cores 4 
--executor-memory 8g --driver-memory=8g --queue=default --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.driver.extraClassPath=org.apache.avro_avro-1.8.2.jar:spark-avro_2.11-2.4.4.jar:hudi-spark-bundle_2.11-0.7.0.jar:/path/lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar
 --conf 
spark.executor.extraClassPath=org.apache.avro_avro-1.8.2.jar:spark-avro_2.11-2.4.4.jar:hudi-spark-bundle_2.11-0.7.0.jar:/path/lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar
 --files /path/hive-site.xml,/path/resources/hudiConf.conf --class 
com.app.workflows.RecordPartialUpdate 
lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar/`
   
   I'm able to find class name in jar using linux command.
   `find /path/orders-poc-1.0.41-SNAPSHOT-shaded.jar|xargs grep 
CustomRecordUpdate
   Binary file /path/orders-poc-1.0.41-SNAPSHOT-shaded.jar matches`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-1684) Tweak hudi-flink-bundle module pom and re-organize the pacakges for hudi-flink module

2021-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1684.
--
Resolution: Done

fc6c5f4285098d18cd7f6e81785f59e68a3b6862

> Tweak hudi-flink-bundle module pom and re-organize the pacakges for 
> hudi-flink module
> -
>
> Key: HUDI-1684
> URL: https://issues.apache.org/jira/browse/HUDI-1684
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> - Add required dependencies for hudi-flink-bundle module
> - Some package reorganize of hudi-flink module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1684) Tweak hudi-flink-bundle module pom and re-organize the pacakges for hudi-flink module

2021-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-1684:
--

Assignee: Danny Chen

> Tweak hudi-flink-bundle module pom and re-organize the pacakges for 
> hudi-flink module
> -
>
> Key: HUDI-1684
> URL: https://issues.apache.org/jira/browse/HUDI-1684
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> - Add required dependencies for hudi-flink-bundle module
> - Some package reorganize of hudi-flink module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (e93c6a5 -> fc6c5f4)

2021-03-15 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from e93c6a5  [HUDI-1496] Fixing input stream detection of GCS FileSystem 
(#2500)
 add fc6c5f4  [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize 
the pacakges for hudi-flink module (#2669)

No new revisions were added by this update.

Summary of changes:
 .../{operator => configuration}/FlinkOptions.java  |   6 +-
 .../hudi/schema/FilebasedSchemaProvider.java   |   2 +-
 .../main/java/org/apache/hudi/sink/CommitSink.java |   1 -
 .../InstantGenerateOperator.java   |   8 +-
 .../KeyedWriteProcessFunction.java |   6 +-
 .../KeyedWriteProcessOperator.java |   3 +-
 .../{operator => sink}/StreamWriteFunction.java|   5 +-
 .../{operator => sink}/StreamWriteOperator.java|   2 +-
 .../StreamWriteOperatorCoordinator.java|   5 +-
 .../StreamWriteOperatorFactory.java|   2 +-
 .../compact/CompactFunction.java   |   2 +-
 .../compact/CompactionCommitEvent.java |   2 +-
 .../compact/CompactionCommitSink.java  |   2 +-
 .../compact/CompactionPlanEvent.java   |   2 +-
 .../compact/CompactionPlanOperator.java|   2 +-
 .../event/BatchWriteSuccessEvent.java  |   6 +-
 .../partitioner/BucketAssignFunction.java  |   4 +-
 .../partitioner/BucketAssigner.java|   2 +-
 .../partitioner/BucketAssigners.java   |   4 +-
 .../partitioner/delta/DeltaBucketAssigner.java |   4 +-
 .../JsonStringToHoodieRecordMapFunction.java   |   2 +-
 .../transform/RowDataToHoodieFunction.java |   4 +-
 .../StreamReadMonitoringFunction.java  |  11 +-
 .../{operator => source}/StreamReadOperator.java   |   6 +-
 .../apache/hudi/streamer/HoodieFlinkStreamer.java  |  12 +-
 .../hudi/streamer/HoodieFlinkStreamerV2.java   |   8 +-
 .../{factory => table}/HoodieTableFactory.java |   6 +-
 .../hudi/{sink => table}/HoodieTableSink.java  |  20 +--
 .../hudi/{source => table}/HoodieTableSource.java  |  20 +--
 .../{source => table}/format/FilePathUtils.java|   4 +-
 .../hudi/{source => table}/format/FormatUtils.java |   6 +-
 .../format/cow/AbstractColumnReader.java   |   2 +-
 .../format/cow/CopyOnWriteInputFormat.java |   2 +-
 .../format/cow/Int64TimestampColumnReader.java |   2 +-
 .../format/cow/ParquetColumnarRowSplitReader.java  |   6 +-
 .../format/cow/ParquetDecimalVector.java   |   2 +-
 .../format/cow/ParquetSplitReaderUtil.java |   2 +-
 .../format/cow/RunLengthDecoder.java   |   2 +-
 .../{source => table}/format/mor/InstantRange.java |   2 +-
 .../format/mor/MergeOnReadInputFormat.java |  14 +-
 .../format/mor/MergeOnReadInputSplit.java  |   2 +-
 .../format/mor/MergeOnReadTableState.java  |   2 +-
 .../apache/hudi/util/RowDataToAvroConverters.java  |   9 +-
 .../java/org/apache/hudi/util/StreamerUtil.java|   2 +-
 .../org.apache.flink.table.factories.TableFactory  |   2 +-
 .../hudi/{operator => sink}/StreamWriteITCase.java |  22 +--
 .../TestStreamWriteOperatorCoordinator.java}   |   8 +-
 .../{operator => sink}/TestWriteCopyOnWrite.java   |  11 +-
 .../{operator => sink}/TestWriteMergeOnRead.java   |   4 +-
 .../TestWriteMergeOnReadWithCompact.java   |   3 +-
 .../partitioner/TestBucketAssigner.java|   4 +-
 .../TestJsonStringToHoodieRecordMapFunction.java   |   2 +-
 .../utils/CompactFunctionWrapper.java  |  14 +-
 .../utils/MockFunctionInitializationContext.java   |   2 +-
 .../{operator => sink}/utils/MockMapState.java |   2 +-
 .../utils/MockOperatorStateStore.java  |   2 +-
 .../utils/MockStreamingRuntimeContext.java |   2 +-
 .../utils/StreamWriteFunctionWrapper.java  |  15 +-
 .../source/TestStreamReadMonitoringFunction.java   |   9 +-
 .../apache/hudi/source/TestStreamReadOperator.java |  16 +-
 .../{source => table}/HoodieDataSourceITCase.java  |  10 +-
 .../{factory => table}/TestHoodieTableFactory.java |   8 +-
 .../{source => table}/TestHoodieTableSource.java   |  10 +-
 .../{source => table}/format/TestInputFormat.java  |  10 +-
 .../{operator => }/utils/TestConfigurations.java   |   4 +-
 .../apache/hudi/{operator => }/utils/TestData.java |   5 +-
 .../test/java/org/apache/hudi/utils/TestUtils.java |   6 +-
 .../utils/factory/CollectSinkTableFactory.java |   2 +-
 .../utils/factory/ContinuousFileSourceFactory.java |   2 +-
 .../hudi/utils/source/ContinuousFileSource.java|   2 +-
 .../org.apache.flink.table.factories.TableFactory  |   2 +-
 packaging/hudi-flink-bundle/pom.xml| 163 -
 72 files changed, 357 insertions(+), 203 deletions(-)
 rename hudi-flink/src/main/java/org/apache/hudi/{operator => 
configuration}/Fli

[GitHub] [hudi] yanghua merged pull request #2669: [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize the pa…

2021-03-15 Thread GitBox


yanghua merged pull request #2669:
URL: https://github.com/apache/hudi/pull/2669


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shenbinglife closed issue #2652: [SUPPORT] I have some questions for hudi clustering

2021-03-15 Thread GitBox


shenbinglife closed issue #2652:
URL: https://github.com/apache/hudi/issues/2652


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shenbinglife commented on issue #2652: [SUPPORT] I have some questions for hudi clustering

2021-03-15 Thread GitBox


shenbinglife commented on issue #2652:
URL: https://github.com/apache/hudi/issues/2652#issuecomment-799201945


   Thanks a lot



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-03-15 Thread GitBox


liujinhui1994 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-799193666


   @n3nash @satishkotha @yanghua  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-03-15 Thread GitBox


liujinhui1994 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-799193044


   The content in this pr https://github.com/apache/hudi/pull/1929 comment is 
resolved here, please review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-03-15 Thread GitBox


n3nash commented on pull request #2374:
URL: https://github.com/apache/hudi/pull/2374#issuecomment-799186819


   @vinothchandar Build succeeds locally and should pass on jenkins (will check 
tomorrow morning), ready for review. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2673: [HUDI-1688]hudi write should uncache rdd, when the write operation is finished

2021-03-15 Thread GitBox


codecov-io edited a comment on pull request #2673:
URL: https://github.com/apache/hudi/pull/2673#issuecomment-799061373


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=h1) Report
   > Merging 
[#2673](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=desc) (2ecca39) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e93c6a569310ce55c5a0fc0655328e7fd32a9da2?el=desc)
 (e93c6a5) will **increase** coverage by `0.01%`.
   > The diff coverage is `85.71%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2673/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2673  +/-   ##
   
   + Coverage 51.99%   52.00%   +0.01% 
 Complexity 3580 3580  
   
 Files   466  466  
 Lines 2227522282   +7 
 Branches   2374 2375   +1 
   
   + Hits  1158111587   +6 
   - Misses 9686 9687   +1 
 Partials   1008 1008  
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiflink | `53.57% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.91% <85.71%> (+0.07%)` | `0.00 <0.00> (ø)` | |
   | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh)
 | `52.15% <85.71%> (+0.79%)` | `0.00 <0.00> (ø)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2673: [HUDI-1688]hudi write should uncache rdd, when the write operation is finished

2021-03-15 Thread GitBox


codecov-io edited a comment on pull request #2673:
URL: https://github.com/apache/hudi/pull/2673#issuecomment-799061373


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=h1) Report
   > Merging 
[#2673](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=desc) (2ecca39) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e93c6a569310ce55c5a0fc0655328e7fd32a9da2?el=desc)
 (e93c6a5) will **decrease** coverage by `0.13%`.
   > The diff coverage is `85.71%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2673/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2673  +/-   ##
   
   - Coverage 51.99%   51.85%   -0.14% 
   + Complexity 3580 3390 -190 
   
 Files   466  445  -21 
 Lines 2227520771-1504 
 Branches   2374 2230 -144 
   
   - Hits  1158110771 -810 
   + Misses 9686 9067 -619 
   + Partials   1008  933  -75 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiflink | `53.57% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.91% <85.71%> (+0.07%)` | `0.00 <0.00> (ø)` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh)
 | `52.15% <85.71%> (+0.79%)` | `0.00 <0.00> (ø)` | |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | | | |
   | 
[.../apache/hudi/timeline/service/TimelineService.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | | | |
   | 
[...va/org/apache/hudi/hive/util/ColumnNameXLator.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db2x1bW5OYW1lWExhdG9yLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | | | |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | | | |
   | 
[...udi/timeline/service/handlers/TimelineHandler.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvVGltZWxpbmVIYW5kbGVyLmphdmE=)
 | | | |
   | 
[...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==)
 | | | |
   | 
[...c/main/java/org/apache/hudi/dla/DLASyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0RMQVN5bmNDb25maWcuamF2YQ==)
 | | | |
   | 
[...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh)
 | | | |
   | ... and [12 
more](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree-more) | |
   



[GitHub] [hudi] liujinhui1994 commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2021-03-15 Thread GitBox


liujinhui1994 commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-799175366


   @nsivabalan   This PR has been modified based on the comments,
   https://github.com/apache/hudi/pull/2666
   Please review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2669: [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize the pa…

2021-03-15 Thread GitBox


codecov-io edited a comment on pull request #2669:
URL: https://github.com/apache/hudi/pull/2669#issuecomment-797515929


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=h1) Report
   > Merging 
[#2669](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=desc) (50d4722) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/20786ab8a2a1e7735ab846e92802fb9f4449adc9?el=desc)
 (20786ab) will **decrease** coverage by `42.48%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2669/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2669   +/-   ##
   
   - Coverage 52.00%   9.52%   -42.49% 
   + Complexity 3579  48 -3531 
   
 Files   465  53  -412 
 Lines 222681963-20305 
 Branches   2375 235 -2140 
   
   - Hits  11581 187-11394 
   + Misses 96761763 -7913 
   + Partials   1011  13  -998 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.52% <ø> (-60.02%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm

[jira] [Created] (HUDI-1691) Enrich HDFS data importer

2021-03-15 Thread vinoyang (Jira)
vinoyang created HUDI-1691:
--

 Summary: Enrich HDFS data importer
 Key: HUDI-1691
 URL: https://issues.apache.org/jira/browse/HUDI-1691
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Utilities
Reporter: vinoyang


Currently, hudi has a utility class named {{HDFSParquetImporter}} , it is used 
to import parquet dataset from HDFS to be a hudi dataset. This class has a 
{{format}} config option, however, it's useless. We'd better enhance this 
importer or introduce other importers to support multiple HDFS input formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)