[GitHub] [incubator-hudi] codecov-io commented on issue #1543: [HUDI-821]:Fix the wrong annotation of JCommander IStringConverter

2020-04-20 Thread GitBox


codecov-io commented on issue #1543:
URL: https://github.com/apache/incubator-hudi/pull/1543#issuecomment-616990309


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1543?src=pr&el=h1) 
Report
   > Merging 
[#1543](https://codecov.io/gh/apache/incubator-hudi/pull/1543?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/ddd105bb3119174b613c6917ee25795f2939f430&el=desc)
 will **decrease** coverage by `0.69%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1543/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1543?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1543  +/-   ##
   
   - Coverage 72.35%   71.66%   -0.70% 
 Complexity  294  294  
   
 Files   374  378   +4 
 Lines 1637716535 +158 
 Branches   1650 1672  +22 
   
 Hits  1184911849  
   - Misses 3797 3954 +157 
   - Partials731  732   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1543?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `92.30% <ø> (-0.12%)` | `0.00 <0.00> (ø)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `78.13% <ø> (ø)` | `11.00 <0.00> (ø)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.39% <ø> (ø)` | `18.00 <0.00> (ø)` | |
   | 
[...di/hadoop/realtime/HoodieRealtimeRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyLmphdmE=)
 | `70.00% <0.00%> (-14.22%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZUhpdmVJbnB1dEZvcm1hdC5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../hadoop/realtime/RealtimeUnmergedRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lVW5tZXJnZWRSZWNvcmRSZWFkZXIuamF2YQ==)
 | `96.96% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/common/util/collection/ArrayUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9BcnJheVV0aWxzLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...di/hadoop/hive/HoodieCombineRealtimeFileSplit.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZVJlYWx0aW1lRmlsZVNwbGl0LmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...op/realtime/HoodieCombineRealtimeRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUNvbWJpbmVSZWFsdGltZVJlY29yZFJlYWRlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...di/hadoop/hive/HoodieCombineRealtimeHiveSplit.java](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZVJlYWx0aW1lSGl2ZVNwbGl0LmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | ... and [1 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1543/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1543?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.c

[jira] [Closed] (HUDI-789) Adjust logic of upsert in HDFSParquetImporter

2020-04-20 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-789.
-
Resolution: Fixed

Fixed via master branch: 84dd9047d3902650d7ff5bc95b9789d6880ca8e2

> Adjust logic of upsert in HDFSParquetImporter
> -
>
> Key: HUDI-789
> URL: https://issues.apache.org/jira/browse/HUDI-789
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HDFSParquetImporter, upsert is equivalent to insert (remove old metadata, 
> then insert). But upsert means update and insert on old data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-823) Typo in quick start guide

2020-04-20 Thread Lisheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Wang updated HUDI-823:
--
Status: In Progress  (was: Open)

> Typo in quick start guide
> -
>
> Key: HUDI-823
> URL: https://issues.apache.org/jira/browse/HUDI-823
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs, docs-chinese
>Affects Versions: 0.5.2
>Reporter: Lisheng Wang
>Assignee: Lisheng Wang
>Priority: Minor
>  Labels: documentation
>
> i fount there is a typo in both chinese or english docs of quick start guide. 
> partition field ({{region/county/city}}) and combine logic ({{ts}} in 
> [schema|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58])
>  to ensure trip records are unique within each partition.
> should be "{{region/country/city"}}, 
> following are urls with typo:
> [https://hudi.apache.org/docs/quick-start-guide.html#insert-data]
> [https://hudi.apache.org/cn/docs/quick-start-guide.html#inserts]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-823) Typo in quick start guide

2020-04-20 Thread Lisheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Wang updated HUDI-823:
--
Status: Open  (was: New)

> Typo in quick start guide
> -
>
> Key: HUDI-823
> URL: https://issues.apache.org/jira/browse/HUDI-823
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs, docs-chinese
>Affects Versions: 0.5.2
>Reporter: Lisheng Wang
>Assignee: Lisheng Wang
>Priority: Minor
>  Labels: documentation
>
> i fount there is a typo in both chinese or english docs of quick start guide. 
> partition field ({{region/county/city}}) and combine logic ({{ts}} in 
> [schema|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58])
>  to ensure trip records are unique within each partition.
> should be "{{region/country/city"}}, 
> following are urls with typo:
> [https://hudi.apache.org/docs/quick-start-guide.html#insert-data]
> [https://hudi.apache.org/cn/docs/quick-start-guide.html#inserts]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-789) Adjust logic of upsert in HDFSParquetImporter

2020-04-20 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-789:
--
Fix Version/s: 0.6.0

> Adjust logic of upsert in HDFSParquetImporter
> -
>
> Key: HUDI-789
> URL: https://issues.apache.org/jira/browse/HUDI-789
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HDFSParquetImporter, upsert is equivalent to insert (remove old metadata, 
> then insert). But upsert means update and insert on old data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-789) Adjust logic of upsert in HDFSParquetImporter

2020-04-20 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-789:
--
Status: Open  (was: New)

> Adjust logic of upsert in HDFSParquetImporter
> -
>
> Key: HUDI-789
> URL: https://issues.apache.org/jira/browse/HUDI-789
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HDFSParquetImporter, upsert is equivalent to insert (remove old metadata, 
> then insert). But upsert means update and insert on old data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-823) Typo in quick start guide

2020-04-20 Thread Lisheng Wang (Jira)
Lisheng Wang created HUDI-823:
-

 Summary: Typo in quick start guide
 Key: HUDI-823
 URL: https://issues.apache.org/jira/browse/HUDI-823
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Docs, docs-chinese
Affects Versions: 0.5.2
Reporter: Lisheng Wang
Assignee: Lisheng Wang


i fount there is a typo in both chinese or english docs of quick start guide. 

partition field ({{region/county/city}}) and combine logic ({{ts}} in 
[schema|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58])
 to ensure trip records are unique within each partition.

should be "{{region/country/city"}}, 

following are urls with typo:

[https://hudi.apache.org/docs/quick-start-guide.html#insert-data]

[https://hudi.apache.org/cn/docs/quick-start-guide.html#inserts]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-789]Adjust logic of upsert in HDFSParquetImporter (#1511)

2020-04-20 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 84dd904  [HUDI-789]Adjust logic of upsert in HDFSParquetImporter 
(#1511)
84dd904 is described below

commit 84dd9047d3902650d7ff5bc95b9789d6880ca8e2
Author: hongdd 
AuthorDate: Tue Apr 21 14:21:30 2020 +0800

[HUDI-789]Adjust logic of upsert in HDFSParquetImporter (#1511)
---
 .../apache/hudi/utilities/HDFSParquetImporter.java |  22 +-
 .../hudi/utilities/TestHDFSParquetImporter.java| 255 -
 2 files changed, 217 insertions(+), 60 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HDFSParquetImporter.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HDFSParquetImporter.java
index f389c58..4befaec 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HDFSParquetImporter.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HDFSParquetImporter.java
@@ -100,6 +100,10 @@ public class HDFSParquetImporter implements Serializable {
 
   }
 
+  private boolean isUpsert() {
+return "upsert".equals(cfg.command.toLowerCase());
+  }
+
   public int dataImport(JavaSparkContext jsc, int retry) {
 this.fs = FSUtils.getFs(cfg.targetPath, jsc.hadoopConfiguration());
 this.props = cfg.propsFilePath == null ? 
UtilHelpers.buildProperties(cfg.configs)
@@ -108,7 +112,7 @@ public class HDFSParquetImporter implements Serializable {
 int ret = -1;
 try {
   // Verify that targetPath is not present.
-  if (fs.exists(new Path(cfg.targetPath))) {
+  if (fs.exists(new Path(cfg.targetPath)) && !isUpsert()) {
 throw new HoodieIOException(String.format("Make sure %s is not 
present.", cfg.targetPath));
   }
   do {
@@ -122,20 +126,22 @@ public class HDFSParquetImporter implements Serializable {
 
   protected int dataImport(JavaSparkContext jsc) throws IOException {
 try {
-  if (fs.exists(new Path(cfg.targetPath))) {
+  if (fs.exists(new Path(cfg.targetPath)) && !isUpsert()) {
 // cleanup target directory.
 fs.delete(new Path(cfg.targetPath), true);
   }
 
+  if (!fs.exists(new Path(cfg.targetPath))) {
+// Initialize target hoodie table.
+Properties properties = new Properties();
+properties.put(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, 
cfg.tableName);
+properties.put(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, 
cfg.tableType);
+
HoodieTableMetaClient.initTableAndGetMetaClient(jsc.hadoopConfiguration(), 
cfg.targetPath, properties);
+  }
+
   // Get schema.
   String schemaStr = UtilHelpers.parseSchema(fs, cfg.schemaFile);
 
-  // Initialize target hoodie table.
-  Properties properties = new Properties();
-  properties.put(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, 
cfg.tableName);
-  properties.put(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, 
cfg.tableType);
-  
HoodieTableMetaClient.initTableAndGetMetaClient(jsc.hadoopConfiguration(), 
cfg.targetPath, properties);
-
   HoodieWriteClient client =
   UtilHelpers.createHoodieClient(jsc, cfg.targetPath, schemaStr, 
cfg.parallelism, Option.empty(), props);
 
diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
index c94edf3..a4711b5 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
@@ -20,6 +20,7 @@ package org.apache.hudi.utilities;
 
 import org.apache.hudi.client.HoodieReadClient;
 import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.HoodieClientTestUtils;
 import org.apache.hudi.common.HoodieTestDataGenerator;
 import org.apache.hudi.common.minicluster.HdfsTestService;
 import org.apache.hudi.common.model.HoodieTestUtils;
@@ -37,8 +38,13 @@ import org.apache.parquet.avro.AvroParquetWriter;
 import org.apache.parquet.hadoop.ParquetWriter;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SQLContext;
+
+import org.junit.After;
 import org.junit.AfterClass;
+import org.junit.Before;
 import org.junit.BeforeClass;
 import org.junit.Test;
 
@@ -50,8 +56,10 @@ import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Map.Entry;
+import java.util.Objects;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
+import java.util.stream.Collectors;
 
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
@@ -75,34 +83,43 @@ pu

[GitHub] [incubator-hudi] jenu9417 commented on issue #1528: [SUPPORT] Issue while writing to HDFS via hudi. Only `/.hoodie` folder is written.

2020-04-20 Thread GitBox


jenu9417 commented on issue #1528:
URL: https://github.com/apache/incubator-hudi/issues/1528#issuecomment-616968792


   @vinothchandar 
   Thanks for replying in detail.
   As you pointed out, premature termination of job seems to be the problem. 
Since this was a POC and dry run, I was using a timer for closing the job after 
x seconds, which seems to close the job before write phase is finished.
   
   But now, the problem is why is write taking more than 40secs, even for as 
simple as 10 records, where average record size is less than a KB.
   
   ```
   91975 [dispatcher-event-loop-0] INFO  
org.apache.spark.scheduler.TaskSetManager  - Starting task 625.0 in stage 19.0 
(TID 11130, localhost, executor driver, partition 625, PROCESS_LOCAL, 7193 
bytes)
   91975 [Executor task launch worker for task 11130] INFO  
org.apache.spark.executor.Executor  - Running task 625.0 in stage 19.0 (TID 
11130)
   91975 [task-result-getter-0] INFO  org.apache.spark.scheduler.TaskSetManager 
 - Finished task 624.0 in stage 19.0 (TID 11129) in 16 ms on localhost 
(executor driver) (624/1500)
   ```
   From the logs, above set of lines were continuously repeating for multiple 
times.
   The stage number was increasing and the same 1500 tasks were run again and 
again. I presume, these 1500 are partitions in rdd? If so, is it 
possible/advisable to reduce the number of partitions in the RDD.
   
   And what all would be the general suggestions to speed up write here.
   
   Happy to provide any other supporting data, if needed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1511: [HUDI-789]Adjust logic of upsert in HDFSParquetImporter

2020-04-20 Thread GitBox


codecov-io edited a comment on issue #1511:
URL: https://github.com/apache/incubator-hudi/pull/1511#issuecomment-612848674


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1511?src=pr&el=h1) 
Report
   > Merging 
[#1511](https://codecov.io/gh/apache/incubator-hudi/pull/1511?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/f5f34bb1c16e6d070668486eba2a29f554c0bbc7&el=desc)
 will **decrease** coverage by `0.49%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1511/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1511?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1511  +/-   ##
   
   - Coverage 72.15%   71.66%   -0.50% 
   - Complexity  290  294   +4 
   
 Files   338  378  +40 
 Lines 1592916535 +606 
 Branches   1625 1672  +47 
   
   + Hits  1149411849 +355 
   - Misses 3704 3954 +250 
   - Partials731  732   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1511?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `61.62% <0.00%> (-27.66%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../org/apache/hudi/table/HoodieMergeOnReadTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllTWVyZ2VPblJlYWRUYWJsZS5qYXZh)
 | `60.00% <0.00%> (-23.13%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh)
 | `38.46% <0.00%> (-15.39%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...di/hadoop/realtime/HoodieRealtimeRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyLmphdmE=)
 | `70.00% <0.00%> (-14.22%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/client/AbstractHoodieWriteClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0Fic3RyYWN0SG9vZGllV3JpdGVDbGllbnQuamF2YQ==)
 | `68.33% <0.00%> (-5.83%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=)
 | `89.28% <0.00%> (-3.03%)` | `14.00% <0.00%> (+2.00%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/index/hbase/HBaseIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvaGJhc2UvSEJhc2VJbmRleC5qYXZh)
 | `83.25% <0.00%> (-0.96%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=)
 | `72.34% <0.00%> (-0.78%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `64.70% <0.00%> (-0.71%)` | `22.00% <0.00%> (+1.00%)` | :arrow_down: |
   | ... and [77 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1511/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1511?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codec

[GitHub] [incubator-hudi] shenh062326 commented on issue #1544: [Minor] Update docs for oss_filesystem

2020-04-20 Thread GitBox


shenh062326 commented on issue #1544:
URL: https://github.com/apache/incubator-hudi/pull/1544#issuecomment-616952742


   @leesf  please help review this patch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] shenh062326 opened a new pull request #1544: [Minor] Update docs for oss_filesystem

2020-04-20 Thread GitBox


shenh062326 opened a new pull request #1544:
URL: https://github.com/apache/incubator-hudi/pull/1544


   ## What is the purpose of the pull request
   
   * Update docs for oss_filesystem
   
   ## Brief change log
   
 - Modify docs/_docs/0_5_oss_filesystem.cn.md
 - Modify docs/_docs/0_5_oss_filesystem.md
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar edited a comment on issue #1543: [HUDI-821]:Fix the wrong annotation of JCommander IStringConverter

2020-04-20 Thread GitBox


bvaradar edited a comment on issue #1543:
URL: https://github.com/apache/incubator-hudi/pull/1543#issuecomment-616939586


   @dengziming : Thanks for your contribution. This is addressing the same 
issue as https://github.com/apache/incubator-hudi/pull/1525. Do you have any 
specific comments/concerns with 
https://github.com/apache/incubator-hudi/pull/1525 . Feel free to review 
https://github.com/apache/incubator-hudi/pull/1525 too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on issue #1543: [HUDI-821]:Fix the wrong annotation of JCommander IStringConverter

2020-04-20 Thread GitBox


bvaradar commented on issue #1543:
URL: https://github.com/apache/incubator-hudi/pull/1543#issuecomment-616939586


   @dengziming : THis is addressing the same issue as 
https://github.com/apache/incubator-hudi/pull/1525. Do you have any specific 
comments/concerns with https://github.com/apache/incubator-hudi/pull/1525 . 
Feel free to review https://github.com/apache/incubator-hudi/pull/1525 too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] yanghua commented on issue #1541: [Minor] Add ability to specify time unit for TimestampBasedKeyGenerator

2020-04-20 Thread GitBox


yanghua commented on issue #1541:
URL: https://github.com/apache/incubator-hudi/pull/1541#issuecomment-616938485


   Hi @afilipchik thanks for your contribution. The correct prefix is "MINOR". 
IMHO, the change of this PR should be filed in JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-821) Fix the wrong annotation of JCommander IStringConverter

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-821:

Labels: pull-request-available  (was: )

> Fix the wrong annotation of JCommander IStringConverter
> ---
>
> Key: HUDI-821
> URL: https://issues.apache.org/jira/browse/HUDI-821
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: dengziming
>Assignee: dengziming
>Priority: Minor
>  Labels: pull-request-available
>
> Please refer to https://github.com/cbeust/jcommander/issues/253.
> If you define a list as argument to be parsed with an IStringConverter, 
> JCommander will create a List> instead of a List.
> we should change `converter = TransformersConverter.class` to `converter = 
> StringConverter.class, listConverter = TransformersConverter.class`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] dengziming opened a new pull request #1543: HUDI-821:Fix the wrong annotation of JCommander IStringConverter

2020-04-20 Thread GitBox


dengziming opened a new pull request #1543:
URL: https://github.com/apache/incubator-hudi/pull/1543


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Please refer to https://github.com/cbeust/jcommander/issues/253.
   If you define a list as argument to be parsed with an IStringConverter, 
JCommander will create a List> instead of a List.
   
   
   ## Brief change log
   
   we should change `converter = TransformersConverter.class` to `converter = 
StringConverter.class, listConverter = TransformersConverter.class`.
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-766) Update Apache Hudi website with usage info about HoodieMultiTableDeltaStreamer

2020-04-20 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-766:
--
Status: In Progress  (was: Open)

> Update Apache Hudi website with usage info about HoodieMultiTableDeltaStreamer
> --
>
> Key: HUDI-766
> URL: https://issues.apache.org/jira/browse/HUDI-766
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs, docs-chinese
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.6.0
>
>
> Relevant Section : 
> [https://hudi.apache.org/docs/writing_data.html#deltastreamer]
> Add high-level description about this tool 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-803) Improve Unit test coverage of HoodieAvroUtils around default values

2020-04-20 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-803:
--
Status: In Progress  (was: Open)

> Improve Unit test coverage of HoodieAvroUtils around default values
> ---
>
> Key: HUDI-803
> URL: https://issues.apache.org/jira/browse/HUDI-803
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Recently there has been lot of work and improvements around schema evolution 
> and HoodieAvroUtils class in particular. Few bugs have already been fixed 
> around this. With the version bump of avro from 1.7.7 to 1.8.2, the flow 
> around default values of Schema.Field has changed significantly. This Jira 
> aims to improve the test coverage of HoodieAvroUtils class so that our 
> functionality remains intact with respect to default values and schema 
> evolution. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-769) Write blog about HoodieMultiTableDeltaStreamer in cwiki

2020-04-20 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-769:
--
Status: In Progress  (was: Open)

> Write blog about HoodieMultiTableDeltaStreamer in cwiki
> ---
>
> Key: HUDI-769
> URL: https://issues.apache.org/jira/browse/HUDI-769
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs, docs-chinese
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-796) Rewrite DedupeSparkJob.scala without considering the _hoodie_commit_time

2020-04-20 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-796:
--
Status: In Progress  (was: Open)

> Rewrite DedupeSparkJob.scala without considering the _hoodie_commit_time
> 
>
> Key: HUDI-796
> URL: https://issues.apache.org/jira/browse/HUDI-796
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> _`_hoodie_commit_time` can only be used for deduping a partition path if 
> duplicates happened due to INSERT operation. In case of updates, bloom filter 
> tags both the files where a record is present for update, and all such files 
> will have the same `___hoodie_commit_time__` for a duplicate record 
> henceforth._ 
> _Hence it makes sense to rewrite this class without considering the metadata 
> field._ 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-480) Support a querying delete data methond in incremental view

2020-04-20 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088263#comment-17088263
 ] 

vinoyang commented on HUDI-480:
---

[~chenxiang] Glad to hear this. We are busy with other things. Please go ahead!

> Support a querying delete data methond in incremental view
> --
>
> Key: HUDI-480
> URL: https://issues.apache.org/jira/browse/HUDI-480
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Incremental Pull
>Reporter: cdmikechen
>Priority: Minor
>
> As we known, hudi have supported many method to query data in Spark and Hive 
> and Presto. And it also provides a very good timeline idea to trace changes 
> in data, and it can be used to query incremental data in incremental view.
> In old time, we just have insert and update funciton to upsert data, and now 
> we have added new functions to delete some existing data.
> *[HUDI-328] Adding delete api to HoodieWriteClient* 
> https://github.com/apache/incubator-hudi/pull/1004
> *[HUDI-377] Adding Delete() support to 
> DeltaStreamer**https://github.com/apache/incubator-hudi/pull/1073
> So I think if we have delete api, should we add another method to get deleted 
> data in incremental view?
> I've looked at the methods for generating new parquet files. I think the main 
> idea is to combine old and new data, and then filter the data which need to 
> be deleted, so that the deleted data does not exist in the new dataset. 
> However, in this way, the data to be deleted will not be retained in new 
> dataset, so that only the inserted or modified data can be found according to 
> the existing timestamp field during data tracing in incremental view.
> If we can do it, I feel that there are two ideas to consider:
> 1. Trace the dataset in the same file at different time check points 
> according to the timeline, compare the two datasets according to the key and 
> filter out the deleted data. This method does not consume extra when writing, 
> but it needs to call the analysis function according to the actual request 
> during query, which consumes a lot.
> 2. When writing data, if there is any deleted data, we will record it. File 
> name such as *.delete_filename_version_timestamp*. So that we can immediately 
> give feedback according to the time. But additional processing will be done 
> at the time of writing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #254

2020-04-20 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.38 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
or

[incubator-hudi] branch master updated: [HUDI-371] Supporting hive combine input format for realtime tables (#1503)

2020-04-20 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 332072b  [HUDI-371] Supporting hive combine input format for realtime 
tables (#1503)
332072b is described below

commit 332072bc6d5fd04e9325d21e55dfac66bc8f848a
Author: n3nash 
AuthorDate: Mon Apr 20 20:40:06 2020 -0700

[HUDI-371] Supporting hive combine input format for realtime tables (#1503)
---
 .../hudi/common/HoodieMergeOnReadTestUtils.java|   4 +-
 .../hudi/common/util/collection/ArrayUtils.java|  62 ++
 hudi-hadoop-mr/pom.xml |   2 +-
 .../hadoop/hive/HoodieCombineHiveInputFormat.java  | 626 -
 .../hive/HoodieCombineRealtimeFileSplit.java   | 169 ++
 .../hive/HoodieCombineRealtimeHiveSplit.java   |  44 ++
 .../realtime/AbstractRealtimeRecordReader.java |   3 +
 .../HoodieCombineRealtimeRecordReader.java | 103 
 .../realtime/HoodieParquetRealtimeInputFormat.java |   2 +-
 .../realtime/HoodieRealtimeRecordReader.java   |   1 +
 .../realtime/RealtimeUnmergedRecordReader.java |  22 +-
 .../apache/hudi/hadoop/InputFormatTestUtil.java| 100 
 .../hudi/hadoop/TestHoodieParquetInputFormat.java  |  28 +-
 .../realtime/TestHoodieCombineHiveInputFormat.java | 160 ++
 .../realtime/TestHoodieRealtimeRecordReader.java   |  91 +--
 15 files changed, 1045 insertions(+), 372 deletions(-)

diff --git 
a/hudi-client/src/test/java/org/apache/hudi/common/HoodieMergeOnReadTestUtils.java
 
b/hudi-client/src/test/java/org/apache/hudi/common/HoodieMergeOnReadTestUtils.java
index 24430fb..1a65a46 100644
--- 
a/hudi-client/src/test/java/org/apache/hudi/common/HoodieMergeOnReadTestUtils.java
+++ 
b/hudi-client/src/test/java/org/apache/hudi/common/HoodieMergeOnReadTestUtils.java
@@ -50,7 +50,7 @@ public class HoodieMergeOnReadTestUtils {
   }
 
   public static List getRecordsUsingInputFormat(List 
inputPaths, String basePath,
-Configuration 
conf) {
+  Configuration conf) {
 JobConf jobConf = new JobConf(conf);
 return getRecordsUsingInputFormat(inputPaths, basePath, jobConf, new 
HoodieParquetRealtimeInputFormat());
   }
@@ -125,4 +125,4 @@ public class HoodieMergeOnReadTestUtils {
 jobConf.set("mapreduce.input.fileinputformat.inputdir", inputPath);
 jobConf.set("map.input.dir", inputPath);
   }
-}
+}
\ No newline at end of file
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ArrayUtils.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ArrayUtils.java
new file mode 100644
index 000..cc76c9d
--- /dev/null
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ArrayUtils.java
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util.collection;
+
+/**
+ * Operations on arrays, primitive arrays (like {@code int[]}) and
+ * primitive wrapper arrays (like {@code Integer[]}).
+ *
+ * This class tries to handle {@code null} input gracefully.
+ * An exception will not be thrown for a {@code null}
+ * array input. However, an Object array that contains a {@code null}
+ * element may throw an exception. Each method documents its behaviour.
+ *
+ * NOTE : Adapted from org.apache.commons.lang3.ArrayUtils
+ */
+public class ArrayUtils {
+
+  /**
+   * An empty immutable {@code long} array.
+   */
+  public static final long[] EMPTY_LONG_ARRAY = new long[0];
+
+  // Long array converters
+  // --
+  /**
+   * Converts an array of object Longs to primitives.
+   *
+   * This method returns {@code null} for a {@code null} input array.
+   *
+   * @param array  a {@code Long} array, may be {@code null}
+   * @return a {@code long} array, {@code null} if null array input
+   * @throws NullPointerException if array content is {@code null}
+   */
+  public static long[] toPrimitive(Long[] array) {
+if (array == null) {
+  return null;
+} els

[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-04-20 Thread cdmikechen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088258#comment-17088258
 ] 

cdmikechen commented on HUDI-83:


[~vinoth] [~arw357] [~uditme] [~xleesf] I have custom a new hudi serde by 
creating a new ObjectInspector which can transform parquet-avro timestamp 
(primary type long and logical type timestamp-micros) type to right 
TimestampWritable class, so that hive can read column correctly. 

There are still some tests on hive and spark. If I finish those, I will let you 
know :)

> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
> Fix For: 0.6.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-04-20 Thread cdmikechen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cdmikechen reassigned HUDI-83:
--

Assignee: cdmikechen

> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
> Fix For: 0.6.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-822) Decouple hoodie related methods with Hoodie Input Formats

2020-04-20 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-822:
---

 Summary: Decouple hoodie related methods with Hoodie Input Formats
 Key: HUDI-822
 URL: https://issues.apache.org/jira/browse/HUDI-822
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: Yanjia Gary Li
Assignee: Yanjia Gary Li


In order to support multiple query engines, we need to generalize the Hudi 
input format and Hudi record merging logic. And decouple from 
MapredParquetInputFormat, which is depending on Hive. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #765: [WIP] Fix KafkaAvroSource to use the latest schema

2020-04-20 Thread GitBox


pratyakshsharma commented on issue #765:
URL: https://github.com/apache/incubator-hudi/pull/765#issuecomment-616917937


   @haiminh87 Still working on this? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-821) Fix the wrong annotation of JCommander IStringConverter

2020-04-20 Thread dengziming (Jira)
dengziming created HUDI-821:
---

 Summary: Fix the wrong annotation of JCommander IStringConverter
 Key: HUDI-821
 URL: https://issues.apache.org/jira/browse/HUDI-821
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: dengziming
Assignee: dengziming


Please refer to https://github.com/cbeust/jcommander/issues/253.
If you define a list as argument to be parsed with an IStringConverter, 
JCommander will create a List> instead of a List.

we should change `converter = TransformersConverter.class` to `converter = 
StringConverter.class, listConverter = TransformersConverter.class`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-351) Implement Range + Bloom Filter checking in one go to improve speed of index

2020-04-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088208#comment-17088208
 ] 

sivabalan narayanan commented on HUDI-351:
--

[~vinoth]: Do you think we still need to work on this given that we have 
BloomIndexV2? 

> Implement Range + Bloom Filter checking in one go to improve speed of index
> ---
>
> Key: HUDI-351
> URL: https://issues.apache.org/jira/browse/HUDI-351
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Index, Performance
>Reporter: Vinoth Chandar
>Priority: Major
>
> Currently, we read the min/max ranges once for range pruning and again read 
> the footer metadata to check for bloom filter..
> Once spark 2.4 support and the 2GB limitations are gone.. worth revisiting if 
> we could do this in a single pass for cases where the bloom filters could fit 
> into memory or implement this check as a RDD operation.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-04-20 Thread GitBox


codecov-io edited a comment on issue #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-616904151


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=h1) 
Report
   > Merging 
[#1538](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/09fd6f64c527e6a822c4e17dc4e61b8fdee28189&el=desc)
 will **increase** coverage by `0.07%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1538/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1538  +/-   ##
   
   + Coverage 72.32%   72.39%   +0.07% 
 Complexity  294  294  
   
 Files   374  374  
 Lines 1636616379  +13 
 Branches   1649 1651   +2 
   
   + Hits  1183611858  +22 
   + Misses 3798 3791   -7 
   + Partials732  730   -2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `95.19% <100.00%> (+2.05%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `84.84% <0.00%> (+0.19%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `56.70% <0.00%> (+6.13%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/exception/SchemaCompatabilityException.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGFiaWxpdHlFeGNlcHRpb24uamF2YQ==)
 | `33.33% <0.00%> (+33.33%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=footer).
 Last update 
[09fd6f6...b13db4d](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io commented on issue #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-04-20 Thread GitBox


codecov-io commented on issue #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-616904151


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=h1) 
Report
   > Merging 
[#1538](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/09fd6f64c527e6a822c4e17dc4e61b8fdee28189&el=desc)
 will **increase** coverage by `0.07%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1538/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1538  +/-   ##
   
   + Coverage 72.32%   72.39%   +0.07% 
 Complexity  294  294  
   
 Files   374  374  
 Lines 1636616379  +13 
 Branches   1649 1651   +2 
   
   + Hits  1183611858  +22 
   + Misses 3798 3791   -7 
   + Partials732  730   -2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `95.19% <100.00%> (+2.05%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `84.84% <0.00%> (+0.19%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `56.70% <0.00%> (+6.13%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/exception/SchemaCompatabilityException.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGFiaWxpdHlFeGNlcHRpb24uamF2YQ==)
 | `33.33% <0.00%> (+33.33%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=footer).
 Last update 
[09fd6f6...b13db4d](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch master updated (ddd105b -> 2a2f31d)

2020-04-20 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from ddd105b  [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable 
for DataSource (#1500)
 add 2a2f31d  [MINOR] Remove reduntant code and fix typo in 
HoodieDefaultTimeline (#1535)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/common/table/timeline/HoodieDefaultTimeline.java| 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)



[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-04-20 Thread GitBox


vinothchandar commented on a change in pull request #1512:
URL: https://github.com/apache/incubator-hudi/pull/1512#discussion_r411783820



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.java
##
@@ -22,7 +22,7 @@
  * Hoodie file format.
  */
 public enum HoodieFileFormat {
-  PARQUET(".parquet"), HOODIE_LOG(".log");
+  PARQUET(".parquet"), HOODIE_LOG(".log"), ORC(".orc");

Review comment:
   again, this can happen in a PR, that actually adds ORC support.. not now 
?>

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
##
@@ -190,6 +190,9 @@ public Operation convert(String value) throws 
ParameterException {
 @Parameter(names = {"--table-type"}, description = "Type of table. 
COPY_ON_WRITE (or) MERGE_ON_READ", required = true)
 public String tableType;
 
+@Parameter(names = {"--table-file-format"}, description = "BaseFileFormat 
of table. PARQUET (or) ORC")

Review comment:
   adding something like this without actual ORC support feel bit premature 
and misleading to me.. for e,g if we release the code in few weeks, 
deltastreams help will be very misleading.. let's wait till we have datasource 
or some level of progress with ORC atleast ? 

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HDFSParquetImporter.java
##
@@ -251,6 +252,8 @@ public void validate(String name, String value) {
 public String tableName = null;
 @Parameter(names = {"--table-type", "-tt"}, description = "Table type", 
required = true)
 public String tableType = null;
+@Parameter(names = {"--table-file-format", "-tff"}, description = "The 
base file storage format")

Review comment:
   same here.. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-820) Fix bug in repair corrupted clean files command

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-820:

Labels: pull-request-available  (was: )

> Fix bug in repair corrupted clean files command
> ---
>
> Key: HUDI-820
> URL: https://issues.apache.org/jira/browse/HUDI-820
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar opened a new pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files

2020-04-20 Thread GitBox


bvaradar opened a new pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542


   @lamber-ken : This is something I missed when reviewing cleaner repair code 
changes. The repair command has a serious bug in that it might delete inflight 
instants of other actions.  
   
   cc @vinothchandar 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-480) Support a querying delete data methond in incremental view

2020-04-20 Thread cdmikechen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088183#comment-17088183
 ] 

cdmikechen commented on HUDI-480:
-

[~vinoth] [~yanghua] Maybe I can open a RFC and write down my own thoughts, and 
then we can discuss the feasibility of the plan

> Support a querying delete data methond in incremental view
> --
>
> Key: HUDI-480
> URL: https://issues.apache.org/jira/browse/HUDI-480
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Incremental Pull
>Reporter: cdmikechen
>Priority: Minor
>
> As we known, hudi have supported many method to query data in Spark and Hive 
> and Presto. And it also provides a very good timeline idea to trace changes 
> in data, and it can be used to query incremental data in incremental view.
> In old time, we just have insert and update funciton to upsert data, and now 
> we have added new functions to delete some existing data.
> *[HUDI-328] Adding delete api to HoodieWriteClient* 
> https://github.com/apache/incubator-hudi/pull/1004
> *[HUDI-377] Adding Delete() support to 
> DeltaStreamer**https://github.com/apache/incubator-hudi/pull/1073
> So I think if we have delete api, should we add another method to get deleted 
> data in incremental view?
> I've looked at the methods for generating new parquet files. I think the main 
> idea is to combine old and new data, and then filter the data which need to 
> be deleted, so that the deleted data does not exist in the new dataset. 
> However, in this way, the data to be deleted will not be retained in new 
> dataset, so that only the inserted or modified data can be found according to 
> the existing timestamp field during data tracing in incremental view.
> If we can do it, I feel that there are two ideas to consider:
> 1. Trace the dataset in the same file at different time check points 
> according to the timeline, compare the two datasets according to the key and 
> filter out the deleted data. This method does not consume extra when writing, 
> but it needs to call the analysis function according to the actual request 
> during query, which consumes a lot.
> 2. When writing data, if there is any deleted data, we will record it. File 
> name such as *.delete_filename_version_timestamp*. So that we can immediately 
> give feedback according to the time. But additional processing will be done 
> at the time of writing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1528: [SUPPORT] Issue while writing to HDFS via hudi. Only `/.hoodie` folder is written.

2020-04-20 Thread GitBox


vinothchandar commented on issue #1528:
URL: https://github.com/apache/incubator-hudi/issues/1528#issuecomment-616877781


   @jenu9417 Thanks for taking the time to report this. 
   
   a) is weird.. The logs do indicate that tasks got scheduled atleast.. but I 
think the job died before getting to write any data.. Do you have access to 
Spark UI? to see how the jobs are doing..
   
   b) So `.parquet()` does not use hudi at all (I suspect).. It uses the Spark 
parquet datasource and you can look at official spark docs to understand how 
you can partition that write (I think `.partitionBy("batch")`). `.save()` will 
invoke the save method of the datasource you configured using `format(...)`.. 
Spark docs will do a better job of explaining this than me :) 
   
   >The query was throwing error that there are no such field called 
_hoodie_commit_time
   
   parquet and hudi are different things.. Only hudi datasets have this field 
   
   c) `.hoodie` will contain all the metadata
   
   d) You can find more on compaction here 
https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture#DesignAndArchitecture-Compaction
 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-820) Fix bug in repair corrupted clean files command

2020-04-20 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-820:
---

 Summary: Fix bug in repair corrupted clean files command
 Key: HUDI-820
 URL: https://issues.apache.org/jira/browse/HUDI-820
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: CLI
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] afilipchik opened a new pull request #1541: [Minor] Add ability to specify time unit for TimestampBasedKeyGenerator

2020-04-20 Thread GitBox


afilipchik opened a new pull request #1541:
URL: https://github.com/apache/incubator-hudi/pull/1541


   ## What is the purpose of the pull request
   
   Adding a way to specify any source time unit for TimestampBasedKeyGenerator. 
   Properties probably need some refactoring, kept unix timestamp for backward 
compatibility
   ## Brief change log
   
   *(for example:)*
 - updated TimestampBasedKeyGenerator
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-760) Remove Rolling Stat management from Hudi Writer

2020-04-20 Thread renyi.bao (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088169#comment-17088169
 ] 

renyi.bao commented on HUDI-760:


[~vbalaji] thanks for your guidance, if I understand it correctly, this issue's 
main purpose is to clean up the related code about rolling stat  from the 
existing logic. I'm interested in trying to solve it 

> Remove Rolling Stat management from Hudi Writer
> ---
>
> Key: HUDI-760
> URL: https://issues.apache.org/jira/browse/HUDI-760
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: help-wanted, newbie,
> Fix For: 0.6.0
>
>
> Current implementation of rolling stat is not scalable. As Consolidated 
> Metadata will be implemented eventually, we can have one design to manage 
> file-level stats too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch asf-site updated: Travis CI build asf-site

2020-04-20 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9b2b8d4  Travis CI build asf-site
9b2b8d4 is described below

commit 9b2b8d4fff6b6181a49a71c3799ed50fc0ef6bf5
Author: CI 
AuthorDate: Mon Apr 20 23:20:22 2020 +

Travis CI build asf-site
---
 content/cn/docs/docker_demo.html | 2 +-
 content/docs/docker_demo.html| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/cn/docs/docker_demo.html b/content/cn/docs/docker_demo.html
index 42abb0d..c74a272 100644
--- a/content/cn/docs/docker_demo.html
+++ b/content/cn/docs/docker_demo.html
@@ -455,7 +455,7 @@ This should pull the docker images from docker hub and 
setup docker cluster.
   HDFS Services (NameNode, DataNode)
   Spark Master and Worker
   Hive Services (Metastore, HiveServer2 along with PostgresDB)
-  Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source 
for the demo)
+  Kafka Broker and a Zookeeper Node (Kafka will be used as upstream source 
for the demo)
   Adhoc containers to run Hudi/Hive CLI commands
 
 
diff --git a/content/docs/docker_demo.html b/content/docs/docker_demo.html
index e08bfc4..c167fec 100644
--- a/content/docs/docker_demo.html
+++ b/content/docs/docker_demo.html
@@ -460,7 +460,7 @@ This should pull the docker images from docker hub and 
setup docker cluster.
   HDFS Services (NameNode, DataNode)
   Spark Master and Worker
   Hive Services (Metastore, HiveServer2 along with PostgresDB)
-  Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source 
for the demo)
+  Kafka Broker and a Zookeeper Node (Kafka will be used as upstream source 
for the demo)
   Adhoc containers to run Hudi/Hive CLI commands
 
 



[jira] [Updated] (HUDI-316) Improve performance of HbaseIndex puts by repartitioning WriteStatus and using rate limiter instead of sleep()

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-316:

Labels: pull-request-available  (was: )

> Improve performance of HbaseIndex puts by repartitioning WriteStatus and 
> using rate limiter instead of sleep()
> --
>
> Key: HUDI-316
> URL: https://issues.apache.org/jira/browse/HUDI-316
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Index
>Reporter: Venkatesh Rudraraju
>Assignee: Venkatesh Rudraraju
>Priority: Major
>  Labels: pull-request-available
>
> * Repartition WriteStatus before index writes, in a way that each WriteStatus 
> with new records are not clubbed together.
>  * This repartition will improve parallelism for this hbase index operation.
>  * In HBaseIndex puts call, there is a sleep of 100 millis for each batch of 
> puts. This implementation assumes negligible time for puts, but for large 
> batches of puts it is inefficient.
>  * Using rate limiter will be efficient compared to sleep as it accounts for 
> the time taken for puts as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-04-20 Thread GitBox


satishkotha commented on a change in pull request #1484:
URL: https://github.com/apache/incubator-hudi/pull/1484#discussion_r411749174



##
File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
##
@@ -322,66 +347,94 @@ private boolean checkIfValidCommit(HoodieTableMetaClient 
metaClient, String comm
   /**
* Helper method to facilitate performing mutations (including puts and 
deletes) in Hbase.
*/
-  private void doMutations(BufferedMutator mutator, List mutations) 
throws IOException {
+  private void doMutations(BufferedMutator mutator, List mutations, 
RateLimiter limiter) throws IOException {
 if (mutations.isEmpty()) {
   return;
 }
+// report number of operations to account per second with rate limiter.
+// If #limiter.getRate() operations are acquired within 1 second, 
ratelimiter will limit the rest of calls
+// for within that second
+limiter.acquire(mutations.size());
 mutator.mutate(mutations);
 mutator.flush();
 mutations.clear();

Review comment:
   another question, what is the typical latency of these mutate 
operations? If time taken here combined with time taken to collect 
'multiPutBatchSize' is > 1 second, then it seems like limiter would generate 
enough tokens for next run and would not wait at all. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-04-20 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish reassigned HUDI-819:
---

Assignee: satish

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-819:

Labels: pull-request-available  (was: )

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Priority: Major
>  Labels: pull-request-available
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] satishkotha opened a new pull request #1540: [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.

2020-04-20 Thread GitBox


satishkotha opened a new pull request #1540:
URL: https://github.com/apache/incubator-hudi/pull/1540


   ## What is the purpose of the pull request
   
   Variable declared here [1] masks protected statuses variable. So although 
hoodie writes data, will not include WriteStatus in the completed section. This 
can cause duplicates being written
   
   [1] 
https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53
   
   ## Brief change log
   
   - Delete MergeOnReadLazyInsertIterable because it is exact same as COW 
except for the type of handle created (HoodieCreateHandle vs HoodieAppendHandle)
   - Added new 'HandleCreator' classes and reuse code in COWLazyInsertIterable
   
   Let me know if you have any other suggestions to improve this code. This 
refactoring also helps me with implementation of 'insert overwrite' features.
   
   ## Verify this pull request
   This pull request is already covered by existing tests in hudi-client 
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-04-20 Thread satish (Jira)
satish created HUDI-819:
---

 Summary: missing write status in MergeOnReadLazyInsertIterable
 Key: HUDI-819
 URL: https://issues.apache.org/jira/browse/HUDI-819
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: satish


Variable declared 
[here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
 masks protected statuses variable. 

So although hoodie writes data, will not include writestatus in the completed 
section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vingov commented on issue #1526: [HUDI-1526] Add pyspark example in quickstart

2020-04-20 Thread GitBox


vingov commented on issue #1526:
URL: https://github.com/apache/incubator-hudi/pull/1526#issuecomment-616768714


   @vinothchandar - This is similar to the blog post draft I have prepared, 
which explains the usage of the hudi reader/writer with pyspark. I will review 
the example code.
   
   @EdwinGuo - If adding a python code tab is difficult with markdown, let's go 
with a separate page for explaining the python usage.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] lamber-ken commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-20 Thread GitBox


lamber-ken commented on issue #1491:
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-616729550


   > @lamber-ken since this has come up a few times, worth tracking a jira for 
0.6 that can help get a better default for this?
   
   Agree, https://issues.apache.org/jira/browse/HUDI-818



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2020-04-20 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-818:

Fix Version/s: 0.6.0

> Optimize the default value of hoodie.memory.merge.max.size option
> -
>
> Key: HUDI-818
> URL: https://issues.apache.org/jira/browse/HUDI-818
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Performance
>Reporter: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2020-04-20 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-818:

Status: Open  (was: New)

> Optimize the default value of hoodie.memory.merge.max.size option
> -
>
> Key: HUDI-818
> URL: https://issues.apache.org/jira/browse/HUDI-818
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Performance
>Reporter: lamber-ken
>Priority: Major
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2020-04-20 Thread lamber-ken (Jira)
lamber-ken created HUDI-818:
---

 Summary: Optimize the default value of 
hoodie.memory.merge.max.size option
 Key: HUDI-818
 URL: https://issues.apache.org/jira/browse/HUDI-818
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Performance
Reporter: lamber-ken


The default value of hoodie.memory.merge.max.size option is incapable of 
meeting their performance requirements

[https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-773) Hudi On Azure Data Lake Storage V2

2020-04-20 Thread Yanjia Gary Li (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087994#comment-17087994
 ] 

Yanjia Gary Li commented on HUDI-773:
-

[~sasikumar.venkat] I haven't tried Databricks Spark myself, but one of my 
colleagues tried that before and have some issues with the Hudi write, probably 
related to yours. As Vinoth mentioned, any debugging info would be helpful. I 
will also try it myself later

> Hudi On Azure Data Lake Storage V2
> --
>
> Key: HUDI-773
> URL: https://issues.apache.org/jira/browse/HUDI-773
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Usability
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

2020-04-20 Thread GitBox


pratyakshsharma commented on a change in pull request #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r411592119



##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
##
@@ -219,14 +220,29 @@ private boolean 
deleteArchivedInstants(List archivedInstants) thr
* @throws IOException in case of error
*/
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant 
thresholdInstant) throws IOException {
-List instants = metaClient.scanHoodieInstantsFromFileSystem(
-new Path(metaClient.getMetaAuxiliaryPath()), 
HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+List instants = null;
+boolean success = true;
+try {
+  instants =
+  metaClient.scanHoodieInstantsFromFileSystem(
+  new Path(metaClient.getMetaAuxiliaryPath()),
+  HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+  false);
+} catch (FileNotFoundException e) {
+  // On some FSs deletion of all files in the directory can auto remove 
the directory itself.
+  // GCS is one example, as it doesn't have real directories and 
subdirectories. When client
+  // removes all the files from a "folder" on GCS is has to create a 
special "/" to keep the folder
+  // around. If this doesn't happen (timeout, misconfigured client, ...) 
folder will be deleted and
+  // in this case we should not break when aux folder is not found.
+  // GCS information: 
(https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork)

Review comment:
   Guess it would be better to use multi line comments here like 
   /*
   *
   */





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-04-20 Thread GitBox


lamber-ken commented on a change in pull request #1512:
URL: https://github.com/apache/incubator-hudi/pull/1512#discussion_r411589627



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -131,6 +131,9 @@ public static void createHoodieProperties(FileSystem fs, 
Path metadataFolder, Pr
 // Use latest Version as default unless forced by client
 properties.setProperty(HOODIE_TIMELINE_LAYOUT_VERSION, 
TimelineLayoutVersion.CURR_VERSION.toString());
   }
+  if (!properties.containsKey(HOODIE_BASE_FILE_FORMAT_PROP_NAME)) {
+properties.setProperty(HOODIE_BASE_FILE_FORMAT_PROP_NAME, 
DEFAULT_BASE_FILE_FORMAT.name());
+  }

Review comment:
   hi @bvaradar, it already exists in master branch.
   ```
   // HoodieTableConfig#getBaseFileFormat
   
   /**
* Get the base file storage format.
*
* @return HoodieFileFormat for the base file Storage format
*/
   public HoodieFileFormat getBaseFileFormat() {
 if (props.containsKey(HOODIE_BASE_FILE_FORMAT_PROP_NAME)) {
   return 
HoodieFileFormat.valueOf(props.getProperty(HOODIE_BASE_FILE_FORMAT_PROP_NAME));
 }
 if (props.containsKey(HOODIE_RO_FILE_FORMAT_PROP_NAME)) {
   return 
HoodieFileFormat.valueOf(props.getProperty(HOODIE_RO_FILE_FORMAT_PROP_NAME));
 }
 return DEFAULT_BASE_FILE_FORMAT;
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

2020-04-20 Thread GitBox


afilipchik commented on a change in pull request #1516:
URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r411560192



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##
@@ -79,6 +79,7 @@
   this.inputStream = fsDataInputStream;
 }
 
+fsDataInputStream.seek(0);

Review comment:
   magicBuffer check was failing on file open, as it couldn't find a 
beginning of the HUDI block in the log file. 
   That was throwing exception that was killing compaction. In the debug, it 
appeared that content of magicBuffer was incorrect, and steam offsets we off. 
Only happened when more that 1 files was scheduled to be processed. So, I 
didn't test all the variations originally (non static magicBuffer without seek 
and seek with static) as was in hurry to fix, so not sure which one actually 
fixes the issue. Agree that seek(0) is weird and probably non static is a fix. 
Added comment and additional check..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-801) Add a way to postprocess schema after it is loaded from the schema provider

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-801:

Labels: pull-request-available  (was: )

> Add a way to postprocess schema after it is loaded from the schema provider
> ---
>
> Key: HUDI-801
> URL: https://issues.apache.org/jira/browse/HUDI-801
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Sometimes it is needed to postprocess schemas after they are fetched from the 
> external sources. Some examples of postprocessing:
>  * make sure all the defaults are set correctly, and update schema if not.
>  * insert marker columns into records with no fields (no writable as parquest)
>  * ...
> Would be great to have a way to plug in custom post processors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1524: [HUDI-801] Adding a way to post process schema after it is fetched

2020-04-20 Thread GitBox


afilipchik commented on a change in pull request #1524:
URL: https://github.com/apache/incubator-hudi/pull/1524#discussion_r411546036



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestSchemaPostProcessor.java
##
@@ -0,0 +1,61 @@
+package org.apache.hudi.utilities;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+import java.io.IOException;
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Type;
+import org.apache.avro.SchemaBuilder;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.utilities.schema.SchemaPostProcessor;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+import org.apache.hudi.utilities.schema.SchemaProvider.Config;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.junit.Test;
+
+public class TestSchemaPostProcessor {
+
+  private TypedProperties properties = new TypedProperties();
+
+  @Test
+  public void testPostProcessor() throws IOException {
+properties.put(Config.SCHEMA_POST_PROCESSOR_PROP, 
DummySchemaPostProcessor.class.getName());
+
+JavaSparkContext jsc =
+UtilHelpers.buildSparkContext(this.getClass().getName() + "-hoodie", 
"local[2]");
+SchemaProvider provider =
+UtilHelpers.createSchemaProvider(DummySchemaProvider.class.getName(), 
properties, jsc);
+
+Schema schema = provider.getSourceSchema();
+assertEquals(schema.getType(), Type.RECORD);
+assertEquals(schema.getName(), "test");
+assertNotNull(schema.getField("testString"));
+  }
+
+  public static class DummySchemaProvider extends SchemaProvider {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-817) Wrong index filter condition check in HoodieGlobalBloomIndex

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-817:

Labels: pull-request-available  (was: )

> Wrong index filter condition check in HoodieGlobalBloomIndex
> 
>
> Key: HUDI-817
> URL: https://issues.apache.org/jira/browse/HUDI-817
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> In HoodieGlobalBloomIndex, wrong condition is checked.
>  
> {code:java}
> IndexFileFilter indexFileFilter =config.getBloomIndexPruneByRanges() 
> ? new IntervalTreeBasedGlobalIndexFileFilter(partitionToFileIndexInfo)
> : new ListBasedGlobalIndexFileFilter(partitionToFileIndexInfo);
> {code}
>  Instead of config.getBloomIndexPruneByRanges(), it should be 
> config.useBloomIndexTreebasedFilter().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] nsivabalan commented on issue #1537: [HUDI-817] fixed building IndexFileFilter with a wrong condition in Hood…

2020-04-20 Thread GitBox


nsivabalan commented on issue #1537:
URL: https://github.com/apache/incubator-hudi/pull/1537#issuecomment-616687122


   LGTM. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1526: [HUDI-1526] Add pyspark example in quickstart

2020-04-20 Thread GitBox


vinothchandar commented on issue #1526:
URL: https://github.com/apache/incubator-hudi/pull/1526#issuecomment-616686836


   @vingov does this supercede your work? Or you could add more on top? Trying 
to understand how’d these two are related.. 
   
   In any case, do you mind reviewing this since you have this working at uber 
anyway. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-817) Wrong index filter condition check in HoodieGlobalBloomIndex

2020-04-20 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-817:


 Summary: Wrong index filter condition check in 
HoodieGlobalBloomIndex
 Key: HUDI-817
 URL: https://issues.apache.org/jira/browse/HUDI-817
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Index
Reporter: sivabalan narayanan
 Fix For: 0.6.0


In HoodieGlobalBloomIndex, wrong condition is checked.

 
{code:java}
IndexFileFilter indexFileFilter =config.getBloomIndexPruneByRanges() ? 
new IntervalTreeBasedGlobalIndexFileFilter(partitionToFileIndexInfo)
: new ListBasedGlobalIndexFileFilter(partitionToFileIndexInfo);
{code}
 Instead of config.getBloomIndexPruneByRanges(), it should be 
config.useBloomIndexTreebasedFilter().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-20 Thread GitBox


vinothchandar commented on issue #1491:
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-616684234


   @lamber-ken  since this has come up a few times, worth tracking a jira for 
0.6 that can help get a better default for this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] n3nash commented on issue #1503: [HUDI-371] : Supporting combine input format RT tables

2020-04-20 Thread GitBox


n3nash commented on issue #1503:
URL: https://github.com/apache/incubator-hudi/pull/1503#issuecomment-616684125


   @vinothchandar no new class, for the existing class -> 
https://github.com/apache/incubator-hudi/blob/master/LICENSE#L206
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] tooptoop4 opened a new issue #1539: [SUPPORT] Migration new inputformat for hive?

2020-04-20 Thread GitBox


tooptoop4 opened a new issue #1539:
URL: https://github.com/apache/incubator-hudi/issues/1539


   
https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
 mentions 2 conflicting names for Read Optimized:
   
   
   View Type | Pre v0.5.0 Input Format Class | v0.5.0 Input Format Class
   -- | -- | --
   Read Optimized View | com.uber.hoodie.hadoop.HoodieInputFormat | 
org.apache.hudi.hadoop.HoodieParquetInputFormat
   Realtime View | com.uber.hoodie.hadoop.HoodieRealtimeInputFormat | 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
   
   
   
   
   For Read Optimized Tables: ALTER TABLE table_name SET FILEFORMAT 
org.apache.hudi.hadoop.HoodieInputFormat;
   For Realtime Tables : ALTER TABLE table_name SET FILEFORMAT 
org.apache.hudi.hadoop.HoodieRealtimeInputFormat;
   
   
   
   
   So my question for COW table should it be 
org.apache.hudi.hadoop.HoodieParquetInputFormat or  
org.apache.hudi.hadoop.HoodieInputFormat ?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

2020-04-20 Thread GitBox


afilipchik commented on a change in pull request #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r411536924



##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
##
@@ -219,14 +220,23 @@ private boolean 
deleteArchivedInstants(List archivedInstants) thr
* @throws IOException in case of error
*/
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant 
thresholdInstant) throws IOException {
-List instants = metaClient.scanHoodieInstantsFromFileSystem(
-new Path(metaClient.getMetaAuxiliaryPath()), 
HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+List instants = null;
+boolean success = true;
+try {
+  instants =
+  metaClient.scanHoodieInstantsFromFileSystem(
+  new Path(metaClient.getMetaAuxiliaryPath()),
+  HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+  false);
+} catch (FileNotFoundException e) {

Review comment:
   Will add comment. On create folder -> no, as GCS will create "folder" 
automatically. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch hudi_test_suite_refactor updated (d29e41e -> 6465dc4)

2020-04-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard d29e41e  [HUDI-394] Provide a basic implementation of test suite
 add 6465dc4  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (d29e41e)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (6465dc4)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 hudi-hadoop-mr/pom.xml  | 4 
 hudi-test-suite/pom.xml | 6 ++
 .../java/org/apache/hudi/testsuite/dag/ComplexDagGenerator.java | 1 -
 3 files changed, 10 insertions(+), 1 deletion(-)



[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-20 Thread GitBox


pratyakshsharma commented on issue #1513:
URL: https://github.com/apache/incubator-hudi/pull/1513#issuecomment-616657717


   > Sure. https://issues.apache.org/jira/browse/HUDI-803 tracks this.
   
   https://github.com/apache/incubator-hudi/pull/1538 is raised for this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-04-20 Thread GitBox


pratyakshsharma commented on issue #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-616650031


   @vinothchandar Please take a pass. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-04-20 Thread GitBox


pratyakshsharma commented on issue #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-616649362


   Few observations related to issues we have faced recently: 
   
   1. If we specify \"default\": null in string schema for a field or specify 
NullNode.getInstance() for default value when defining a field and then invoke 
field.defaultValue(), it returns a NullNode instance which in turn gives 
JsonProperties.Null instance when field.defaultVal() is invoked. 
   When we try to validate such a record in rewrite() function in 
HoodieAvroUtils class, the validate function internally tries to resolve a 
union type schema. At this point JsonProperties.Null type value is not handled. 
Rest of the data types are handled. 
   
   Rest all the cases, defaultVal() either returns proper data type or it comes 
as null i.e defaultValue variable of Field class is null.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

2020-04-20 Thread GitBox


bvaradar commented on a change in pull request #1516:
URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r411494903



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##
@@ -59,7 +59,7 @@
 
   private final FSDataInputStream inputStream;
   private final HoodieLogFile logFile;
-  private static final byte[] MAGIC_BUFFER = new byte[6];
+  private final byte[] magicBuffer = new byte[6];

Review comment:
   makes sense to be non-static.

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##
@@ -79,6 +79,7 @@
   this.inputStream = fsDataInputStream;
 }
 
+fsDataInputStream.seek(0);

Review comment:
   What is the original exception were you seeing ? 
   
   This seek() call is immediately after open() so should be innocuous for 
other file-systems. right ?
   
   It would be better to inline document the reasoning behind this change for 
future understanding.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma opened a new pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-04-20 Thread GitBox


pratyakshsharma opened a new pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-803) Improve Unit test coverage of HoodieAvroUtils around default values

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-803:

Labels: pull-request-available  (was: )

> Improve Unit test coverage of HoodieAvroUtils around default values
> ---
>
> Key: HUDI-803
> URL: https://issues.apache.org/jira/browse/HUDI-803
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Recently there has been lot of work and improvements around schema evolution 
> and HoodieAvroUtils class in particular. Few bugs have already been fixed 
> around this. With the version bump of avro from 1.7.7 to 1.8.2, the flow 
> around default values of Schema.Field has changed significantly. This Jira 
> aims to improve the test coverage of HoodieAvroUtils class so that our 
> functionality remains intact with respect to default values and schema 
> evolution. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

2020-04-20 Thread GitBox


bvaradar commented on a change in pull request #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r411490140



##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
##
@@ -219,14 +220,23 @@ private boolean 
deleteArchivedInstants(List archivedInstants) thr
* @throws IOException in case of error
*/
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant 
thresholdInstant) throws IOException {
-List instants = metaClient.scanHoodieInstantsFromFileSystem(
-new Path(metaClient.getMetaAuxiliaryPath()), 
HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+List instants = null;
+boolean success = true;
+try {
+  instants =
+  metaClient.scanHoodieInstantsFromFileSystem(
+  new Path(metaClient.getMetaAuxiliaryPath()),
+  HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+  false);
+} catch (FileNotFoundException e) {

Review comment:
   @afilipchik : Can you add the above comment in the Exception code block. 
Easy to understand the reason for this code. Also, Does this mean, we need to 
create the "aux" folder next time we create a file under aux ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-773) Hudi On Azure Data Lake Storage V2

2020-04-20 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087869#comment-17087869
 ] 

Vinoth Chandar commented on HUDI-773:
-

[~sasikumar.venkat] Happy to work with you and get this ironed out.. Could you 
please past the entire stack trace? for the error? Not super familiar with 
azure, but that can help start some troubleshooting

> Hudi On Azure Data Lake Storage V2
> --
>
> Key: HUDI-773
> URL: https://issues.apache.org/jira/browse/HUDI-773
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Usability
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-04-20 Thread GitBox


bvaradar commented on a change in pull request #1512:
URL: https://github.com/apache/incubator-hudi/pull/1512#discussion_r411480163



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -131,6 +131,9 @@ public static void createHoodieProperties(FileSystem fs, 
Path metadataFolder, Pr
 // Use latest Version as default unless forced by client
 properties.setProperty(HOODIE_TIMELINE_LAYOUT_VERSION, 
TimelineLayoutVersion.CURR_VERSION.toString());
   }
+  if (!properties.containsKey(HOODIE_BASE_FILE_FORMAT_PROP_NAME)) {
+properties.setProperty(HOODIE_BASE_FILE_FORMAT_PROP_NAME, 
DEFAULT_BASE_FILE_FORMAT.name());
+  }

Review comment:
   Can you also add a getBaseFileFormat() API which returns PARQUET if not 
set in hoodie.properties.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1537: [MINOR] fixed building IndexFileFilter with a wrong condition in Hood…

2020-04-20 Thread GitBox


vinothchandar commented on issue #1537:
URL: https://github.com/apache/incubator-hudi/pull/1537#issuecomment-616637515


   thanks for the catch @Jecarm ..
   
   @nsivabalan can you please review this. This deserves a JIRA since its an 
actual bug fix.. (performance should improve, correctness should be the same)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1536: [HUDI-816] Fix MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread GitBox


vinothchandar commented on issue #1536:
URL: https://github.com/apache/incubator-hudi/pull/1536#issuecomment-616635927


   an accompanying test case would be great!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1503: [HUDI-371] : Supporting combine input format RT tables

2020-04-20 Thread GitBox


vinothchandar commented on issue #1503:
URL: https://github.com/apache/incubator-hudi/pull/1503#issuecomment-616634319


   @bvaradar @n3nash any code that is reused from other projects here? (asking 
since this is Hive and combine input splits)..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch master updated: [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500)

2020-04-20 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ddd105b  [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable 
for DataSource (#1500)
ddd105b is described below

commit ddd105bb3119174b613c6917ee25795f2939f430
Author: Dongwook 
AuthorDate: Mon Apr 20 08:38:18 2020 -0700

[HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for 
DataSource (#1500)
---
 .../org/apache/hudi/config/HoodieWriteConfig.java  | 10 +++
 .../main/java/org/apache/hudi/DataSourceUtils.java | 27 ++-
 hudi-spark/src/test/java/DataSourceTestUtils.java  | 13 
 hudi-spark/src/test/java/DataSourceUtilsTest.java  | 86 ++
 4 files changed, 134 insertions(+), 2 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index 5ac87da..50af725 100644
--- a/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ b/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -57,6 +57,7 @@ public class HoodieWriteConfig extends DefaultHoodieConfig {
   private static final String DEFAULT_PARALLELISM = "1500";
   private static final String INSERT_PARALLELISM = 
"hoodie.insert.shuffle.parallelism";
   private static final String BULKINSERT_PARALLELISM = 
"hoodie.bulkinsert.shuffle.parallelism";
+  private static final String BULKINSERT_USER_DEFINED_PARTITIONER_CLASS = 
"hoodie.bulkinsert.user.defined.partitioner.class";
   private static final String UPSERT_PARALLELISM = 
"hoodie.upsert.shuffle.parallelism";
   private static final String DELETE_PARALLELISM = 
"hoodie.delete.shuffle.parallelism";
   private static final String DEFAULT_ROLLBACK_PARALLELISM = "100";
@@ -157,6 +158,10 @@ public class HoodieWriteConfig extends DefaultHoodieConfig 
{
 return Integer.parseInt(props.getProperty(BULKINSERT_PARALLELISM));
   }
 
+  public String getUserDefinedBulkInsertPartitionerClass() {
+return props.getProperty(BULKINSERT_USER_DEFINED_PARTITIONER_CLASS);
+  }
+
   public int getInsertShuffleParallelism() {
 return Integer.parseInt(props.getProperty(INSERT_PARALLELISM));
   }
@@ -603,6 +608,11 @@ public class HoodieWriteConfig extends DefaultHoodieConfig 
{
   return this;
 }
 
+public Builder withUserDefinedBulkInsertPartitionerClass(String className) 
{
+  props.setProperty(BULKINSERT_USER_DEFINED_PARTITIONER_CLASS, className);
+  return this;
+}
+
 public Builder withParallelism(int insertShuffleParallelism, int 
upsertShuffleParallelism) {
   props.setProperty(INSERT_PARALLELISM, 
String.valueOf(insertShuffleParallelism));
   props.setProperty(UPSERT_PARALLELISM, 
String.valueOf(upsertShuffleParallelism));
diff --git a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java 
b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
index 7a4caac..34f2ef2 100644
--- a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
+++ b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
@@ -25,7 +25,9 @@ import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ReflectionUtils;
+import org.apache.hudi.common.util.StringUtils;
 import org.apache.hudi.config.HoodieCompactionConfig;
 import org.apache.hudi.config.HoodieIndexConfig;
 import org.apache.hudi.config.HoodieWriteConfig;
@@ -36,6 +38,7 @@ import org.apache.hudi.hive.HiveSyncConfig;
 import org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;
 import org.apache.hudi.index.HoodieIndex;
 import org.apache.hudi.keygen.KeyGenerator;
+import org.apache.hudi.table.UserDefinedBulkInsertPartitioner;
 
 import org.apache.avro.LogicalTypes;
 import org.apache.avro.Schema;
@@ -153,6 +156,24 @@ public class DataSourceUtils {
   }
 
   /**
+   * Create a UserDefinedBulkInsertPartitioner class via reflection,
+   * 
+   * if the class name of UserDefinedBulkInsertPartitioner is configured 
through the HoodieWriteConfig.
+   * @see HoodieWriteConfig#getUserDefinedBulkInsertPartitionerClass()
+   */
+  private static Option 
createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)
+  throws HoodieException {
+String bulkInsertPartitionerClass = 
config.getUserDefinedBulkInsertPartitionerClass();
+try {
+  return StringUtils.isNullOrEmpty(bulkInsertPartitionerClass)
+  ? Option.empty() :
+  Option.of((UserDefinedBulkInsertPartitioner) 
ReflectionUtils.loadClass(bulkInsertPartitionerClass));
+} catch (Throwable e) {
+  throw new HoodieException("C

[GitHub] [incubator-hudi] bvaradar commented on issue #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-04-20 Thread GitBox


bvaradar commented on issue #1512:
URL: https://github.com/apache/incubator-hudi/pull/1512#issuecomment-616631319


   @lamber-ken : Storing per-partition specific metadata in hoodie.properties 
wont work as we are not versioning hoodie.properties. There is no atomicity 
guarantees across different cloud storages   for writers. 
   
   I think supporting different file formats within a table is not a priority 
but if we have to do, then we can instead store in .hoodie_partition_metadata 
first time we create it per partition. Hudi Hive Sync needs to then read this 
for each new partition getting added to hive to register the correct input 
format for that partition. I am not sure how Spark, Presto and Impala would 
work in this case. We need to evaluate before venturing out to supporting this.
   
   Coming back to the original objective of this PR, since hoodie.properties is 
effectively write-once, we can use it to store default file format of the 
hoodie table. As part of HoodieTableMetaClient.initTableType alone, we should 
set the file format of table once. TO support existing tables,  we can keep the 
default storage layout as "PARQUET" and if the setting is not present in 
hoodie.properties (for already created tables), we should use "PARQUET" as 
default.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-760) Remove Rolling Stat management from Hudi Writer

2020-04-20 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087835#comment-17087835
 ] 

Balaji Varadarajan commented on HUDI-760:
-

 Hi [~baobaoyeye] : Sorry for the delay. Here are extra information.

 

Hudi had an earlier implementation of keeping all consolidated metadata in 
every "commit" files. We had earlier disabled it by not writing this data. But, 
the code has still references to Rolling stats.

1. org.apache.hudi.common.model.HoodieRollingStatMetadata

2. 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java

3. hudi-client/src/test/java/org/apache/hudi/table/TestMergeOnReadTable.java

4. 
hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientOnCopyOnWriteStorage.java

 

Hope this helps !! Let us know if you plan to take it up. 

Thanks for offering to contribute.

 

Balaji.V

 

> Remove Rolling Stat management from Hudi Writer
> ---
>
> Key: HUDI-760
> URL: https://issues.apache.org/jira/browse/HUDI-760
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: help-wanted, newbie,
> Fix For: 0.6.0
>
>
> Current implementation of rolling stat is not scalable. As Consolidated 
> Metadata will be implemented eventually, we can have one design to manage 
> file-level stats too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-760) Remove Rolling Stat management from Hudi Writer

2020-04-20 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-760:

Status: Open  (was: New)

> Remove Rolling Stat management from Hudi Writer
> ---
>
> Key: HUDI-760
> URL: https://issues.apache.org/jira/browse/HUDI-760
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: help-wanted, newbie,
> Fix For: 0.6.0
>
>
> Current implementation of rolling stat is not scalable. As Consolidated 
> Metadata will be implemented eventually, we can have one design to manage 
> file-level stats too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] Jecarm opened a new pull request #1537: [MINOR] fixed building IndexFileFilter with a wrong condition in Hood…

2020-04-20 Thread GitBox


Jecarm opened a new pull request #1537:
URL: https://github.com/apache/incubator-hudi/pull/1537


   #672  What is the purpose of the pull request
   fixed bug, when building IndexFileFilter with a wrong condition in 
HoodieGlobalBloomIndex class
   
   ## Brief change log
   use a wrong config parameter in build IndexFileFilter in 
HoodieGlobalBloomIndex  class
   
   ## Verify this pull request
   No validation required
   
   ## Committer checklist



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-815) Typo in Demo's document

2020-04-20 Thread Lisheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Wang reassigned HUDI-815:
-

Assignee: Lisheng Wang

> Typo in Demo's document
> ---
>
> Key: HUDI-815
> URL: https://issues.apache.org/jira/browse/HUDI-815
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs
>Reporter: Lisheng Wang
>Assignee: Lisheng Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> i found there is a typo in 
> [https://hudi.apache.org/docs/docker_demo.html#bringing-up-demo-cluster]
> "Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source for 
> the demo)", 
>  should be Kafka
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-816) Fix MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-816:
---
Summary: Fix MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP 
do not work  due to HUDI-678  (was: Fixed MAX_MEMORY_FOR_MERGE_PROP and 
MAX_MEMORY_FOR_COMPACTION_PROP do not work  due to HUDI-678)

> Fix MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work  
> due to HUDI-678
> -
>
> Key: HUDI-816
> URL: https://issues.apache.org/jira/browse/HUDI-816
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-815) Typo in Demo's document

2020-04-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-815.

Fix Version/s: 0.6.0
   Resolution: Fixed

> Typo in Demo's document
> ---
>
> Key: HUDI-815
> URL: https://issues.apache.org/jira/browse/HUDI-815
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs
>Reporter: Lisheng Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> i found there is a typo in 
> [https://hudi.apache.org/docs/docker_demo.html#bringing-up-demo-cluster]
> "Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source for 
> the demo)", 
>  should be Kafka
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-816) Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-816:
---
Status: Open  (was: New)

> Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not 
> work  due to HUDI-678
> ---
>
> Key: HUDI-816
> URL: https://issues.apache.org/jira/browse/HUDI-816
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-816) Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-816:
---
Fix Version/s: 0.6.0

> Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not 
> work  due to HUDI-678
> ---
>
> Key: HUDI-816
> URL: https://issues.apache.org/jira/browse/HUDI-816
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #1536: [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread GitBox


leesf commented on issue #1536:
URL: https://github.com/apache/incubator-hudi/pull/1536#issuecomment-616533827


   @lamber-ken Thanks for reporting this, please take a look when you are free. 
Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-816) Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-816:

Labels: pull-request-available  (was: )

> Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not 
> work  due to HUDI-678
> ---
>
> Key: HUDI-816
> URL: https://issues.apache.org/jira/browse/HUDI-816
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf opened a new pull request #1536: [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread GitBox


leesf opened a new pull request #1536:
URL: https://github.com/apache/incubator-hudi/pull/1536


   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not 
work due to HUDI-678
   
   ## Brief change log
   
   Modify SparkConfigUtils.java
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-816) Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-816:
---
Component/s: (was: Writer Core)

> Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not 
> work  due to HUDI-678
> ---
>
> Key: HUDI-816
> URL: https://issues.apache.org/jira/browse/HUDI-816
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-816) Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678

2020-04-20 Thread leesf (Jira)
leesf created HUDI-816:
--

 Summary: Fixed MAX_MEMORY_FOR_MERGE_PROP and 
MAX_MEMORY_FOR_COMPACTION_PROP do not work  due to HUDI-678
 Key: HUDI-816
 URL: https://issues.apache.org/jira/browse/HUDI-816
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Writer Core
Reporter: leesf
Assignee: leesf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] wangxianghu commented on issue #1535: [MINOR]Remove reduntant code and fix typo in HoodieDefaultTimeline

2020-04-20 Thread GitBox


wangxianghu commented on issue #1535:
URL: https://github.com/apache/incubator-hudi/pull/1535#issuecomment-616526186


   hi @yanghua, could you please take a look, thanks 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-773) Hudi On Azure Data Lake Storage V2

2020-04-20 Thread Sasikumar Venkatesh (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087672#comment-17087672
 ] 

Sasikumar Venkatesh commented on HUDI-773:
--

My Cluster is setup on Databricks. I have attached my storage account in the 
cluster. 

I have tried
 # Added my container in ADLS as a mount point in the Databricks cluster.
 # I have configured a Service Principal in Azure to access it via OAuth. 

I think both the method happens through API. 

 

> Hudi On Azure Data Lake Storage V2
> --
>
> Key: HUDI-773
> URL: https://issues.apache.org/jira/browse/HUDI-773
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Usability
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1535: [MINOR]Remove reduntant code and fix typo in HoodieDefaultTimeline

2020-04-20 Thread GitBox


wangxianghu opened a new pull request #1535:
URL: https://github.com/apache/incubator-hudi/pull/1535


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Remove reduntant code and fix typo in HoodieDefaultTimeline*
   
   ## Brief change log
   
   *Remove reduntant code and fix typo in HoodieDefaultTimeline*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch asf-site updated: [HUDI-815] Fix typo(Kakfa -> Kafka) (#1534)

2020-04-20 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 2bba463  [HUDI-815] Fix typo(Kakfa -> Kafka) (#1534)
2bba463 is described below

commit 2bba4638ba2dc1b89ff27cf06aa38ba66bcf94ba
Author: wanglisheng81 <37138788+wanglishen...@users.noreply.github.com>
AuthorDate: Mon Apr 20 20:22:34 2020 +0800

[HUDI-815] Fix typo(Kakfa -> Kafka) (#1534)
---
 docs/_docs/0_4_docker_demo.cn.md | 2 +-
 docs/_docs/0_4_docker_demo.md| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/_docs/0_4_docker_demo.cn.md b/docs/_docs/0_4_docker_demo.cn.md
index b50e3b4..e313f52 100644
--- a/docs/_docs/0_4_docker_demo.cn.md
+++ b/docs/_docs/0_4_docker_demo.cn.md
@@ -90,7 +90,7 @@ At this point, the docker cluster will be up and running. The 
demo cluster bring
* HDFS Services (NameNode, DataNode)
* Spark Master and Worker
* Hive Services (Metastore, HiveServer2 along with PostgresDB)
-   * Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source 
for the demo)
+   * Kafka Broker and a Zookeeper Node (Kafka will be used as upstream source 
for the demo)
* Adhoc containers to run Hudi/Hive CLI commands
 
 ## Demo
diff --git a/docs/_docs/0_4_docker_demo.md b/docs/_docs/0_4_docker_demo.md
index 448b5c1..f69b72e 100644
--- a/docs/_docs/0_4_docker_demo.md
+++ b/docs/_docs/0_4_docker_demo.md
@@ -91,7 +91,7 @@ At this point, the docker cluster will be up and running. The 
demo cluster bring
* HDFS Services (NameNode, DataNode)
* Spark Master and Worker
* Hive Services (Metastore, HiveServer2 along with PostgresDB)
-   * Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source 
for the demo)
+   * Kafka Broker and a Zookeeper Node (Kafka will be used as upstream source 
for the demo)
* Adhoc containers to run Hudi/Hive CLI commands
 
 ## Demo



[jira] [Updated] (HUDI-815) Typo in Demo's document

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-815:

Labels: pull-request-available  (was: )

> Typo in Demo's document
> ---
>
> Key: HUDI-815
> URL: https://issues.apache.org/jira/browse/HUDI-815
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs
>Reporter: Lisheng Wang
>Priority: Minor
>  Labels: pull-request-available
>
> i found there is a typo in 
> [https://hudi.apache.org/docs/docker_demo.html#bringing-up-demo-cluster]
> "Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source for 
> the demo)", 
>  should be Kafka
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >