[incubator-hudi] branch master updated: [HUDI-852] adding check for table name for Append Save mode (#1580)

2020-05-03 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e0f5e5  [HUDI-852] adding check for table name for Append Save mode  
(#1580)
5e0f5e5 is described below

commit 5e0f5e5521c52a08c284a92a3a6a00e34805cce5
Author: AakashPradeep 
AuthorDate: Sun May 3 23:09:17 2020 -0700

[HUDI-852] adding check for table name for Append Save mode  (#1580)

* adding check for table name for Append Save mode

* adding existing table validation for delete and upsert operation

Co-authored-by: Aakash Pradeep 
---
 .../org/apache/hudi/HoodieSparkSqlWriter.scala |  9 +++-
 .../apache/hudi/HoodieSparkSqlWriterSuite.scala| 60 --
 2 files changed, 64 insertions(+), 5 deletions(-)

diff --git 
a/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala 
b/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
index 66b6145..5456782 100644
--- a/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
+++ b/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
@@ -25,11 +25,11 @@ import org.apache.hadoop.fs.{FileSystem, Path}
 import org.apache.hadoop.hive.conf.HiveConf
 import org.apache.hudi.DataSourceWriteOptions._
 import org.apache.hudi.client.{HoodieWriteClient, WriteStatus}
+import org.apache.hudi.common.config.TypedProperties
 import org.apache.hudi.common.fs.FSUtils
 import org.apache.hudi.common.model.HoodieRecordPayload
 import org.apache.hudi.common.table.HoodieTableMetaClient
 import org.apache.hudi.common.table.timeline.HoodieActiveTimeline
-import org.apache.hudi.common.config.TypedProperties
 import org.apache.hudi.config.HoodieWriteConfig
 import org.apache.hudi.exception.HoodieException
 import org.apache.hudi.hive.{HiveSyncConfig, HiveSyncTool}
@@ -83,6 +83,13 @@ private[hudi] object HoodieSparkSqlWriter {
 val fs = basePath.getFileSystem(sparkContext.hadoopConfiguration)
 var exists = fs.exists(new Path(basePath, 
HoodieTableMetaClient.METAFOLDER_NAME))
 
+if (exists && mode == SaveMode.Append) {
+  val existingTableName = new 
HoodieTableMetaClient(sparkContext.hadoopConfiguration, 
path.get).getTableConfig.getTableName
+  if (!existingTableName.equals(tblName.get)) {
+throw new HoodieException(s"hoodie table with name $existingTableName 
already exist at $basePath")
+  }
+}
+
 val (writeStatuses, writeClient: 
HoodieWriteClient[HoodieRecordPayload[Nothing]]) =
   if (!operation.equalsIgnoreCase(DELETE_OPERATION_OPT_VAL)) {
   // register classes & schemas
diff --git 
a/hudi-spark/src/test/scala/org/apache/hudi/HoodieSparkSqlWriterSuite.scala 
b/hudi-spark/src/test/scala/org/apache/hudi/HoodieSparkSqlWriterSuite.scala
index 58ca984..bb82f8d 100644
--- a/hudi-spark/src/test/scala/org/apache/hudi/HoodieSparkSqlWriterSuite.scala
+++ b/hudi-spark/src/test/scala/org/apache/hudi/HoodieSparkSqlWriterSuite.scala
@@ -17,6 +17,9 @@
 
 package org.apache.hudi
 
+import java.util.{Date, UUID}
+
+import org.apache.commons.io.FileUtils
 import org.apache.hudi.DataSourceWriteOptions._
 import org.apache.hudi.config.HoodieWriteConfig
 import org.apache.hudi.exception.HoodieException
@@ -43,10 +46,59 @@ class HoodieSparkSqlWriterSuite extends FunSuite with 
Matchers {
 
   test("throw hoodie exception when invalid serializer") {
 val session = 
SparkSession.builder().appName("hoodie_test").master("local").getOrCreate()
-val sqlContext = session.sqlContext
-val options = Map("path" -> "hoodie/test/path", 
HoodieWriteConfig.TABLE_NAME -> "hoodie_test_tbl")
-val e = intercept[HoodieException](HoodieSparkSqlWriter.write(sqlContext, 
SaveMode.ErrorIfExists, options, session.emptyDataFrame))
-assert(e.getMessage.contains("spark.serializer"))
+try {
+  val sqlContext = session.sqlContext
+  val options = Map("path" -> "hoodie/test/path", 
HoodieWriteConfig.TABLE_NAME -> "hoodie_test_tbl")
+  val e = 
intercept[HoodieException](HoodieSparkSqlWriter.write(sqlContext, 
SaveMode.ErrorIfExists, options, session.emptyDataFrame))
+  assert(e.getMessage.contains("spark.serializer"))
+} finally {
+  session.stop()
+}
+  }
+
+
+  test("throw hoodie exception when there already exist a table with different 
name with Append Save mode") {
+
+val session = SparkSession.builder()
+  .appName("test_append_mode")
+  .master("local[2]")
+  .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
+  .getOrCreate()
+val path = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+try {
+
+  val sqlContext = session.sqlContext
+  val hoodieFooTableName = "hoodie_foo_tbl"
+
+  //create a new table
+  val fooTableModifier = Map("path" -> path.toAbsolutePath.toString,
+H

[GitHub] [incubator-hudi] bhasudha commented on pull request #1580: [HUDI-852] adding check for table name for Append Save mode

2020-05-03 Thread GitBox


bhasudha commented on pull request #1580:
URL: https://github.com/apache/incubator-hudi/pull/1580#issuecomment-623275628


   LGtM. Merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-813) Migrate hudi-utilities tests to JUnit 5

2020-05-03 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-813.
-
Resolution: Done

Done via master branch: 096f7f55b2553265c0b72f42a1eb7f291e5626ad

> Migrate hudi-utilities tests to JUnit 5
> ---
>
> Key: HUDI-813
> URL: https://issues.apache.org/jira/browse/HUDI-813
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-813] Migrate hudi-utilities tests to JUnit 5 (#1589)

2020-05-03 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 096f7f5  [HUDI-813] Migrate hudi-utilities tests to JUnit 5 (#1589)
096f7f5 is described below

commit 096f7f55b2553265c0b72f42a1eb7f291e5626ad
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun May 3 21:43:42 2020 -0700

[HUDI-813] Migrate hudi-utilities tests to JUnit 5 (#1589)
---
 .../TestAWSDatabaseMigrationServiceSource.java | 16 ++--
 .../hudi/utilities/TestHDFSParquetImporter.java| 33 
 .../hudi/utilities/TestHiveIncrementalPuller.java  | 15 ++--
 .../hudi/utilities/TestHoodieDeltaStreamer.java| 88 ++
 .../TestHoodieMultiTableDeltaStreamer.java | 78 +--
 .../utilities/TestJdbcbasedSchemaProvider.java | 12 +--
 .../hudi/utilities/TestSchedulerConfGenerator.java | 16 ++--
 .../utilities/TestTimestampBasedKeyGenerator.java  | 16 ++--
 .../org/apache/hudi/utilities/TestUtilHelpers.java | 48 ++--
 .../apache/hudi/utilities/UtilitiesTestBase.java   | 16 ++--
 .../utilities/inline/fs/TestParquetInLining.java   | 10 +--
 .../sources/AbstractDFSSourceTestBase.java | 22 +++---
 .../hudi/utilities/sources/TestCsvDFSSource.java   |  4 +-
 .../hudi/utilities/sources/TestJsonDFSSource.java  |  4 +-
 .../hudi/utilities/sources/TestKafkaSource.java| 20 ++---
 .../utilities/sources/TestParquetDFSSource.java|  4 +-
 .../transform/TestFlatteningTransformer.java   |  4 +-
 17 files changed, 192 insertions(+), 214 deletions(-)

diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestAWSDatabaseMigrationServiceSource.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestAWSDatabaseMigrationServiceSource.java
index d015a42..1fb45f0 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestAWSDatabaseMigrationServiceSource.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestAWSDatabaseMigrationServiceSource.java
@@ -28,29 +28,29 @@ import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SparkSession;
-import org.junit.AfterClass;
-import org.junit.BeforeClass;
-import org.junit.Test;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
 
 import java.io.IOException;
 import java.io.Serializable;
 import java.util.Arrays;
 
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertTrue;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
 
 public class TestAWSDatabaseMigrationServiceSource {
 
   private static JavaSparkContext jsc;
   private static SparkSession spark;
 
-  @BeforeClass
+  @BeforeAll
   public static void setupTest() {
 jsc = UtilHelpers.buildSparkContext("aws-dms-test", "local[2]");
 spark = SparkSession.builder().config(jsc.getConf()).getOrCreate();
   }
 
-  @AfterClass
+  @AfterAll
   public static void tearDownTest() {
 if (jsc != null) {
   jsc.stop();
@@ -99,7 +99,7 @@ public class TestAWSDatabaseMigrationServiceSource {
 new Record("2", 3433L)), Record.class);
 
 Dataset outputFrame = transformer.apply(jsc, spark, inputFrame, null);
-assertTrue(Arrays.asList(outputFrame.schema().fields()).stream()
+assertTrue(Arrays.stream(outputFrame.schema().fields())
 .map(f -> f.name()).anyMatch(n -> 
n.equals(AWSDmsAvroPayload.OP_FIELD)));
 
assertTrue(outputFrame.select(AWSDmsAvroPayload.OP_FIELD).collectAsList().stream()
 .allMatch(r -> r.getString(0).equals("")));
diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
index a4711b5..cf6cf75 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHDFSParquetImporter.java
@@ -41,12 +41,11 @@ import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SQLContext;
-
-import org.junit.After;
-import org.junit.AfterClass;
-import org.junit.Before;
-import org.junit.BeforeClass;
-import org.junit.Test;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
 
 import java.io.IOException;
 import java.io.Serializable;
@@ -61,8 +60,8 @@ import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.stream.

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #267

2020-05-03 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.34 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
or

[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1590: [HUDI-812] Migrate hudi common tests to JUnit 5

2020-05-03 Thread GitBox


codecov-io edited a comment on pull request #1590:
URL: https://github.com/apache/incubator-hudi/pull/1590#issuecomment-623236291


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=h1) 
Report
   > Merging 
[#1590](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1590/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1590  +/-   ##
   
   + Coverage 71.82%   71.84%   +0.01% 
 Complexity  294  294  
   
 Files   385  385  
 Lines 1654916549  
 Branches   1661 1661  
   
   + Hits  1188611889   +3 
   + Misses 3931 3928   -3 
 Partials732  732  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1590/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=footer).
 Last update 
[506447f...717bddb](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1590: [HUDI-812] Migrate hudi common tests to JUnit 5

2020-05-03 Thread GitBox


codecov-io edited a comment on pull request #1590:
URL: https://github.com/apache/incubator-hudi/pull/1590#issuecomment-623236291


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=h1) 
Report
   > Merging 
[#1590](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1590/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1590  +/-   ##
   
   + Coverage 71.82%   71.84%   +0.01% 
 Complexity  294  294  
   
 Files   385  385  
 Lines 1654916549  
 Branches   1661 1661  
   
   + Hits  1188611889   +3 
   + Misses 3931 3928   -3 
 Partials732  732  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1590/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=footer).
 Last update 
[506447f...717bddb](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1590: [HUDI-812] Migrate hudi common tests to JUnit 5

2020-05-03 Thread GitBox


codecov-io edited a comment on pull request #1590:
URL: https://github.com/apache/incubator-hudi/pull/1590#issuecomment-623236291


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=h1) 
Report
   > Merging 
[#1590](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1590/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1590  +/-   ##
   
   + Coverage 71.82%   71.83%   +0.01% 
 Complexity  294  294  
   
 Files   385  385  
 Lines 1654916549  
 Branches   1661 1661  
   
   + Hits  1188611888   +2 
   + Misses 3931 3929   -2 
 Partials732  732  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1590/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0.00%> (-0.88%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1590/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=footer).
 Last update 
[506447f...717bddb](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io commented on pull request #1590: [HUDI-812] Migrate hudi common tests to JUnit 5

2020-05-03 Thread GitBox


codecov-io commented on pull request #1590:
URL: https://github.com/apache/incubator-hudi/pull/1590#issuecomment-623236291


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=h1) 
Report
   > Merging 
[#1590](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1590/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1590  +/-   ##
   
   + Coverage 71.82%   71.83%   +0.01% 
 Complexity  294  294  
   
 Files   385  385  
 Lines 1654916549  
 Branches   1661 1661  
   
   + Hits  1188611888   +2 
   + Misses 3931 3929   -2 
 Partials732  732  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1590/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0.00%> (-0.88%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1590/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=footer).
 Last update 
[506447f...717bddb](https://codecov.io/gh/apache/incubator-hudi/pull/1590?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1584: fix schema provider issue

2020-05-03 Thread GitBox


xushiyan commented on pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-623232249


   @pratyakshsharma thanks for checking..will circle back to your comments 
later. :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1590: [HUDI-812] Migrate hudi common tests to JUnit 5

2020-05-03 Thread GitBox


xushiyan commented on pull request #1590:
URL: https://github.com/apache/incubator-hudi/pull/1590#issuecomment-623231117


   @yanghua This marks completion of API migration. Once #1589 merged, I'll 
rebase this on master and update checkstyle to ban junit 4 import.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan opened a new pull request #1590: [HUDI-812] Migrate hudi common tests

2020-05-03 Thread GitBox


xushiyan opened a new pull request #1590:
URL: https://github.com/apache/incubator-hudi/pull/1590


   Migrate the test cases in hudi-common to JUnit 5.
   
   Follows #1589 
   
   ### Migration status (after merging)
   
   | Package | JUnit 5 lib | API migration | Restructure packages |
   | --- | --- | --- | --- |
   | `hudi-cli` | ✅ | ✅ | - |
   | `hudi-client` | ✅ | ✅ | - |
   | `hudi-common` | ✅ | ✅ | 🚧 |
   | `hudi-hadoop-mr` | ✅ | ✅ | - |
   | `hudi-hive-sync` | ✅ | ✅ | - |
   | `hudi-integ-test` | ✅ | ✅  | N.A. |
   | `hudi-spark` | ✅ | ✅ | - |
   | `hudi-timeline-service` | ✅ | ✅ | - |
   | `hudi-utilities` | ✅ | ✅ | - |
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-812) Migrate hudi-common tests to JUnit 5

2020-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-812:

Labels: pull-request-available  (was: )

> Migrate hudi-common tests to JUnit 5
> 
>
> Key: HUDI-812
> URL: https://issues.apache.org/jira/browse/HUDI-812
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-05-03 Thread GitBox


codecov-io edited a comment on pull request #1149:
URL: https://github.com/apache/incubator-hudi/pull/1149#issuecomment-623229474


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=h1) 
Report
   > Merging 
[#1149](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97&el=desc)
 will **decrease** coverage by `0.22%`.
   > The diff coverage is `55.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1149/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1149  +/-   ##
   
   - Coverage 71.82%   71.59%   -0.23% 
 Complexity  294  294  
   
 Files   385  393   +8 
 Lines 1654916659 +110 
 Branches   1661 1663   +2 
   
   + Hits  1188611927  +41 
   - Misses 3931 4000  +69 
 Partials732  732  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...sert/BulkInsertMapFunctionForNonSortedRecords.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvQnVsa0luc2VydE1hcEZ1bmN0aW9uRm9yTm9uU29ydGVkUmVjb3Jkcy5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[.../hudi/execution/bulkinsert/NonSortPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvTm9uU29ydFBhcnRpdGlvbmVyLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...n/bulkinsert/RDDPartitionLocalSortPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvUkREUGFydGl0aW9uTG9jYWxTb3J0UGFydGl0aW9uZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...ution/bulkinsert/RDDPartitionRangePartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvUkREUGFydGl0aW9uUmFuZ2VQYXJ0aXRpb25lci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...tion/bulkinsert/BulkInsertInternalPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvQnVsa0luc2VydEludGVybmFsUGFydGl0aW9uZXIuamF2YQ==)
 | `53.84% <53.84%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...che/hudi/table/action/commit/BulkInsertHelper.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9CdWxrSW5zZXJ0SGVscGVyLmphdmE=)
 | `74.19% <62.50%> (-10.81%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `84.14% <80.76%> (-0.71%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/execution/CopyOnWriteInsertHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlSW5zZXJ0SGFuZGxlci5qYXZh)
 | `94.11% <94.11%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[.../org/apache/hudi/execution/LazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0xhenlJbnNlcnRJdGVyYWJsZS5qYXZh)
 | `79.41% <100.00%> (-0.99%)` | `0.00 <0.00> (ø)` | |
   | 
[...di/execution/bulkinsert/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvQnVsa0luc2VydE1hcEZ1bmN0aW9uLmphdmE=)
 | `75.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | ... and [13 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
 

[GitHub] [incubator-hudi] codecov-io commented on pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-05-03 Thread GitBox


codecov-io commented on pull request #1149:
URL: https://github.com/apache/incubator-hudi/pull/1149#issuecomment-623229474


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=h1) 
Report
   > Merging 
[#1149](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97&el=desc)
 will **decrease** coverage by `0.22%`.
   > The diff coverage is `55.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1149/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1149  +/-   ##
   
   - Coverage 71.82%   71.59%   -0.23% 
 Complexity  294  294  
   
 Files   385  393   +8 
 Lines 1654916659 +110 
 Branches   1661 1663   +2 
   
   + Hits  1188611927  +41 
   - Misses 3931 4000  +69 
 Partials732  732  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...sert/BulkInsertMapFunctionForNonSortedRecords.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvQnVsa0luc2VydE1hcEZ1bmN0aW9uRm9yTm9uU29ydGVkUmVjb3Jkcy5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[.../hudi/execution/bulkinsert/NonSortPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvTm9uU29ydFBhcnRpdGlvbmVyLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...n/bulkinsert/RDDPartitionLocalSortPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvUkREUGFydGl0aW9uTG9jYWxTb3J0UGFydGl0aW9uZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...ution/bulkinsert/RDDPartitionRangePartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvUkREUGFydGl0aW9uUmFuZ2VQYXJ0aXRpb25lci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...tion/bulkinsert/BulkInsertInternalPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvQnVsa0luc2VydEludGVybmFsUGFydGl0aW9uZXIuamF2YQ==)
 | `53.84% <53.84%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...che/hudi/table/action/commit/BulkInsertHelper.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9CdWxrSW5zZXJ0SGVscGVyLmphdmE=)
 | `74.19% <62.50%> (-10.81%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `84.14% <80.76%> (-0.71%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/execution/CopyOnWriteInsertHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlSW5zZXJ0SGFuZGxlci5qYXZh)
 | `94.11% <94.11%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[.../org/apache/hudi/execution/LazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0xhenlJbnNlcnRJdGVyYWJsZS5qYXZh)
 | `79.41% <100.00%> (-0.99%)` | `0.00 <0.00> (ø)` | |
   | 
[...di/execution/bulkinsert/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL2J1bGtpbnNlcnQvQnVsa0luc2VydE1hcEZ1bmN0aW9uLmphdmE=)
 | `75.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | ... and [13 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1149/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1149?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ 

[GitHub] [incubator-hudi] AakashPradeep commented on a change in pull request #1580: [HUDI-852] adding check for table name for Append Save mode

2020-05-03 Thread GitBox


AakashPradeep commented on a change in pull request #1580:
URL: https://github.com/apache/incubator-hudi/pull/1580#discussion_r419180984



##
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -118,6 +118,12 @@ private[hudi] object HoodieSparkSqlWriter {
 fs.delete(basePath, true)
 exists = false
   }
+  if (exists && mode == SaveMode.Append) {

Review comment:
   @bhasudha please review! 
   
   Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch asf-site updated: Travis CI build asf-site

2020-05-03 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 79fb998  Travis CI build asf-site
79fb998 is described below

commit 79fb9989614909d23885760b253d446a38ea66b5
Author: CI 
AuthorDate: Mon May 4 00:20:21 2020 +

Travis CI build asf-site
---
 content/docs/quick-start-guide.html | 269 +---
 1 file changed, 249 insertions(+), 20 deletions(-)

diff --git a/content/docs/quick-start-guide.html 
b/content/docs/quick-start-guide.html
index 8e40382..e8cbbf7 100644
--- a/content/docs/quick-start-guide.html
+++ b/content/docs/quick-start-guide.html
@@ -4,7 +4,7 @@
 
 
 Quick-Start Guide - Apache Hudi
-
+
 
 
 
@@ -13,7 +13,7 @@
 https://hudi.apache.org/docs/quick-start-guide.html";>
 
 
-  
+  
 
 
 
@@ -335,14 +335,29 @@
   
  IN 
THIS PAGE
 
-  Setup spark-shell
-  Insert data
-  Query data
-  Update data
-  Incremental query
-  Point in time query
-  Delete data
-  Where to go from here?
+  Scala example
+
+  Setup
+  Insert data
+  Query data
+  Update data
+  Incremental query
+  Point in time query
+  Delete data
+
+  
+  Pyspark example
+
+  Setup
+  Insert data
+  Query data
+  Update data
+  Incremental query
+  Point in time query
+  Delete data
+  Where to go from here?
+
+  
 
   
 
@@ -351,13 +366,15 @@
 code snippets that allows you to insert and update a Hudi table of default 
table type: 
 Copy on Write. 
 After each write operation we will also show how to read the data both 
snapshot and incrementally.
+Scala example
 
-Setup spark-shell
+Setup
 
 Hudi works with Spark-2.x versions. You can follow instructions https://spark.apache.org/downloads.html";>here for setting up spark. 
 From the extracted directory run spark-shell with Hudi as:
 
-spark-2.4.4-bin-hadoop2.7/bin/spark// spark-shell
+spark-2.4.4-bin-hadoop2.7/bin/spark-shell 
\
   --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating, [...]
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 
@@ -374,7 +391,8 @@ From the extracted directory run spark-shell with Hudi 
as:
 
 Setup table name, base path and a data generator to generate records for 
this guide.
 
-import org.apache.hudi.QuickstartUtils._
+// spark-shell
+import org.apache.hudi.QuickstartUtils._
 import scala.collection.JavaConversions._
 import org.apache.spark.sql.SaveMode._
 import org.apache.hudi.DataSourceReadOptions._
@@ -393,7 +411,8 @@ can generate sample inserts and updates based on the the 
sample trip schema Generate some new trips, load them into a DataFrame and write the DataFrame 
into the Hudi table as below.
 
-val inserts = convertToStringList(dataGen.generateInserts(10))
+// spark-shell
+val inserts = convertToStringList(dataGen.generateInserts(10))
 val df = 
spark.read.json(spark.sparkContext.parallelize(inserts, 2df.write.format("hudi").
   options(getQuickstartWriteConfigs).
@@ -418,7 +437,8 @@ Here we are using the default write operation : 
 
 Load the data files into a DataFrame.
 
-val tripsSnapshotDF = spark.
+// spark-shell
+val tripsSnapshotDF 
= spark.
   read.
   format("hudi").
   load(basePath + "/*/*/*/*")
@@ -437,7 +457,8 @@ Refer to Table types and queriesThis is similar to inserting new data. Generate updates to existing trips 
using the data generator, load into a DataFrame 
 and write DataFrame into the hudi table.
 
-val updates = convertToStringList(dataGen.generateUpdates(10))
+// spark-shell
+val updates = convertToStringList(dataGen.generateUpdates(10))
 val df = 
spark.read.json(spark.sparkContext.parallelize(updates, 2df.write.format("hudi").
   options(getQuickstartWriteConfigs).
@@ -459,7 +480,8 @@ denoted by the timestamp. Look for changes in _h
 This can be achieved using Hudi’s incremental querying and providing a begin 
time from which changes need to be streamed. 
 We do not need to specify endTime, if we want all changes after the given 
commit (as is the common case).
 
-// reload data
+// spark-shell
+// reload data
 spark.
   read.
   format("hudi").
@@ -487,7 +509,8 @@ feature is that it now lets you author streaming pipelines 
on batch data.
 Lets look at how to query data as of a specific time. The specific time can 
be represented by pointing endTime to a 
 specific commit time and beginTime to “000” (denoting earliest possible commit 
time).
 
-val beginTime = "000" // Represents all commits > this 
time.
+// spark-shell
+val beginTime = "000" // Represents 
all commits > this time.
 val endTime = commits(commits.length 
- 2) 
// commit time we are interested in
 
 //incrementally query data
@@ -497,13 +520,14 @@ specific commit time and beginTime to “000” (denoting 
earliest possible comm
   option(END_INSTANTTI

[incubator-hudi] branch asf-site updated: [HUDI-783] Add pyspark example in quickstart (#1526)

2020-05-03 Thread lamberken
This is an automated email from the ASF dual-hosted git repository.

lamberken pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 531b42a  [HUDI-783] Add pyspark example in quickstart (#1526)
531b42a is described below

commit 531b42a529b7a2f248c062c1e5b351e79239baca
Author: Edwin Guo 
AuthorDate: Sun May 3 20:18:17 2020 -0400

[HUDI-783] Add pyspark example in quickstart (#1526)

* [HUDI-783] Pyspark Docs - Add Hudi pyspark insert example.

* [HUDI-783] Pyspark Docs - Add Hudi pyspark query and update example.

* [HUDI-783] Pyspark Docs - Add Hudi pyspark incremental example.

* Pyspark Docs - Add Hudi pyspark delete example.

* [HUDI-783] Pyspark Docs - Fix syntax issue

* [HUDI-1526] Address PR comments.

* [HUDI-1526] Pyspark Docs - Fix syntax issue

* [HUDI-783] Pyspark Docs - Update Formatting.

* [HUDI-1526] Reformatted the docs

* [HUDI-1526] Address pr docs comments.

* [HUDI-783] Add pyspark example in quickstart #1526
---
 docs/_docs/1_1_quick_start_guide.md | 236 +++-
 1 file changed, 233 insertions(+), 3 deletions(-)

diff --git a/docs/_docs/1_1_quick_start_guide.md 
b/docs/_docs/1_1_quick_start_guide.md
index 08269ec..3e088dd 100644
--- a/docs/_docs/1_1_quick_start_guide.md
+++ b/docs/_docs/1_1_quick_start_guide.md
@@ -9,13 +9,15 @@ This guide provides a quick peek at Hudi's capabilities using 
spark-shell. Using
 code snippets that allows you to insert and update a Hudi table of default 
table type: 
 [Copy on Write](/docs/concepts.html#copy-on-write-table). 
 After each write operation we will also show how to read the data both 
snapshot and incrementally.
+# Scala example
 
-## Setup spark-shell
+## Setup
 
 Hudi works with Spark-2.x versions. You can follow instructions 
[here](https://spark.apache.org/downloads.html) for setting up spark. 
 From the extracted directory run spark-shell with Hudi as:
 
 ```scala
+// spark-shell
 spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
   --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
 \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
@@ -34,6 +36,7 @@ spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
 Setup table name, base path and a data generator to generate records for this 
guide.
 
 ```scala
+// spark-shell
 import org.apache.hudi.QuickstartUtils._
 import scala.collection.JavaConversions._
 import org.apache.spark.sql.SaveMode._
@@ -56,6 +59,7 @@ can generate sample inserts and updates based on the the 
sample trip schema [her
 Generate some new trips, load them into a DataFrame and write the DataFrame 
into the Hudi table as below.
 
 ```scala
+// spark-shell
 val inserts = convertToStringList(dataGen.generateInserts(10))
 val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
 df.write.format("hudi").
@@ -77,12 +81,13 @@ and for info on ways to ingest data into Hudi, refer to 
[Writing Hudi Tables](/d
 Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue 
 `insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/writing_data#write-operations)
 {: .notice--info}
- 
+
 ## Query data 
 
 Load the data files into a DataFrame.
 
 ```scala
+// spark-shell
 val tripsSnapshotDF = spark.
   read.
   format("hudi").
@@ -104,6 +109,7 @@ This is similar to inserting new data. Generate updates to 
existing trips using
 and write DataFrame into the hudi table.
 
 ```scala
+// spark-shell
 val updates = convertToStringList(dataGen.generateUpdates(10))
 val df = spark.read.json(spark.sparkContext.parallelize(updates, 2))
 df.write.format("hudi").
@@ -128,6 +134,7 @@ This can be achieved using Hudi's incremental querying and 
providing a begin tim
 We do not need to specify endTime, if we want all changes after the given 
commit (as is the common case). 
 
 ```scala
+// spark-shell
 // reload data
 spark.
   read.
@@ -158,6 +165,7 @@ Lets look at how to query data as of a specific time. The 
specific time can be r
 specific commit time and beginTime to "000" (denoting earliest possible commit 
time). 
 
 ```scala
+// spark-shell
 val beginTime = "000" // Represents all commits > this time.
 val endTime = commits(commits.length - 2) // commit time we are interested in
 
@@ -168,13 +176,14 @@ val tripsPointInTimeDF = spark.read.format("hudi").
   option(END_INSTANTTIME_OPT_KEY, endTime).
   load(basePath)
 tripsPointInTimeDF.createOrReplaceTempView("hudi_trips_point_in_time")
-spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  
hudi_trips_point_in_time where fare > 20.0").show()
+spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from 
hudi_trips_point_i

[GitHub] [incubator-hudi] AakashPradeep edited a comment on pull request #1580: [HUDI-852] adding check for table name for Append Save mode

2020-05-03 Thread GitBox


AakashPradeep edited a comment on pull request #1580:
URL: https://github.com/apache/incubator-hudi/pull/1580#issuecomment-623205984


   Now it checks for the existing table name with delete and upsert opertions 
when Append is the SaveMode.
   
   exceptions
   ---
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, "foo_table").
|   mode(Append).
|   save(basePath)
   org.apache.hudi.exception.HoodieException: hoodie table with name 
hudi_trips_cow already exist at file:/tmp/hudi_trips_cow
 at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:89)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
 at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
 ... 68 elided
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(OPERATION_OPT_KEY,"delete").
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, tableName).
|   mode(Append).
|   save(basePath)
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(OPERATION_OPT_KEY,"delete").
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, "foo_table").
|   mode(Append).
|   save(basePath)
   org.apache.hudi.exception.HoodieException: hoodie table with name 
hudi_trips_cow already exist at file:/tmp/hudi_trips_cow
 at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:89)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:12

[GitHub] [incubator-hudi] AakashPradeep commented on pull request #1580: [HUDI-852] adding check for table name for Append Save mode

2020-05-03 Thread GitBox


AakashPradeep commented on pull request #1580:
URL: https://github.com/apache/incubator-hudi/pull/1580#issuecomment-623205984


   Now it checks for the existing table name with delete and upsert opertions 
when Append is the SaveMode.
   
   exceptions
   ---
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, "foo_table").
|   mode(Append).
|   save(basePath)
   org.apache.hudi.exception.HoodieException: hoodie table with name 
hudi_trips_cow already exist at file:/tmp/hudi_trips_cow
 at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:89)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
 at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
 ... 68 elided
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(OPERATION_OPT_KEY,"delete").
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, tableName).
|   mode(Append).
|   save(basePath)
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(OPERATION_OPT_KEY,"delete").
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, "foo_table").
|   mode(Append).
|   save(basePath)
   org.apache.hudi.exception.HoodieException: hoodie table with name 
hudi_trips_cow already exist at file:/tmp/hudi_trips_cow
 at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:89)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)

[GitHub] [incubator-hudi] AakashPradeep edited a comment on pull request #1580: [HUDI-852] adding check for table name for Append Save mode

2020-05-03 Thread GitBox


AakashPradeep edited a comment on pull request #1580:
URL: https://github.com/apache/incubator-hudi/pull/1580#issuecomment-622318079


   I did manual verification with spark-shell. If I try to Append data with 
different table name it throws a HoodieException : 
   
   org.apache.hudi.exception.HoodieException: hoodie table with name 
hudi_trips_cow already exist at file:/tmp/hudi_trips_cow
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, tableName).
|   mode(Overwrite).
|   save(basePath)
   20/05/01 02:23:52 WARN HoodieSparkSqlWriter$: hoodie table at 
file:/tmp/hudi_trips_cow already exists. Deleting existing data & overwriting 
with new data.
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, tableName).
|   mode(Append).
|   save(basePath)
   
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, **"foo_table"**).
|   mode(Append).
|   save(basePath)
   org.apache.hudi.exception.HoodieException: hoodie table with name 
hudi_trips_cow already exist at file:/tmp/hudi_trips_cow
 at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:124)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
 at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
 ... 68 elided
   
   scala>
   
scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   **option(OPERATION_OPT_KEY,"delete").**
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, **"foo_table"**).
|   mode(Append).
|   save(basePath)
   org.apache.hudi.exception.HoodieException: hoodie table with name 
hudi_trips_cow already exist at file:/tmp/hudi_trips_cow
 at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:89)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.

[GitHub] [incubator-hudi] AakashPradeep commented on a change in pull request #1580: adding check for table name for Append Save mode HUDI-852

2020-05-03 Thread GitBox


AakashPradeep commented on a change in pull request #1580:
URL: https://github.com/apache/incubator-hudi/pull/1580#discussion_r419177304



##
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -118,6 +118,12 @@ private[hudi] object HoodieSparkSqlWriter {
 fs.delete(basePath, true)
 exists = false
   }
+  if (exists && mode == SaveMode.Append) {

Review comment:
   Thanks for the comment. I have updated the code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-541) Replace variables/comments named "data files" to "base file"

2020-05-03 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma reassigned HUDI-541:
-

Assignee: Pratyaksh Sharma

> Replace variables/comments named "data files" to "base file"
> 
>
> Key: HUDI-541
> URL: https://issues.apache.org/jira/browse/HUDI-541
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup, newbie
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> Per cWiki design and arch page, we should converge on the same terminology.. 
> We have _HoodieBaseFile_.. we should ensure all variables of this type are 
> named _baseFile_ or _bf_ , as opposed to _dataFile_ or _df_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-541) Replace variables/comments named "data files" to "base file"

2020-05-03 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-541:
--
Status: In Progress  (was: Open)

> Replace variables/comments named "data files" to "base file"
> 
>
> Key: HUDI-541
> URL: https://issues.apache.org/jira/browse/HUDI-541
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup, newbie
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> Per cWiki design and arch page, we should converge on the same terminology.. 
> We have _HoodieBaseFile_.. we should ensure all variables of this type are 
> named _baseFile_ or _bf_ , as opposed to _dataFile_ or _df_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-117) Write Unit-Test Case to test recovery lease handling in HoodieLogFormatWriter

2020-05-03 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-117:
--
Status: Patch Available  (was: In Progress)

> Write Unit-Test Case to test recovery lease handling in HoodieLogFormatWriter
> -
>
> Key: HUDI-117
> URL: https://issues.apache.org/jira/browse/HUDI-117
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Common Core, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current test-case 
> com.uber.hoodie.common.table.log.HoodieLogFormatTest.testLeaseRecovery needs 
> to be rewritten. It currently tests  concurrent append cases. 
> The test-case is commented for now. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-117) Write Unit-Test Case to test recovery lease handling in HoodieLogFormatWriter

2020-05-03 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-117:
--
Status: Closed  (was: Patch Available)

> Write Unit-Test Case to test recovery lease handling in HoodieLogFormatWriter
> -
>
> Key: HUDI-117
> URL: https://issues.apache.org/jira/browse/HUDI-117
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Common Core, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current test-case 
> com.uber.hoodie.common.table.log.HoodieLogFormatTest.testLeaseRecovery needs 
> to be rewritten. It currently tests  concurrent append cases. 
> The test-case is commented for now. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-541) Replace variables/comments named "data files" to "base file"

2020-05-03 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-541:
---

Assignee: (was: Bhavani Sudha)

> Replace variables/comments named "data files" to "base file"
> 
>
> Key: HUDI-541
> URL: https://issues.apache.org/jira/browse/HUDI-541
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup, newbie
>Reporter: Vinoth Chandar
>Priority: Major
>
> Per cWiki design and arch page, we should converge on the same terminology.. 
> We have _HoodieBaseFile_.. we should ensure all variables of this type are 
> named _baseFile_ or _bf_ , as opposed to _dataFile_ or _df_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-541) Replace variables/comments named "data files" to "base file"

2020-05-03 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-541:

Fix Version/s: (was: 0.6.0)

> Replace variables/comments named "data files" to "base file"
> 
>
> Key: HUDI-541
> URL: https://issues.apache.org/jira/browse/HUDI-541
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup, newbie
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha
>Priority: Major
>
> Per cWiki design and arch page, we should converge on the same terminology.. 
> We have _HoodieBaseFile_.. we should ensure all variables of this type are 
> named _baseFile_ or _bf_ , as opposed to _dataFile_ or _df_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] selvarajperiyasamy commented on issue #1583: Small File Issue

2020-05-03 Thread GitBox


selvarajperiyasamy commented on issue #1583:
URL: https://github.com/apache/incubator-hudi/issues/1583#issuecomment-623175739


   Thanks Balaji . That helps ! 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1584: fix schema provider issue

2020-05-03 Thread GitBox


pratyakshsharma commented on a change in pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#discussion_r419151032



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -298,15 +300,15 @@ private void refreshTimeline() throws IOException {
   // default to RowBasedSchemaProvider
   schemaProvider = this.schemaProvider == null || 
this.schemaProvider.getTargetSchema() == null
   ? transformed.map(r -> (SchemaProvider) new 
RowBasedSchemaProvider(r.schema())).orElse(
-  dataAndCheckpoint.getSchemaProvider())
+  schemaProviderFromFetched)
   : this.schemaProvider;
 } else {
   // Pull the data from the source & prepare the write
   InputBatch> dataAndCheckpoint =
   formatAdapter.fetchNewDataInAvroFormat(resumeCheckpointStr, 
cfg.sourceLimit);
   avroRDDOptional = dataAndCheckpoint.getBatch();
   checkpointStr = dataAndCheckpoint.getCheckpointForNextBatch();
-  schemaProvider = dataAndCheckpoint.getSchemaProvider();
+  schemaProvider = avroRDDOptional.isPresent() ? 
dataAndCheckpoint.getSchemaProvider() : null;

Review comment:
   please refer to my other comment on the changes in SourceFormatAdapter. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1584: fix schema provider issue

2020-05-03 Thread GitBox


pratyakshsharma commented on a change in pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#discussion_r419150921



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -276,6 +276,8 @@ private void refreshTimeline() throws IOException {
   // to generic records for writing
   InputBatch> dataAndCheckpoint =
   formatAdapter.fetchNewDataInRowFormat(resumeCheckpointStr, 
cfg.sourceLimit);
+  SchemaProvider schemaProviderFromFetched = 
dataAndCheckpoint.getBatch().isPresent()
+  ? dataAndCheckpoint.getSchemaProvider() : null;

Review comment:
   when fetching in row format, this change should not be needed since 
RowBasedSchemaProvider is already getting initialised at the end. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1584: fix schema provider issue

2020-05-03 Thread GitBox


pratyakshsharma commented on a change in pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#discussion_r419150151



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/SourceFormatAdapter.java
##
@@ -64,19 +64,22 @@ public SourceFormatAdapter(Source source) {
   }
   case ROW: {
 InputBatch> r = ((RowSource) 
source).fetchNext(lastCkptStr, sourceLimit);
-return new InputBatch<>(Option.ofNullable(r.getBatch().map(
-rdd -> (
-(r.getSchemaProvider() instanceof FilebasedSchemaProvider)
-// If the source schema is specified through Avro schema,
-// pass in the schema for the Row-to-Avro conversion
-// to avoid nullability mismatch between Avro schema and 
Row schema
-? AvroConversionUtils.createRdd(
-rdd, r.getSchemaProvider().getSourceSchema(),
-HOODIE_RECORD_STRUCT_NAME, 
HOODIE_RECORD_NAMESPACE).toJavaRDD()
-: AvroConversionUtils.createRdd(
-rdd, HOODIE_RECORD_STRUCT_NAME, 
HOODIE_RECORD_NAMESPACE).toJavaRDD()
-))
-.orElse(null)), r.getCheckpointForNextBatch(), 
r.getSchemaProvider());
+if (r.getBatch().isPresent()) {

Review comment:
   If I understand correctly, not specifying schema provider should be 
feasible in case of row based sources when you try to fetch the data in row 
format (i.e when using transformers).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1584: fix schema provider issue

2020-05-03 Thread GitBox


pratyakshsharma commented on a change in pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#discussion_r419149028



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/SourceFormatAdapter.java
##
@@ -64,19 +64,22 @@ public SourceFormatAdapter(Source source) {
   }
   case ROW: {
 InputBatch> r = ((RowSource) 
source).fetchNext(lastCkptStr, sourceLimit);
-return new InputBatch<>(Option.ofNullable(r.getBatch().map(
-rdd -> (
-(r.getSchemaProvider() instanceof FilebasedSchemaProvider)
-// If the source schema is specified through Avro schema,
-// pass in the schema for the Row-to-Avro conversion
-// to avoid nullability mismatch between Avro schema and 
Row schema
-? AvroConversionUtils.createRdd(
-rdd, r.getSchemaProvider().getSourceSchema(),
-HOODIE_RECORD_STRUCT_NAME, 
HOODIE_RECORD_NAMESPACE).toJavaRDD()
-: AvroConversionUtils.createRdd(
-rdd, HOODIE_RECORD_STRUCT_NAME, 
HOODIE_RECORD_NAMESPACE).toJavaRDD()
-))
-.orElse(null)), r.getCheckpointForNextBatch(), 
r.getSchemaProvider());
+if (r.getBatch().isPresent()) {

Review comment:
   I think since this method tries to fetch data in avro format, pre 
specifying a schema provider is mandatory. So even if you do not get any data, 
you should mention RowBasedSchemaProvider as the schema provider in the very 
beginning. If that is done, there is no need to do this change I believe. :) 
   Do you face issues after pre specifying schema provider? 
   Please let me know your thoughts on this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1584: fix schema provider issue

2020-05-03 Thread GitBox


pratyakshsharma commented on a change in pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#discussion_r419148578



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/SourceFormatAdapter.java
##
@@ -64,19 +64,22 @@ public SourceFormatAdapter(Source source) {
   }
   case ROW: {
 InputBatch> r = ((RowSource) 
source).fetchNext(lastCkptStr, sourceLimit);
-return new InputBatch<>(Option.ofNullable(r.getBatch().map(
-rdd -> (
-(r.getSchemaProvider() instanceof FilebasedSchemaProvider)
-// If the source schema is specified through Avro schema,
-// pass in the schema for the Row-to-Avro conversion
-// to avoid nullability mismatch between Avro schema and 
Row schema
-? AvroConversionUtils.createRdd(
-rdd, r.getSchemaProvider().getSourceSchema(),
-HOODIE_RECORD_STRUCT_NAME, 
HOODIE_RECORD_NAMESPACE).toJavaRDD()
-: AvroConversionUtils.createRdd(
-rdd, HOODIE_RECORD_STRUCT_NAME, 
HOODIE_RECORD_NAMESPACE).toJavaRDD()
-))
-.orElse(null)), r.getCheckpointForNextBatch(), 
r.getSchemaProvider());
+if (r.getBatch().isPresent()) {
+  return new InputBatch<>(r.getBatch().map(
+  rdd -> (
+  (r.getSchemaProvider() instanceof FilebasedSchemaProvider)

Review comment:
   So this is the line where you mentioned the exception is thrown for you. 
Am I correct? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io commented on pull request #1589: [HUDI-813] Migrate hudi-utilities tests to JUnit 5

2020-05-03 Thread GitBox


codecov-io commented on pull request #1589:
URL: https://github.com/apache/incubator-hudi/pull/1589#issuecomment-623151141


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1589?src=pr&el=h1) 
Report
   > Merging 
[#1589](https://codecov.io/gh/apache/incubator-hudi/pull/1589?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1589/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1589?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1589   +/-   ##
   =
 Coverage 71.82%   71.82%   
 Complexity  294  294   
   =
 Files   385  385   
 Lines 1654916549   
 Branches   1661 1661   
   =
 Hits  1188611886   
 Misses 3931 3931   
 Partials732  732   
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1589?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1589?src=pr&el=footer).
 Last update 
[506447f...d71d318](https://codecov.io/gh/apache/incubator-hudi/pull/1589?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1589: [HUDI-813] Migrate hudi-utilities tests to JUnit 5

2020-05-03 Thread GitBox


xushiyan commented on a change in pull request #1589:
URL: https://github.com/apache/incubator-hudi/pull/1589#discussion_r419131896



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHiveIncrementalPuller.java
##
@@ -18,27 +18,26 @@
 
 package org.apache.hudi.utilities;
 
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
 
 public class TestHiveIncrementalPuller {
 
   private HiveIncrementalPuller.Config config;
 
-  @Before
+  @BeforeEach
   public void setup() {
 config = new HiveIncrementalPuller.Config();
   }
 
   @Test
   public void testInitHiveIncrementalPuller() {
 
-try {
+assertDoesNotThrow(() -> {
   new HiveIncrementalPuller(config);
-} catch (Exception e) {
-  Assert.fail("Unexpected exception while initing HiveIncrementalPuller, 
msg: " + e.getMessage());
-}
+}, "Unexpected exception while initing HiveIncrementalPuller.");

Review comment:
   `e.getMessage()` will be included in message printed by 
`assertDoesNotThrow()` hence no need to concat it manually.

##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
##
@@ -413,37 +412,33 @@ public void testKafkaConnectCheckpointProvider() throws 
IOException {
 HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
 
Helpers.saveParquetToDFS(Helpers.toGenericRecords(dataGenerator.generateInserts("000",
 100)), new Path(filePath));
 HoodieDeltaStreamer deltaStreamer = new HoodieDeltaStreamer(cfg, jsc, dfs, 
hdfsTestService.getHadoopConf(), props);
-assertEquals(deltaStreamer.getConfig().checkpoint, "kafka_topic1,0:200");
+assertEquals("kafka_topic1,0:200", deltaStreamer.getConfig().checkpoint);
   }
 
   @Test
   public void testPropsWithInvalidKeyGenerator() throws Exception {
-try {
+Exception e = assertThrows(IOException.class, () -> {
   String tableBasePath = dfsBasePath + "/test_table";
   HoodieDeltaStreamer deltaStreamer =
   new HoodieDeltaStreamer(TestHelpers.makeConfig(tableBasePath, 
Operation.BULK_INSERT,
   
Collections.singletonList(TripsWithDistanceTransformer.class.getName()), 
PROPS_FILENAME_TEST_INVALID, false), jsc);
   deltaStreamer.sync();
-  fail("Should error out when setting the key generator class property to 
an invalid value");
-} catch (IOException e) {
-  // expected
-  LOG.error("Expected error during getting the key generator", e);
-  assertTrue(e.getMessage().contains("Could not load key generator 
class"));
-}
+}, "Should error out when setting the key generator class property to an 
invalid value");
+// expected
+LOG.debug("Expected error during getting the key generator", e);
+assertTrue(e.getMessage().contains("Could not load key generator class"));

Review comment:
   Changing log level from error to debug. It is a good practice to not 
print anything unless test case failed thus we tend to mute the output here 
when it's passing. Same to other similar changes below.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-813) Migrate hudi-utilities tests to JUnit 5

2020-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-813:

Labels: pull-request-available  (was: )

> Migrate hudi-utilities tests to JUnit 5
> ---
>
> Key: HUDI-813
> URL: https://issues.apache.org/jira/browse/HUDI-813
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] xushiyan opened a new pull request #1589: [HUDI-813] Migrate hudi-utilities tests to JUnit 5

2020-05-03 Thread GitBox


xushiyan opened a new pull request #1589:
URL: https://github.com/apache/incubator-hudi/pull/1589


   Migrate the test cases in hudi-utilities to JUnit 5.
   
   Follows #1570 
   
   ### Migration status (after merging)
   
   | Package | JUnit 5 lib | API migration | Restructure packages |
   | --- | --- | --- | --- |
   | `hudi-cli` | ✅ | ✅ | - |
   | `hudi-client` | ✅ | ✅ | - |
   | `hudi-common` | ✅ | 🚧 | 🚧 |
   | `hudi-hadoop-mr` | ✅ | ✅ | - |
   | `hudi-hive-sync` | ✅ | ✅ | - |
   | `hudi-integ-test` | ✅ | ✅  | N.A. |
   | `hudi-spark` | ✅ | ✅ | - |
   | `hudi-timeline-service` | ✅ | ✅ | - |
   | `hudi-utilities` | ✅ | ✅ | - |
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] wanglisheng81 opened a new pull request #1588: [MINOR] fix typo in comparison document

2020-05-03 Thread GitBox


wanglisheng81 opened a new pull request #1588:
URL: https://github.com/apache/incubator-hudi/pull/1588


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *fix typo in comparison document*
   
   ## Brief change log
   
   *fix typo in comparison document*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1584: fix schema provider issue

2020-05-03 Thread GitBox


vinothchandar commented on pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-623133003


   Sure. will do tonight ! cc @pratyakshsharma as well. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] wanglisheng81 opened a new pull request #1587: [MINOR]Fix typo in use case

2020-05-03 Thread GitBox


wanglisheng81 opened a new pull request #1587:
URL: https://github.com/apache/incubator-hudi/pull/1587


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(Fix typo in use-case document)*
   
   ## Brief change log
   
   *Fix typo in use-case document*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] tooptoop4 opened a new issue #1586: [SUPPORT] DMS with 2 key example

2020-05-03 Thread GitBox


tooptoop4 opened a new issue #1586:
URL: https://github.com/apache/incubator-hudi/issues/1586


   would u be able to add an example to 
https://cwiki.apache.org/confluence/display/HUDI/2020/01/20/Change+Capture+Using+AWS+Database+Migration+Service+and+Hudi
 using 2 column key?
   
   can it just be done calling pre-built   'spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer'  without writing 
any class ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-861) Add Github and Twitter Widget on Hudi's official website

2020-05-03 Thread vinoyang (Jira)
vinoyang created HUDI-861:
-

 Summary: Add Github and Twitter Widget on Hudi's official website
 Key: HUDI-861
 URL: https://issues.apache.org/jira/browse/HUDI-861
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: vinoyang


In order to further strengthen the influence of the Hudi community. I suggest 
that we can embed Github and Twitter widgets on Hudi's official website as 
Apahce ignite does. [https://ignite.apache.org/]
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] EdwinGuo commented on a change in pull request #1526: [HUDI-783] Add pyspark example in quickstart

2020-05-03 Thread GitBox


EdwinGuo commented on a change in pull request #1526:
URL: https://github.com/apache/incubator-hudi/pull/1526#discussion_r419103929



##
File path: docs/_docs/1_1_quick_start_guide.md
##
@@ -204,6 +213,224 @@ spark.sql("select uuid, partitionPath from 
hudi_trips_snapshot").count()
 ```
 Note: Only `Append` mode is supported for delete operation.
 
+# Pyspark example
+## Setup
+
+Hudi works with Spark-2.x versions. You can follow instructions 
[here](https://spark.apache.org/downloads.html) for setting up spark. 
+From the extracted directory run spark-shell with Hudi as:
+
+```python
+# pyspark
+export PYSPARK_PYTHON=$(which python3)
+spark-2.4.4-bin-hadoop2.7/bin/pyspark \
+  --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
 \
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+```
+
+
+  Please note the following: 
+
+  spark-avro module needs to be specified in --packages as it is not 
included with spark-shell by default
+  spark-avro and spark versions must match (we have used 2.4.4 for both 
above)
+  we have used hudi-spark-bundle built for scala 2.11 since the spark-avro 
module used also depends on 2.11. 
+ If spark-avro_2.12 is used, correspondingly hudi-spark-bundle_2.12 
needs to be used. 
+
+
+
+Setup table name, base path and a data generator to generate records for this 
guide.
+
+```python
+# pyspark
+tableName = "hudi_trips_cow"
+basePath = "file:///tmp/hudi_trips_cow"
+dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
+```
+
+The 
[DataGenerator](https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L50)
 
+can generate sample inserts and updates based on the the sample trip schema 
[here](https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L57)
+{: .notice--info}
+
+
+## Insert data
+
+Generate some new trips, load them into a DataFrame and write the DataFrame 
into the Hudi table as below.
+
+```python
+# pyspark
+inserts = 
sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(dataGen.generateInserts(10))
+df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+
+hudi_options = {
+  'hoodie.table.name': tableName,
+  'hoodie.datasource.write.recordkey.field': 'uuid',
+  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
+  'hoodie.datasource.write.table.name': tableName,
+  'hoodie.datasource.write.operation': 'insert',
+  'hoodie.datasource.write.precombine.field': 'ts',
+  'hoodie.upsert.shuffle.parallelism': 2, 
+  'hoodie.insert.shuffle.parallelism': 2
+}
+
+df.write.format("hudi"). \
+  options(**hudi_options). \
+  mode("overwrite"). \
+  save(basePath)
+```
+
+`mode(Overwrite)` overwrites and recreates the table if it already exists.
+You can check the data generated under 
`/tmp/hudi_trips_cow`. We provided a record key 
+(`uuid` in 
[schema](https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58)),
 partition field (`region/county/city`) and combine logic (`ts` in 
+[schema](https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58))
 to ensure trip records are unique within each partition. For more info, refer 
to 
+[Modeling data stored in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi)
+and for info on ways to ingest data into Hudi, refer to [Writing Hudi 
Tables](/docs/writing_data.html).
+Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue 
+`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/writing_data#write-operations)
+{: .notice--info}
+
+## Query data 
+
+Load the data files into a DataFrame.
+
+```python
+# pyspark
+tripsSnapshotDF = spark. \
+  read. \
+  format("hudi"). \
+  load(basePath + "/*/*/*/*")
+
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
+spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_trips_snapshot 
where fare > 20.0").show()
+spark.sql("select _hoodie_commit_time, _hoodie_record_key, 
_hoodie_partition_path, rider, driver, fare from  hudi_trips_snapshot").show()
+```
+
+This query provides snapshot querying of the ingested data. Since our 
partition path (`region/country/city`) is 3 levels nested 
+from base path we ve used `load(basePath + "/*/*/*/*")`. 
+Refer to [Table types and queries](/docs/concepts#table-types--queries) for 
more info on all table types and query types supported.
+{: .notice--info}
+
+## Update data
+
+This is similar to inserting new data. Generate updates to existing trips 
using the data generator, load into a DataFrame 
+and write DataFrame into the hudi table.
+
+```python
+# pyspark
+updates = 
sc._j

[jira] [Closed] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode

2020-05-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-850.
--

> Avoid unnecessary listings in incremental cleaning mode
> ---
>
> Key: HUDI-850
> URL: https://issues.apache.org/jira/browse/HUDI-850
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner, Performance
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Came up during https://github.com/apache/incubator-hudi/issues/1552 
> Even with incremental cleaning turned on, we would have a scenario where 
> there are no commits yet to clean, but we end up listing needlessly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode

2020-05-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-850.

Resolution: Fixed

> Avoid unnecessary listings in incremental cleaning mode
> ---
>
> Key: HUDI-850
> URL: https://issues.apache.org/jira/browse/HUDI-850
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner, Performance
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Came up during https://github.com/apache/incubator-hudi/issues/1552 
> Even with incremental cleaning turned on, we would have a scenario where 
> there are no commits yet to clean, but we end up listing needlessly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode

2020-05-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-850:
---
Status: Open  (was: New)

> Avoid unnecessary listings in incremental cleaning mode
> ---
>
> Key: HUDI-850
> URL: https://issues.apache.org/jira/browse/HUDI-850
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner, Performance
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Came up during https://github.com/apache/incubator-hudi/issues/1552 
> Even with incremental cleaning turned on, we would have a scenario where 
> there are no commits yet to clean, but we end up listing needlessly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-05-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-819.

Fix Version/s: 0.6.0
   Resolution: Fixed

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-05-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-819.
--

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-05-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-819:
---
Status: Open  (was: New)

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-03 Thread GitBox


lamber-ken edited a comment on pull request #1402:
URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-623052515


   hi @nsivabalan, go ahead 
[HUDI-622](https://github.com/apache/incubator-hudi/pull/1343)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org