[jira] [Comment Edited] (HUDI-1214) Need ability to set deltastreamer checkpoints when doing Spark datasource writes

2020-08-23 Thread Trevorzhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182960#comment-17182960
 ] 

Trevorzhang edited comment on HUDI-1214 at 8/24/20, 5:52 AM:
-

hi,[~vbalaji], I want to claim this jiar , if no one does it.


was (Author: trevorzhang):
hi,Balaji Varadarajan, I want to claim this jiar , if no one does it.

> Need ability to set deltastreamer checkpoints when doing Spark datasource 
> writes
> 
>
> Key: HUDI-1214
> URL: https://issues.apache.org/jira/browse/HUDI-1214
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.1
>
>
> Such support is needed  for bootstrapping cases when users use spark write to 
> do initial bootstrap and then subsequently use deltastreamer.
> DeltaStreamer manages checkpoints inside hoodie commit files and expects 
> checkpoints in previously committed metadata. Users are expected to pass 
> checkpoint or initial checkpoint provider when performing bootstrap through 
> deltastreamer. Such support is not present when doing bootstrap using Spark 
> Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1201) HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset when commit files do not have checkpoint

2020-08-23 Thread Trevorzhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182959#comment-17182959
 ] 

Trevorzhang commented on HUDI-1201:
---

hi,[~vbalaji], I want to claim this jiar , if no one does it.


> HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset 
> when commit files do not have checkpoint
> -
>
> Key: HUDI-1201
> URL: https://issues.apache.org/jira/browse/HUDI-1201
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.1
>
>
> [https://github.com/apache/hudi/issues/1985]
>  
> It would be easier for user to just specify deltastreamer to read from 
> earliest offset instead  of implementing -initial-checkpoint-provider or 
> passing raw kafka checkpoints when the table was initially bootstrapped 
> through spark.write().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1214) Need ability to set deltastreamer checkpoints when doing Spark datasource writes

2020-08-23 Thread Trevorzhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182960#comment-17182960
 ] 

Trevorzhang commented on HUDI-1214:
---

hi,Balaji Varadarajan, I want to claim this jiar , if no one does it.

> Need ability to set deltastreamer checkpoints when doing Spark datasource 
> writes
> 
>
> Key: HUDI-1214
> URL: https://issues.apache.org/jira/browse/HUDI-1214
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.1
>
>
> Such support is needed  for bootstrapping cases when users use spark write to 
> do initial bootstrap and then subsequently use deltastreamer.
> DeltaStreamer manages checkpoints inside hoodie commit files and expects 
> checkpoints in previously committed metadata. Users are expected to pass 
> checkpoint or initial checkpoint provider when performing bootstrap through 
> deltastreamer. Such support is not present when doing bootstrap using Spark 
> Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] bhasudha opened a new pull request #2016: [WIP] Add release page doc for 0.6.0

2020-08-23 Thread GitBox


bhasudha opened a new pull request #2016:
URL: https://github.com/apache/hudi/pull/2016


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2020-08-23 Thread GitBox


garyli1019 commented on issue #2013:
URL: https://github.com/apache/hudi/issues/2013#issuecomment-678881198


   @rubenssoto Hello, the incremental pulling for MOR table is currently under 
review and will be available in the 0.6.1 release, which will be shortly after 
the 0.6.0 release.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sreeram26 commented on pull request #2014: [HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured

2020-08-23 Thread GitBox


sreeram26 commented on pull request #2014:
URL: https://github.com/apache/hudi/pull/2014#issuecomment-678878193


   @bvaradar 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Trevor-zhang commented on pull request #2015: [HUDI-1103]Fix Delete data demo in Quick-Start Guide

2020-08-23 Thread GitBox


Trevor-zhang commented on pull request #2015:
URL: https://github.com/apache/hudi/pull/2015#issuecomment-678877259


   @nsivabalan can u take a look when free?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1103) Improve the code format of Delete data demo in Quick-Start Guide

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1103:
-
Labels: pull-request-available  (was: )

> Improve the code format of Delete data demo in Quick-Start Guide
> 
>
> Key: HUDI-1103
> URL: https://issues.apache.org/jira/browse/HUDI-1103
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: wangxianghu
>Assignee: Trevorzhang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> {color}Currently, the delete data demo code is not runnable in spark-shell 
> {code:java}
> scala> val df = spark
> df: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
> :1: error: illegal start of definition
>   .read
>   ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
> :1: error: illegal start of definition
>   .json(spark.sparkContext.parallelize(deletes, 2))
>   ^
> {code}
> This dot symbol should be  at the end of the line or put a "\" at the end
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] Trevor-zhang opened a new pull request #2015: [HUDI-1103]Fix Delete data demo in Quick-Start Guide

2020-08-23 Thread GitBox


Trevor-zhang opened a new pull request #2015:
URL: https://github.com/apache/hudi/pull/2015


   Fix Delete data demo in Quick-Start Guide
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sreeram26 commented on a change in pull request #2014: [HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured

2020-08-23 Thread GitBox


sreeram26 commented on a change in pull request #2014:
URL: https://github.com/apache/hudi/pull/2014#discussion_r475306551



##
File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
##
@@ -248,15 +249,18 @@ public static HoodieWriteClient 
createHoodieClient(JavaSparkContext jssc, String
   }
 
   public static JavaRDD doWriteOperation(HoodieWriteClient 
client, JavaRDD hoodieRecords,
-  String instantTime, String operation) throws HoodieException {
-if 
(operation.equals(DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL())) {
+  String instantTime, WriteOperationType operation) throws HoodieException 
{
+if (operation == WriteOperationType.BULK_INSERT) {
   Option userDefinedBulkInsertPartitioner =
   createUserDefinedBulkInsertPartitioner(client.getConfig());
   return client.bulkInsert(hoodieRecords, instantTime, 
userDefinedBulkInsertPartitioner);
-} else if 
(operation.equals(DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL())) {
+} else if (operation == WriteOperationType.INSERT) {
   return client.insert(hoodieRecords, instantTime);
 } else {
   // default is upsert
+  if (operation != WriteOperationType.UPSERT) {

Review comment:
   Not throwing an explicit error here, the only other value it can 
potentially have is Bootstrap based on the enum, the issue which exposed this 
issue would have thrown an exception on WriteOperationType.fromValue itself.
   
   Can change to throw a HoodieException, if the reviewer feels that is 
necessary





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sreeram26 commented on a change in pull request #2014: [HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured

2020-08-23 Thread GitBox


sreeram26 commented on a change in pull request #2014:
URL: https://github.com/apache/hudi/pull/2014#discussion_r475306551



##
File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
##
@@ -248,15 +249,18 @@ public static HoodieWriteClient 
createHoodieClient(JavaSparkContext jssc, String
   }
 
   public static JavaRDD doWriteOperation(HoodieWriteClient 
client, JavaRDD hoodieRecords,
-  String instantTime, String operation) throws HoodieException {
-if 
(operation.equals(DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL())) {
+  String instantTime, WriteOperationType operation) throws HoodieException 
{
+if (operation == WriteOperationType.BULK_INSERT) {
   Option userDefinedBulkInsertPartitioner =
   createUserDefinedBulkInsertPartitioner(client.getConfig());
   return client.bulkInsert(hoodieRecords, instantTime, 
userDefinedBulkInsertPartitioner);
-} else if 
(operation.equals(DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL())) {
+} else if (operation == WriteOperationType.INSERT) {
   return client.insert(hoodieRecords, instantTime);
 } else {
   // default is upsert
+  if (operation != WriteOperationType.UPSERT) {

Review comment:
   Not throwing an explicit error here, the only other value it can 
potentially have is Bootstrap based on the enum, the issue which exposed this 
issue would have thrown an exception on WriteOperationType.fromValue itself





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1153) Spark DataSource and Streaming Write must fail when operation type is misconfigured

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1153:
-
Labels: pull-request-available  (was: )

> Spark DataSource and Streaming Write must fail when operation type is 
> misconfigured
> ---
>
> Key: HUDI-1153
> URL: https://issues.apache.org/jira/browse/HUDI-1153
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Sreeram Ramji
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> Context: [https://github.com/apache/hudi/issues/1902#issuecomment-669698259]
>  
> If you look at DataSourceUtils.java, 
> [https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L257]
>  
> we are string comparison to determine operation type which is a bad idea and 
> a typo could result in "upsert" being used silently. 
>  
> Just like 
> [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L187]
>  being used for DeltaStreamer, we need similar enums defined in 
> DataSourceOptions.scala for OPERATION_OPT_KEY but care must be taken to 
> ensure we do not cause backwards compatibility issue by changing the property 
> value. In other words, we need to retain the lower case values 
> ("bulk_insert", "insert" and "upsert") but make it an enum. 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] sreeram26 opened a new pull request #2014: [HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured

2020-08-23 Thread GitBox


sreeram26 opened a new pull request #2014:
URL: https://github.com/apache/hudi/pull/2014


   
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Currently for Spark Streaming Write operation is being manually string 
compared on usage in most of the code, we also silently swallow illegal 
operation types by defaulting to upsert. This addresses these issues
   
   ## Brief change log
   
   - [HUDI-1153] Spark DataSource and Streaming Write must fail when operation 
type is misconfigured
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   TestDataSourceUtils
   * testDoWriteOperationWithoutUserDefinedBulkInsertPartitioner
   * testDoWriteOperationWithNonExistUserDefinedBulkInsertPartitioner
   * testDoWriteOperationWithUserDefinedBulkInsertPartitioner
   If all existing tests pass. This should be good to review
   
- [ ] Existing tests pass 
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [x] Necessary doc changes done or have another open PR - None
  
- [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA- Not a large task



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-23 Thread GitBox


yanghua commented on a change in pull request #1901:
URL: https://github.com/apache/hudi/pull/1901#discussion_r475291250



##
File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
##
@@ -50,6 +50,9 @@
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertNotEquals;
 
+/**
+ * Base test class for IT Test. helps to run cmd and generate data.

Review comment:
   `Base test class for IT Test helps to run command and generate data.`?

##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/TestDFSHoodieTestSuiteWriterAdapter.java
##
@@ -52,6 +52,9 @@
 import org.junit.jupiter.api.Test;
 import org.mockito.Mockito;
 
+/**
+ * {@link HoodieTestSuiteWriter}. Helps to test writing a DFS file.

Review comment:
   `Helps`? This usage may not be correct?

##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/utils/TestUtils.java
##
@@ -45,6 +48,15 @@
 return dataGenerator.generateGenericRecords(numRecords);
   }
 
+  /**
+   * Method help to create avro files and save it to file.

Review comment:
   `Methods`?

##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/configuration/TestWorkflowBuilder.java
##
@@ -30,41 +30,58 @@
 import org.apache.hudi.integ.testsuite.dag.WorkflowDag;
 import org.junit.jupiter.api.Test;
 
+/**
+ * Unit test for the build process of {@link DagNode} and {@link WorkflowDag}.
+ */
 public class TestWorkflowBuilder {
 
   @Test
   public void testWorkloadOperationSequenceBuilder() {

Review comment:
   please remove all the comments of this method

##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/job/TestHoodieTestSuiteJob.java
##
@@ -49,6 +49,9 @@
 import org.junit.jupiter.params.provider.Arguments;
 import org.junit.jupiter.params.provider.MethodSource;
 
+/**
+ * Unit tests against {@link HoodieTestSuiteJob}.

Review comment:
   `Unit test`?

##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java
##
@@ -49,11 +53,16 @@ public void teardown() {
 jsc.stop();
   }
 
+  /**
+   * Test {@link UpdateConverter} by generates random updates from existing 
records.
+   */
   @Test
   public void testGenerateUpdateRecordsFromInputRecords() throws Exception {
+// 1. prepare input record

Review comment:
   `record` -> `records`

##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/utils/TestUtils.java
##
@@ -45,6 +48,15 @@
 return dataGenerator.generateGenericRecords(numRecords);
   }
 
+  /**
+   * Method help to create avro files and save it to file.
+   *
+   * @param jsc   {@link JavaSparkContext}.

Review comment:
   We should not only use `{@link }` in the comment, add more description.

##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/dag/ComplexDagGenerator.java
##
@@ -46,6 +51,7 @@ public WorkflowDag build() {
 .withNumInsertPartitions(1)
 .withRecordSize(1).build());
 
+// function to build ValidateNode with

Review comment:
   with what?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1103) Improve the code format of Delete data demo in Quick-Start Guide

2020-08-23 Thread Trevorzhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevorzhang updated HUDI-1103:
--
Description: 
{color}Currently, the delete data demo code is not runnable in spark-shell 
{code:java}
scala> val df = spark
df: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
:1: error: illegal start of definition
  .read
  ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
:1: error: illegal start of definition
  .json(spark.sparkContext.parallelize(deletes, 2))
  ^
{code}
This dot symbol should be  at the end of the line or put a "\" at the end

 

  was:
{color:red}着色文本{color}Currently, the delete data demo code is not runnable in 
spark-shell 
{code:java}
scala> val df = spark
df: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
:1: error: illegal start of definition
  .read
  ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
:1: error: illegal start of definition
  .json(spark.sparkContext.parallelize(deletes, 2))
  ^
{code}
This dot symbol should be  at the end of the line or put a "\" at the end

 


> Improve the code format of Delete data demo in Quick-Start Guide
> 
>
> Key: HUDI-1103
> URL: https://issues.apache.org/jira/browse/HUDI-1103
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: wangxianghu
>Assignee: Trevorzhang
>Priority: Minor
> Fix For: 0.6.1
>
>
> {color}Currently, the delete data demo code is not runnable in spark-shell 
> {code:java}
> scala> val df = spark
> df: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
> :1: error: illegal start of definition
>   .read
>   ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
> :1: error: illegal start of definition
>   .json(spark.sparkContext.parallelize(deletes, 2))
>   ^
> {code}
> This dot symbol should be  at the end of the line or put a "\" at the end
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1103) Improve the code format of Delete data demo in Quick-Start Guide

2020-08-23 Thread Trevorzhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevorzhang updated HUDI-1103:
--
Description: 
{color:red}着色文本{color}Currently, the delete data demo code is not runnable in 
spark-shell 
{code:java}
scala> val df = spark
df: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
:1: error: illegal start of definition
  .read
  ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
:1: error: illegal start of definition
  .json(spark.sparkContext.parallelize(deletes, 2))
  ^
{code}
This dot symbol should be  at the end of the line or put a "\" at the end

 

  was:
Currently, the delete data demo code is not runnable in spark-shell 
{code:java}
scala> val df = spark
df: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
:1: error: illegal start of definition
  .read
  ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
:1: error: illegal start of definition
  .json(spark.sparkContext.parallelize(deletes, 2))
  ^
{code}
This dot symbol should be  at the end of the line or put a "\" at the end

 


> Improve the code format of Delete data demo in Quick-Start Guide
> 
>
> Key: HUDI-1103
> URL: https://issues.apache.org/jira/browse/HUDI-1103
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: wangxianghu
>Assignee: Trevorzhang
>Priority: Minor
> Fix For: 0.6.1
>
>
> {color:red}着色文本{color}Currently, the delete data demo code is not runnable in 
> spark-shell 
> {code:java}
> scala> val df = spark
> df: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
> :1: error: illegal start of definition
>   .read
>   ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
> :1: error: illegal start of definition
>   .json(spark.sparkContext.parallelize(deletes, 2))
>   ^
> {code}
> This dot symbol should be  at the end of the line or put a "\" at the end
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] cdmikechen edited a comment on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-23 Thread GitBox


cdmikechen edited a comment on issue #2005:
URL: https://github.com/apache/hudi/issues/2005#issuecomment-678860501


   > @cdmikechen : Also, if you look at integration tests ITTestHoodieDemo, we 
cover the tests with hive syncing and this test has been passing for us. Can 
you take a look at the tests to see what the difference is ?
   
   @bvaradar I checked `hudi-integ-test` package and found the reason:
   In `hudi-integ-test` pom.xml where contains `ITTestHoodieDemo`, hudi 
contains `hudi-exec-2.3.1` In pom dependencies. So that if we new a 
`MapredParquetInputFormat` class, hudi will use this class by `hudi-exec-2.3.1`.
   ```java 
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.io.NullWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.InputSplit;
   import org.apache.hadoop.mapred.JobConf;
   import org.apache.hadoop.mapred.RecordReader;
   import org.apache.hadoop.mapred.Reporter;
   import org.slf4j.Logger;
   import org.slf4j.LoggerFactory;
   
   import org.apache.parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat implements VectorizedInputFormatInterface {
   ```
   But if we just use a standalone spark environmental without hive-2.3.1 
dependencies (like starting a new project and only depend spark lib), hudi will 
use `hive-exec-1.2.1-spark2`.
   ```java
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.commons.logging.Log;
   import org.apache.commons.logging.LogFactory;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.RecordReader;
   
   import parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat {
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] cdmikechen edited a comment on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-23 Thread GitBox


cdmikechen edited a comment on issue #2005:
URL: https://github.com/apache/hudi/issues/2005#issuecomment-678860501


   > @cdmikechen : Also, if you look at integration tests ITTestHoodieDemo, we 
cover the tests with hive syncing and this test has been passing for us. Can 
you take a look at the tests to see what the difference is ?
   
   @bvaradar I checked `hudi-integ-test` package and found the reason:
   In `hudi-integ-test` pom.xml where contains `ITTestHoodieDemo`, hudi 
contains `hudi-exec-2.3.1` In pom dependencies. So that if we new a 
`MapredParquetInputFormat` class, hudi will use this class by `hudi-exec-2.3.1`.
   ```java 
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.io.NullWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.InputSplit;
   import org.apache.hadoop.mapred.JobConf;
   import org.apache.hadoop.mapred.RecordReader;
   import org.apache.hadoop.mapred.Reporter;
   import org.slf4j.Logger;
   import org.slf4j.LoggerFactory;
   
   import org.apache.parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat implements VectorizedInputFormatInterface {
   ```
   But if we just use a standalone spark environmental without hive-2.3.1 
dependencies (like starting a new project and only depend spark lib), hudi will 
use `hive-exec-1.2.1-spark`.
   ```java
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.commons.logging.Log;
   import org.apache.commons.logging.LogFactory;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.RecordReader;
   
   import parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat {
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] cdmikechen edited a comment on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-23 Thread GitBox


cdmikechen edited a comment on issue #2005:
URL: https://github.com/apache/hudi/issues/2005#issuecomment-678860501


   > @cdmikechen : Also, if you look at integration tests ITTestHoodieDemo, we 
cover the tests with hive syncing and this test has been passing for us. Can 
you take a look at the tests to see what the difference is ?
   
   @bvaradar I checked `hudi-integ-test` package and found the reason:
   In `hudi-integ-test` pom.xml where contains `ITTestHoodieDemo`, hudi 
contains `hudi-exec-2.3.1` In pom dependencies. So that if we new a 
`MapredParquetInputFormat` class, hudi will use this class by `hudi-exec-2.3.1`.
   ```java 
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.io.NullWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.InputSplit;
   import org.apache.hadoop.mapred.JobConf;
   import org.apache.hadoop.mapred.RecordReader;
   import org.apache.hadoop.mapred.Reporter;
   import org.slf4j.Logger;
   import org.slf4j.LoggerFactory;
   
   import org.apache.parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat implements VectorizedInputFormatInterface {
   ```
   But if we just use a standalone spark environmental without a hive-2.3.1 
dependencies (like starting a new project and only depend spark lib), hudi will 
use `hive-exec-1.2.1-spark`.
   ```java
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.commons.logging.Log;
   import org.apache.commons.logging.LogFactory;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.RecordReader;
   
   import parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat {
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] cdmikechen commented on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-23 Thread GitBox


cdmikechen commented on issue #2005:
URL: https://github.com/apache/hudi/issues/2005#issuecomment-678860501


   > @cdmikechen : Also, if you look at integration tests ITTestHoodieDemo, we 
cover the tests with hive syncing and this test has been passing for us. Can 
you take a look at the tests to see what the difference is ?
   
   @bvaradar I checked `hudi-integ-test` package and found the reason:
   In `hudi-integ-test` pom.xml where contains `ITTestHoodieDemo`, hudi 
contains `hudi-exec-2.3.1` In this dependency. So that if we new a 
`MapredParquetInputFormat` class, hudi will use this class by `hudi-exec-2.3.1`.
   ```java 
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.io.NullWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.InputSplit;
   import org.apache.hadoop.mapred.JobConf;
   import org.apache.hadoop.mapred.RecordReader;
   import org.apache.hadoop.mapred.Reporter;
   import org.slf4j.Logger;
   import org.slf4j.LoggerFactory;
   
   import org.apache.parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat implements VectorizedInputFormatInterface {
   ```
   But if we just use a standalone spark environmental without a hive-2.3.1 
dependencies (like starting a new project and only depend spark lib), hudi will 
use `hive-exec-1.2.1-spark`.
   ```java
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.commons.logging.Log;
   import org.apache.commons.logging.LogFactory;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.RecordReader;
   
   import parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat {
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on a change in pull request #1996: [BLOG] Async Compaction and Efficient Migration of large Parquet tables

2020-08-23 Thread GitBox


bvaradar commented on a change in pull request #1996:
URL: https://github.com/apache/hudi/pull/1996#discussion_r475285098



##
File path: docs/_posts/2020-08-21-async-compaction-deployment-model.md
##
@@ -0,0 +1,99 @@
+---
+title: "Async Compaction Deployment Models"
+excerpt: "Mechanisms for executing compaction jobs in Hudi asynchronously"
+author: vbalaji
+category: blog
+---
+
+We will look at different deployment models for executing compactions 
asynchronously.
+
+# Compaction
+
+For Merge-On-Read table, data is stored using a combination of columnar (e.g 
parquet) + row based (e.g avro) file formats. 
+Updates are logged to delta files & later compacted to produce new versions of 
columnar files synchronously or 
+asynchronously. One of th main motivations behind Merge-On-Read is to reduce 
data latency when ingesting records.
+Hence, it makes sense to run compaction asynchronously without blocking 
ingestion.
+
+
+# Async Compaction
+
+Async Compaction is performed in 2 steps:
+
+1. ***Compaction Scheduling***: This is done by the ingestion job. In this 
step, Hudi scans the partitions and selects **file 
+slices** to be compacted. A compaction plan is finally written to Hudi 
timeline.
+1. ***Compaction Execution***: A separate process reads the compaction plan 
and performs compaction of file slices.
+
+  
+# Deployment Models
+
+There are few ways by which we can execute compactions asynchronously. 
+
+## Spark Structured Streaming
+
+With 0.6.0, we now have support for running async compactions in Spark 
+Structured Streaming jobs. Compactions are scheduled and executed 
asynchronously inside the 
+streaming job.  Async Compactions are enabled by default for structured 
streaming jobs
+on Merge-On-Read table.
+
+Here is an example snippet in java
+
+```properties
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.HoodieDataSourceHelpers;
+import org.apache.hudi.config.HoodieCompactionConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.spark.sql.streaming.OutputMode;
+import org.apache.spark.sql.streaming.ProcessingTime;
+
+
+ DataStreamWriter writer = 
streamingInput.writeStream().format("org.apache.hudi")
+.option(DataSourceWriteOptions.OPERATION_OPT_KEY(), operationType)
+.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(), tableType)
+.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
+.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
"partition")
+.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
+.option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP, 
"10")
+.option(DataSourceWriteOptions.ASYNC_COMPACT_ENABLE_OPT_KEY(), "true")
+.option(HoodieWriteConfig.TABLE_NAME, 
tableName).option("checkpointLocation", checkpointLocation)
+.outputMode(OutputMode.Append());
+ writer.trigger(new ProcessingTime(3)).start(tablePath);
+```
+
+## DeltaStreaminer Continuous Mode

Review comment:
   Fixed. Thanks,





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2020-08-23 Thread GitBox


bvaradar commented on issue #2013:
URL: https://github.com/apache/hudi/issues/2013#issuecomment-678842778


   @garyli1019 : I would let you answer this question. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-23 Thread GitBox


rubenssoto commented on issue #1981:
URL: https://github.com/apache/hudi/issues/1981#issuecomment-678824833


   @umehrot2 @vinothchandar 
   Path Filter improvements, could be achieved updating some Hudi Lib in 
presto? Because emr presto is 0.232, and these improvements were made in 0.233.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto opened a new issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2020-08-23 Thread GitBox


rubenssoto opened a new issue #2013:
URL: https://github.com/apache/hudi/issues/2013


   Hi Guys,
   
   I have a table could have updated at any point in time, so I would try MoR 
tables, this table would be a source for my Redshift DW, so I need a method to 
pull this data incrementally.
   
   I saw that Spark Datasource only query MoR tables in batch, so, would be 
good full support of Hudi on spark datasources and full support of hudi in a 
spark structure streaming source.
   
   I found some Jira tickets with this topic.
   
   
https://issues.apache.org/jira/projects/HUDI/issues/HUDI-920?filter=allopenissues
   
   
https://issues.apache.org/jira/projects/HUDI/issues/HUDI-1109?filter=allopenissues



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sathyaprakashg commented on pull request #2012: HUDI-1129 Deltastreamer Add support for schema evaluation

2020-08-23 Thread GitBox


sathyaprakashg commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-678806335


   @bvaradar @sbernauer 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sathyaprakashg opened a new pull request #2012: HUDI-1129 Deltastreamer Add support for schema evaluation

2020-08-23 Thread GitBox


sathyaprakashg opened a new pull request #2012:
URL: https://github.com/apache/hudi/pull/2012


   ## What is the purpose of the pull request
   
   When schema is evolved but producer is still producing events using older 
version of schema, Hudi delta streamer is failing. This fix is to make sure 
delta streamer works fine with schema evoluation.
   
   Related issues #1845 #1971 #1972 
   
   ## Brief change log
   
 - Update avro to spark conversion method 
`AvroConversionHelper.createConverterToRow` to handle scenario when provided 
schema has more fields than data (scenario where producer is still sending 
events with old schema)
-  Introduce new schema provider class called `SchemaBasedSchemaProvider`. 
This is used to set schema based on schema of the data. Currently, 
`HoodieAvroUtils.avroToBytes` uses the schema of the data to convert to bytes, 
but `HoodieAvroUtils.bytesToAvro` uses provided schema. Since both may not 
match always, it results in error. By using data's schema using new schema 
provider, we can ensure, same schema is used for converting avro to bytes and 
bytes back to avro.
   
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
 - *Added unit test to verify schema evoluation* Thanks @sbernauer for unit 
test
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [x] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1103) Improve the code format of Delete data demo in Quick-Start Guide

2020-08-23 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu updated HUDI-1103:
--
Parent: HUDI-1215
Issue Type: Sub-task  (was: Task)

> Improve the code format of Delete data demo in Quick-Start Guide
> 
>
> Key: HUDI-1103
> URL: https://issues.apache.org/jira/browse/HUDI-1103
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: wangxianghu
>Assignee: Trevorzhang
>Priority: Minor
> Fix For: 0.6.1
>
>
> Currently, the delete data demo code is not runnable in spark-shell 
> {code:java}
> scala> val df = spark
> df: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.SparkSession@74e7d97bscala>   .read
> :1: error: illegal start of definition
>   .read
>   ^scala>   .json(spark.sparkContext.parallelize(deletes, 2))
> :1: error: illegal start of definition
>   .json(spark.sparkContext.parallelize(deletes, 2))
>   ^
> {code}
> This dot symbol should be  at the end of the line or put a "\" at the end
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1150) Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator

2020-08-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1150.
--
Resolution: Fixed

Fixed via master branch: 35b21855da209c812e006c1afff3d940d5ac2a18

> Fix unable to parse input partition field :1 exception when using 
> TimestampBasedKeyGenerator 
> -
>
> Key: HUDI-1150
> URL: https://issues.apache.org/jira/browse/HUDI-1150
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> scene to reproduce:
>  # use TimestampBasedKeyGenerator
>  # set 
> {color:#33}hoodie.deltastreamer.keygen.timebased.timestamp.type{color} = 
> DATE_STRING
>  # partitionpath field value is null
> when partitionpath field value is null TimestampBasedKeyGenerator will set it 
> to1L, which can not be parsed correctly.
>  
> {code:java}
> //
> User class threw exception: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4, prod-t3-data-lake-007, executor 6): 
> org.apache.hudi.exception.HoodieDeltaStreamerException: Unable to parse input 
> partition field :1
>  at 
> org.apache.hudi.keygen.TimestampBasedKeyGenerator.getPartitionPath(TimestampBasedKeyGenerator.java:156)
>  at 
> org.apache.hudi.keygen.CustomKeyGenerator.getPartitionPath(CustomKeyGenerator.java:108)
>  at 
> org.apache.hudi.keygen.CustomKeyGenerator.getKey(CustomKeyGenerator.java:78)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$9fce03f0$1(DeltaSync.java:343)
>  at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>  at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>  at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>  at scala.collection.AbstractIterator.to(Iterator.scala:1334)
>  at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>  at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
>  at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>  at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
>  at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
>  at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
>  at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
>  at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>  at org.apache.spark.scheduler.Task.run(Task.scala:121)
>  at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.lang.RuntimeException: 
> hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit is not 
> specified but scalar it supplied as time value
>  at 
> org.apache.hudi.keygen.TimestampBasedKeyGenerator.convertLongTimeToMillis(TimestampBasedKeyGenerator.java:163)
>  at 
> org.apache.hudi.keygen.TimestampBasedKeyGenerator.getPartitionPath(TimestampBasedKeyGenerator.java:138)
>  ... 29 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator(#1920)

2020-08-23 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 35b2185  [HUDI-1150] Fix unable to parse input partition field :1 
exception when using TimestampBasedKeyGenerator(#1920)
35b2185 is described below

commit 35b21855da209c812e006c1afff3d940d5ac2a18
Author: Mathieu 
AuthorDate: Sun Aug 23 19:56:50 2020 +0800

[HUDI-1150] Fix unable to parse input partition field :1 exception when 
using TimestampBasedKeyGenerator(#1920)
---
 .../main/java/org/apache/hudi/DataSourceUtils.java |  6 ++--
 .../apache/hudi/keygen/RowKeyGeneratorHelper.java  |  2 +-
 .../hudi/keygen/TimestampBasedKeyGenerator.java| 38 +---
 ...rser.java => AbstractHoodieDateTimeParser.java} | 40 +-
 .../keygen/parser/HoodieDateTimeParserImpl.java| 17 +++--
 .../keygen/TestTimestampBasedKeyGenerator.java | 39 +++--
 6 files changed, 109 insertions(+), 33 deletions(-)

diff --git a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java 
b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
index ea2cc5c..19316d5 100644
--- a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
+++ b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
@@ -39,7 +39,7 @@ import org.apache.hudi.hive.HiveSyncConfig;
 import org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;
 import org.apache.hudi.index.HoodieIndex;
 import org.apache.hudi.keygen.KeyGenerator;
-import org.apache.hudi.keygen.parser.HoodieDateTimeParser;
+import org.apache.hudi.keygen.parser.AbstractHoodieDateTimeParser;
 import org.apache.hudi.table.BulkInsertPartitioner;
 
 import org.apache.avro.LogicalTypes;
@@ -172,9 +172,9 @@ public class DataSourceUtils {
   /**
* Create a date time parser class for TimestampBasedKeyGenerator, passing 
in any configs needed.
*/
-  public static HoodieDateTimeParser createDateTimeParser(TypedProperties 
props, String parserClass) throws IOException {
+  public static AbstractHoodieDateTimeParser 
createDateTimeParser(TypedProperties props, String parserClass) throws 
IOException {
 try {
-  return (HoodieDateTimeParser) ReflectionUtils.loadClass(parserClass, 
props);
+  return (AbstractHoodieDateTimeParser) 
ReflectionUtils.loadClass(parserClass, props);
 } catch (Throwable e) {
   throw new IOException("Could not load date time parser class " + 
parserClass, e);
 }
diff --git 
a/hudi-spark/src/main/java/org/apache/hudi/keygen/RowKeyGeneratorHelper.java 
b/hudi-spark/src/main/java/org/apache/hudi/keygen/RowKeyGeneratorHelper.java
index 02b8492..4c05489 100644
--- a/hudi-spark/src/main/java/org/apache/hudi/keygen/RowKeyGeneratorHelper.java
+++ b/hudi-spark/src/main/java/org/apache/hudi/keygen/RowKeyGeneratorHelper.java
@@ -146,7 +146,7 @@ public class RowKeyGeneratorHelper {
 }
 valueToProcess = (Row) valueToProcess.get(positions.get(index));
   } else { // last index
-if (valueToProcess.getAs(positions.get(index)).toString().isEmpty()) {
+if (null != valueToProcess.getAs(positions.get(index)) && 
valueToProcess.getAs(positions.get(index)).toString().isEmpty()) {
   toReturn = EMPTY_RECORDKEY_PLACEHOLDER;
   break;
 }
diff --git 
a/hudi-spark/src/main/java/org/apache/hudi/keygen/TimestampBasedKeyGenerator.java
 
b/hudi-spark/src/main/java/org/apache/hudi/keygen/TimestampBasedKeyGenerator.java
index 25a52fe..97a7d2e 100644
--- 
a/hudi-spark/src/main/java/org/apache/hudi/keygen/TimestampBasedKeyGenerator.java
+++ 
b/hudi-spark/src/main/java/org/apache/hudi/keygen/TimestampBasedKeyGenerator.java
@@ -26,7 +26,7 @@ import org.apache.hudi.common.util.Option;
 import org.apache.hudi.exception.HoodieDeltaStreamerException;
 import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.exception.HoodieNotSupportedException;
-import org.apache.hudi.keygen.parser.HoodieDateTimeParser;
+import org.apache.hudi.keygen.parser.AbstractHoodieDateTimeParser;
 import org.apache.hudi.keygen.parser.HoodieDateTimeParserImpl;
 
 import org.apache.avro.generic.GenericRecord;
@@ -41,6 +41,7 @@ import java.io.Serializable;
 import java.io.UnsupportedEncodingException;
 import java.net.URLEncoder;
 import java.nio.charset.StandardCharsets;
+import java.util.TimeZone;
 import java.util.concurrent.TimeUnit;
 
 import static java.util.concurrent.TimeUnit.MILLISECONDS;
@@ -63,10 +64,11 @@ public class TimestampBasedKeyGenerator extends 
SimpleKeyGenerator {
   private final String outputDateFormat;
   private transient Option inputFormatter;
   private transient DateTimeFormatter partitionFormatter;
-  private final HoodieDateTimeParser parser;
+  private final AbstractHoodieDateTimeParser parser;
 
   // TimeZone detailed settings reference
   // https://docs.oracle.com/javase/8/doc

[GitHub] [hudi] yanghua merged pull request #1920: [HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator

2020-08-23 Thread GitBox


yanghua merged pull request #1920:
URL: https://github.com/apache/hudi/pull/1920


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1215) Ensure all commands in quick start are copy pastable

2020-08-23 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182665#comment-17182665
 ] 

sivabalan narayanan commented on HUDI-1215:
---

sure [~wangxianghu]. sounds good. 

> Ensure all commands in quick start are copy pastable
> 
>
> Key: HUDI-1215
> URL: https://issues.apache.org/jira/browse/HUDI-1215
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: 0.6.1
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> I see that delete commands at not directly copy pastable. Fix all such 
> commands in quick start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1215) Ensure all commands in quick start are copy pastable

2020-08-23 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-1215:
-

Assignee: wangxianghu  (was: sivabalan narayanan)

> Ensure all commands in quick start are copy pastable
> 
>
> Key: HUDI-1215
> URL: https://issues.apache.org/jira/browse/HUDI-1215
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: 0.6.1
>Reporter: sivabalan narayanan
>Assignee: wangxianghu
>Priority: Major
>
> I see that delete commands at not directly copy pastable. Fix all such 
> commands in quick start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] bhasudha opened a new pull request #2011: [DOC] Change reference from`Presto` to `PrestoDB`

2020-08-23 Thread GitBox


bhasudha opened a new pull request #2011:
URL: https://github.com/apache/hudi/pull/2011


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




svn commit: r41075 - in /release/hudi/hudi-0.6.0: ./ hudi-0.6.0.src.tgz hudi-0.6.0.src.tgz.asc hudi-0.6.0.src.tgz.sha512

2020-08-23 Thread bhavanisudha
Author: bhavanisudha
Date: Sun Aug 23 08:02:57 2020
New Revision: 41075

Log:
Apache Hudi 0.6.0 source release

Added:
release/hudi/hudi-0.6.0/
release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz   (with props)
release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc
release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512

Added: release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz
==
Binary file - no diff available.

Propchange: release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz
--
svn:mime-type = application/octet-stream

Added: release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc
==
--- release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc (added)
+++ release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc Sun Aug 23 08:02:57 2020
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEEf2bNTOmQmDooRnIpMiTyAOH8IXIFAl9CIDMACgkQMiTyAOH8
+IXJpJw//b5kQILuHPgiU/z0JXJkNpH9hs/OwjhUPP30lq9doEkCZ/DU/ZMP34has
+JWdYl3Qjin3OGpFoFWKXocxqovO8ACKP5Fo+ktqP5lAVgjZ/W9WXctCaRG/li3VR
+QhHOYeMho3s+hK2DOitexw4+PdCRFtVQ5vjSY9UpuvdzxZ5cXrj13wQ3b4N/pMnA
+tPTXzVj2UetVZaWQ59A72yWF9MZFeMuI/cRP1DJhVAGw8MNbgSDmZH+5H5avCvj+
+1ycwuTFcutP+6Fe4Acer5MysxaccGRuTbrODMuKjAhIqbo0pxjQ2UCOKDRdHvGB3
+4p1nun3+7gqoTfTqPJ5jbnvCGKGD777S8MysXBxzCyySneeL5Hn/QQxV2Fm7/Wkd
+ZrDtp669UkPA4o4MjqxrYpdbV4WkDI4Nggi2ITg6dKznQxSlwnpP+evPCc+rh0+S
+Av52nG35cudpseBPfCplonEI+dWJLjyf9O0cju2x2J2XIzIXjMhZ3IdG6Z4cL+n9
+40wdpGizbSdqf1RfC1UTTfndENilmLNbIhfFWhfBWJrXCFaINPUeXrheMQI5pMVC
+k8FXN6ol+9XVMJuXElpsO5s3HornM7+OKm71WEwmIDqX0iRkqu0DGz0NonZSHIhS
+4EQMbhzzmNczWtpLQ4HyJtkeRj9TaW7csF6gufKw5PhHsSnbKJ4=
+=JyJe
+-END PGP SIGNATURE-

Added: release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512
==
--- release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512 (added)
+++ release/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512 Sun Aug 23 08:02:57 2020
@@ -0,0 +1 @@
+80255cf9b62c548eebe6306d39acf04f66113482552e7acb653e225644ae4f1ae8892af1a737262ac737737dc9ca4da7117d5a9f05377c79c86e90ee11e7d89a
  hudi-0.6.0.src.tgz




[jira] [Comment Edited] (HUDI-1215) Ensure all commands in quick start are copy pastable

2020-08-23 Thread wangxianghu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182617#comment-17182617
 ] 

wangxianghu edited comment on HUDI-1215 at 8/23/20, 8:00 AM:
-

Hi [~shivnarayan], this issue contains HUDI-1103, may I make 1103 a sub-task of 
this one?

BTW, may I take this issue? :)


was (Author: wangxianghu):
Hi [~shivnarayan], this issue contains 
HUDI-1103(https://issues.apache.org/jira/browse/HUDI-1103), may I make 1103 a 
sub-task of this one?

BTW, may I take this issue? :)

> Ensure all commands in quick start are copy pastable
> 
>
> Key: HUDI-1215
> URL: https://issues.apache.org/jira/browse/HUDI-1215
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: 0.6.1
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> I see that delete commands at not directly copy pastable. Fix all such 
> commands in quick start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1215) Ensure all commands in quick start are copy pastable

2020-08-23 Thread wangxianghu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182617#comment-17182617
 ] 

wangxianghu commented on HUDI-1215:
---

Hi [~shivnarayan], this issue contains 
HUDI-1103(https://issues.apache.org/jira/browse/HUDI-1103), may I make 1103 a 
sub-task of this one?

BTW, may I take this issue? :)

> Ensure all commands in quick start are copy pastable
> 
>
> Key: HUDI-1215
> URL: https://issues.apache.org/jira/browse/HUDI-1215
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: 0.6.1
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> I see that delete commands at not directly copy pastable. Fix all such 
> commands in quick start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1216) Create chinese version of pyspark quickstart example

2020-08-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-1216:
--

Assignee: wangxianghu  (was: vinoyang)

> Create chinese version of pyspark quickstart example 
> -
>
> Key: HUDI-1216
> URL: https://issues.apache.org/jira/browse/HUDI-1216
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs-chinese
>Reporter: Balaji Varadarajan
>Assignee: wangxianghu
>Priority: Major
> Fix For: 0.6.1
>
>
> The quickstart page in Engish (for 0.5.3 version onwards) has pyspark example 
> but the chinese version do not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


svn commit: r41074 - in /dev/hudi/hudi-0.6.0: ./ hudi-0.6.0.src.tgz hudi-0.6.0.src.tgz.asc hudi-0.6.0.src.tgz.sha512

2020-08-23 Thread bhavanisudha
Author: bhavanisudha
Date: Sun Aug 23 07:24:29 2020
New Revision: 41074

Log:
Staging source releases for release-0.6.0

Added:
dev/hudi/hudi-0.6.0/
dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz   (with props)
dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc
dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512

Added: dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz
==
Binary file - no diff available.

Propchange: dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz
--
svn:mime-type = application/octet-stream

Added: dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc
==
--- dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc (added)
+++ dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.asc Sun Aug 23 07:24:29 2020
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEEf2bNTOmQmDooRnIpMiTyAOH8IXIFAl9CCq4ACgkQMiTyAOH8
+IXKRLA/9E50KbJqMwj7/TsJb93RKauBPj2kcc75F+ZE7Hy6Iypt1++rQ5E22a+FZ
+huOjCOsmKBCNMwkpc4NQdGz4iRrEYnQiCjTdNqQRFGA7n8hcXJLKbFSs0AhPR4qJ
+F0kWafpVtyt71s2MacPt44VgO3yfRswUmWzKGOeX1hef91fWI4O6JuJEIoeordE9
+KlI1GIckh5L3WyeFnd4EFX7Jc4joaDi4NNJLE+3Hg730lJgZHvXUwatWxPpb0Ccm
+WrzUWSZkjkj8hnHHljAMJmbXpOh8zi2IUvxQoiuuv6KhC2GEY/fhF9scchoTdqs2
+dIxlNhVD0y5j0ZSJydZGQLEhC6btD2Encvu1FB+wT/w380izqote1/YbGKsNOh/f
+9p6Oenioo3Gqfd6OtsKSaPNNsaNN2PlvSmHdJMlLbLyljYNDjJ24c/QGE1/c+NDa
+KDJF3Lj56OhESNR251FHJDVCA6mRTroboCR2VTdlg5QBMlgDaZyyJ1u1atMT2lAF
+1ZFNJCV8Q/Y5ospzqVaii9eSZuTiHJh3UEfLZuhvsU2pF4it3ew3G2j8dugE4Kxf
+3dGvzIEZNTyI3fBqrTc0Q+/I7ZWiyEUbMiOnp1e6lbjsN0d9QWkLHlw6gaNwdR6Z
+NoR/UhH9Y+0gX4/GqnuG86xxdCCuh2KhO6L2wEtH0K/r+BgKKyc=
+=Iuob
+-END PGP SIGNATURE-

Added: dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512
==
--- dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512 (added)
+++ dev/hudi/hudi-0.6.0/hudi-0.6.0.src.tgz.sha512 Sun Aug 23 07:24:29 2020
@@ -0,0 +1 @@
+f9c37064631d6c0e6d2bb143f639dfd03b1ca46e882643b5570d5f8819c2805a4f9dd91cb6ced1cdc625aeebdf94b69dd7bc0133444652a2f5f54358f5e43053
  hudi-0.6.0.src.tgz