date:20200306

[GitHub] [incubator-hudi] lamber-ken commented on issue #1376: Problem Sync Hudi table with Hive

2020-03-06 Thread GitBox

lamber-ken commented on issue #1376: Problem  Sync Hudi table with Hive
URL: https://github.com/apache/incubator-hudi/issues/1376#issuecomment-596059238
 
 
   hi @bvaradar, some api in `HoodieCombineHiveInputFormat` are not compatible 
with hive older versions. 
   compile will throw error if don't remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (HUDI-639) QU20: The project puts a very high priority on producing secure software.

2020-03-06 Thread vinoyang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053932#comment-17053932
 ] 

vinoyang commented on HUDI-639:
---

[~smarthi] After finishing HUDI-640, we will close this ticket.

> QU20: The project puts a very high priority on producing secure software.
> -
>
> Key: HUDI-639
> URL: https://issues.apache.org/jira/browse/HUDI-639
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-643) Check and write comment for all the rule items

2020-03-06 Thread vinoyang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053931#comment-17053931
 ] 

vinoyang commented on HUDI-643:
---

[~smarthi] Do you think the "comment" of all rules in the maturity model table 
should have content? If no, I will close this ticket.

> Check and write comment for all the rule items
> --
>
> Key: HUDI-643
> URL: https://issues.apache.org/jira/browse/HUDI-643
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Blocker
>
> Some rule item does not contain "comment", we should check and write it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-516) Avoid need to import spark-avro package when submitting Hudi job in spark

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken resolved HUDI-516.
-
Resolution: Fixed

> Avoid need to import spark-avro package when submitting Hudi job in spark
> -
>
> Key: HUDI-516
> URL: https://issues.apache.org/jira/browse/HUDI-516
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Udit Mehrotra
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> We are in the process of migrating Hudi to *spark 2.4.4* and using 
> *spark-avro* instead of the deprecated *databricks-avro* here 
> [https://github.com/apache/incubator-hudi/pull/1005/]
> After this change, users would be required to specifically download 
> spark-avro while start spark-shell using:
> {code:java}
> --packages org.apache.spark:spark-avro_2.11:2.4.4
> {code}
> This is because we are not shading this now in *hudi-spark-bundle*. One 
> reason for not shading this is because we are not sure of the implications of 
> shading a spark dependency in a jar which is being submitted to spark. 
> [~vinoth] pointed out that a possible concern could be that we will always be 
> shading spark-avro 2.4.4 which can affect users using higher versions of 
> Spark.
> This Jira is to come up with a way to solve this usability issue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-516) Avoid need to import spark-avro package when submitting Hudi job in spark

2020-03-06 Thread lamber-ken (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053930#comment-17053930
 ] 

lamber-ken commented on HUDI-516:
-

Right, fixed at master 5f85c267040fd51c186794fdae900162ab176b14

> Avoid need to import spark-avro package when submitting Hudi job in spark
> -
>
> Key: HUDI-516
> URL: https://issues.apache.org/jira/browse/HUDI-516
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Udit Mehrotra
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> We are in the process of migrating Hudi to *spark 2.4.4* and using 
> *spark-avro* instead of the deprecated *databricks-avro* here 
> [https://github.com/apache/incubator-hudi/pull/1005/]
> After this change, users would be required to specifically download 
> spark-avro while start spark-shell using:
> {code:java}
> --packages org.apache.spark:spark-avro_2.11:2.4.4
> {code}
> This is because we are not shading this now in *hudi-spark-bundle*. One 
> reason for not shading this is because we are not sure of the implications of 
> shading a spark dependency in a jar which is being submitted to spark. 
> [~vinoth] pointed out that a possible concern could be that we will always be 
> shading spark-avro 2.4.4 which can affect users using higher versions of 
> Spark.
> This Jira is to come up with a way to solve this usability issue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-674) Rename hudi-hadoop-mr-bundle to hudi-hive-bundle

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-674:

Status: Open  (was: New)

> Rename hudi-hadoop-mr-bundle to hudi-hive-bundle
> 
>
> Key: HUDI-674
> URL: https://issues.apache.org/jira/browse/HUDI-674
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: lamber-ken
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-674) Rename hudi-hadoop-mr-bundle to hudi-hive-bundle

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-674:
---

Assignee: lamber-ken

> Rename hudi-hadoop-mr-bundle to hudi-hive-bundle
> 
>
> Key: HUDI-674
> URL: https://issues.apache.org/jira/browse/HUDI-674
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-674) Rename hudi-hadoop-mr-bundle to hudi-hive-bundle

2020-03-06 Thread lamber-ken (Jira)

lamber-ken created HUDI-674:
---

 Summary: Rename hudi-hadoop-mr-bundle to hudi-hive-bundle
 Key: HUDI-674
 URL: https://issues.apache.org/jira/browse/HUDI-674
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Code Cleanup
Reporter: lamber-ken






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-673) Rename hudi-hive-bundle to hudi-hive-sync-bundle

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-673:

Status: Open  (was: New)

> Rename hudi-hive-bundle to hudi-hive-sync-bundle
> 
>
> Key: HUDI-673
> URL: https://issues.apache.org/jira/browse/HUDI-673
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-673) Rename hudi-hive-bundle to hudi-hive-sync-bundle

2020-03-06 Thread lamber-ken (Jira)

lamber-ken created HUDI-673:
---

 Summary: Rename hudi-hive-bundle to hudi-hive-sync-bundle
 Key: HUDI-673
 URL: https://issues.apache.org/jira/browse/HUDI-673
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Code Cleanup
Reporter: lamber-ken






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-673) Rename hudi-hive-bundle to hudi-hive-sync-bundle

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-673:
---

Assignee: lamber-ken

> Rename hudi-hive-bundle to hudi-hive-sync-bundle
> 
>
> Key: HUDI-673
> URL: https://issues.apache.org/jira/browse/HUDI-673
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] lamber-ken commented on issue #1374: [HUDI-654] Rename hudi-hive to hudi-hive-sync

2020-03-06 Thread GitBox

lamber-ken commented on issue #1374: [HUDI-654] Rename hudi-hive to 
hudi-hive-sync
URL: https://github.com/apache/incubator-hudi/pull/1374#issuecomment-596056298
 
 
   > we should also change `hudi-hive-bundle` to `hudi-hive-sync-bundle`? and 
need a PR to update the docs?
   > 
   > @lamber-ken @leesf ?
   
   will do, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Resolved] (HUDI-598) Update quick start page

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken resolved HUDI-598.
-
Resolution: Fixed

Fixed at asf-site 35b0aef62e27290657f5561658bf828b1a2c1b87

> Update quick start page 
> 
>
> Key: HUDI-598
> URL: https://issues.apache.org/jira/browse/HUDI-598
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1. code padding
> 2. org.apache.hudi to hudi
> 3. fix wrong table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-604) Update docker page

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken resolved HUDI-604.
-
Resolution: Fixed

Fixed at asf-site 5831e456cb3b37f83a2dbc92eb721a1a7b85bbb8

> Update docker page
> --
>
> Key: HUDI-604
> URL: https://issues.apache.org/jira/browse/HUDI-604
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1, Change one-line command to multi lines
> 2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-584) Relocate spark-avro dependency by maven-shade-plugin

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken resolved HUDI-584.
-
Resolution: Fixed

Fixed at master 5f85c267040fd51c186794fdae900162ab176b14

> Relocate spark-avro dependency by maven-shade-plugin
> 
>
> Key: HUDI-584
> URL: https://issues.apache.org/jira/browse/HUDI-584
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Relocate spark-avro dependency by maven-shade-plugin, spark-avro module is 
> not included with spark-shell by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-516) Avoid need to import spark-avro package when submitting Hudi job in spark

2020-03-06 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-516:

Status: Open  (was: New)

> Avoid need to import spark-avro package when submitting Hudi job in spark
> -
>
> Key: HUDI-516
> URL: https://issues.apache.org/jira/browse/HUDI-516
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Udit Mehrotra
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> We are in the process of migrating Hudi to *spark 2.4.4* and using 
> *spark-avro* instead of the deprecated *databricks-avro* here 
> [https://github.com/apache/incubator-hudi/pull/1005/]
> After this change, users would be required to specifically download 
> spark-avro while start spark-shell using:
> {code:java}
> --packages org.apache.spark:spark-avro_2.11:2.4.4
> {code}
> This is because we are not shading this now in *hudi-spark-bundle*. One 
> reason for not shading this is because we are not sure of the implications of 
> shading a spark dependency in a jar which is being submitted to spark. 
> [~vinoth] pointed out that a possible concern could be that we will always be 
> shading spark-avro 2.4.4 which can affect users using higher versions of 
> Spark.
> This Jira is to come up with a way to solve this usability issue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-516) Avoid need to import spark-avro package when submitting Hudi job in spark

2020-03-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-516:
---

Assignee: lamber-ken

> Avoid need to import spark-avro package when submitting Hudi job in spark
> -
>
> Key: HUDI-516
> URL: https://issues.apache.org/jira/browse/HUDI-516
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Udit Mehrotra
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> We are in the process of migrating Hudi to *spark 2.4.4* and using 
> *spark-avro* instead of the deprecated *databricks-avro* here 
> [https://github.com/apache/incubator-hudi/pull/1005/]
> After this change, users would be required to specifically download 
> spark-avro while start spark-shell using:
> {code:java}
> --packages org.apache.spark:spark-avro_2.11:2.4.4
> {code}
> This is because we are not shading this now in *hudi-spark-bundle*. One 
> reason for not shading this is because we are not sure of the implications of 
> shading a spark dependency in a jar which is being submitted to spark. 
> [~vinoth] pointed out that a possible concern could be that we will always be 
> shading spark-avro 2.4.4 which can affect users using higher versions of 
> Spark.
> This Jira is to come up with a way to solve this usability issue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-516) Avoid need to import spark-avro package when submitting Hudi job in spark

2020-03-06 Thread Vinoth Chandar (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053902#comment-17053902
 ] 

Vinoth Chandar commented on HUDI-516:
-

[~lamber-ken] this should be resolved now with your change.. correct? feel free 
to close this out if so 

> Avoid need to import spark-avro package when submitting Hudi job in spark
> -
>
> Key: HUDI-516
> URL: https://issues.apache.org/jira/browse/HUDI-516
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Udit Mehrotra
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> We are in the process of migrating Hudi to *spark 2.4.4* and using 
> *spark-avro* instead of the deprecated *databricks-avro* here 
> [https://github.com/apache/incubator-hudi/pull/1005/]
> After this change, users would be required to specifically download 
> spark-avro while start spark-shell using:
> {code:java}
> --packages org.apache.spark:spark-avro_2.11:2.4.4
> {code}
> This is because we are not shading this now in *hudi-spark-bundle*. One 
> reason for not shading this is because we are not sure of the implications of 
> shading a spark dependency in a jar which is being submitted to spark. 
> [~vinoth] pointed out that a possible concern could be that we will always be 
> shading spark-avro 2.4.4 which can affect users using higher versions of 
> Spark.
> This Jira is to come up with a way to solve this usability issue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-408) [Umbrella] Refactor/Code clean up hoodie write client

2020-03-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-408:
---

Assignee: Vinoth Chandar

> [Umbrella] Refactor/Code clean up hoodie write client 
> --
>
> Key: HUDI-408
> URL: https://issues.apache.org/jira/browse/HUDI-408
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Nishith Agarwal
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] OpenOpened commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-06 Thread GitBox

OpenOpened commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot 
Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-596047808
 
 
   @xushiyan @vinothchandar I already resolve comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #209

2020-03-06 Thread Apache Jenkins Server

See 


Changes:


--
[...truncated 2.40 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @

[jira] [Updated] (HUDI-670) Improve unit test coverage for org.apache.hudi.common.util.collection.DiskBasedMap

2020-03-06 Thread Prashant Wason (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-670:

Status: In Progress  (was: Open)

> Improve unit test coverage for 
> org.apache.hudi.common.util.collection.DiskBasedMap
> --
>
> Key: HUDI-670
> URL: https://issues.apache.org/jira/browse/HUDI-670
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-670) Improve unit test coverage for org.apache.hudi.common.util.collection.DiskBasedMap

2020-03-06 Thread Prashant Wason (Jira)

Prashant Wason created HUDI-670:
---

 Summary: Improve unit test coverage for 
org.apache.hudi.common.util.collection.DiskBasedMap
 Key: HUDI-670
 URL: https://issues.apache.org/jira/browse/HUDI-670
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: Prashant Wason






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-670) Improve unit test coverage for org.apache.hudi.common.util.collection.DiskBasedMap

2020-03-06 Thread Prashant Wason (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-670:

Status: Open  (was: New)

> Improve unit test coverage for 
> org.apache.hudi.common.util.collection.DiskBasedMap
> --
>
> Key: HUDI-670
> URL: https://issues.apache.org/jira/browse/HUDI-670
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-670) Improve unit test coverage for org.apache.hudi.common.util.collection.DiskBasedMap

2020-03-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-670:

Labels: pull-request-available  (was: )

> Improve unit test coverage for 
> org.apache.hudi.common.util.collection.DiskBasedMap
> --
>
> Key: HUDI-670
> URL: https://issues.apache.org/jira/browse/HUDI-670
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-668) Improve unit test coverage org.apache.hudi.metrics.Metrics

2020-03-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-668:

Labels: pull-request-available  (was: )

> Improve unit test coverage org.apache.hudi.metrics.Metrics
> --
>
> Key: HUDI-668
> URL: https://issues.apache.org/jira/browse/HUDI-668
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-672) Spark DataSource - Upsert for S3 Hudi dataset with large partitions takes a lot of time in writing

2020-03-06 Thread Balaji Varadarajan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-672:
---

Assignee: Udit Mehrotra

> Spark DataSource - Upsert for S3 Hudi dataset with large partitions takes a 
> lot of time in writing
> --
>
> Key: HUDI-672
> URL: https://issues.apache.org/jira/browse/HUDI-672
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.6.0
>
>
> Github Issue : [https://github.com/apache/incubator-hudi/issues/1371]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-672) Spark DataSource - Upsert for S3 Hudi dataset with large partitions takes a lot of time in writing

2020-03-06 Thread Balaji Varadarajan (Jira)

Balaji Varadarajan created HUDI-672:
---

 Summary: Spark DataSource - Upsert for S3 Hudi dataset with large 
partitions takes a lot of time in writing
 Key: HUDI-672
 URL: https://issues.apache.org/jira/browse/HUDI-672
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Spark Integration
Reporter: Balaji Varadarajan
 Fix For: 0.6.0


Github Issue : [https://github.com/apache/incubator-hudi/issues/1371]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-671) Improve unit test coverage for org.apache.hudi.index.hbase.HbaseIndex

2020-03-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-671:

Labels: pull-request-available  (was: )

> Improve unit test coverage for org.apache.hudi.index.hbase.HbaseIndex
> -
>
> Key: HUDI-671
> URL: https://issues.apache.org/jira/browse/HUDI-671
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-667) HoodieTestDataGenerator does not delete keys correctly

2020-03-06 Thread Prashant Wason (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-667:

Status: In Progress  (was: Open)

> HoodieTestDataGenerator does not delete keys correctly
> --
>
> Key: HUDI-667
> URL: https://issues.apache.org/jira/browse/HUDI-667
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HoodieTestDataGenerator is used to generate sample data for unit-tests. It 
> allows generating HoodieRecords for insert/update/delete. It maintains the 
> record keys in a HashMap.
> private final Map existingKeys;
> There are two issues in the implementation:
>  # Delete from existingKeys uses KeyPartition rather than Integer keys
>  # Inserting records after deletes is not correctly handled
> The implementation uses the Integer key so that values can be looked up 
> randomly. Assume three values were inserted, then the HashMap will hold:
> 0 -> KeyPartition1
> 1 -> KeyPartition2
> 2 -> KeyPartition3
> Now if we delete KeyPartition2  (generate a random record for deletion), the 
> HashMap will be:
> 0 -> KeyPartition1
> 2 -> KeyPartition3
>  
> Now if we issue a insertBatch() then the insert is 
> existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the 
> KeyPartition3 already in the map rather than actually inserting a new entry 
> in the map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-667) HoodieTestDataGenerator does not delete keys correctly

2020-03-06 Thread Prashant Wason (Jira)

Prashant Wason created HUDI-667:
---

 Summary: HoodieTestDataGenerator does not delete keys correctly
 Key: HUDI-667
 URL: https://issues.apache.org/jira/browse/HUDI-667
 Project: Apache Hudi (incubating)
  Issue Type: Bug
Reporter: Prashant Wason


HoodieTestDataGenerator is used to generate sample data for unit-tests. It 
allows generating HoodieRecords for insert/update/delete. It maintains the 
record keys in a HashMap.

private final Map existingKeys;

There are two issues in the implementation:
 # Delete from existingKeys uses KeyPartition rather than Integer keys
 # Inserting records after deletes is not correctly handled

The implementation uses the Integer key so that values can be looked up 
randomly. Assume three values were inserted, then the HashMap will hold:

0 -> KeyPartition1
1 -> KeyPartition2
2 -> KeyPartition3

Now if we delete KeyPartition2  (generate a random record for deletion), the 
HashMap will be:

0 -> KeyPartition1
2 -> KeyPartition3

 

Now if we issue a insertBatch() then the insert is 
existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the 
KeyPartition3 already in the map rather than actually inserting a new entry in 
the map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-669) HoodieDeltaStreamer offset not handled correctly when using LATEST offset reset strategy

2020-03-06 Thread Balaji Varadarajan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-669:

Status: Open  (was: New)

> HoodieDeltaStreamer offset not handled correctly when using LATEST offset 
> reset strategy
> 
>
> Key: HUDI-669
> URL: https://issues.apache.org/jira/browse/HUDI-669
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Context : [https://github.com/apache/incubator-hudi/issues/1375]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-667) HoodieTestDataGenerator does not delete keys correctly

2020-03-06 Thread Prashant Wason (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053732#comment-17053732
 ] 

Prashant Wason commented on HUDI-667:
-

The fix is to maintain a minKey and maxKey within the HoodieTestDataGenerator 
class. 
 # To find random records, we generate a random integer in range  [minKey, 
maxKey] and verify that this index actually exists in the HashMap
 # To insert new records, we always insert at maxKey and increment maxKey
 # We update minKey / maxKey during deletions (if required)

 

 

> HoodieTestDataGenerator does not delete keys correctly
> --
>
> Key: HUDI-667
> URL: https://issues.apache.org/jira/browse/HUDI-667
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HoodieTestDataGenerator is used to generate sample data for unit-tests. It 
> allows generating HoodieRecords for insert/update/delete. It maintains the 
> record keys in a HashMap.
> private final Map existingKeys;
> There are two issues in the implementation:
>  # Delete from existingKeys uses KeyPartition rather than Integer keys
>  # Inserting records after deletes is not correctly handled
> The implementation uses the Integer key so that values can be looked up 
> randomly. Assume three values were inserted, then the HashMap will hold:
> 0 -> KeyPartition1
> 1 -> KeyPartition2
> 2 -> KeyPartition3
> Now if we delete KeyPartition2  (generate a random record for deletion), the 
> HashMap will be:
> 0 -> KeyPartition1
> 2 -> KeyPartition3
>  
> Now if we issue a insertBatch() then the insert is 
> existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the 
> KeyPartition3 already in the map rather than actually inserting a new entry 
> in the map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] leesf merged pull request #1190: [HUDI-499] Add configuration docs

2020-03-06 Thread GitBox

leesf merged pull request #1190: [HUDI-499] Add configuration docs
URL: https://github.com/apache/incubator-hudi/pull/1190
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-hudi] branch asf-site updated: [HUDI-499] Add configuration docs

2020-03-06 Thread leesf

This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e15ff7f  [HUDI-499] Add configuration docs
e15ff7f is described below

commit e15ff7f23439588186ad54952242f555be3602b2
Author: XU SHIYAN 
AuthorDate: Sun Jan 5 22:53:08 2020 -0800

[HUDI-499] Add configuration docs

For the new configuration: hoodie.bloom.index.update.partition.path
---
 docs/_docs/2_4_configurations.cn.md | 6 +-
 docs/_docs/2_4_configurations.md| 6 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/docs/_docs/2_4_configurations.cn.md 
b/docs/_docs/2_4_configurations.cn.md
index 66b1b0d..bd0ddf5 100644
--- a/docs/_docs/2_4_configurations.cn.md
+++ b/docs/_docs/2_4_configurations.cn.md
@@ -291,7 +291,11 @@ Hudi将有关提交、保存点、清理审核日志等的所有主要元数据
 属性：`hoodie.index.hbase.table` 
 仅在索引类型为HBASE时适用。HBase表名称，用作索引。Hudi将row_key和[partition_path, 
fileID, commitTime]映射存储在表中。
 
-
+# bloomIndexUpdatePartitionPath(updatePartitionPath = false) 
{#bloomIndexUpdatePartitionPath}
+属性：`hoodie.bloom.index.update.partition.path` 
+仅在索引类型为GLOBAL_BLOOM时适用。为true时，当对一个已有记录执行包含分区路径的更新操作时，将会导致把新记录插入到新分区，而把原有记录从旧分区里删除。为false时，只对旧分区的原有记录进行更新。
+
+
 ### 存储选项
 控制有关调整parquet和日志文件大小的方面。
 
diff --git a/docs/_docs/2_4_configurations.md b/docs/_docs/2_4_configurations.md
index d62f179..33c5e2c 100644
--- a/docs/_docs/2_4_configurations.md
+++ b/docs/_docs/2_4_configurations.md
@@ -274,7 +274,11 @@ Property: `hoodie.index.hbase.zknode.path` 
 Property: `hoodie.index.hbase.table` 
 Only applies if index type is HBASE. HBase Table name 
to use as the index. Hudi stores the row_key and [partition_path, fileID, 
commitTime] mapping in the table.
 
-
+# bloomIndexUpdatePartitionPath(updatePartitionPath = false) 
{#bloomIndexUpdatePartitionPath}
+Property: `hoodie.bloom.index.update.partition.path` 
+Only applies if index type is GLOBAL_BLOOM. When 
set to true, an update including the partition path of a record that already 
exists will result in inserting the incoming record into the new partition and 
deleting the original record in the old partition. When set to false, the 
original record will only be updated in the old partition.
+
+
 ### Storage configs
 Controls aspects around sizing parquet and log files.

[jira] [Created] (HUDI-671) Improve unit test coverage for org.apache.hudi.index.hbase.HbaseIndex

2020-03-06 Thread Prashant Wason (Jira)

Prashant Wason created HUDI-671:
---

 Summary: Improve unit test coverage for 
org.apache.hudi.index.hbase.HbaseIndex
 Key: HUDI-671
 URL: https://issues.apache.org/jira/browse/HUDI-671
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: Prashant Wason






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-668) Improve unit test coverage org.apache.hudi.metrics.Metrics

2020-03-06 Thread Prashant Wason (Jira)

Prashant Wason created HUDI-668:
---

 Summary: Improve unit test coverage org.apache.hudi.metrics.Metrics
 Key: HUDI-668
 URL: https://issues.apache.org/jira/browse/HUDI-668
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: Prashant Wason






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-669) HoodieDeltaStreamer offset not handled correctly when using LATEST offset reset strategy

2020-03-06 Thread Balaji Varadarajan (Jira)

Balaji Varadarajan created HUDI-669:
---

 Summary: HoodieDeltaStreamer offset not handled correctly when 
using LATEST offset reset strategy
 Key: HUDI-669
 URL: https://issues.apache.org/jira/browse/HUDI-669
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: Balaji Varadarajan


Context : [https://github.com/apache/incubator-hudi/issues/1375]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-672) Spark DataSource - Upsert for S3 Hudi dataset with large partitions takes a lot of time in writing

2020-03-06 Thread Udit Mehrotra (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053834#comment-17053834
 ] 

Udit Mehrotra commented on HUDI-672:


This issue is duplicated by [https://issues.apache.org/jira/browse/HUDI-656] . 
Resolving this Jira in favor of the other one.

> Spark DataSource - Upsert for S3 Hudi dataset with large partitions takes a 
> lot of time in writing
> --
>
> Key: HUDI-672
> URL: https://issues.apache.org/jira/browse/HUDI-672
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.6.0
>
>
> Github Issue : [https://github.com/apache/incubator-hudi/issues/1371]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] xushiyan commented on issue #1190: [HUDI-499] Add configuration docs

2020-03-06 Thread GitBox

xushiyan commented on issue #1190: [HUDI-499] Add configuration docs
URL: https://github.com/apache/incubator-hudi/pull/1190#issuecomment-596030140
 
 
   @leesf Updated the branch.. this should be ready to merge. thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] prashantwason opened a new pull request #1381: [HUDI-671] Added unit-test for HBaseIndex

2020-03-06 Thread GitBox

prashantwason opened a new pull request #1381: [HUDI-671] Added unit-test for 
HBaseIndex
URL: https://github.com/apache/incubator-hudi/pull/1381
 
 
   ## What is the purpose of the pull request
   
   Added unit-test for HBaseIndex
   
   ## Brief change log
   
   Added unit test for code missing coverage.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   mvn test
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1350: [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java

2020-03-06 Thread GitBox

codecov-io edited a comment on issue #1350: [HUDI-629]: Replace Guava's Hashing 
with an equivalent in NumericUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1350#issuecomment-59868
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=h1) 
Report
   > Merging 
[#1350](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/ee5b32f5d4aa26e7fc58ccdae46935f063460920?src=pr=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `58.5%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1350/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1350  +/-   ##
   
   - Coverage 66.96%   66.94%   -0.03% 
 Complexity  223  223  
   
 Files   334  335   +1 
 Lines 1627616296  +20 
 Branches   1661 1661  
   
   + Hits  1090010909   +9 
   - Misses 4638 4644   +6 
   - Partials738  743   +5
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `83.84% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...pache/hudi/utilities/HoodieWithTimelineServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVdpdGhUaW1lbGluZVNlcnZlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...e/hudi/common/util/queue/BoundedInMemoryQueue.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWUuamF2YQ==)
 | `91.13% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...pache/hudi/common/model/TimelineLayoutVersion.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL1RpbWVsaW5lTGF5b3V0VmVyc2lvbi5qYXZh)
 | `65% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...pache/hudi/common/versioning/MetadataMigrator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvTWV0YWRhdGFNaWdyYXRvci5qYXZh)
 | `58.33% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `84.12% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `62.79% <0%> (ø)` | `20 <0> (ø)` | :arrow_down: |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...mmon/versioning/clean/CleanV2MigrationHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5WMk1pZ3JhdGlvbkhhbmRsZXIuamF2YQ==)
 | `94.87% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1oaXZlLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGl2ZS9NdWx0aVBhcnRLZXlzVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `55.55% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | ... and [33 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing

[GitHub] [incubator-hudi] satishkotha closed pull request #1320: [HUDI-571] Add min/max headers on archived files

2020-03-06 Thread GitBox

satishkotha closed pull request #1320: [HUDI-571] Add min/max headers on 
archived files
URL: https://github.com/apache/incubator-hudi/pull/1320
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha closed pull request #1355: [HUDI-633] limit archive file block size by number of bytes

2020-03-06 Thread GitBox

satishkotha closed pull request #1355: [HUDI-633] limit archive file block size 
by number of bytes
URL: https://github.com/apache/incubator-hudi/pull/1355
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] xushiyan edited a comment on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-06 Thread GitBox

xushiyan edited a comment on issue #1360: [HUDI-344][RFC-09] Hudi Dataset 
Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-595927116
 
 
   > @xushiyan could you please make one pass and resolve the comments that are 
already addressed..
   
   @vinothchandar I'm not able to resolve comments, the "resolve" buttons are 
not shown to me. guess it's due to permission :) I think @OpenOpened as the 
author can do that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on issue #998: Incremental view not implemented yet, for merge-on-read datasets

2020-03-06 Thread GitBox

bhasudha commented on issue #998: Incremental view not implemented yet, for 
merge-on-read datasets
URL: https://github.com/apache/incubator-hudi/issues/998#issuecomment-596005967
 
 
   closing this in favor of https://issues.apache.org/jira/browse/HUDI-58


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha closed issue #998: Incremental view not implemented yet, for merge-on-read datasets

2020-03-06 Thread GitBox

bhasudha closed issue #998: Incremental view not implemented yet, for 
merge-on-read datasets
URL: https://github.com/apache/incubator-hudi/issues/998
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha merged pull request #1306: [HUDI-598] Update quick start page

2020-03-06 Thread GitBox

bhasudha merged pull request #1306: [HUDI-598] Update quick start page
URL: https://github.com/apache/incubator-hudi/pull/1306
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha merged pull request #1316: [HUDI-604] Update docker page

2020-03-06 Thread GitBox

bhasudha merged pull request #1316: [HUDI-604] Update docker page
URL: https://github.com/apache/incubator-hudi/pull/1316
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on issue #1306: [HUDI-598] Update quick start page

2020-03-06 Thread GitBox

bhasudha commented on issue #1306: [HUDI-598] Update quick start page
URL: https://github.com/apache/incubator-hudi/pull/1306#issuecomment-596005465
 
 
   Merging this since 0.5.1 doc release is cut.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-hudi] 02/02: Unify table name

2020-03-06 Thread bhavanisudha

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git

commit 35b0aef62e27290657f5561658bf828b1a2c1b87
Author: lamber-ken 
AuthorDate: Wed Feb 5 13:25:11 2020 +0800

Unify table name
---
 docs/_docs/1_1_quick_start_guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/_docs/1_1_quick_start_guide.md 
b/docs/_docs/1_1_quick_start_guide.md
index 83a3c27..8111acf 100644
--- a/docs/_docs/1_1_quick_start_guide.md
+++ b/docs/_docs/1_1_quick_start_guide.md
@@ -198,9 +198,9 @@ val roAfterDeleteViewDF = spark.
   read.
   format("hudi").
   load(basePath + "/*/*/*/*")
-roAfterDeleteViewDF.registerTempTable("hudi_ro_table")
+roAfterDeleteViewDF.registerTempTable("hudi_trips_snapshot")
 // fetch should return (total - 2) records
-spark.sql("select uuid, partitionPath from hudi_ro_table").count()
+spark.sql("select uuid, partitionPath from hudi_trips_snapshot").count()
 ```
 Note: Only `Append` mode is supported for delete operation.

[incubator-hudi] 01/02: [HUDI-598] Update quick start page

2020-03-06 Thread bhavanisudha

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git

commit 19b08d225484c3e7dd3d59e01d7546059086b96d
Author: lamber-ken 
AuthorDate: Wed Feb 5 02:06:50 2020 +0800

[HUDI-598] Update quick start page
---
 docs/_docs/1_1_quick_start_guide.md | 102 ++--
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/docs/_docs/1_1_quick_start_guide.md 
b/docs/_docs/1_1_quick_start_guide.md
index 256e560..83a3c27 100644
--- a/docs/_docs/1_1_quick_start_guide.md
+++ b/docs/_docs/1_1_quick_start_guide.md
@@ -17,8 +17,8 @@ From the extracted directory run spark-shell with Hudi as:
 
 ```scala
 spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
---packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
 \
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+  --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
 \
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 
 
@@ -58,14 +58,14 @@ Generate some new trips, load them into a DataFrame and 
write the DataFrame into
 ```scala
 val inserts = convertToStringList(dataGen.generateInserts(10))
 val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
-df.write.format("org.apache.hudi").
-options(getQuickstartWriteConfigs).
-option(PRECOMBINE_FIELD_OPT_KEY, "ts").
-option(RECORDKEY_FIELD_OPT_KEY, "uuid").
-option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
-option(TABLE_NAME, tableName).
-mode(Overwrite).
-save(basePath);
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Overwrite).
+  save(basePath)
 ``` 
 
 `mode(Overwrite)` overwrites and recreates the table if it already exists.
@@ -84,10 +84,11 @@ Load the data files into a DataFrame.
 
 ```scala
 val tripsSnapshotDF = spark.
-read.
-format("org.apache.hudi").
-load(basePath + "/*/*/*/*")
+  read.
+  format("hudi").
+  load(basePath + "/*/*/*/*")
 tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
 spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_trips_snapshot 
where fare > 20.0").show()
 spark.sql("select _hoodie_commit_time, _hoodie_record_key, 
_hoodie_partition_path, rider, driver, fare from  hudi_trips_snapshot").show()
 ```
@@ -104,15 +105,15 @@ and write DataFrame into the hudi table.
 
 ```scala
 val updates = convertToStringList(dataGen.generateUpdates(10))
-val df = spark.read.json(spark.sparkContext.parallelize(updates, 2));
-df.write.format("org.apache.hudi").
-options(getQuickstartWriteConfigs).
-option(PRECOMBINE_FIELD_OPT_KEY, "ts").
-option(RECORDKEY_FIELD_OPT_KEY, "uuid").
-option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
-option(TABLE_NAME, tableName).
-mode(Append).
-save(basePath);
+val df = spark.read.json(spark.sparkContext.parallelize(updates, 2))
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
 ```
 
 Notice that the save mode is now `Append`. In general, always use append mode 
unless you are trying to create the table for the first time.
@@ -129,22 +130,21 @@ We do not need to specify endTime, if we want all changes 
after the given commit
 ```scala
 // reload data
 spark.
-read.
-format("org.apache.hudi").
-load(basePath + "/*/*/*/*").
-createOrReplaceTempView("hudi_trips_snapshot")
+  read.
+  format("hudi").
+  load(basePath + "/*/*/*/*").
+  createOrReplaceTempView("hudi_trips_snapshot")
 
 val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime 
from  hudi_trips_snapshot order by commitTime").map(k => 
k.getString(0)).take(50)
 val beginTime = commits(commits.length - 2) // commit time we are interested in
 
 // incrementally query data
-val tripsIncrementalDF = spark.
-read.
-format("org.apache.hudi").
-option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
-option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
-load(basePath);
+val tripsIncrementalDF = spark.read.format("hudi").
+  option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
+  option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
+  load(basePath)
 tripsIncrementalDF.createOrReplaceTempView("hudi_trips_incremental")
+
 spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  
hudi_trips_incremental where fare > 20.0").show()
 ``` 
 
@@ -162,11 +162,11 @@ val beginTime = "000" // Represents all commits

[incubator-hudi] branch asf-site updated (d02c401 -> 35b0aef)

2020-03-06 Thread bhavanisudha

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from d02c401  [HUDI-590] [DOCS] Cut doc version 0.5.1 and update README 
with instruction to cut doc version from Mac
 new 19b08d2  [HUDI-598] Update quick start page
 new 35b0aef  Unify table name

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/_docs/1_1_quick_start_guide.md | 106 ++--
 1 file changed, 53 insertions(+), 53 deletions(-)

[incubator-hudi] branch asf-site updated: [HUDI-604] Update docker page

2020-03-06 Thread bhavanisudha

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 5831e45  [HUDI-604] Update docker page
5831e45 is described below

commit 5831e456cb3b37f83a2dbc92eb721a1a7b85bbb8
Author: lamber-ken 
AuthorDate: Mon Feb 10 13:32:16 2020 +0800

[HUDI-604] Update docker page
---
 docs/_docs/0_4_docker_demo.md | 279 ++
 1 file changed, 174 insertions(+), 105 deletions(-)

diff --git a/docs/_docs/0_4_docker_demo.md b/docs/_docs/0_4_docker_demo.md
index 88ead1b..306545e 100644
--- a/docs/_docs/0_4_docker_demo.md
+++ b/docs/_docs/0_4_docker_demo.md
@@ -19,18 +19,17 @@ The steps have been tested on a Mac laptop
   * kafkacat : A command-line utility to publish/consume from kafka topics. 
Use `brew install kafkacat` to install kafkacat
   * /etc/hosts : The demo references many services running in container by the 
hostname. Add the following settings to /etc/hosts
 
-
-```java
-   127.0.0.1 adhoc-1
-   127.0.0.1 adhoc-2
-   127.0.0.1 namenode
-   127.0.0.1 datanode1
-   127.0.0.1 hiveserver
-   127.0.0.1 hivemetastore
-   127.0.0.1 kafkabroker
-   127.0.0.1 sparkmaster
-   127.0.0.1 zookeeper
-```
+```java
+127.0.0.1 adhoc-1
+127.0.0.1 adhoc-2
+127.0.0.1 namenode
+127.0.0.1 datanode1
+127.0.0.1 hiveserver
+127.0.0.1 hivemetastore
+127.0.0.1 kafkabroker
+127.0.0.1 sparkmaster
+127.0.0.1 zookeeper
+```
 
 Also, this has not been tested on some environments like Docker on Windows.
 
@@ -148,7 +147,6 @@ kafkacat -b kafkabroker -L -J | jq .
 }
   ]
 }
-
 ```
 
 ### Step 2: Incrementally ingest data from Kafka topic
@@ -162,12 +160,26 @@ automatically initializes the tables in the file-system 
if they do not exist yet
 docker exec -it adhoc-2 /bin/bash
 
 # Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_cow table in HDFS
-spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --table-type COPY_ON_WRITE --source-class 
org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_cow --target-table 
stock_ticks_cow --props /var/demo/config/kafka-source.properties 
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
-
+spark-submit \
+  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
+  --table-type COPY_ON_WRITE \
+  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
+  --source-ordering-field ts  \
+  --target-base-path /user/hive/warehouse/stock_ticks_cow \
+  --target-table stock_ticks_cow --props 
/var/demo/config/kafka-source.properties \
+  --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
 
 # Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_mor table in HDFS
-spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --table-type MERGE_ON_READ --source-class 
org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_mor --target-table 
stock_ticks_mor --props /var/demo/config/kafka-source.properties 
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
--disable-compaction
-
+spark-submit \
+  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
+  --table-type MERGE_ON_READ \
+  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
+  --source-ordering-field ts \
+  --target-base-path /user/hive/warehouse/stock_ticks_mor \
+  --target-table stock_ticks_mor \
+  --props /var/demo/config/kafka-source.properties \
+  --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
+  --disable-compaction
 
 # As part of the setup (Look at setup_demo.sh), the configs needed for 
DeltaStreamer is uploaded to HDFS. The configs
 # contain mostly Kafa connectivity settings, the avro-schema to be used for 
ingesting along with key and partitioning fields.
@@ -194,18 +206,33 @@ inorder to run Hive queries against those tables.
 docker exec -it adhoc-2 /bin/bash
 
 # THis command takes in HIveServer URL and COW Hudi table location in HDFS and 
sync the HDFS state to Hive
-/var/hoodie/ws/hudi-hive/run_sync_tool.sh  --jdbc-url 
jdbc:hive2://hiveserver:1 --user hive --pass hive --partitioned-by dt 
--base-path /user/hive/warehouse/stock_ticks_cow --database default --table 
stock_ticks_cow
+/var/hoodie/ws/hudi-hive/run_sync_tool.sh \
+  --jdbc-url jdbc:hive2://hiveserver:1 \
+  --user hive \
+  --pass hive \
+  --partitioned-by dt \
+  --base-path

[GitHub] [incubator-hudi] bhasudha merged pull request #1378: [HUDI-590] Cut doc version for 0.5.1 and update site

2020-03-06 Thread GitBox

bhasudha merged pull request #1378: [HUDI-590] Cut doc version for 0.5.1 and 
update site
URL: https://github.com/apache/incubator-hudi/pull/1378
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on issue #1378: [HUDI-590] Cut doc version for 0.5.1 and update site

2020-03-06 Thread GitBox

leesf commented on issue #1378: [HUDI-590] Cut doc version for 0.5.1 and update 
site
URL: https://github.com/apache/incubator-hudi/pull/1378#issuecomment-596001146
 
 
   > @leesf @vc Thanks for the reviews. Updated the 
https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+%28incubating%29+-+Release+Guide
 with the section `Steps to cut doc version and update website` as a last step 
in `Finalize the release`
   > 
   > Also I have squashed the commits into one. I can merge if you are okay 
with the changes.
   
   LGTM, please merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] melkimohamed closed issue #1376: Problem Sync Hudi table with Hive

2020-03-06 Thread GitBox

melkimohamed closed issue #1376: Problem  Sync Hudi table with Hive
URL: https://github.com/apache/incubator-hudi/issues/1376
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] melkimohamed commented on issue #1376: Problem Sync Hudi table with Hive

2020-03-06 Thread GitBox

melkimohamed commented on issue #1376: Problem  Sync Hudi table with Hive
URL: https://github.com/apache/incubator-hudi/issues/1376#issuecomment-595995451
 
 
   Thanks @lamber-ken and @bvaradar , i resolved the problem by deleted 
HoodieCombineHiveInputFormat class and rebuild the project with hive 2.1.0 
version 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #652: Reading Merge_on_read table| Failing SchemaParseException: Empty name

2020-03-06 Thread GitBox

bvaradar closed issue #652: Reading Merge_on_read table| Failing 
SchemaParseException: Empty name
URL: https://github.com/apache/incubator-hudi/issues/652
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #652: Reading Merge_on_read table| Failing SchemaParseException: Empty name

2020-03-06 Thread GitBox

bvaradar commented on issue #652: Reading Merge_on_read table| Failing 
SchemaParseException: Empty name
URL: https://github.com/apache/incubator-hudi/issues/652#issuecomment-595993807
 
 
   Closing this due to inactivity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #828: Synchronizing to hive partition is incorrect

2020-03-06 Thread GitBox

bvaradar closed issue #828: Synchronizing to hive partition is incorrect
URL: https://github.com/apache/incubator-hudi/issues/828
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #828: Synchronizing to hive partition is incorrect

2020-03-06 Thread GitBox

bvaradar commented on issue #828: Synchronizing to hive partition is incorrect
URL: https://github.com/apache/incubator-hudi/issues/828#issuecomment-595993579
 
 
   Closing this due to inactivity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2020-03-06 Thread GitBox

bvaradar closed issue #894: Getting java.lang.NoSuchMethodError while doing 
Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2020-03-06 Thread GitBox

bvaradar commented on issue #894: Getting java.lang.NoSuchMethodError while 
doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-595993229
 
 
   Closing this ticket due to inactivity. Hudi 0.5.1-incubating is now 
available with Spark 2.4.4. and avro 1.8.2 compile time dependency. @firecast : 
Please feel free to reopen if you still running into issues.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] prashantwason opened a new pull request #1380: [HUDI-668] Added unit-tests for HUDI metrics.

2020-03-06 Thread GitBox

prashantwason opened a new pull request #1380: [HUDI-668] Added unit-tests for 
HUDI metrics.
URL: https://github.com/apache/incubator-hudi/pull/1380
 
 
   ## What is the purpose of the pull request
   
   Added unit-tests for HUDI metrics.
   
   ## Brief change log
   
   Added unit test for code missing coverage.
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This change added tests and can be verified as follows:
   
   mvn test
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #910: hoodie..consume. should be set whitelist in hive-site.xml

2020-03-06 Thread GitBox

bvaradar commented on issue #910: hoodie.*.consume.* should be set whitelist in 
hive-site.xml
URL: https://github.com/apache/incubator-hudi/issues/910#issuecomment-595992583
 
 
   @bhasudha : Can you kindly let me know if this is documented so that we can 
close this ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] prashantwason opened a new pull request #1379: [HUDI-670] Added unit-test for DiskBasedMap.

2020-03-06 Thread GitBox

prashantwason opened a new pull request #1379: [HUDI-670] Added unit-test for 
DiskBasedMap.
URL: https://github.com/apache/incubator-hudi/pull/1379
 
 
   ## What is the purpose of the pull request
   
   Added unit-test for DiskBasedMap
   
   ## Brief change log
   
   Added unit test for missing code
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   mvn test 
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1284: [SUPPORT]

2020-03-06 Thread GitBox

bvaradar commented on issue #1284: [SUPPORT]
URL: https://github.com/apache/incubator-hudi/issues/1284#issuecomment-595992007
 
 
   @haospotai : Can you confirm if you are still having issues. It is not very 
clear to me.
   
   @lamber-ken : I see discussion about Jira. Are there any tracking jira 
already. If so, can you add the Jira as a comment for reference. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1371: [SUPPORT] Upsert for S3 Hudi dataset with large partitions takes a lot of time in writing

2020-03-06 Thread GitBox

bvaradar commented on issue #1371: [SUPPORT] Upsert for S3 Hudi dataset with 
large partitions takes a lot of time in writing
URL: https://github.com/apache/incubator-hudi/issues/1371#issuecomment-595991258
 
 
   @umehrot2 : Assigning this github issue to you.  Corresponding Jira : 
https://jira.apache.org/jira/browse/HUDI-672 If there is already a tracking 
Jira, please feel free to close this one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #1243: [SUPPORT]Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.p

2020-03-06 Thread GitBox

bvaradar closed issue #1243: [SUPPORT]Caused by: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 
org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/avro/Schema;)Lorg/apache/parquet/schema/MessageType;
URL: https://github.com/apache/incubator-hudi/issues/1243
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1243: [SUPPORT]Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.ap

2020-03-06 Thread GitBox

bvaradar commented on issue #1243: [SUPPORT]Caused by: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 
org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/avro/Schema;)Lorg/apache/parquet/schema/MessageType;
URL: https://github.com/apache/incubator-hudi/issues/1243#issuecomment-595989970
 
 
   @HariprasadAllaka1612 : Can you check with Hudi 0.5.1 version where we have 
upgraded to Spark 2.4.4 and Avro 1.8.2.  Please reopen if you are still seeing 
this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1313: [SUPPORT]used in test inveroment

2020-03-06 Thread GitBox

bvaradar commented on issue #1313: [SUPPORT]used in test inveroment
URL: https://github.com/apache/incubator-hudi/issues/1313#issuecomment-595984777
 
 
   Closing this due to inactivity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #1313: [SUPPORT]used in test inveroment

2020-03-06 Thread GitBox

bvaradar closed issue #1313: [SUPPORT]used in test inveroment
URL: https://github.com/apache/incubator-hudi/issues/1313
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1325: presto - querying nested object in parquet file created by hudi

2020-03-06 Thread GitBox

bvaradar commented on issue #1325: presto - querying nested object in parquet 
file created by hudi
URL: https://github.com/apache/incubator-hudi/issues/1325#issuecomment-595982426
 
 
   Thanks @bhasudha . Assigning the ticket to you as you are leading the 
investigation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1342: [SUPPORT] do cow tables need to be converted when changing from hoodie to hudi?

2020-03-06 Thread GitBox

bvaradar commented on issue #1342: [SUPPORT] do cow tables need to be converted 
when changing from hoodie to hudi?
URL: https://github.com/apache/incubator-hudi/issues/1342#issuecomment-595978595
 
 
   Closing this ticket as it is answered.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #1342: [SUPPORT] do cow tables need to be converted when changing from hoodie to hudi?

2020-03-06 Thread GitBox

bvaradar closed issue #1342: [SUPPORT] do cow tables need to be converted when 
changing from hoodie to hudi?
URL: https://github.com/apache/incubator-hudi/issues/1342
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1359: [SUPPORT] handle partition value containing colon ?

2020-03-06 Thread GitBox

bvaradar commented on issue #1359: [SUPPORT] handle partition value containing 
colon ?
URL: https://github.com/apache/incubator-hudi/issues/1359#issuecomment-595978051
 
 
   @tooptoop4 : Pinging to see if this is still an issue. If not, we can close 
this ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1375: [SUPPORT] HoodieDeltaStreamer offset not handled correctly

2020-03-06 Thread GitBox

bvaradar commented on issue #1375: [SUPPORT] HoodieDeltaStreamer offset not 
handled correctly
URL: https://github.com/apache/incubator-hudi/issues/1375#issuecomment-595962932
 
 
   Jira : https://jira.apache.org/jira/browse/HUDI-669


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1375: [SUPPORT] HoodieDeltaStreamer offset not handled correctly

2020-03-06 Thread GitBox

bvaradar commented on issue #1375: [SUPPORT] HoodieDeltaStreamer offset not 
handled correctly
URL: https://github.com/apache/incubator-hudi/issues/1375#issuecomment-595961459
 
 
   @eigakow : Can you try the below patch  to see if the issue goes away. If 
you can confirm this works, I can put a PR to fix in the next version.
   
   diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
   index 4ad88556..2989f200 100644
   --- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
   +++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
   @@ -180,7 +180,7 @@ public class KafkaOffsetGen {
  .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());

  // Determine the offset ranges to read from
   -  if (lastCheckpointStr.isPresent()) {
   +  if (lastCheckpointStr.isPresent() && 
!lastCheckpointStr.get().isEmpty()) {
fromOffsets = checkupValidOffsets(consumer, lastCheckpointStr, 
topicPartitions);
  } else {
KafkaResetOffsetStrategies autoResetValue = 
KafkaResetOffsetStrategies
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar edited a comment on issue #1375: [SUPPORT] HoodieDeltaStreamer offset not handled correctly

2020-03-06 Thread GitBox

bvaradar edited a comment on issue #1375: [SUPPORT] HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/issues/1375#issuecomment-595961459
 
 
   @eigakow : Can you try the below patch  to see if the issue goes away. If 
you can confirm this works, I can put a PR to fix in the next version.
   
   ```
   diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
   index 4ad88556..2989f200 100644
   --- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
   +++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
   @@ -180,7 +180,7 @@ public class KafkaOffsetGen {
  .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());

  // Determine the offset ranges to read from
   -  if (lastCheckpointStr.isPresent()) {
   +  if (lastCheckpointStr.isPresent() && 
!lastCheckpointStr.get().isEmpty()) {
fromOffsets = checkupValidOffsets(consumer, lastCheckpointStr, 
topicPartitions);
  } else {
KafkaResetOffsetStrategies autoResetValue = 
KafkaResetOffsetStrategies
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] xushiyan commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-06 Thread GitBox

xushiyan commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot 
Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-595927116
 
 
   > @xushiyan could you please make one pass and resolve the comments that are 
already addressed..
   
   @vinothchandar I'm not able to resolve comments, the "resolve" buttons are 
not shown to me. guess it's due to permission :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-06 Thread GitBox

garyli1019 commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r389089300
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   Right. As you mentioned it is still possible that some wrong user behaviors 
might lead to an empty checkpoint. From a user perspective, I'd say if there is 
an empty checkpoint in the last commit, I will prefer to let the job fail other 
than automatically reset the checkpoint. Throw an exception if the checkpoint 
is empty would make more sense to me and let the user decide whether they wanna 
reset or not. WDYT? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1376: Problem Sync Hudi table with Hive

2020-03-06 Thread GitBox

bvaradar commented on issue #1376: Problem  Sync Hudi table with Hive
URL: https://github.com/apache/incubator-hudi/issues/1376#issuecomment-595913815
 
 
   @lamber-ken : Regarding HoodieCombineHiveInputFormat not working with 
Hive-2.1.0, Is there a jira for it. If not, can you create one ? 
   
   But this issue looks like it is unrelated to HoodieCombineHiveInputFormat 
though. 
   
   @melkimohamed : As @lamber-ken  mentioned, can you try with Spark-2.4.4 
runtime and see if you are encountering any issues. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha edited a comment on issue #1378: [HUDI-590] Cut doc version for 0.5.1 and update site

2020-03-06 Thread GitBox

bhasudha edited a comment on issue #1378: [HUDI-590] Cut doc version for 0.5.1 
and update site
URL: https://github.com/apache/incubator-hudi/pull/1378#issuecomment-595913035
 
 
   @leesf @vc Thanks for the reviews. Updated the 
https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+%28incubating%29+-+Release+Guide
  with the section `Steps to cut doc version and update website` as a last step 
in  `Finalize the release` 
   
   Also I have squashed the commits into one. I can merge if you are okay with 
the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on issue #1378: [HUDI-590] Cut doc version for 0.5.1 and update site

2020-03-06 Thread GitBox

bhasudha commented on issue #1378: [HUDI-590] Cut doc version for 0.5.1 and 
update site
URL: https://github.com/apache/incubator-hudi/pull/1378#issuecomment-595913035
 
 
   @leesf @vc Thanks for the reviews. Updated the 
https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+%28incubating%29+-+Release+Guide
  with the section `Steps to cut doc version and update website` just after 
Finalizing release. 
   
   Also I have squashed the commits into one. I can merge if you are okay with 
the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Assigned] (HUDI-638) Fix the asf compliant issues based on the maturity model

2020-03-06 Thread Suneel Marthi (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned HUDI-638:
--

Assignee: Suneel Marthi

> Fix the asf compliant issues based on the maturity model
> 
>
> Key: HUDI-638
> URL: https://issues.apache.org/jira/browse/HUDI-638
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release  Administrative
>Reporter: vinoyang
>Assignee: Suneel Marthi
>Priority: Blocker
> Fix For: 0.5.2
>
>
> The Hudi's maturity model link is here: 
> https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Maturity+Matrix
> We should fix all of the compliant issues ASAP before releasing Hudi 0.5.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-638) Fix the asf compliant issues based on the maturity model

2020-03-06 Thread Suneel Marthi (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-638:
---
Status: Open  (was: New)

> Fix the asf compliant issues based on the maturity model
> 
>
> Key: HUDI-638
> URL: https://issues.apache.org/jira/browse/HUDI-638
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release  Administrative
>Reporter: vinoyang
>Assignee: Suneel Marthi
>Priority: Blocker
> Fix For: 0.5.2
>
>
> The Hudi's maturity model link is here: 
> https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Maturity+Matrix
> We should fix all of the compliant issues ASAP before releasing Hudi 0.5.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-638) Fix the asf compliant issues based on the maturity model

2020-03-06 Thread Suneel Marthi (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-638:
---
Status: In Progress  (was: Open)

> Fix the asf compliant issues based on the maturity model
> 
>
> Key: HUDI-638
> URL: https://issues.apache.org/jira/browse/HUDI-638
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release  Administrative
>Reporter: vinoyang
>Assignee: Suneel Marthi
>Priority: Blocker
> Fix For: 0.5.2
>
>
> The Hudi's maturity model link is here: 
> https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Maturity+Matrix
> We should fix all of the compliant issues ASAP before releasing Hudi 0.5.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-581) NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-03-06 Thread Suneel Marthi (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-581:
---
Status: In Progress  (was: Open)

> NOTICE need more work as it missing content form included 3rd party ALv2 
> licensed NOTICE files
> --
>
> Key: HUDI-581
> URL: https://issues.apache.org/jira/browse/HUDI-581
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: Suneel Marthi
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Issues pointed out in general@incubator ML, more context here: 
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer

2020-03-06 Thread GitBox

garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint 
from previous commits in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-595908998
 
 
   > Let me catch up on this discussion and circle back.. :)
   > 
   > Just one high level question (apologies if its already answered above).
   > 
   > why can't we use the checkpoint reset flag, if one-time manual restarts 
are needed for deltastreamer? is it because its hard to compute that?
   
   Right. I need a robust way to generate the checkpoint from 
kafka-connect-hdfs managed files and kafka-connect itself sometimes having an 
issue to retrieve checkpoint when the Kafka partition number was large. The 
mechanism is to scan every single file and get the latest checkpoint of each 
Kafka partition. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-06 Thread GitBox

vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-595907417
 
 
   @garyli1019 can you drive this review? :). I can merge once I have a   from 
you


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-06 Thread GitBox

vinothchandar commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset 
Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-595903576
 
 
   @xushiyan could you please make one pass and resolve the comments that are 
already addressed..  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer

2020-03-06 Thread GitBox

vinothchandar commented on issue #1362: HUDI-644 Enable user to get checkpoint 
from previous commits in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-595903147
 
 
   Let me catch up on this discussion and circle back..  :) 
   
   Just one high level question (apologies if its already answered above).
   
   why can't we use the checkpoint reset flag, if one-time manual restarts are 
needed for deltastreamer?  is it because its hard to compute that? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (HUDI-656) Write Performance - Driver spends too much time creating Parquet DataSource after writes

2020-03-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-656:

Fix Version/s: 0.6.0

> Write Performance - Driver spends too much time creating Parquet DataSource 
> after writes
> 
>
> Key: HUDI-656
> URL: https://issues.apache.org/jira/browse/HUDI-656
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Performance, Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.6.0
>
>
> h2. Problem Statement
> We have noticed this performance bottleneck at EMR, and it has been reported 
> here as well [https://github.com/apache/incubator-hudi/issues/1371]
> Hudi for writes through DataSource API uses 
> [this|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L85]
>  to create the spark relation. Here it uses HoodieSparkSqlWriter to write the 
> dataframe and after it tries to 
> [return|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L92]
>  a relation by creating it through parquet data source 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L72]
> In the process of creating this parquet data source, Spark creates an 
> *InMemoryFileIndex* 
> [here|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L371]
>  as part of which it performs file listing of the base path. While the 
> listing itself is 
> [parallelized|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L289],
>  the filter that we pass which is *HoodieROTablePathFilter* is applied 
> [sequentially|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L294]
>  on the driver side on all the 1000s of files returned during listing. This 
> part is not parallelized by spark, and it takes a lot of time probably 
> because of the filters logic. This causes the driver to just spend time 
> filtering. We have seen it take 10-12 minutes to do this process for just 50 
> partitions in S3, and this time is spent after the writing has finished.
> Solving this will significantly reduce the writing time across all sorts of 
> writes. This time is essentially getting wasted, because we do not really 
> have to return a relation after the write. This relation is never really used 
> by Spark either ways 
> [here|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala#L45]
>  and writing process returns empty set of rows..
> h2. Proposed Solution
> Proposal is to return an Empty Spark relation after the write, which will cut 
> down all this unnecessary time spent to create a parquet relation that never 
> gets used.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] vinothchandar commented on issue #1371: [SUPPORT] Upsert for S3 Hudi dataset with large partitions takes a lot of time in writing

2020-03-06 Thread GitBox

vinothchandar commented on issue #1371: [SUPPORT] Upsert for S3 Hudi dataset 
with large partitions takes a lot of time in writing
URL: https://github.com/apache/incubator-hudi/issues/1371#issuecomment-595898387
 
 
   @abhaygupta3390  we will get this fixed in the next release.. @umehrot2 do 
you have a patch to share already? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2020-03-06 Thread GitBox

vinothchandar commented on issue #1128: [HUDI-453] Fix throw failed to archive 
commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-595897988
 
 
   cc @bvaradar @n3nash have you folks seen this at uber? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (HUDI-658) Make ClientUtils spark-free

2020-03-06 Thread Vinoth Chandar (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053659#comment-17053659
 ] 

Vinoth Chandar commented on HUDI-658:
-

May be.. but I have a concern with how we approaching this multi engine 
refactoring.. we need a top down design and probably proof of concept few 
things at the high level first, before we can go down the list to smaller items 
like this.. We cannot do this class by class IMO.. lets hash out a high level 
plan first? 

> Make ClientUtils spark-free
> ---
>
> Key: HUDI-658
> URL: https://issues.apache.org/jira/browse/HUDI-658
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> {{ClientUtils#createMetaClient}} require {{JavaSparkContext}} only for 
> getting the hadoop configuration obejct. We can pass the {{Configuration}} 
> object directly so that we can make {{ClientUtils}} spark-free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] vinothchandar merged pull request #1372: [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance chain

2020-03-06 Thread GitBox

vinothchandar merged pull request #1372: [HUDI-652] Decouple HoodieReadClient 
and AbstractHoodieClient to break the inheritance chain
URL: https://github.com/apache/incubator-hudi/pull/1372
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-hudi] branch master updated: [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance chain (#1372)

2020-03-06 Thread vinoth

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ee5b32f  [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient 
to break the inheritance chain (#1372)
ee5b32f is described below

commit ee5b32f5d4aa26e7fc58ccdae46935f063460920
Author: vinoyang 
AuthorDate: Sat Mar 7 01:59:35 2020 +0800

[HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to break the 
inheritance chain (#1372)


* Removed timeline server support
* Removed try-with-resource
---
 .../org/apache/hudi/client/HoodieReadClient.java   |  9 ++-
 .../apache/hudi/client/TestHoodieReadClient.java   | 63 -
 .../apache/hudi/table/TestMergeOnReadTable.java| 82 +++---
 .../hudi/table/compact/TestAsyncCompaction.java| 25 +++
 .../main/java/org/apache/hudi/DataSourceUtils.java |  3 +-
 5 files changed, 88 insertions(+), 94 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java 
b/hudi-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java
index e08ec34..33d661b 100644
--- a/hudi-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java
+++ b/hudi-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java
@@ -25,7 +25,6 @@ import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
-import org.apache.hudi.common.table.HoodieTimeline;
 import org.apache.hudi.common.util.CompactionUtils;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
@@ -46,6 +45,7 @@ import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SQLContext;
 import org.apache.spark.sql.types.StructType;
 
+import java.io.Serializable;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Set;
@@ -56,7 +56,7 @@ import scala.Tuple2;
 /**
  * Provides an RDD based API for accessing/filtering Hoodie tables, based on 
keys.
  */
-public class HoodieReadClient extends 
AbstractHoodieClient {
+public class HoodieReadClient implements 
Serializable {
 
   private static final Logger LOG = 
LogManager.getLogger(HoodieReadClient.class);
 
@@ -65,9 +65,9 @@ public class HoodieReadClient 
extends AbstractHoo
* basepath pointing to the table. Until, then just always assume a 
BloomIndex
*/
   private final transient HoodieIndex index;
-  private final HoodieTimeline commitTimeline;
   private HoodieTable hoodieTable;
   private transient Option sqlContextOpt;
+  private final transient JavaSparkContext jsc;
 
   /**
* @param basePath path to Hoodie table
@@ -108,12 +108,11 @@ public class HoodieReadClient extends AbstractHoo
*/
   public HoodieReadClient(JavaSparkContext jsc, HoodieWriteConfig clientConfig,
   Option timelineService) {
-super(jsc, clientConfig, timelineService);
+this.jsc = jsc;
 final String basePath = clientConfig.getBasePath();
 // Create a Hoodie table which encapsulated the commits and files visible
 HoodieTableMetaClient metaClient = new 
HoodieTableMetaClient(jsc.hadoopConfiguration(), basePath, true);
 this.hoodieTable = HoodieTable.getHoodieTable(metaClient, clientConfig, 
jsc);
-this.commitTimeline = 
metaClient.getCommitTimeline().filterCompletedInstants();
 this.index = HoodieIndex.createIndex(clientConfig, jsc);
 this.sqlContextOpt = Option.empty();
   }
diff --git 
a/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieReadClient.java 
b/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieReadClient.java
index c57da14..6329e08 100644
--- a/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieReadClient.java
+++ b/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieReadClient.java
@@ -96,8 +96,8 @@ public class TestHoodieReadClient extends 
TestHoodieClientBase {
*/
   private void testReadFilterExist(HoodieWriteConfig config,
   Function3, HoodieWriteClient, 
JavaRDD, String> writeFn) throws Exception {
-try (HoodieWriteClient writeClient = getHoodieWriteClient(config);
-HoodieReadClient readClient = 
getHoodieReadClient(config.getBasePath());) {
+try (HoodieWriteClient writeClient = getHoodieWriteClient(config);) {
+  HoodieReadClient readClient = getHoodieReadClient(config.getBasePath());
   String newCommitTime = writeClient.startCommit();
   List records = dataGen.generateInserts(newCommitTime, 100);
   JavaRDD recordsRDD = jsc.parallelize(records, 1);
@@ -113,37 +113,36 @@ public class TestHoodieReadClient extends 
TestHoodieClientBase {
   // Verify there are no errors
   assertNoWriteErrors(statuses);
 
-  try (HoodieReadClient anotherReadClient =

[GitHub] [incubator-hudi] vinothchandar commented on issue #1374: [HUDI-654] Rename hudi-hive to hudi-hive-sync

2020-03-06 Thread GitBox

vinothchandar commented on issue #1374: [HUDI-654] Rename hudi-hive to 
hudi-hive-sync
URL: https://github.com/apache/incubator-hudi/pull/1374#issuecomment-595887038
 
 
   we should also change `hudi-hive-bundle` to `hudi-hive-sync-bundle`? and 
need a PR to update the docs? 
   
   @lamber-ken @leesf ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

1 2 >

1 - 100 of 114 matches

Mail list logo