[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE and NOTICE files in hoodie child modules.

2019-10-02 Thread GitBox
bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE 
and NOTICE files in hoodie child modules. 
URL: https://github.com/apache/incubator-hudi/pull/935#discussion_r330868161
 
 

 ##
 File path: pom.xml
 ##
 @@ -166,9 +104,9 @@
   
 
   
-scm:git:g...@github.com:apache/incubator-hudi.git
-
scm:git:g...@github.com:apache/incubator-hudi.git
-g...@github.com:apache/incubator-hudi.git
+
scm:git:https://gitbox.apache.org/repos/asf/incubator-hudi.git
 
 Review comment:
   @tweise : To be compliant for the voting process, I looked into how the 
parent pom (apache-21.pom) is setup. The corresponding codebase for 
apache-21.pom is in github but they had similar setup : 
https://github.com/apache/maven-apache-parent/blob/master/pom.xml . 
   
   I was not aware of the implication of this change. I am happy to revert the 
scm configuration as we want developers to use github.  Will update the PR.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-64) Estimation of compression ratio & other dynamic storage knobs based on historical stats

2019-10-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943337#comment-16943337
 ] 

Vinoth Chandar commented on HUDI-64:


[~xleesf] [[~yanghua] any interest in picking this up?  I can help provide more 
context.. This is the second blocker to move towards Flink 

> Estimation of compression ratio & other dynamic storage knobs based on 
> historical stats
> ---
>
> Key: HUDI-64
> URL: https://issues.apache.org/jira/browse/HUDI-64
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management, Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Roughly along the likes of. [https://github.com/uber/hudi/issues/270] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-289) Implement a long running test for Hudi writing and querying end-end

2019-10-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943335#comment-16943335
 ] 

Vinoth Chandar commented on HUDI-289:
-

[~xleesf] [[~yanghua] any interest in picking this up? 

> Implement a long running test for Hudi writing and querying end-end
> ---
>
> Key: HUDI-289
> URL: https://issues.apache.org/jira/browse/HUDI-289
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We would need an equivalent of an end-end test which runs some workload for 
> few hours atleast, triggers various actions like commit, deltacopmmit, 
> rollback, compaction and ensures correctness of code before every release
> P.S: Learn from all the CSS issues managing compaction.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-253) DeltaStreamer should report nicer error messages for misconfigs

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-253:

Component/s: Usability

> DeltaStreamer should report nicer error messages for misconfigs
> ---
>
> Key: HUDI-253
> URL: https://issues.apache.org/jira/browse/HUDI-253
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer, Usability
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> e.g: 
> https://lists.apache.org/thread.html/4fdcdd7ba77a4f0366ec0e95f54298115fcc9567f6b0c9998f1b92b7@
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-260:

Component/s: Usability

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support, Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following 
> cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* 
> or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
> jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using 
> *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options 
> which is what EMR uses to configure external dependencies. Although we can 
> drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is 
> recommended because that folder is supposed to contain the jars coming from 
> spark. Extra dependencies from users side would be better off specified 
> through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #56

2019-10-02 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.16 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle

[jira] [Reopened] (HUDI-73) Support vanilla Avro Kafka Source in HoodieDeltaStreamer

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened HUDI-73:


> Support vanilla Avro Kafka Source in HoodieDeltaStreamer
> 
>
> Key: HUDI-73
> URL: https://issues.apache.org/jira/browse/HUDI-73
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Context : [https://github.com/uber/hudi/issues/597]
> Currently, Avro Kafka Source expects the installation to use Confluent 
> version with SchemaRegistry server running. We need to support the Kafka 
> installations which do not use Schema Registry by allowing 
> FileBasedSchemaProvider to be integrated to AvroKafkaSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-73) Support vanilla Avro Kafka Source in HoodieDeltaStreamer

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-73:
---
Status: Closed  (was: Patch Available)

> Support vanilla Avro Kafka Source in HoodieDeltaStreamer
> 
>
> Key: HUDI-73
> URL: https://issues.apache.org/jira/browse/HUDI-73
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Context : [https://github.com/uber/hudi/issues/597]
> Currently, Avro Kafka Source expects the installation to use Confluent 
> version with SchemaRegistry server running. We need to support the Kafka 
> installations which do not use Schema Registry by allowing 
> FileBasedSchemaProvider to be integrated to AvroKafkaSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-12) Upgrade Hudi to Spark 2.4

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-12:
---
Component/s: (was: Spark datasource)
 Usability

> Upgrade Hudi to Spark 2.4
> -
>
> Key: HUDI-12
> URL: https://issues.apache.org/jira/browse/HUDI-12
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Usability, Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/549



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables

2019-10-02 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal reassigned HUDI-258:


Assignee: Nishith Agarwal

> Hive Query engine not supporting join queries between RT and RO tables
> --
>
> Key: HUDI-258
> URL: https://issues.apache.org/jira/browse/HUDI-258
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Nishith Agarwal
>Priority: Major
>
> Description : 
> [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]
>  
> Root Cause: Hive is tracking getSplits calls by dataset basePath and does not 
> take INputFormatClass into account. Hence getSplits() is called only once. In 
> the case of RO and RT tables, they both have same dataset base-path but 
> differ in the InputFormatClass. Due to this, Hive join query is returning 
> weird results.
>  
> =
> The result of the demo is very strange
> (Step 6(a))
>  
> {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor_rt where  symbol = 'GOOG';
>  select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor where  symbol = 'GOOG';}}
> return as demo
> BUT!
>  
> {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b  
> on a.key=b.key where a.ts != b.ts
> ...
> ++---+---+--+
> | a.key  | a.ts  | b.ts  |
> ++---+---+--+
> ++---+---+--+}}
>  
> {{0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from 
> stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 
> 'GOOG_2018-08-31 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: 
> /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
> 2019-07-18 09:13:20 Starting to launch local task to process map join;  
> maximum memory = 477626368
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into 
> file: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
> 2019-07-18 09:13:21 Uploaded 1 File to: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
>  (317 bytes)
> 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
> +-+--+--+--+
> |a.key| a.ts | b.ts |
> +-+--+--+--+
> | GOOG_2018-08-31 10  | 2018-08-31 10:29:00  | 2018-08-31 10:29:00  |
> +-+--+--+--+
> 1 row selected (7.207 seconds)
> 0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from stock_ticks_mor 
> a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 
> 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: 
> /tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
> 2019-07-18 09:13:51 Starting to launch local task to process map join;  
> maximum memory = 477626368
> 

[jira] [Updated] (HUDI-13) Clarify whether the hoodie-hadoop-mr jars need to be rolled out across Hive cluster #553

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-13:
---
Component/s: Usability

> Clarify whether the hoodie-hadoop-mr jars need to be rolled out across Hive 
> cluster #553
> 
>
> Key: HUDI-13
> URL: https://issues.apache.org/jira/browse/HUDI-13
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Hive Integration, Usability
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/553



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-73) Support vanilla Avro Kafka Source in HoodieDeltaStreamer

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-73:
---
Component/s: (was: Incremental Pull)
 deltastreamer

> Support vanilla Avro Kafka Source in HoodieDeltaStreamer
> 
>
> Key: HUDI-73
> URL: https://issues.apache.org/jira/browse/HUDI-73
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Context : [https://github.com/uber/hudi/issues/597]
> Currently, Avro Kafka Source expects the installation to use Confluent 
> version with SchemaRegistry server running. We need to support the Kafka 
> installations which do not use Schema Registry by allowing 
> FileBasedSchemaProvider to be integrated to AvroKafkaSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-261) Failed to create deltacommit.inflight file, duplicate timestamp issue

2019-10-02 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-261:
---

Assignee: Balaji Varadarajan

> Failed to create deltacommit.inflight file, duplicate timestamp issue
> -
>
> Key: HUDI-261
> URL: https://issues.apache.org/jira/browse/HUDI-261
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Elon Azoulay
>Assignee: Balaji Varadarajan
>Priority: Major
>
> {{Hudi jobs started failing with }}
>  {{Found commits after time :20190916210221, please rollback greater commits 
> first}}
>  
> This occured after a "Failed to create deltacommit inflight file" exception:
> {code:bash}
> {{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
> Failed to upsert for commit time 20190916210221 at 
> org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) 
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: 
> org.apache.hudi.exception.HoodieIOException: Failed to create file 
> gs:///.hoodie/20190916210223.deltacommit.inflight at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:391)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:371)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveToInflight(HoodieActiveTimeline.java:359)
>  at 
> org.apache.hudi.HoodieWriteClient.saveWorkloadProfileMetadataToInflight(HoodieWriteClient.java:417)
>  at 
> org.apache.hudi.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:440)
>  at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:172) ... 
> 14 more Caused by: java.io.IOException: 
> com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
>  412 Precondition Failed { "code" : 412, "errors" : [
> { "domain" : "global", "location" : "If-Match", "locationType" : "header", 
> "message" : "Precondition Failed", "reason" : "conditionNotMet" }
> ], "message" : "Precondition Failed" } at 
> com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:367)
>  at 
> com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:238)
>  at java.nio.channels.Channels$1.close(Channels.java:178) at 
> java.io.FilterOutputStream.close(FilterOutputStream.java:159) at 
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:127)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hudi.common.io.storage.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:66)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:388)
>  ... 19 more Caused by: 
> com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
>  412 Precondition Failed}}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-251) JDBC incremental load to HUDI with DeltaStreamer

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-251:
---

Assignee: Taher Koitawala  (was: Vinoth Chandar)

> JDBC incremental load to HUDI with DeltaStreamer
> 
>
> Key: HUDI-251
> URL: https://issues.apache.org/jira/browse/HUDI-251
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Taher Koitawala
>Assignee: Taher Koitawala
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Mirroring RDBMS to HUDI is one of the most basic use cases of HUDI. Hence, 
> for such use cases, DeltaStreamer should provide inbuilt support.
> DeltaSteamer should accept something like jdbc-source.properties where users 
> can define the RDBMS connection properties along with a timestamp column and 
> an interval which allows users to express how frequently HUDI should check 
> with RDBMS data source for new inserts or updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-232) Implement sealing/unsealing for HoodieRecord class

2019-10-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943329#comment-16943329
 ] 

Vinoth Chandar commented on HUDI-232:
-

Please "Start progress" on the task once ready

> Implement sealing/unsealing for HoodieRecord class
> --
>
> Key: HUDI-232
> URL: https://issues.apache.org/jira/browse/HUDI-232
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> HoodieRecord class sometimes is modified to set the record location. We can 
> get into issues like HUDI-170 if the modification is misplaced. We need a 
> mechanism to seal the class and unseal for modification explicity.. Try to 
> modify in sealed state should throw an error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-251) JDBC incremental load to HUDI with DeltaStreamer

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-251:
---

Assignee: Vinoth Chandar  (was: Taher Koitawala)

> JDBC incremental load to HUDI with DeltaStreamer
> 
>
> Key: HUDI-251
> URL: https://issues.apache.org/jira/browse/HUDI-251
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Taher Koitawala
>Assignee: Vinoth Chandar
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Mirroring RDBMS to HUDI is one of the most basic use cases of HUDI. Hence, 
> for such use cases, DeltaStreamer should provide inbuilt support.
> DeltaSteamer should accept something like jdbc-source.properties where users 
> can define the RDBMS connection properties along with a timestamp column and 
> an interval which allows users to express how frequently HUDI should check 
> with RDBMS data source for new inserts or updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-130) Paths written in compaction plan needs to be relative to base-path

2019-10-02 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-130:
---

Assignee: Balaji Varadarajan

> Paths written in compaction plan needs to be relative to base-path
> --
>
> Key: HUDI-130
> URL: https://issues.apache.org/jira/browse/HUDI-130
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Minor
>
> The paths stored in compation plan are all absolute. They need to be changed 
> to relative path.
> The challene would be to handle cases when bot version of compaction plans 
> are present and needs to be processed  (backwards compatibility)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-64) Estimation of compression ratio & other dynamic storage knobs based on historical stats

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-64:
--

Assignee: Vinoth Chandar

> Estimation of compression ratio & other dynamic storage knobs based on 
> historical stats
> ---
>
> Key: HUDI-64
> URL: https://issues.apache.org/jira/browse/HUDI-64
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management, Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Roughly along the likes of. [https://github.com/uber/hudi/issues/270] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-57) Support for writing ORC base files

2019-10-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943323#comment-16943323
 ] 

Vinoth Chandar commented on HUDI-57:


[~ambition119] are you still interested to work on it.. 

> Support for writing ORC base files
> --
>
> Key: HUDI-57
> URL: https://issues.apache.org/jira/browse/HUDI-57
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Hive Integration, Write Client
>Reporter: Vinoth Chandar
>Assignee: pingle wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/uber/hudi/issues/68]
> https://github.com/uber/hudi/issues/155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-53) Implement Global Index to map a record key to a pair #90

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-53?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-53:
--

Assignee: Vinoth Chandar

> Implement Global Index to map a record key to a  pair 
> #90
> -
>
> Key: HUDI-53
> URL: https://issues.apache.org/jira/browse/HUDI-53
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://github.com/uber/hudi/issues/90] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-46) Investigate possibility of logging data blocks in parquet format #369

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-46:
--

Assignee: Vinoth Chandar

> Investigate possibility of logging data blocks in parquet format #369
> -
>
> Key: HUDI-46
> URL: https://issues.apache.org/jira/browse/HUDI-46
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Storage Management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/369



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-253) DeltaStreamer should report nicer error messages for misconfigs

2019-10-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943322#comment-16943322
 ] 

Vinoth Chandar commented on HUDI-253:
-

[~Pratyaksh] are you still working on this 

> DeltaStreamer should report nicer error messages for misconfigs
> ---
>
> Key: HUDI-253
> URL: https://issues.apache.org/jira/browse/HUDI-253
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> e.g: 
> https://lists.apache.org/thread.html/4fdcdd7ba77a4f0366ec0e95f54298115fcc9567f6b0c9998f1b92b7@
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-12) Upgrade Hudi to Spark 2.4

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-12:
--

Assignee: Udit Mehrotra  (was: omkar vinit joshi)

> Upgrade Hudi to Spark 2.4
> -
>
> Key: HUDI-12
> URL: https://issues.apache.org/jira/browse/HUDI-12
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark datasource, Write Client
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/549



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-12) Upgrade Hudi to Spark 2.4

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-12:
--

Assignee: Vinoth Chandar  (was: Udit Mehrotra)

> Upgrade Hudi to Spark 2.4
> -
>
> Key: HUDI-12
> URL: https://issues.apache.org/jira/browse/HUDI-12
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark datasource, Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/549



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-7) Guidance for hoodie-spark + hive sync to CDH #578

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar closed HUDI-7.
-
Resolution: Fixed

> Guidance for hoodie-spark + hive sync to CDH #578
> -
>
> Key: HUDI-7
> URL: https://issues.apache.org/jira/browse/HUDI-7
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Spark datasource
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/578



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-15) Add a delete() API to HoodieWriteClient as well as Spark datasource #531

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-15:
--

Assignee: Vinoth Chandar

> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> 
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark datasource, Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/531



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-15) Add a delete() API to HoodieWriteClient as well as Spark datasource #531

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-15:
--

Assignee: Bhavani Sudha Saktheeswaran  (was: Vinoth Chandar)

> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> 
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark datasource, Write Client
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>
> https://github.com/uber/hudi/issues/531



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-10-02 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943318#comment-16943318
 ] 

Vinoth Chandar commented on HUDI-260:
-

[~uditme] assigning back to your for now, to tell me if this is still neede.. 
FWIW even iceberg suggests to integrate by dropping the jar into jars folders/ 

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following 
> cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* 
> or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
> jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using 
> *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options 
> which is what EMR uses to configure external dependencies. Although we can 
> drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is 
> recommended because that folder is supposed to contain the jars coming from 
> spark. Extra dependencies from users side would be better off specified 
> through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-260:
---

Assignee: Udit Mehrotra  (was: Vinoth Chandar)

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following 
> cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* 
> or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
> jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using 
> *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options 
> which is what EMR uses to configure external dependencies. Although we can 
> drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is 
> recommended because that folder is supposed to contain the jars coming from 
> spark. Extra dependencies from users side would be better off specified 
> through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-56) Dynamically configure the number of entries in BloomFilter index based on size of the record #70

2019-10-02 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal closed HUDI-56.
---
Resolution: Duplicate

> Dynamically configure the number of entries in BloomFilter index based on 
> size of the record #70
> 
>
> Key: HUDI-56
> URL: https://issues.apache.org/jira/browse/HUDI-56
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management, Usability
>Reporter: Vinoth Chandar
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: realtime-data-lakes
>
> https://github.com/uber/hudi/issues/70



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-56) Dynamically configure the number of entries in BloomFilter index based on size of the record #70

2019-10-02 Thread Nishith Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943316#comment-16943316
 ] 

Nishith Agarwal commented on HUDI-56:
-

Closing this in favor of : 
[https://issues.apache.org/jira/projects/HUDI/issues/HUDI-106?filter=allopenissues]

> Dynamically configure the number of entries in BloomFilter index based on 
> size of the record #70
> 
>
> Key: HUDI-56
> URL: https://issues.apache.org/jira/browse/HUDI-56
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management, Usability
>Reporter: Vinoth Chandar
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: realtime-data-lakes
>
> https://github.com/uber/hudi/issues/70



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-10-02 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-288:
---

 Summary: Add support for ingesting multiple kafka streams in a 
single DeltaStreamer deployment
 Key: HUDI-288
 URL: https://issues.apache.org/jira/browse/HUDI-288
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: deltastreamer
Reporter: Vinoth Chandar


https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
 has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] tweise commented on a change in pull request #935: [HUDI-287] Remove LICENSE and NOTICE files in hoodie child modules.

2019-10-02 Thread GitBox
tweise commented on a change in pull request #935: [HUDI-287] Remove LICENSE 
and NOTICE files in hoodie child modules. 
URL: https://github.com/apache/incubator-hudi/pull/935#discussion_r330853682
 
 

 ##
 File path: pom.xml
 ##
 @@ -166,9 +104,9 @@
   
 
   
-scm:git:g...@github.com:apache/incubator-hudi.git
-
scm:git:g...@github.com:apache/incubator-hudi.git
-g...@github.com:apache/incubator-hudi.git
+
scm:git:https://gitbox.apache.org/repos/asf/incubator-hudi.git
 
 Review comment:
   Why this change? Don't you want contributors to work with the github repo?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE and NOTICE files in hoodie child modules.

2019-10-02 Thread GitBox
bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE 
and NOTICE files in hoodie child modules. 
URL: https://github.com/apache/incubator-hudi/pull/935#discussion_r330746584
 
 

 ##
 File path: pom.xml
 ##
 @@ -67,67 +67,8 @@
 https://www.apache.org
   
 
-  
-
-  vinothchandar
-  Vinoth Chandar
-  Confluent Inc
-
-
-  prasannarajaperumal
-  Prasanna Rajaperumal
-  Snowflake
-
-
-  n3nash
-  Nishith Agarwal
-  Uber
-
-
-  bvaradar
-  Balaji Varadharajan
-  Uber
-
-  
-
-  
-
-  Wei Yan
-  Uber
-
-
-  Siddhartha Gunda
-  Uber
-
-
-  Omkar Joshi
-  Uber
-
-
-  Zeeshan Qureshi
-  Shopify
-
-
-  Kathy Ge
-  Shopify
-
-
-  Kaushik Devarajaiah
-  Uber
-
-
-  Anbu Cheeralan
-  DoubleVerify
-
-
-  Jiale Tan
-  Vungle
-
-  
-
   2016
 
 Review comment:
   @vinothchandar : Took a look at the parent apache pom 
(https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml) and made 
changes accordingly to hoodie parent pom


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE and NOTICE files in hoodie child modules.

2019-10-02 Thread GitBox
bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE 
and NOTICE files in hoodie child modules. 
URL: https://github.com/apache/incubator-hudi/pull/935#discussion_r330738298
 
 

 ##
 File path: pom.xml
 ##
 @@ -67,67 +67,8 @@
 https://www.apache.org
   
 
-  
-
-  vinothchandar
-  Vinoth Chandar
-  Confluent Inc
-
-
-  prasannarajaperumal
-  Prasanna Rajaperumal
-  Snowflake
-
-
-  n3nash
-  Nishith Agarwal
-  Uber
-
-
-  bvaradar
-  Balaji Varadharajan
-  Uber
-
-  
-
-  
-
-  Wei Yan
-  Uber
-
-
-  Siddhartha Gunda
-  Uber
-
-
-  Omkar Joshi
-  Uber
-
-
-  Zeeshan Qureshi
-  Shopify
-
-
-  Kathy Ge
-  Shopify
-
-
-  Kaushik Devarajaiah
-  Uber
-
-
-  Anbu Cheeralan
-  DoubleVerify
-
-
-  Jiale Tan
-  Vungle
-
-  
-
   2016
 
 Review comment:
   Will make a pass for others


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE and NOTICE files in hoodie child modules.

2019-10-02 Thread GitBox
bvaradar commented on a change in pull request #935: [HUDI-287] Remove LICENSE 
and NOTICE files in hoodie child modules. 
URL: https://github.com/apache/incubator-hudi/pull/935#discussion_r330735638
 
 

 ##
 File path: pom.xml
 ##
 @@ -67,67 +67,8 @@
 https://www.apache.org
   
 
-  
-
-  vinothchandar
-  Vinoth Chandar
-  Confluent Inc
-
-
-  prasannarajaperumal
-  Prasanna Rajaperumal
-  Snowflake
-
-
-  n3nash
-  Nishith Agarwal
-  Uber
-
-
-  bvaradar
-  Balaji Varadharajan
-  Uber
-
-  
-
-  
-
-  Wei Yan
-  Uber
-
-
-  Siddhartha Gunda
-  Uber
-
-
-  Omkar Joshi
-  Uber
-
-
-  Zeeshan Qureshi
-  Shopify
-
-
-  Kathy Ge
-  Shopify
-
-
-  Kaushik Devarajaiah
-  Uber
-
-
-  Anbu Cheeralan
-  DoubleVerify
-
-
-  Jiale Tan
-  Vungle
-
-  
-
   2016
 
 Review comment:
   @vinothchandar : Saw this in flink 
https://github.com/apache/flink/blob/master/pom.xml#L36. thats why left it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #935: [HUDI-287] Remove LICENSE and NOTICE files in hoodie child modules.

2019-10-02 Thread GitBox
vinothchandar commented on a change in pull request #935: [HUDI-287] Remove 
LICENSE and NOTICE files in hoodie child modules. 
URL: https://github.com/apache/incubator-hudi/pull/935#discussion_r330730814
 
 

 ##
 File path: pom.xml
 ##
 @@ -67,67 +67,8 @@
 https://www.apache.org
   
 
-  
-
-  vinothchandar
-  Vinoth Chandar
-  Confluent Inc
-
-
-  prasannarajaperumal
-  Prasanna Rajaperumal
-  Snowflake
-
-
-  n3nash
-  Nishith Agarwal
-  Uber
-
-
-  bvaradar
-  Balaji Varadharajan
-  Uber
-
-  
-
-  
-
-  Wei Yan
-  Uber
-
-
-  Siddhartha Gunda
-  Uber
-
-
-  Omkar Joshi
-  Uber
-
-
-  Zeeshan Qureshi
-  Shopify
-
-
-  Kathy Ge
-  Shopify
-
-
-  Kaushik Devarajaiah
-  Uber
-
-
-  Anbu Cheeralan
-  DoubleVerify
-
-
-  Jiale Tan
-  Vungle
-
-  
-
   2016
 
 Review comment:
   remove this too?  can you go line-by-line and see whats left? :) thats the 
only way to clean this up 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-287) Remove LICENSE and NOTICE files in hoodie child modules

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-287:

Labels: pull-request-available  (was: )

> Remove LICENSE and NOTICE files in hoodie child modules
> ---
>
> Key: HUDI-287
> URL: https://issues.apache.org/jira/browse/HUDI-287
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: asf-migration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>
> This was earlier added to ensure LICENSE and NOTICE files are present in 
> generated jars. In the earlier pom setup, hudi parent-pom was not linked to 
> apache parent pom. There were no "generate-resource-bundle" plugin present in 
> parent hudi pom to automatically generate LICENSE and NOTICE files in jars 
> and we resorted to manually storing the LICENSE/NOTICE files in each 
> submodule 
>  
> With Apache parent pom, the NOTICE and LICENSE files are automatically setup 
> for each jar added.  This would mean that we can safely remove all LICENSE 
> and NOTICE files in each submodules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar opened a new pull request #935: [HUDI-287] Remove LICENSE and NOTICE files in hoodie child modules.

2019-10-02 Thread GitBox
bvaradar opened a new pull request #935: [HUDI-287] Remove LICENSE and NOTICE 
files in hoodie child modules. 
URL: https://github.com/apache/incubator-hudi/pull/935
 
 
   Changes:  
   1. Remove LICENSE and NOTICE files in hoodie child modules
   2. Also remove developers and contributor section from pom
   
   Jira: https://jira.apache.org/jira/projects/HUDI/issues/HUDI-287
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-287) Remove LICENSE and NOTICE files in hoodie child modules

2019-10-02 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-287:
---

Assignee: Balaji Varadarajan

> Remove LICENSE and NOTICE files in hoodie child modules
> ---
>
> Key: HUDI-287
> URL: https://issues.apache.org/jira/browse/HUDI-287
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: asf-migration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.0
>
>
> This was earlier added to ensure LICENSE and NOTICE files are present in 
> generated jars. In the earlier pom setup, hudi parent-pom was not linked to 
> apache parent pom. There were no "generate-resource-bundle" plugin present in 
> parent hudi pom to automatically generate LICENSE and NOTICE files in jars 
> and we resorted to manually storing the LICENSE/NOTICE files in each 
> submodule 
>  
> With Apache parent pom, the NOTICE and LICENSE files are automatically setup 
> for each jar added.  This would mean that we can safely remove all LICENSE 
> and NOTICE files in each submodules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-287) Remove LICENSE and NOTICE files in hoodie child modules

2019-10-02 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-287:
---

 Summary: Remove LICENSE and NOTICE files in hoodie child modules
 Key: HUDI-287
 URL: https://issues.apache.org/jira/browse/HUDI-287
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: asf-migration
Reporter: Balaji Varadarajan
 Fix For: 0.5.0


This was earlier added to ensure LICENSE and NOTICE files are present in 
generated jars. In the earlier pom setup, hudi parent-pom was not linked to 
apache parent pom. There were no "generate-resource-bundle" plugin present in 
parent hudi pom to automatically generate LICENSE and NOTICE files in jars and 
we resorted to manually storing the LICENSE/NOTICE files in each submodule 

 

With Apache parent pom, the NOTICE and LICENSE files are automatically setup 
for each jar added.  This would mean that we can safely remove all LICENSE and 
NOTICE files in each submodules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] tweise commented on a change in pull request #918: [HUDI-121] : Address comments during RC2 voting

2019-10-02 Thread GitBox
tweise commented on a change in pull request #918: [HUDI-121] : Address 
comments during RC2 voting
URL: https://github.com/apache/incubator-hudi/pull/918#discussion_r330615602
 
 

 ##
 File path: packaging/hudi-spark-bundle/src/main/resources/META-INF/LICENSE
 ##
 @@ -0,0 +1,177 @@
+
 
 Review comment:
   Why repeat LICENSE in src/main/resources?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #928: [HUDI-265] Failed to delete tmp dirs created in unit tests

2019-10-02 Thread GitBox
vinothchandar commented on issue #928: [HUDI-265] Failed to delete tmp dirs 
created in unit tests
URL: https://github.com/apache/incubator-hudi/pull/928#issuecomment-537481270
 
 
   Will merge once CI passes. thanks for doing this @leesf ! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-282) Update documentation to reflect additional option of HiveSync via metastore

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-282:

Labels: pull-request-available  (was: )

> Update documentation to reflect additional option of HiveSync via metastore
> ---
>
> Key: HUDI-282
> URL: https://issues.apache.org/jira/browse/HUDI-282
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar merged pull request #931: HUDI-282 : Updating hive sync tool docs for hive metastore based operations

2019-10-02 Thread GitBox
vinothchandar merged pull request #931: HUDI-282 : Updating hive sync tool docs 
for hive metastore based operations
URL: https://github.com/apache/incubator-hudi/pull/931
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: [HUDI-282] Updating hive sync tool docs for hive metastore based operations (#931)

2019-10-02 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new d0ce979  [HUDI-282] Updating hive sync tool docs for hive metastore 
based operations (#931)
d0ce979 is described below

commit d0ce979f63b41a841b7fe35ddeed0ee0459fe126
Author: n3nash 
AuthorDate: Wed Oct 2 05:10:31 2019 -0700

[HUDI-282] Updating hive sync tool docs for hive metastore based operations 
(#931)
---
 docs/writing_data.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/writing_data.md b/docs/writing_data.md
index c727266..37bc0c9 100644
--- a/docs/writing_data.md
+++ b/docs/writing_data.md
@@ -174,6 +174,8 @@ Usage:  [options]
Default: false
   * --jdbc-url
Hive jdbc connect url
+  * --use-jdbc
+   Whether to use jdbc connection or hive metastore (via thrift)
   * --pass
Hive password
   * --table



[GitHub] [incubator-hudi] vinothchandar commented on issue #932: HUDI-160 : Removing cdh support and hive 1.x mentions

2019-10-02 Thread GitBox
vinothchandar commented on issue #932: HUDI-160 : Removing cdh support and hive 
1.x mentions
URL: https://github.com/apache/incubator-hudi/pull/932#issuecomment-537458998
 
 
   @n3nash please include ticket number in the commit message (in addition to 
PR)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: [HUDI-160] removing cdh support and hive 1.x mentions (#932)

2019-10-02 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e042aea  [HUDI-160] removing cdh support and hive 1.x mentions (#932)
e042aea is described below

commit e042aea837412b461b02f8062d8a11e8029823bf
Author: n3nash 
AuthorDate: Wed Oct 2 04:57:08 2019 -0700

[HUDI-160] removing cdh support and hive 1.x mentions (#932)
---
 docs/quickstart.md | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/docs/quickstart.md b/docs/quickstart.md
index 97db2fa..5007442 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -22,10 +22,8 @@ Check out [code](https://github.com/apache/incubator-hudi) 
and normally build th
 $ mvn clean install -DskipTests -DskipITs
 ```
 
-To work with older version of Hive (pre Hive-1.2.1), use
-```
-$ mvn clean install -DskipTests -DskipITs -Dhive11
-```
+Hudi works with Hive 2.3.x or higher versions. As long as Hive 2.x protocol 
can talk to Hive 1.x, you can use Hudi to 
+talk to older hive versions.
 
 For IDE, you can pull in the code into IntelliJ as a normal maven project. 
 You might want to add your spark jars folder to project dependencies under 
'Module Setttings', to be able to run from IDE.
@@ -38,9 +36,7 @@ Further, we have verified that Hudi works with the following 
combination of Hado
 
 | Hadoop | Hive  | Spark | Instructions to Build Hudi |
 |  | - |  |  |
-| 2.6.0-cdh5.7.2 | 1.1.0-cdh5.7.2 | spark-2.[1-3].x | Use “mvn clean install 
-DskipTests -Dhadoop.version=2.6.0-cdh5.7.2 -Dhive.version=1.1.0-cdh5.7.2” |
-| Apache hadoop-2.8.4 | Apache hive-2.3.3 | spark-2.[1-3].x | Use "mvn clean 
install -DskipTests" |
-| Apache hadoop-2.7.3 | Apache hive-1.2.1 | spark-2.[1-3].x | Use "mvn clean 
install -DskipTests" |
+| Apache hadoop-2.[7-8].x | Apache hive-2.3.[1-3] | spark-2.[1-3].x | Use "mvn 
clean install -DskipTests" |
 
 If your environment has other versions of hadoop/hive/spark, please try out 
Hudi 
 and let us know if there are any issues. 



[GitHub] [incubator-hudi] leesf commented on issue #928: [HUDI-265] Failed to delete tmp dirs created in unit tests

2019-10-02 Thread GitBox
leesf commented on issue #928: [HUDI-265] Failed to delete tmp dirs created in 
unit tests
URL: https://github.com/apache/incubator-hudi/pull/928#issuecomment-537423776
 
 
   @vinothchandar Updated the PR to address the comments.
   
   _1. Introduce HoodieCommonTestHarness.
   2. Make HoodieClientTestHarness extends HoodieCommonTestHarness and move 
getTableType to HoodieCommonTestHarness. 
   3. Make some UTs in common module extend HoodieCommonTestHarness._
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #933: Support for multiple level partitioning in Hudi

2019-10-02 Thread GitBox
HariprasadAllaka1612 commented on issue #933: Support for multiple level 
partitioning in Hudi
URL: https://github.com/apache/incubator-hudi/issues/933#issuecomment-537360583
 
 
   I found the way to do this, For anyone's reference this can be achieved by
   
   1. Use org.apache.hudi.ComplexKeyGenerator as key generator class instead of 
SimpleKeyGenerator.
   2. Provide the fields that you want to partition based on as comma separated 
string as PARITION_FIELD_OPT_KEY
   
   Reference : 
   
https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/ComplexKeyGenerator.java#L42


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 closed issue #933: Support for multiple level partitioning in Hudi

2019-10-02 Thread GitBox
HariprasadAllaka1612 closed issue #933: Support for multiple level partitioning 
in Hudi
URL: https://github.com/apache/incubator-hudi/issues/933
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services