[GitHub] [incubator-hudi] vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2019-09-20 Thread GitBox
vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-533596837
 
 
   Got it. To run from IDE what I do is to add spark jars to the class path of 
my module in IntelliJ. You dont have to mess with your sbt and bring in avro 
etc . The idea here is that when you actually submit your application using 
spark-submit then these jars are there already. 
   
   On transitive dependencies it would be based.  on scope . Avro, parquet etc 
are in provided scope and so the expectation is that they are supplied by the 
actual runtime (IDE or spark-submit). This way we can keep hudi thinner and be 
able to support multiples spark hive hadoop versions


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2019-09-17 Thread GitBox
vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-532347935
 
 
   >>Am I supposed to add hudi-hive jars separately?
   No.. its all there in the bundle
   
   Some general context . if you are writing a spark job, its better to just 
depend on `hudi-spark` , which will pull in hudi-hive.  That way you have 
control over what versions to exclude and bring in. With a bundled jar (true 
for any bundled/fat/uber jar), you dont have control to say tell hudi to not 
bring its version of Hive. 
   
   Can you try building a fat jar and running your job once via `spark-submit` 
locally? I imagine you added the spark jars to your IntelliJ module to be able 
to run the program locally. Want to see if the jar conflict is coming from 
that.. Cant understand where `Hive 2.3.2-amzn-2` comes from still from what you 
shared


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2019-09-16 Thread GitBox
vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-531988780
 
 
   Hmmm. I tested with 
   
   Hive version - Hive 2.3.3 
   Spark version - 2.4.4
   
   by simply using spark2.4.4 to do this step on the docker demo (had spark 
installation unzipped onto `docker` and did something like 
   
   ```
   root@adhoc-2:/opt# 
/var/hoodie/ws/docker/spark-2.4.4-bin-hadoop2.7/bin/spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar 
--storage-type COPY_ON_WRITE  --source-class 
org.apache.hudi.utilities.sources.JsonDFSSource --source-ordering-field ts  
--target-base-path /user/hive/warehouse/stock_ticks_cow --target-table 
stock_ticks_cow --props /var/demo/config/dfs-source.properties  
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
 --enable-hive-sync  --hoodie-conf 
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:1  
--hoodie-conf hoodie.datasource.hive_sync.username=hive  --hoodie-conf 
hoodie.datasource.hive_sync.password=hive  --hoodie-conf 
hoodie.datasource.hive_sync.partition_fields=dt  --hoodie-conf 
hoodie.datasource.hive_sync.database=default  --hoodie-conf 
hoodie.datasource.hive_sync.table=stock_ticks_cow
   ```
   seems to work.. 
   
   So I suspect it has to do something with the amazon specific version? Can 
you try building a local docker image with that hive version locally and see if 
you can reproduce this?  
https://hudi.apache.org/docker_demo.html#building-local-docker-containers 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2019-09-16 Thread GitBox
vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-531825236
 
 
   Seems like a jar mismatch isssue.. 
   
   ```
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
   Caused by: 
MetaException(message:org.apache.hadoop.hive.ql.log.PerfLogger.getPerfLogger(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;Z)Lorg/apache/hadoop/hive/ql/log/PerfLogger;)
at 
org.apache.hudi.org.apache.hadoop_hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:83)
at 
org.apache.hudi.org.apache.hadoop_hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
at 
org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6889)
at 
org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:159)
at 
org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:128)
at 
org.apache.hudi.hive.HoodieHiveClient.(HoodieHiveClient.java:99)
... 50 more
   Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.ql.log.PerfLogger.getPerfLogger(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;Z)Lorg/apache/hadoop/hive/ql/log/PerfLogger;
at 
org.apache.hudi.org.apache.hadoop_hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:104)
at 
org.apache.hudi.org.apache.hadoop_hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
... 55 more
   ```
   
   I will try repro this and try to get it working. Thanks for reporting this . 
We stopped bundling the standalone hive jars, which may have been providing 
this previously (guessing. will need to repro and understand)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services