Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Hi Arpit, I didn't build it, I am using the prebuild version described here: http://www.abcn.net/2014/04/install-shark-on-cdh5-hadoop2-spark.html including adding e.g. the mentioned jar br...Gerd... On 17 April 2014 15:49, Arpit Tak wrote: > Just for curiosity , as you are using Cloudera-Manager hadoop and spark.. > How you build shark .for it?? > > are you able to read any file from hdfs ...did you tried that out..??? > > > Regards, > Arpit Tak > > > On Thu, Apr 17, 2014 at 7:07 PM, ge ko wrote: > >> Hi, >> >> the error java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been >> resolved by adding >> parquet-hive-bundle-1.4.1.jar to shark's lib folder. >> Now the Hive metastore can be read successfully (also the parquet based >> table). >> >> But if I want to select from that table I receive: >> >> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times >> (most recent failure: Exception failure: java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) >> >> This is really strange, since the class >> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in >> the parquet-hive-bundle-1.4.1.jar ?!?! >> ...getting more and more confused ;) >> >> any help ? >> >> regards, Gerd >> >> >> On 17 April 2014 11:55, ge ko wrote: >> >>> Hi, >>> >>> I want to select from a parquet based table in shark, but receive the >>> error: >>> >>> shark> select * from wl_parquet; >>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark >>> 14/04/17 11:33:49 INFO ql.Driver: >>> 14/04/17 11:33:49 INFO ql.Driver: >>> 14/04/17 11:33:49 INFO ql.Driver: >>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from >>> wl_parquet >>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed >>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for >>> source tables >>> FAILED: Hive Internal Error: >>> java.lang.RuntimeException(java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) >>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: >>> java.lang.RuntimeException(java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) >>> java.lang.RuntimeException: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat >>> at >>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306) >>> at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:99) >>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988) >>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891) >>> at >>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083) >>> at >>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) >>> at >>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137) >>> at >>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) >>> at shark.SharkDriver.compile(SharkDriver.scala:215) >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) >>> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338) >>> at >>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) >>> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235) >>> at shark.SharkCliDriver.main(SharkCliDriver.scala) >>> Caused by: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:270) >>> at >>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302) >>> ... 14 more >>> >>> I can successfully select from that table with Hive and Impala, but >>> shark doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1. >>> >>> In what jar is this class "hidden", how can I get rid of this exception >>> ?!?! >>> >>>
Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Just for curiosity , as you are using Cloudera-Manager hadoop and spark.. How you build shark .for it?? are you able to read any file from hdfs ...did you tried that out..??? Regards, Arpit Tak On Thu, Apr 17, 2014 at 7:07 PM, ge ko wrote: > Hi, > > the error java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been > resolved by adding > parquet-hive-bundle-1.4.1.jar to shark's lib folder. > Now the Hive metastore can be read successfully (also the parquet based > table). > > But if I want to select from that table I receive: > > org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times > (most recent failure: Exception failure: java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) > > This is really strange, since the class > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in > the parquet-hive-bundle-1.4.1.jar ?!?! > ...getting more and more confused ;) > > any help ? > > regards, Gerd > > > On 17 April 2014 11:55, ge ko wrote: > >> Hi, >> >> I want to select from a parquet based table in shark, but receive the >> error: >> >> shark> select * from wl_parquet; >> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark >> 14/04/17 11:33:49 INFO ql.Driver: >> 14/04/17 11:33:49 INFO ql.Driver: >> 14/04/17 11:33:49 INFO ql.Driver: >> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from >> wl_parquet >> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed >> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for >> source tables >> FAILED: Hive Internal Error: >> java.lang.RuntimeException(java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) >> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: >> java.lang.RuntimeException(java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) >> java.lang.RuntimeException: java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat >> at >> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306) >> at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:99) >> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988) >> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891) >> at >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083) >> at >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) >> at >> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137) >> at >> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) >> at shark.SharkDriver.compile(SharkDriver.scala:215) >> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) >> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) >> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338) >> at >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) >> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235) >> at shark.SharkCliDriver.main(SharkCliDriver.scala) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:270) >> at >> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302) >> ... 14 more >> >> I can successfully select from that table with Hive and Impala, but shark >> doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1. >> >> In what jar is this class "hidden", how can I get rid of this exception >> ?!?! >> >> The lib folder of shark contains: >> [root@hadoop-pg-9 shark-0.9.1]# ll lib >> total 180 >> lrwxrwxrwx 1 root root67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar >> -> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar >> -rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar >> lrwxrwxrwx 1 root root53 14. Apr 21:46 parquet-avr
Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Hi, the error java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been resolved by adding parquet-hive-bundle-1.4.1.jar to shark's lib folder. Now the Hive metastore can be read successfully (also the parquet based table). But if I want to select from that table I receive: org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) This is really strange, since the class org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in the parquet-hive-bundle-1.4.1.jar ?!?! ...getting more and more confused ;) any help ? regards, Gerd On 17 April 2014 11:55, ge ko wrote: > Hi, > > I want to select from a parquet based table in shark, but receive the > error: > > shark> select * from wl_parquet; > 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark > 14/04/17 11:33:49 INFO ql.Driver: > 14/04/17 11:33:49 INFO ql.Driver: > 14/04/17 11:33:49 INFO ql.Driver: > 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from > wl_parquet > 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed > 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for > source tables > FAILED: Hive Internal Error: > java.lang.RuntimeException(java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) > 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: > java.lang.RuntimeException(java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) > java.lang.RuntimeException: java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > at > org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306) > at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:99) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) > at > shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) > at shark.SharkDriver.compile(SharkDriver.scala:215) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) > at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) > at shark.SharkCliDriver$.main(SharkCliDriver.scala:235) > at shark.SharkCliDriver.main(SharkCliDriver.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:270) > at > org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302) > ... 14 more > > I can successfully select from that table with Hive and Impala, but shark > doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1. > > In what jar is this class "hidden", how can I get rid of this exception > ?!?! > > The lib folder of shark contains: > [root@hadoop-pg-9 shark-0.9.1]# ll lib > total 180 > lrwxrwxrwx 1 root root67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar -> > /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar > -rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar > lrwxrwxrwx 1 root root53 14. Apr 21:46 parquet-avro.jar -> > /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar > lrwxrwxrwx 1 root root58 14. Apr 21:46 parquet-cascading.jar -> > /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar > lrwxrwxrwx 1 root root55 14. Apr 21:46 parquet-column.jar -> > /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar > lrwxrwxrwx 1 root root55 14. Apr 21:46 parquet-common.j