Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

2014-04-17 Thread Gerd Koenig
Hi Arpit,

I didn't build it, I am using the prebuild version described here:
http://www.abcn.net/2014/04/install-shark-on-cdh5-hadoop2-spark.html
including adding e.g. the mentioned jar

br...Gerd...


On 17 April 2014 15:49, Arpit Tak  wrote:

> Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
> How you build shark .for it??
>
> are you able to read any file from hdfs ...did you tried that out..???
>
>
> Regards,
> Arpit Tak
>
>
> On Thu, Apr 17, 2014 at 7:07 PM, ge ko  wrote:
>
>> Hi,
>>
>> the error java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
>> resolved by adding
>> parquet-hive-bundle-1.4.1.jar to shark's lib folder.
>> Now the Hive metastore can be read successfully (also the parquet based
>> table).
>>
>> But if I want to select from that table I receive:
>>
>> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
>> (most recent failure: Exception failure: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>>
>> This is really strange, since the class
>> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
>> the parquet-hive-bundle-1.4.1.jar ?!?!
>> ...getting more and more confused ;)
>>
>> any help ?
>>
>> regards, Gerd
>>
>>
>> On 17 April 2014 11:55, ge ko  wrote:
>>
>>> Hi,
>>>
>>> I want to select from a parquet based table in shark, but receive the
>>> error:
>>>
>>> shark> select * from wl_parquet;
>>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
>>> 14/04/17 11:33:49 INFO ql.Driver: 
>>> 14/04/17 11:33:49 INFO ql.Driver: 
>>> 14/04/17 11:33:49 INFO ql.Driver: 
>>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
>>> wl_parquet
>>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
>>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
>>> source tables
>>> FAILED: Hive Internal Error:
>>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
>>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>> at
>>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
>>> at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:99)
>>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
>>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
>>> at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
>>> at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
>>> at
>>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
>>> at
>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>>> at shark.SharkDriver.compile(SharkDriver.scala:215)
>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>>> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>>> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
>>> at shark.SharkCliDriver.main(SharkCliDriver.scala)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:270)
>>> at
>>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
>>> ... 14 more
>>>
>>> I can successfully select from that table with Hive and Impala, but
>>> shark doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>>>
>>> In what jar is this class "hidden", how can I get rid of this exception
>>> ?!?!
>>>
>>>

Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

2014-04-17 Thread Arpit Tak
Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
How you build shark .for it??

are you able to read any file from hdfs ...did you tried that out..???


Regards,
Arpit Tak


On Thu, Apr 17, 2014 at 7:07 PM, ge ko  wrote:

> Hi,
>
> the error java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
> resolved by adding
> parquet-hive-bundle-1.4.1.jar to shark's lib folder.
> Now the Hive metastore can be read successfully (also the parquet based
> table).
>
> But if I want to select from that table I receive:
>
> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
> (most recent failure: Exception failure: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>
> This is really strange, since the class
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
> the parquet-hive-bundle-1.4.1.jar ?!?!
> ...getting more and more confused ;)
>
> any help ?
>
> regards, Gerd
>
>
> On 17 April 2014 11:55, ge ko  wrote:
>
>> Hi,
>>
>> I want to select from a parquet based table in shark, but receive the
>> error:
>>
>> shark> select * from wl_parquet;
>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
>> 14/04/17 11:33:49 INFO ql.Driver: 
>> 14/04/17 11:33:49 INFO ql.Driver: 
>> 14/04/17 11:33:49 INFO ql.Driver: 
>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
>> wl_parquet
>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
>> source tables
>> FAILED: Hive Internal Error:
>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>> at
>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
>> at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:99)
>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
>> at
>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
>> at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>> at shark.SharkDriver.compile(SharkDriver.scala:215)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
>> at shark.SharkCliDriver.main(SharkCliDriver.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:270)
>> at
>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
>> ... 14 more
>>
>> I can successfully select from that table with Hive and Impala, but shark
>> doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>>
>> In what jar is this class "hidden", how can I get rid of this exception
>> ?!?!
>>
>> The lib folder of shark contains:
>> [root@hadoop-pg-9 shark-0.9.1]# ll lib
>> total 180
>> lrwxrwxrwx 1 root root67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar
>> -> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
>> -rwxrwxr-x 1 root root 23086  9. Apr 10:57 JavaEWAH-0.4.2.jar
>> lrwxrwxrwx 1 root root53 14. Apr 21:46 parquet-avr

Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

2014-04-17 Thread ge ko
Hi,

the error java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
resolved by adding
parquet-hive-bundle-1.4.1.jar to shark's lib folder.
Now the Hive metastore can be read successfully (also the parquet based
table).

But if I want to select from that table I receive:

org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
(most recent failure: Exception failure: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

This is really strange, since the class
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
the parquet-hive-bundle-1.4.1.jar ?!?!
...getting more and more confused ;)

any help ?

regards, Gerd


On 17 April 2014 11:55, ge ko  wrote:

> Hi,
>
> I want to select from a parquet based table in shark, but receive the
> error:
>
> shark> select * from wl_parquet;
> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
> 14/04/17 11:33:49 INFO ql.Driver: 
> 14/04/17 11:33:49 INFO ql.Driver: 
> 14/04/17 11:33:49 INFO ql.Driver: 
> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
> wl_parquet
> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
> source tables
> FAILED: Hive Internal Error:
> java.lang.RuntimeException(java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
> java.lang.RuntimeException(java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> at
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
> at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:99)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
> at
> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
> at shark.SharkDriver.compile(SharkDriver.scala:215)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
> at shark.SharkCliDriver.main(SharkCliDriver.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
> ... 14 more
>
> I can successfully select from that table with Hive and Impala, but shark
> doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>
> In what jar is this class "hidden", how can I get rid of this exception
> ?!?!
>
> The lib folder of shark contains:
> [root@hadoop-pg-9 shark-0.9.1]# ll lib
> total 180
> lrwxrwxrwx 1 root root67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar ->
> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
> -rwxrwxr-x 1 root root 23086  9. Apr 10:57 JavaEWAH-0.4.2.jar
> lrwxrwxrwx 1 root root53 14. Apr 21:46 parquet-avro.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
> lrwxrwxrwx 1 root root58 14. Apr 21:46 parquet-cascading.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
> lrwxrwxrwx 1 root root55 14. Apr 21:46 parquet-column.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
> lrwxrwxrwx 1 root root55 14. Apr 21:46 parquet-common.j