Hi,

I am trying to load hive datasets using HiveContext, in spark shell. Spark
ver 1.0.1 and Hive ver 0.12.

We are trying to get Spark work with hive datasets. I already have existing
Spark deployment. Following is what i did on top of that:
1. Build spark using 'mvn -Pyarn,hive -Phadoop-2.4 -Dhadoop.version=2.4.0
-DskipTests clean package'
2. Copy over spark-assembly-1.0.1-hadoop2.4.0.jar into spark deployment
directory.
3. Launch spark-shell with the spark hive jar included in the list.

When i execute *'*

*val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*

i get the following error stack:

java.lang.NoClassDefFoundError: org/apache/thrift/TBase
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
        at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        ....
        at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.thrift.TBase
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 55 more

I thought that building with -Phive option should include all the necessary
hive packages into the assembly jar (according to here
<https://spark.apache.org/docs/1.0.1/sql-programming-guide.html#hive-tables>).
I tried searching online and in this mailing list archive but haven't found
any instructions on how to get this working.

I know that there is additional step of updating the assembly jar across
the whole cluster, not just client side, but right now, even the client is
not working.

Would appreciate instructions (or link to them) on how to get this working
end-to-end.


Thanks,
pala

Reply via email to