Thanks Akhil. I realized that earlier, and i thought mvn -Phive should have captured and included all these dependencies.
In any case, i proceeded with that, included other such dependencies that were missing, and finally hit the guava version mismatch issue. (Spark with Guava 14 vs Hadoop/Hive with Guava 11). There are 2 parts: 1. Spark includes Guava library within its jars and that may conflict with Hadoop/Hive components depending on older version of the library. It seems this has been solved with SPARK-2848 <https://issues.apache.org/jira/browse/SPARK-2848> patch to shade the Guava libraries. 2. Spark actually uses interfaces from newer version of Guava library, that needs to be rewritten to use older version (i.e. downgrade Spark dependency on Guava). I wasn't able to find the related patches (I need them since i am on Spark 1.0.1). Applying patch for #1 above, i still hit the following error: 14/11/03 15:01:32 WARN storage.BlockManager: Putting block broadcast_0 failed java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) .... <stack continues> I haven't been able to find the other patches that actually downgrade the dependency. Please point me to those patches, or any other ideas about fixing these dependency issues. Thanks. On Sun, Nov 2, 2014 at 8:41 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Adding the libthrift jar > <http://mvnrepository.com/artifact/org.apache.thrift/libthrift/0.9.0> in > the class path would resolve this issue. > > Thanks > Best Regards > > On Sat, Nov 1, 2014 at 12:34 AM, Pala M Muthaia < > mchett...@rocketfuelinc.com> wrote: > >> Hi, >> >> I am trying to load hive datasets using HiveContext, in spark shell. >> Spark ver 1.0.1 and Hive ver 0.12. >> >> We are trying to get Spark work with hive datasets. I already have >> existing Spark deployment. Following is what i did on top of that: >> 1. Build spark using 'mvn -Pyarn,hive -Phadoop-2.4 -Dhadoop.version=2.4.0 >> -DskipTests clean package' >> 2. Copy over spark-assembly-1.0.1-hadoop2.4.0.jar into spark deployment >> directory. >> 3. Launch spark-shell with the spark hive jar included in the list. >> >> When i execute *'* >> >> *val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)* >> >> i get the following error stack: >> >> java.lang.NoClassDefFoundError: org/apache/thrift/TBase >> at java.lang.ClassLoader.defineClass1(Native Method) >> at java.lang.ClassLoader.defineClass(ClassLoader.java:792) >> at >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) >> .... >> at >> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> Caused by: java.lang.ClassNotFoundException: org.apache.thrift.TBase >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> ... 55 more >> >> I thought that building with -Phive option should include all the >> necessary hive packages into the assembly jar (according to here >> <https://spark.apache.org/docs/1.0.1/sql-programming-guide.html#hive-tables>). >> I tried searching online and in this mailing list archive but haven't found >> any instructions on how to get this working. >> >> I know that there is additional step of updating the assembly jar across >> the whole cluster, not just client side, but right now, even the client is >> not working. >> >> Would appreciate instructions (or link to them) on how to get this >> working end-to-end. >> >> >> Thanks, >> pala >> > >