Thanks Akhil.

I realized that earlier, and i thought mvn -Phive should have captured and
included all these dependencies.

In any case, i proceeded with that, included other such dependencies that
were missing, and  finally hit the guava version mismatch issue. (Spark
with Guava 14 vs Hadoop/Hive with Guava 11). There are 2 parts:

1. Spark includes Guava library within its jars and that may conflict with
Hadoop/Hive components depending on older version of the library.

It seems this has been solved with SPARK-2848
<https://issues.apache.org/jira/browse/SPARK-2848> patch to shade the Guava
libraries.


2. Spark actually uses interfaces from newer version of Guava library, that
needs to be rewritten to use older version (i.e. downgrade Spark dependency
on Guava).

I wasn't able to find the related patches (I need them since i am on Spark
1.0.1). Applying patch for #1 above, i still hit the following error:

14/11/03 15:01:32 WARN storage.BlockManager: Putting block broadcast_0
failed
java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
        at org.apache.spark.util.collection.OpenHashSet.org
$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
        at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
        at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
.... <stack continues>

I haven't been able to find the other patches that actually downgrade the
dependency.


Please point me to those patches, or any other ideas about fixing these
dependency issues.


Thanks.



On Sun, Nov 2, 2014 at 8:41 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Adding the libthrift jar
> <http://mvnrepository.com/artifact/org.apache.thrift/libthrift/0.9.0> in
> the class path would resolve this issue.
>
> Thanks
> Best Regards
>
> On Sat, Nov 1, 2014 at 12:34 AM, Pala M Muthaia <
> mchett...@rocketfuelinc.com> wrote:
>
>> Hi,
>>
>> I am trying to load hive datasets using HiveContext, in spark shell.
>> Spark ver 1.0.1 and Hive ver 0.12.
>>
>> We are trying to get Spark work with hive datasets. I already have
>> existing Spark deployment. Following is what i did on top of that:
>> 1. Build spark using 'mvn -Pyarn,hive -Phadoop-2.4 -Dhadoop.version=2.4.0
>> -DskipTests clean package'
>> 2. Copy over spark-assembly-1.0.1-hadoop2.4.0.jar into spark deployment
>> directory.
>> 3. Launch spark-shell with the spark hive jar included in the list.
>>
>> When i execute *'*
>>
>> *val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*
>>
>> i get the following error stack:
>>
>> java.lang.NoClassDefFoundError: org/apache/thrift/TBase
>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
>>         at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>         ....
>>         at
>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.ClassNotFoundException: org.apache.thrift.TBase
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>         ... 55 more
>>
>> I thought that building with -Phive option should include all the
>> necessary hive packages into the assembly jar (according to here
>> <https://spark.apache.org/docs/1.0.1/sql-programming-guide.html#hive-tables>).
>> I tried searching online and in this mailing list archive but haven't found
>> any instructions on how to get this working.
>>
>> I know that there is additional step of updating the assembly jar across
>> the whole cluster, not just client side, but right now, even the client is
>> not working.
>>
>> Would appreciate instructions (or link to them) on how to get this
>> working end-to-end.
>>
>>
>> Thanks,
>> pala
>>
>
>

Reply via email to