Re: Cannot instantiate hive context

2014-11-03 Thread Pala M Muthaia
Thanks Akhil.

I realized that earlier, and i thought mvn -Phive should have captured and
included all these dependencies.

In any case, i proceeded with that, included other such dependencies that
were missing, and  finally hit the guava version mismatch issue. (Spark
with Guava 14 vs Hadoop/Hive with Guava 11). There are 2 parts:

1. Spark includes Guava library within its jars and that may conflict with
Hadoop/Hive components depending on older version of the library.

It seems this has been solved with SPARK-2848
https://issues.apache.org/jira/browse/SPARK-2848 patch to shade the Guava
libraries.


2. Spark actually uses interfaces from newer version of Guava library, that
needs to be rewritten to use older version (i.e. downgrade Spark dependency
on Guava).

I wasn't able to find the related patches (I need them since i am on Spark
1.0.1). Applying patch for #1 above, i still hit the following error:

14/11/03 15:01:32 WARN storage.BlockManager: Putting block broadcast_0
failed
java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
at org.apache.spark.util.collection.OpenHashSet.org
$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 stack continues

I haven't been able to find the other patches that actually downgrade the
dependency.


Please point me to those patches, or any other ideas about fixing these
dependency issues.


Thanks.



On Sun, Nov 2, 2014 at 8:41 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Adding the libthrift jar
 http://mvnrepository.com/artifact/org.apache.thrift/libthrift/0.9.0 in
 the class path would resolve this issue.

 Thanks
 Best Regards

 On Sat, Nov 1, 2014 at 12:34 AM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 Hi,

 I am trying to load hive datasets using HiveContext, in spark shell.
 Spark ver 1.0.1 and Hive ver 0.12.

 We are trying to get Spark work with hive datasets. I already have
 existing Spark deployment. Following is what i did on top of that:
 1. Build spark using 'mvn -Pyarn,hive -Phadoop-2.4 -Dhadoop.version=2.4.0
 -DskipTests clean package'
 2. Copy over spark-assembly-1.0.1-hadoop2.4.0.jar into spark deployment
 directory.
 3. Launch spark-shell with the spark hive jar included in the list.

 When i execute *'*

 *val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*

 i get the following error stack:

 java.lang.NoClassDefFoundError: org/apache/thrift/TBase
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: java.lang.ClassNotFoundException: org.apache.thrift.TBase
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 55 more

 I thought that building with -Phive option should include all the
 necessary hive packages into the assembly jar (according to here
 https://spark.apache.org/docs/1.0.1/sql-programming-guide.html#hive-tables).
 I tried searching online and in this mailing list archive but haven't found
 any instructions on how to get this working.

 I know that there is additional step of updating the assembly jar across
 the whole cluster, not just client side, but right now, even the client is
 not working.

 Would appreciate instructions (or link to them) on how to get this
 working end-to-end.


 Thanks,
 pala





Re: Cannot instantiate hive context

2014-11-03 Thread Akhil Das
Not quiet sure, but moving the Guava 11 jar to first position in the
classpath may solve this issue.

Thanks
Best Regards

On Tue, Nov 4, 2014 at 1:47 AM, Pala M Muthaia mchett...@rocketfuelinc.com
wrote:

 Thanks Akhil.

 I realized that earlier, and i thought mvn -Phive should have captured and
 included all these dependencies.

 In any case, i proceeded with that, included other such dependencies that
 were missing, and  finally hit the guava version mismatch issue. (Spark
 with Guava 14 vs Hadoop/Hive with Guava 11). There are 2 parts:

 1. Spark includes Guava library within its jars and that may conflict with
 Hadoop/Hive components depending on older version of the library.

 It seems this has been solved with SPARK-2848
 https://issues.apache.org/jira/browse/SPARK-2848 patch to shade the
 Guava libraries.


 2. Spark actually uses interfaces from newer version of Guava library,
 that needs to be rewritten to use older version (i.e. downgrade Spark
 dependency on Guava).

 I wasn't able to find the related patches (I need them since i am on Spark
 1.0.1). Applying patch for #1 above, i still hit the following error:

 14/11/03 15:01:32 WARN storage.BlockManager: Putting block broadcast_0
 failed
 java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
  stack continues

 I haven't been able to find the other patches that actually downgrade the
 dependency.


 Please point me to those patches, or any other ideas about fixing these
 dependency issues.


 Thanks.



 On Sun, Nov 2, 2014 at 8:41 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Adding the libthrift jar
 http://mvnrepository.com/artifact/org.apache.thrift/libthrift/0.9.0 in
 the class path would resolve this issue.

 Thanks
 Best Regards

 On Sat, Nov 1, 2014 at 12:34 AM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 Hi,

 I am trying to load hive datasets using HiveContext, in spark shell.
 Spark ver 1.0.1 and Hive ver 0.12.

 We are trying to get Spark work with hive datasets. I already have
 existing Spark deployment. Following is what i did on top of that:
 1. Build spark using 'mvn -Pyarn,hive -Phadoop-2.4
 -Dhadoop.version=2.4.0 -DskipTests clean package'
 2. Copy over spark-assembly-1.0.1-hadoop2.4.0.jar into spark deployment
 directory.
 3. Launch spark-shell with the spark hive jar included in the list.

 When i execute *'*

 *val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*

 i get the following error stack:

 java.lang.NoClassDefFoundError: org/apache/thrift/TBase
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
 at
 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: java.lang.ClassNotFoundException: org.apache.thrift.TBase
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 55 more

 I thought that building with -Phive option should include all the
 necessary hive packages into the assembly jar (according to here
 https://spark.apache.org/docs/1.0.1/sql-programming-guide.html#hive-tables).
 I tried searching online and in this mailing list archive but haven't found
 any instructions on how to get this working.

 I know that there is additional step of updating the assembly jar across
 the whole cluster, not just client side, but right now, even the client is
 not working.

 Would appreciate instructions (or link to them) on how to get this
 working end-to-end.


 Thanks,
 pala






Re: Cannot instantiate hive context

2014-11-02 Thread Akhil Das
Adding the libthrift jar
http://mvnrepository.com/artifact/org.apache.thrift/libthrift/0.9.0 in
the class path would resolve this issue.

Thanks
Best Regards

On Sat, Nov 1, 2014 at 12:34 AM, Pala M Muthaia mchett...@rocketfuelinc.com
 wrote:

 Hi,

 I am trying to load hive datasets using HiveContext, in spark shell. Spark
 ver 1.0.1 and Hive ver 0.12.

 We are trying to get Spark work with hive datasets. I already have
 existing Spark deployment. Following is what i did on top of that:
 1. Build spark using 'mvn -Pyarn,hive -Phadoop-2.4 -Dhadoop.version=2.4.0
 -DskipTests clean package'
 2. Copy over spark-assembly-1.0.1-hadoop2.4.0.jar into spark deployment
 directory.
 3. Launch spark-shell with the spark hive jar included in the list.

 When i execute *'*

 *val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*

 i get the following error stack:

 java.lang.NoClassDefFoundError: org/apache/thrift/TBase
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: java.lang.ClassNotFoundException: org.apache.thrift.TBase
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 55 more

 I thought that building with -Phive option should include all the
 necessary hive packages into the assembly jar (according to here
 https://spark.apache.org/docs/1.0.1/sql-programming-guide.html#hive-tables).
 I tried searching online and in this mailing list archive but haven't found
 any instructions on how to get this working.

 I know that there is additional step of updating the assembly jar across
 the whole cluster, not just client side, but right now, even the client is
 not working.

 Would appreciate instructions (or link to them) on how to get this working
 end-to-end.


 Thanks,
 pala