I've been trying to run HiveQL queries with UDFs in Spark SQL, but with no
success. The problem occurs only when using functions, like the
from_unixtime (represented by the Hive class UDFFromUnixTime).

I'm using Spark 1.2 with CDH5.3.0. Running the queries in local mode work,
but in Yarn mode don't. I'm creating an uber-jar with all the needed
dependencies, excluding the ones provided by the cluster (Spark, Hadoop) and
including the Hive ones. When I run the queries in Yarn I get the following
exception:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
Lost task 1.3 in stage 0.0 (TID 20, <REMOVED>):
java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/exec/UDF;
        at java.lang.Class.getDeclaredFields0(Native Method)
        at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
        at java.lang.Class.getDeclaredField(Class.java:1951)
        at
java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
        at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
        at
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
        at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
        at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
        at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.exec.UDF
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)


First I investigated all the jar and classpath options with the spark-submit
command: no luck.
Then I tried to instantiate the UDF and UDFFromUnixTime classes using the
environment that is set up for the executor. To do that, after letting the
spark app fail, I went to one of the container's directory
/mnt/sda/yarn/nm/usercache/altaia/appcache/application_1422464005963_0047/container_1422464005963_0047_01_000004/
(for example) and from there I can see these files:

    container_tokens
    GeneralTest20140929-1.0.0-SNAPSHOT.jar   <=== this is my uber-jar
    launch_container.sh
    __spark__.jar
    tmp

Opening the launch_container.sh I checked all the classpath setup and to
test if that classpath was correct when launching the JVM I replaced the
"org.apache.spark.executor.CoarseGrainedExecutorBackend" class with a class
of mine whose job is to print the classpath and instantiate, by reflection,
the UDF and UDFFromUnixTime and all went well.

I already tested having all the dependencies' jars in one directory on all
hosts and adding that to the spark.executor.extraClassPath and
spark.driver.extraClassPath: no luck either.
At this stage I just think that the uber-jar and classpath are OK. I have no
more clues of what can be happening. Maybe some classloader issue with Spark
SQL?
The ClassNotFoundException occurs when returning data back to the driver
(because of the "ResultTask" seen in the stacktrace).

Does anyone had such a similar issue?

Regards.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Unable-to-use-Hive-UDF-because-of-ClassNotFoundException-tp21443.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to