Denis Efarov created ZEPPELIN-3591:
--------------------------------------
Summary: Some values of "args" property in interpreter settings
for Spark ruin UDF execution
Key: ZEPPELIN-3591
URL: https://issues.apache.org/jira/browse/ZEPPELIN-3591
Project: Zeppelin
Issue Type: Bug
Components: zeppelin-interpreter
Affects Versions: 0.7.2
Environment: CentOS Linux 7.3.1611
Java 1.8.0_60
Scala 2.11.8
Spark 2.1.1
Hadoop 2.6.0
Zeppelin 0.7.2
Reporter: Denis Efarov
In "args" interpreter configuration property, any value which starts with "-"
(minus) sign prevents correct UDF execution in Spark running on YARN. Text
after "-" doesn't matter, it fails anyway. All the other properties do not
affect this.
Steps to reproduce:
* On the interpreter settings page, find Spark interpreter
* For "args" property, put any value starting with "-", for example "-test"
* Make sure spark starts on yarn (master=yarn-client)
* Save settings and restart the interpreter
* In any notebook, write and execute the following code:
** %spark
val udfDemo = (i: Int) => i + 10;
sqlContext.udf.register("demoUdf", (i: Int) => i);
sqlContext.sql("select demoUdf(1) val").show
Stacktrace:
{{java.lang.ClassCastException: cannot assign instance of
scala.collection.immutable.List$SerializationProxy to field
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD}}{{at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)}}{{at
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)}}{{at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)}}{{at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)}}{{at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)}}{{at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)}}{{at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)}}{{at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)}}{{at
org.apache.spark.scheduler.Task.run(Task.scala:99)}}{{at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)}}{{at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)}}{{at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)}}{{at
java.lang.Thread.run(Thread.java:744)}}
Making the same UDF declaration in, for example, %pyspark interpreter, helps,
even if one executes it in %spark.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)