[ https://issues.apache.org/jira/browse/SPARK-18015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589920#comment-15589920 ]
Nick Orka commented on SPARK-18015: ----------------------------------- I've just found that the exception may not reflect real error in the code. Take a look at Scala backlog for exactly the same Scala version with similar issue - https://issues.scala-lang.org/browse/SI-9777 > CLONE - ClassCastException in instance of > org.apache.spark.rdd.MapPartitionsRDD > ------------------------------------------------------------------------------- > > Key: SPARK-18015 > URL: https://issues.apache.org/jira/browse/SPARK-18015 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Nick Orka > > I've decided to clone the ticket because it had the same problem for anothe > spark version and provided workaround doesn't fix an issue. > I'm duplicating my case here. > I have the same issue with Spark 2.0.0 (spark-2.0.0-bin-hadoop2.7.tar.gz) > Here is my pom: > {code:title=pom.xml} > <properties> > <maven.compiler.source>1.6</maven.compiler.source> > <maven.compiler.target>1.6</maven.compiler.target> > <encoding>UTF-8</encoding> > <scala.version>2.11.8</scala.version> > <spark.version>2.0.0</spark.version> > <hadoop.version>2.7.0</hadoop.version> > </properties> > <dependencies> > <!--Spark--> > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-core_2.11</artifactId> > <version>${spark.version}</version> > <scope>provided</scope> > </dependency> > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-sql_2.11</artifactId> > <version>${spark.version}</version> > <scope>provided</scope> > </dependency> > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-hive_2.11</artifactId> > <version>${spark.version}</version> > <scope>provided</scope> > </dependency> > </dependencies> > {code} > As you can see all spark dependencies have provided scope > And this is a code for reproduction: > {code:title=udfTest.scala} > import org.apache.spark.sql.types.{StringType, StructField, StructType} > import org.apache.spark.sql.{Row, SparkSession} > /** > * Created by nborunov on 10/19/16. > */ > object udfTest { > class Seq extends Serializable { > var i = 0 > def getVal: Int = { > i = i + 1 > i > } > } > def main(args: Array[String]) { > val spark = SparkSession > .builder() > .master("spark://nborunov-mbp.local:7077") > // .master("local") > .getOrCreate() > val rdd = spark.sparkContext.parallelize(Seq(Row("one"), Row("two"))) > val schema = StructType(Array(StructField("name", StringType))) > val df = spark.createDataFrame(rdd, schema) > df.show() > spark.udf.register("func", (name: String) => name.toUpperCase) > import org.apache.spark.sql.functions.expr > val newDf = df.withColumn("upperName", expr("func(name)")) > newDf.show() > val seq = new Seq > spark.udf.register("seq", () => seq.getVal) > val seqDf = df.withColumn("id", expr("seq()")) > seqDf.show() > df.createOrReplaceTempView("df") > spark.sql("select *, seq() as sql_id from df").show() > } > } > {code} > When .master("local") - everything works fine. When > .master("spark://...:7077"), it fails on line: > {code} > newDf.show() > {code} > The error is exactly the same: > {code} > scala> udfTest.main(Array()) > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/Users/nborunov/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/Users/nborunov/.m2/repository/ch/qos/logback/logback-classic/1.1.7/logback-classic-1.1.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 16/10/19 19:37:52 INFO SparkContext: Running Spark version 2.0.0 > 16/10/19 19:37:52 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/10/19 19:37:52 INFO SecurityManager: Changing view acls to: nborunov > 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls to: nborunov > 16/10/19 19:37:52 INFO SecurityManager: Changing view acls groups to: > 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls groups to: > 16/10/19 19:37:52 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(nborunov); > groups with view permissions: Set(); users with modify permissions: > Set(nborunov); groups with modify permissions: Set() > 16/10/19 19:37:53 INFO Utils: Successfully started service 'sparkDriver' on > port 57828. > 16/10/19 19:37:53 INFO SparkEnv: Registering MapOutputTracker > 16/10/19 19:37:53 INFO SparkEnv: Registering BlockManagerMaster > 16/10/19 19:37:53 INFO DiskBlockManager: Created local directory at > /private/var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/blockmgr-f2d05423-b7f7-4525-b41e-10dfe2f88264 > 16/10/19 19:37:53 INFO MemoryStore: MemoryStore started with capacity 2004.6 > MB > 16/10/19 19:37:53 INFO SparkEnv: Registering OutputCommitCoordinator > 16/10/19 19:37:54 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 16/10/19 19:37:54 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.2.202:4040 > 16/10/19 19:37:54 INFO StandaloneAppClient$ClientEndpoint: Connecting to > master spark://nborunov-mbp.local:7077... > 16/10/19 19:37:54 INFO TransportClientFactory: Successfully created > connection to nborunov-mbp.local/192.168.2.202:7077 after 74 ms (0 ms spent > in bootstraps) > 16/10/19 19:37:55 INFO StandaloneSchedulerBackend: Connected to Spark cluster > with app ID app-20161019153755-0017 > 16/10/19 19:37:55 INFO StandaloneAppClient$ClientEndpoint: Executor added: > app-20161019153755-0017/0 on worker-20161018232014-192.168.2.202-61437 > (192.168.2.202:61437) with 4 cores > 16/10/19 19:37:55 INFO StandaloneSchedulerBackend: Granted executor ID > app-20161019153755-0017/0 on hostPort 192.168.2.202:61437 with 4 cores, > 1024.0 MB RAM > 16/10/19 19:37:55 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57832. > 16/10/19 19:37:55 INFO NettyBlockTransferService: Server created on > 192.168.2.202:57832 > 16/10/19 19:37:55 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.2.202, 57832) > 16/10/19 19:37:55 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.2.202:57832 with 2004.6 MB RAM, BlockManagerId(driver, 192.168.2.202, > 57832) > 16/10/19 19:37:55 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.2.202, 57832) > 16/10/19 19:37:55 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20161019153755-0017/0 is now RUNNING > 16/10/19 19:37:55 INFO StandaloneSchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 > 16/10/19 19:37:55 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/10/19 19:37:56 INFO HiveSharedState: spark.sql.warehouse.dir is not set, > but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to > the value of hive.metastore.warehouse.dir ('/user/hive/warehouse'). > 16/10/19 19:37:56 INFO HiveSharedState: Warehouse path is > '/user/hive/warehouse'. > 16/10/19 19:37:58 INFO HiveUtils: Initializing HiveMetastoreConnection > version 1.2.1 using Spark classes. > 16/10/19 19:37:58 INFO deprecation: mapred.max.split.size is deprecated. > Instead, use mapreduce.input.fileinputformat.split.maxsize > 16/10/19 19:37:58 INFO deprecation: mapred.reduce.tasks.speculative.execution > is deprecated. Instead, use mapreduce.reduce.speculative > 16/10/19 19:37:58 INFO deprecation: mapred.committer.job.setup.cleanup.needed > is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed > 16/10/19 19:37:58 INFO deprecation: mapred.min.split.size.per.rack is > deprecated. Instead, use > mapreduce.input.fileinputformat.split.minsize.per.rack > 16/10/19 19:37:58 INFO deprecation: mapred.min.split.size is deprecated. > Instead, use mapreduce.input.fileinputformat.split.minsize > 16/10/19 19:37:58 INFO deprecation: mapred.min.split.size.per.node is > deprecated. Instead, use > mapreduce.input.fileinputformat.split.minsize.per.node > 16/10/19 19:37:58 INFO deprecation: mapred.reduce.tasks is deprecated. > Instead, use mapreduce.job.reduces > 16/10/19 19:37:58 INFO deprecation: mapred.input.dir.recursive is deprecated. > Instead, use mapreduce.input.fileinputformat.input.dir.recursive > 16/10/19 19:37:59 INFO metastore: Trying to connect to metastore with URI > thrift://ip-10-100-102-90.iad.sessionm.com:9083 > 16/10/19 19:37:59 INFO metastore: Connected to metastore. > 16/10/19 19:38:00 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: > Registered executor NettyRpcEndpointRef(null) (192.168.2.202:57835) with ID 0 > 16/10/19 19:38:00 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.2.202:57837 with 366.3 MB RAM, BlockManagerId(0, 192.168.2.202, 57837) > 16/10/19 19:38:01 WARN BlockReaderLocal: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 16/10/19 19:38:01 INFO SessionState: Created local directory: > /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/e1377cbe-3c79-4a44-b0be-551f2b73b931_resources > 16/10/19 19:38:01 INFO SessionState: Created HDFS directory: > /tmp/hive/nborunov/e1377cbe-3c79-4a44-b0be-551f2b73b931 > 16/10/19 19:38:01 INFO SessionState: Created local directory: > /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/nborunov/e1377cbe-3c79-4a44-b0be-551f2b73b931 > 16/10/19 19:38:01 INFO SessionState: Created HDFS directory: > /tmp/hive/nborunov/e1377cbe-3c79-4a44-b0be-551f2b73b931/_tmp_space.db > 16/10/19 19:38:01 INFO HiveClientImpl: Warehouse location for Hive client > (version 1.2.1) is /user/hive/warehouse > 16/10/19 19:38:02 INFO SessionState: Created local directory: > /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/4cdb5e78-de4b-4919-b490-4f414c129ed1_resources > 16/10/19 19:38:02 INFO SessionState: Created HDFS directory: > /tmp/hive/nborunov/4cdb5e78-de4b-4919-b490-4f414c129ed1 > 16/10/19 19:38:02 INFO SessionState: Created local directory: > /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/nborunov/4cdb5e78-de4b-4919-b490-4f414c129ed1 > 16/10/19 19:38:02 INFO SessionState: Created HDFS directory: > /tmp/hive/nborunov/4cdb5e78-de4b-4919-b490-4f414c129ed1/_tmp_space.db > 16/10/19 19:38:02 INFO HiveClientImpl: Warehouse location for Hive client > (version 1.2.1) is /user/hive/warehouse > 16/10/19 19:38:03 INFO SparkContext: Starting job: show at udfTest.scala:36 > 16/10/19 19:38:03 INFO DAGScheduler: Got job 0 (show at udfTest.scala:36) > with 1 output partitions > 16/10/19 19:38:03 INFO DAGScheduler: Final stage: ResultStage 0 (show at > udfTest.scala:36) > 16/10/19 19:38:03 INFO DAGScheduler: Parents of final stage: List() > 16/10/19 19:38:03 INFO DAGScheduler: Missing parents: List() > 16/10/19 19:38:03 INFO DAGScheduler: Submitting ResultStage 0 > (MapPartitionsRDD[3] at show at udfTest.scala:36), which has no missing > parents > 16/10/19 19:38:03 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 6.9 KB, free 2004.6 MB) > 16/10/19 19:38:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 3.8 KB, free 2004.6 MB) > 16/10/19 19:38:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on 192.168.2.202:57832 (size: 3.8 KB, free: 2004.6 MB) > 16/10/19 19:38:03 INFO SparkContext: Created broadcast 0 from broadcast at > DAGScheduler.scala:1012 > 16/10/19 19:38:03 INFO DAGScheduler: Submitting 1 missing tasks from > ResultStage 0 (MapPartitionsRDD[3] at show at udfTest.scala:36) > 16/10/19 19:38:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks > 16/10/19 19:38:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > 192.168.2.202, partition 0, PROCESS_LOCAL, 5381 bytes) > 16/10/19 19:38:04 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: > Launching task 0 on executor id: 0 hostname: 192.168.2.202. > 16/10/19 19:38:04 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on 192.168.2.202:57837 (size: 3.8 KB, free: 366.3 MB) > 16/10/19 19:38:07 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 3235 ms on 192.168.2.202 (1/1) > 16/10/19 19:38:07 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks > have all completed, from pool > 16/10/19 19:38:07 INFO DAGScheduler: ResultStage 0 (show at udfTest.scala:36) > finished in 3.265 s > 16/10/19 19:38:07 INFO DAGScheduler: Job 0 finished: show at > udfTest.scala:36, took 3.629356 s > 16/10/19 19:38:07 INFO SparkContext: Starting job: show at udfTest.scala:36 > 16/10/19 19:38:07 INFO DAGScheduler: Got job 1 (show at udfTest.scala:36) > with 1 output partitions > 16/10/19 19:38:07 INFO DAGScheduler: Final stage: ResultStage 1 (show at > udfTest.scala:36) > 16/10/19 19:38:07 INFO DAGScheduler: Parents of final stage: List() > 16/10/19 19:38:07 INFO DAGScheduler: Missing parents: List() > 16/10/19 19:38:07 INFO DAGScheduler: Submitting ResultStage 1 > (MapPartitionsRDD[3] at show at udfTest.scala:36), which has no missing > parents > 16/10/19 19:38:07 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 6.9 KB, free 2004.6 MB) > 16/10/19 19:38:07 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 3.8 KB, free 2004.6 MB) > 16/10/19 19:38:07 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on 192.168.2.202:57832 (size: 3.8 KB, free: 2004.6 MB) > 16/10/19 19:38:07 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:1012 > 16/10/19 19:38:07 INFO DAGScheduler: Submitting 1 missing tasks from > ResultStage 1 (MapPartitionsRDD[3] at show at udfTest.scala:36) > 16/10/19 19:38:07 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks > 16/10/19 19:38:07 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, > 192.168.2.202, partition 1, PROCESS_LOCAL, 5381 bytes) > 16/10/19 19:38:07 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: > Launching task 1 on executor id: 0 hostname: 192.168.2.202. > 16/10/19 19:38:07 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on 192.168.2.202:57837 (size: 3.8 KB, free: 366.3 MB) > 16/10/19 19:38:07 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) > in 85 ms on 192.168.2.202 (1/1) > 16/10/19 19:38:07 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks > have all completed, from pool > 16/10/19 19:38:07 INFO DAGScheduler: ResultStage 1 (show at udfTest.scala:36) > finished in 0.087 s > 16/10/19 19:38:07 INFO DAGScheduler: Job 1 finished: show at > udfTest.scala:36, took 0.103358 s > 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_0_piece0 on > 192.168.2.202:57832 in memory (size: 3.8 KB, free: 2004.6 MB) > 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_0_piece0 on > 192.168.2.202:57837 in memory (size: 3.8 KB, free: 366.3 MB) > 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_1_piece0 on > 192.168.2.202:57832 in memory (size: 3.8 KB, free: 2004.6 MB) > 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_1_piece0 on > 192.168.2.202:57837 in memory (size: 3.8 KB, free: 366.3 MB) > 16/10/19 19:38:08 INFO CodeGenerator: Code generated in 638.80317 ms > +----+ > |name| > +----+ > | one| > | two| > +----+ > 16/10/19 19:38:08 INFO SparkSqlParser: Parsing command: func(name) > 16/10/19 19:38:09 INFO CodeGenerator: Code generated in 51.788495 ms > 16/10/19 19:38:09 INFO SparkContext: Starting job: show at udfTest.scala:44 > 16/10/19 19:38:09 INFO DAGScheduler: Got job 2 (show at udfTest.scala:44) > with 1 output partitions > 16/10/19 19:38:09 INFO DAGScheduler: Final stage: ResultStage 2 (show at > udfTest.scala:44) > 16/10/19 19:38:09 INFO DAGScheduler: Parents of final stage: List() > 16/10/19 19:38:09 INFO DAGScheduler: Missing parents: List() > 16/10/19 19:38:09 INFO DAGScheduler: Submitting ResultStage 2 > (MapPartitionsRDD[6] at show at udfTest.scala:44), which has no missing > parents > 16/10/19 19:38:09 INFO MemoryStore: Block broadcast_2 stored as values in > memory (estimated size 11.4 KB, free 2004.6 MB) > 16/10/19 19:38:09 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes > in memory (estimated size 5.7 KB, free 2004.6 MB) > 16/10/19 19:38:09 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory > on 192.168.2.202:57832 (size: 5.7 KB, free: 2004.6 MB) > 16/10/19 19:38:09 INFO SparkContext: Created broadcast 2 from broadcast at > DAGScheduler.scala:1012 > 16/10/19 19:38:09 INFO DAGScheduler: Submitting 1 missing tasks from > ResultStage 2 (MapPartitionsRDD[6] at show at udfTest.scala:44) > 16/10/19 19:38:09 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks > 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, > 192.168.2.202, partition 0, PROCESS_LOCAL, 5381 bytes) > 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: > Launching task 2 on executor id: 0 hostname: 192.168.2.202. > 16/10/19 19:38:09 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory > on 192.168.2.202:57837 (size: 5.7 KB, free: 366.3 MB) > 16/10/19 19:38:09 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, > 192.168.2.202): java.lang.ClassCastException: cannot assign instance of > scala.collection.immutable.List$SerializationProxy to field > org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type > scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) > at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.1 in stage 2.0 (TID 3, > 192.168.2.202, partition 0, PROCESS_LOCAL, 5381 bytes) > 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: > Launching task 3 on executor id: 0 hostname: 192.168.2.202. > 16/10/19 19:38:09 INFO TaskSetManager: Lost task 0.1 in stage 2.0 (TID 3) on > executor 192.168.2.202: java.lang.ClassCastException (cannot assign instance > of scala.collection.immutable.List$SerializationProxy to field > org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type > scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) > [duplicate 1] > 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.2 in stage 2.0 (TID 4, > 192.168.2.202, partition 0, PROCESS_LOCAL, 5381 bytes) > 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: > Launching task 4 on executor id: 0 hostname: 192.168.2.202. > 16/10/19 19:38:09 INFO TaskSetManager: Lost task 0.2 in stage 2.0 (TID 4) on > executor 192.168.2.202: java.lang.ClassCastException (cannot assign instance > of scala.collection.immutable.List$SerializationProxy to field > org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type > scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) > [duplicate 2] > 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.3 in stage 2.0 (TID 5, > 192.168.2.202, partition 0, PROCESS_LOCAL, 5381 bytes) > 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: > Launching task 5 on executor id: 0 hostname: 192.168.2.202. > 16/10/19 19:38:09 INFO TaskSetManager: Lost task 0.3 in stage 2.0 (TID 5) on > executor 192.168.2.202: java.lang.ClassCastException (cannot assign instance > of scala.collection.immutable.List$SerializationProxy to field > org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type > scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) > [duplicate 3] > 16/10/19 19:38:09 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > 16/10/19 19:38:09 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks > have all completed, from pool > 16/10/19 19:38:09 INFO TaskSchedulerImpl: Cancelling stage 2 > 16/10/19 19:38:09 INFO DAGScheduler: ResultStage 2 (show at udfTest.scala:44) > failed in 0.354 s > 16/10/19 19:38:09 INFO DAGScheduler: Job 2 failed: show at udfTest.scala:44, > took 0.373604 s > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, 192.168.2.202): java.lang.ClassCastException: cannot assign instance > of scala.collection.immutable.List$SerializationProxy to field > org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type > scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) > at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897) > at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:347) > at > org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1925) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1924) > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2562) > at org.apache.spark.sql.Dataset.head(Dataset.scala:1924) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2139) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:239) > at org.apache.spark.sql.Dataset.show(Dataset.scala:526) > at org.apache.spark.sql.Dataset.show(Dataset.scala:486) > at org.apache.spark.sql.Dataset.show(Dataset.scala:495) > at udfTest$.main(udfTest.scala:44) > ... 29 elided > Caused by: java.lang.ClassCastException: cannot assign instance of > scala.collection.immutable.List$SerializationProxy to field > org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type > scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > scala> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org