Hi all,

I have a question regarding the Power Iteration Clustering.
I have an input file (tab separated edge list) which I read in and map it to the required format of RDD[(Long, Long, Double)] to then apply PIC.
So far so good… 
The implementation works fine if the input is small (up to 50MB). 
But it crashes if I try to apply it to a file of size 650 MB.
My technical setup is a compute cluster with 1 master 2 workers. The executor memory is set to 50 GB and in total 24 cores are available.

Is it normal that  the program crashes at such a file size?
I attached my program code as well as the error output.

I hope someone can help me!
Best regards, 
Lydia


Attachment: PIC.scala
Description: Binary data

16/11/23 13:34:19 INFO spark.SparkContext: Running Spark version 2.1.0-SNAPSHOT
16/11/23 13:34:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/11/23 13:34:20 INFO spark.SecurityManager: Changing view acls to: icklerly
16/11/23 13:34:20 INFO spark.SecurityManager: Changing modify acls to: icklerly
16/11/23 13:34:20 INFO spark.SecurityManager: Changing view acls groups to: 
16/11/23 13:34:20 INFO spark.SecurityManager: Changing modify acls groups to: 
16/11/23 13:34:20 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(icklerly); groups 
with view permissions: Set(); users  with modify permissions: Set(icklerly); 
groups with modify permissions: Set()
16/11/23 13:34:20 INFO util.Utils: Successfully started service 'sparkDriver' 
on port 36371.
16/11/23 13:34:20 INFO spark.SparkEnv: Registering MapOutputTracker
16/11/23 13:34:20 INFO spark.SparkEnv: Registering BlockManagerMaster
16/11/23 13:34:20 INFO storage.BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
16/11/23 13:34:20 INFO storage.BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up
16/11/23 13:34:20 INFO storage.DiskBlockManager: Created local directory at 
/tmp/blockmgr-80b089a7-be21-4d14-ab6f-7e0ef1f14396
16/11/23 13:34:20 INFO memory.MemoryStore: MemoryStore started with capacity 
396.3 MB
16/11/23 13:34:20 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/11/23 13:34:20 INFO util.log: Logging initialized @1120ms
16/11/23 13:34:20 INFO server.Server: jetty-9.2.z-SNAPSHOT
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3543df7d{/jobs,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@7c541c15{/jobs/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3542162a{/jobs/job,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@698122b2{/jobs/job/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@4212a0c8{/stages,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@1e7aa82b{/stages/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3b2c0e88{/stages/stage,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@5bd82fed{/stages/stage/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@c1bd0be{/stages/pool,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@476b0ae6{/stages/pool/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@1c6804cd{/storage,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@655f7ea{/storage/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@549949be{/storage/rdd,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@4b3a45f1{/storage/rdd/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@17a87e37{/environment,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3eeb318f{/environment/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@20a14b55{/executors,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@39ad977d{/executors/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@6da00fb9{/executors/threadDump,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@a202ccb{/executors/threadDump/json,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@20f12539{/static,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@75b25825{/,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@18025ced{/api,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@13cf7d52{/jobs/job/kill,null,AVAILABLE}
16/11/23 13:34:20 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3a3e4aff{/stages/stage/kill,null,AVAILABLE}
16/11/23 13:34:20 INFO server.ServerConnector: Started 
ServerConnector@2cae1042{HTTP/1.1}{0.0.0.0:4040}
16/11/23 13:34:20 INFO server.Server: Started @1207ms
16/11/23 13:34:20 INFO util.Utils: Successfully started service 'SparkUI' on 
port 4040.
16/11/23 13:34:20 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://130.73.20.224:4040
16/11/23 13:34:20 INFO spark.SparkContext: Added JAR 
file:/home/icklerly/spark-master/examples/target/scala-2.11/jars/spark-examples_2.11-2.1.0-SNAPSHOT.jar
 at spark://130.73.20.224:36371/jars/spark-examples_2.11-2.1.0-SNAPSHOT.jar 
with timestamp 1479904460674
16/11/23 13:34:20 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to 
master spark://medlab04:7077...
16/11/23 13:34:20 INFO client.TransportClientFactory: Successfully created 
connection to medlab04/130.73.20.224:7077 after 25 ms (0 ms spent in bootstraps)
16/11/23 13:34:20 INFO cluster.StandaloneSchedulerBackend: Connected to Spark 
cluster with app ID app-20161123133420-0006
16/11/23 13:34:20 INFO client.StandaloneAppClient$ClientEndpoint: Executor 
added: app-20161123133420-0006/0 on worker-20161123131030-130.73.21.134-38384 
(130.73.21.134:38384) with 12 cores
16/11/23 13:34:20 INFO cluster.StandaloneSchedulerBackend: Granted executor ID 
app-20161123133420-0006/0 on hostPort 130.73.21.134:38384 with 12 cores, 50.0 
GB RAM
16/11/23 13:34:20 INFO client.StandaloneAppClient$ClientEndpoint: Executor 
added: app-20161123133420-0006/1 on worker-20161123131042-130.73.20.224-35492 
(130.73.20.224:35492) with 12 cores
16/11/23 13:34:20 INFO cluster.StandaloneSchedulerBackend: Granted executor ID 
app-20161123133420-0006/1 on hostPort 130.73.20.224:35492 with 12 cores, 50.0 
GB RAM
16/11/23 13:34:20 INFO client.StandaloneAppClient$ClientEndpoint: Executor 
updated: app-20161123133420-0006/1 is now RUNNING
16/11/23 13:34:20 INFO client.StandaloneAppClient$ClientEndpoint: Executor 
updated: app-20161123133420-0006/0 is now RUNNING
16/11/23 13:34:20 INFO util.Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 36463.
16/11/23 13:34:20 INFO netty.NettyBlockTransferService: Server created on 
130.73.20.224:36463
16/11/23 13:34:20 INFO storage.BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
16/11/23 13:34:20 INFO storage.BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 130.73.20.224, 36463, None)
16/11/23 13:34:20 INFO storage.BlockManagerMasterEndpoint: Registering block 
manager 130.73.20.224:36463 with 396.3 MB RAM, BlockManagerId(driver, 
130.73.20.224, 36463, None)
16/11/23 13:34:20 INFO storage.BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 130.73.20.224, 36463, None)
16/11/23 13:34:20 INFO storage.BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 130.73.20.224, 36463, None)
16/11/23 13:34:21 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@486be205{/metrics/json,null,AVAILABLE}
16/11/23 13:34:21 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is 
ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/11/23 13:34:21 WARN mllib.PIC$: Start:I
16/11/23 13:34:21 INFO memory.MemoryStore: Block broadcast_0 stored as values 
in memory (estimated size 128.0 KB, free 396.2 MB)
16/11/23 13:34:21 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 14.4 KB, free 396.2 MB)
16/11/23 13:34:21 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on 130.73.20.224:36463 (size: 14.4 KB, free: 396.3 MB)
16/11/23 13:34:21 INFO spark.SparkContext: Created broadcast 0 from textFile at 
PIC.scala:28
16/11/23 13:34:21 WARN mllib.PIC$: End:I
16/11/23 13:34:22 WARN scheduler.TaskSetManager: Lost task 3.0 in stage 2.0 
(TID 13, 130.73.21.134, executor 0): java.lang.ClassCastException: cannot 
assign instance of scala.collection.immutable.List$SerializationProxy to field 
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
        at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at 
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at 
scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

16/11/23 13:34:22 ERROR scheduler.TaskSetManager: Task 3 in stage 2.0 failed 4 
times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 3 in stage 2.0 failed 4 times, most recent failure: Lost 
task 3.3 in stage 2.0 (TID 23, 130.73.21.134, executor 0): 
java.lang.ClassCastException: cannot assign instance of 
scala.collection.immutable.List$SerializationProxy to field 
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
        at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at 
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at 
scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1436)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1424)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1423)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at scala.Option.foreach(Option.scala:257)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1651)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1606)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1595)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1914)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1977)
        at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1078)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
        at org.apache.spark.rdd.RDD.fold(RDD.scala:1072)
        at 
org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply$mcD$sp(DoubleRDDFunctions.scala:35)
        at 
org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
        at 
org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
        at 
org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:34)
        at 
org.apache.spark.mllib.clustering.PowerIterationClustering$.initDegreeVector(PowerIterationClustering.scala:447)
        at 
org.apache.spark.mllib.clustering.PowerIterationClustering.run(PowerIterationClustering.scala:209)
        at org.apache.spark.examples.mllib.PIC$.main(PIC.scala:42)
        at org.apache.spark.examples.mllib.PIC.main(PIC.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: cannot assign instance of 
scala.collection.immutable.List$SerializationProxy to field 
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
        at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at 
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at 
scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Reply via email to