Haifeng Li created TOREE-428: -------------------------------- Summary: Can't use case class in the Scala notebook Key: TOREE-428 URL: https://issues.apache.org/jira/browse/TOREE-428 Project: TOREE Issue Type: Bug Components: Build Reporter: Haifeng Li
the version of docker: jupyter/all-spark-notebook:lastest the way to start docker: docker run -it --rm -p 8888:8888 jupyter/all-spark-notebook:latest or docker ps -a docker start -i containerID the steps: Visit http://localhost:8888 Start an spylon-kernal notebook input code above import spark.implicits._ val p = spark.sparkContext.textFile ("../Data/person.txt") val pmap = p.map ( _.split (",")) pmap.collect() the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), Array(George, Bush, 68), Array(Bill, Clinton, 68)) case class Persons (first_name:String,last_name: String,age:Int) val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt)) personRDD.take(1) the error message: org.apache.spark.SparkDriverExecutionException: Execution error at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.take(RDD.scala:1327) ... 39 elided Caused by: java.lang.ArrayStoreException: [LPersons; at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) The above code is working with the spark-shell. From error message, I speculated that the driver program didn't correctly handle case class Persons to RDD partition. -- This message was sent by Atlassian JIRA (v6.4.14#64029)