Hi, This used to work in Spark 1.6.1. I am trying in Spark 2
scala> val a = df.filter(col("Transaction Date") > "").map(p => Accounts(p(0).toString,p(1).toString,p(2).toString,p(3).toString,p(4).toString,p(5).toString,p(6).toString,p(7).toString.toDouble)) a: org.apache.spark.sql.Dataset[Accounts] = [TransactionDate: string, TransactionType: string ... 6 more fields] scala> a.printSchema root |-- TransactionDate: string (nullable = true) |-- TransactionType: string (nullable = true) |-- SortCode: string (nullable = true) |-- AccountNumber: string (nullable = true) |-- TransactionDescription: string (nullable = true) |-- DebitAmount: string (nullable = true) |-- CreditAmount: string (nullable = true) |-- Balance: double (nullable = true) Now I register it as a temptable scala> a.registerTempTable("tmp") scala> sql("select count(1) from tmp") res35: org.apache.spark.sql.DataFrame = [count(1): bigint] Now try to collect it. it falls over scala> sql("select count(1) from tmp").collect 16/08/08 23:12:03 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 36) java.lang.NullPointerException at $line72.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:31) at $line72.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:31) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/08/08 23:12:03 ERROR TaskSetManager: Task 0 in stage 13.0 failed 1 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 36, localhost): java.lang.NullPointerException at $anonfun$1.apply(<console>:31) at $anonfun$1.apply(<console>:31) Any ideas what is happening! Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.