Thanks. The data is there, I have checked the row count and dump to file. -Vincent
From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, February 05, 2015 2:28 PM To: Sun, Vincent Y Cc: user Subject: Re: get null potiner exception newAPIHadoopRDD.map() Is it possible that value.get("(area_code")) or value.get("time_zone")) returned null ? On Thu, Feb 5, 2015 at 10:58 AM, oxpeople <vincent.y....@bankofamerica.com<mailto:vincent.y....@bankofamerica.com>> wrote: I modified the code Base on CassandraCQLTest. to get the area code count base on time zone. I got error on create new map Rdd. Any helping is appreciated. Thanks. ... val arecodeRdd = sc.newAPIHadoopRDD(job.getConfiguration(), classOf[CqlPagingInputFormat], classOf[java.util.Map[String,ByteBuffer]], classOf[java.util.Map[String,ByteBuffer]]) println("Count: " + arecodeRdd.count) //got right count // arecodeRdd.saveAsTextFile("/tmp/arecodeRddrdd.txt"); val areaCodeSelectedRDD = arecodeRdd.map { case (key, value) => { * (ByteBufferUtil.string(value.get("(area_code")), ByteBufferUtil.string(value.get("time_zone"))) * //failed } } println("areaCodeRDD: " + areaCodeSelectedRDD.count) ... Here is the stack trace: 15/02/05 13:38:15 ERROR executor.Executor: Exception in task 109.0 in stage 1.0 (TID 366) java.lang.NullPointerException at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167) at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/05 13:38:15 INFO scheduler.TaskSetManager: Starting task 110.0 in stage 1.0 (TID 367, localhost, ANY, 1334 bytes) 15/02/05 13:38:15 INFO executor.Executor: Running task 110.0 in stage 1.0 (TID 367) 15/02/05 13:38:15 INFO rdd.NewHadoopRDD: Input split: ColumnFamilySplit((-8484684946848467066, '-8334833978340269788] @[127.0.0.1]) 15/02/05 13:38:15 WARN scheduler.TaskSetManager: Lost task 109.0 in stage 1.0 (TID 366, localhost): java.lang.NullPointerException at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167) at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/get-null-potiner-exception-newAPIHadoopRDD-map-tp21520.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> ---------------------------------------------------------------------- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.