Thanks. The data is there, I have checked the row count and dump to file.

-Vincent

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Thursday, February 05, 2015 2:28 PM
To: Sun, Vincent Y
Cc: user
Subject: Re: get null potiner exception newAPIHadoopRDD.map()

Is it possible that value.get("(area_code")) or value.get("time_zone")) 
returned null ?

On Thu, Feb 5, 2015 at 10:58 AM, oxpeople 
<vincent.y....@bankofamerica.com<mailto:vincent.y....@bankofamerica.com>> wrote:
 I modified the code Base on CassandraCQLTest. to get the area code count
base on time zone. I got error on create new map Rdd. Any helping is
appreciated. Thanks.

...   val arecodeRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
      classOf[CqlPagingInputFormat],
      classOf[java.util.Map[String,ByteBuffer]],
      classOf[java.util.Map[String,ByteBuffer]])

    println("Count: " + arecodeRdd.count) //got right count
  //  arecodeRdd.saveAsTextFile("/tmp/arecodeRddrdd.txt");
    val areaCodeSelectedRDD = arecodeRdd.map {
      case (key, value) => {
       * (ByteBufferUtil.string(value.get("(area_code")),
ByteBufferUtil.string(value.get("time_zone"))) * //failed
      }
    }
  println("areaCodeRDD: " + areaCodeSelectedRDD.count)

...

Here is the stack trace:
15/02/05 13:38:15 ERROR executor.Executor: Exception in task 109.0 in stage
1.0 (TID 366)
java.lang.NullPointerException
        at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
        at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
        at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68)
        at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
        at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
        at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
15/02/05 13:38:15 INFO scheduler.TaskSetManager: Starting task 110.0 in
stage 1.0 (TID 367, localhost, ANY, 1334 bytes)
15/02/05 13:38:15 INFO executor.Executor: Running task 110.0 in stage 1.0
(TID 367)
15/02/05 13:38:15 INFO rdd.NewHadoopRDD: Input split:
ColumnFamilySplit((-8484684946848467066, '-8334833978340269788]
@[127.0.0.1])
15/02/05 13:38:15 WARN scheduler.TaskSetManager: Lost task 109.0 in stage
1.0 (TID 366, localhost): java.lang.NullPointerException
        at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
        at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
        at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68)
        at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
        at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
        at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)









--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/get-null-potiner-exception-newAPIHadoopRDD-map-tp21520.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

Reply via email to