RE: Null pointer exception with larger datasets
Thanks Akhil. -Naveen. From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Tuesday, November 18, 2014 1:19 PM To: Naveen Kumar Pokala Cc: user@spark.apache.org Subject: Re: Null pointer exception with larger datasets Make sure your list is not null, if that is null then its more like doing: JavaRDD distData = sc.parallelize(null) distData.foreach(println) Thanks Best Regards On Tue, Nov 18, 2014 at 12:07 PM, Naveen Kumar Pokala mailto:npok...@spcapitaliq.com>> wrote: Hi, I am having list Students and size is one Lakh and I am trying to save the file. It is throwing null pointer exception. JavaRDD distData = sc.parallelize(list); distData.saveAsTextFile("hdfs://master/data/spark/instruments.txt"); 14/11/18 01:33:21 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, master): java.lang.NullPointerException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) How to handle this? -Naveen
Re: Null pointer exception with larger datasets
Make sure your list is not null, if that is null then its more like doing: JavaRDD distData = sc.parallelize(*null*) distData.foreach(println) Thanks Best Regards On Tue, Nov 18, 2014 at 12:07 PM, Naveen Kumar Pokala < npok...@spcapitaliq.com> wrote: > Hi, > > > > I am having list Students and size is one Lakh and I am trying to save the > file. It is throwing null pointer exception. > > > > JavaRDD distData = sc.parallelize(list); > > > > distData.saveAsTextFile("hdfs://master/data/spark/instruments.txt"); > > > > > > 14/11/18 01:33:21 WARN scheduler.TaskSetManager: Lost task 5.0 in stage > 0.0 (TID 5, master): java.lang.NullPointerException: > > > > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) > > > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) > > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > > org.apache.spark.scheduler.Task.run(Task.scala:54) > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > java.lang.Thread.run(Thread.java:745) > > > > > > How to handle this? > > > > -Naveen >
Null pointer exception with larger datasets
Hi, I am having list Students and size is one Lakh and I am trying to save the file. It is throwing null pointer exception. JavaRDD distData = sc.parallelize(list); distData.saveAsTextFile("hdfs://master/data/spark/instruments.txt"); 14/11/18 01:33:21 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, master): java.lang.NullPointerException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) How to handle this? -Naveen