Did you try to cache a DataFrame with just a single row? Do you rows have any columns with null values? Can you post a code snippet here on how you load/generate the dataframe? Does dataframe.rdd.cache work?
*Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Oct 29, 2015 at 4:33 AM, Zhang, Jingyu <jingyu.zh...@news.com.au> wrote: > It is not a problem to use JavaRDD.cache() for 200M data (all Objects read > form Json Format). But when I try to use DataFrame.cache(), It shown > exception in below. > > My machine can cache 1 G data in Avro format without any problem. > > 15/10/29 13:26:23 INFO GeneratePredicate: Code generated in 154.531827 ms > > 15/10/29 13:26:23 INFO GenerateUnsafeProjection: Code generated in > 27.832369 ms > > 15/10/29 13:26:23 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID > 1) > > java.lang.NullPointerException > > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:497) > > at > org.apache.spark.sql.SQLContext$$anonfun$9$$anonfun$apply$1$$anonfun$apply$2.apply( > SQLContext.scala:500) > > at > org.apache.spark.sql.SQLContext$$anonfun$9$$anonfun$apply$1$$anonfun$apply$2.apply( > SQLContext.scala:500) > > at scala.collection.TraversableLike$$anonfun$map$1.apply( > TraversableLike.scala:244) > > at scala.collection.TraversableLike$$anonfun$map$1.apply( > TraversableLike.scala:244) > > at scala.collection.IndexedSeqOptimized$class.foreach( > IndexedSeqOptimized.scala:33) > > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) > > at org.apache.spark.sql.SQLContext$$anonfun$9$$anonfun$apply$1.apply( > SQLContext.scala:500) > > at org.apache.spark.sql.SQLContext$$anonfun$9$$anonfun$apply$1.apply( > SQLContext.scala:498) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next( > InMemoryColumnarTableScan.scala:127) > > at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next( > InMemoryColumnarTableScan.scala:120) > > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278 > ) > > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38 > ) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38 > ) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > 15/10/29 13:26:23 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, > localhost): java.lang.NullPointerException > > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > > > Thanks, > > > Jingyu > > This message and its attachments may contain legally privileged or > confidential information. It is intended solely for the named addressee. If > you are not the addressee indicated in this message or responsible for > delivery of the message to the addressee, you may not copy or deliver this > message or its attachments to anyone. Rather, you should permanently delete > this message and its attachments and kindly notify the sender by reply > e-mail. Any content of this message and its attachments which does not > relate to the official business of the sending company must be taken not to > have been sent or endorsed by that company or any of its related entities. > No warranty is made that the e-mail or attachments are free from computer > virus or other defect.