Did you try to cache a DataFrame with just a single row?
Do you rows have any columns with null values?
Can you post a code snippet here on how you load/generate the dataframe?
Does dataframe.rdd.cache work?
*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com
On Thu, Oct 29, 2015 at 4:33
Thanks Romi,
I resize the dataset to 7MB, however, the code show NullPointerException
exception as well.
Did you try to cache a DataFrame with just a single row?
Yes, I tried. But, Same problem.
.
Do you rows have any columns with null values?
No, I had filter out null values before cache the
>
> BUT, after change limit(500) to limit(1000). The code report
> NullPointerException.
>
I had a similar situation, and the problem was with a certain record.
Try to find which records are returned when you limit to 1000 but not
returned when you limit to 500.
Could it be a NPE thrown from
It is not a problem to use JavaRDD.cache() for 200M data (all Objects read
form Json Format). But when I try to use DataFrame.cache(), It shown
exception in below.
My machine can cache 1 G data in Avro format without any problem.
15/10/29 13:26:23 INFO GeneratePredicate: Code generated in