Thanks for the below info.
I have one more question. I have my own framework where the Sql query is
already build ,so I am thinking instead of using data frame filter criteria
I could use
Dataframe d=sqlcontext.Sql(" and append query here").
d.printschema()
List row =d.collectaslist();
Here when
Hi
When the dataFrame will load the table into memory when it reads from
HIVe/Phoenix or from any database.
These are two points where need one info , when tables will be loaded into
memory or cached when at point 1 or point 2 below.
1. DataFrame df = sContext.load("jdbc","(select * from
Did you try to cache a DataFrame with just a single row?
Do you rows have any columns with null values?
Can you post a code snippet here on how you load/generate the dataframe?
Does dataframe.rdd.cache work?
*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com
On Thu, Oct 29, 2015 at 4:33
Thanks Romi,
I resize the dataset to 7MB, however, the code show NullPointerException
exception as well.
Did you try to cache a DataFrame with just a single row?
Yes, I tried. But, Same problem.
.
Do you rows have any columns with null values?
No, I had filter out null values before cache the
>
> BUT, after change limit(500) to limit(1000). The code report
> NullPointerException.
>
I had a similar situation, and the problem was with a certain record.
Try to find which records are returned when you limit to 1000 but not
returned when you limit to 500.
Could it be a NPE thrown from
It is not a problem to use JavaRDD.cache() for 200M data (all Objects read
form Json Format). But when I try to use DataFrame.cache(), It shown
exception in below.
My machine can cache 1 G data in Avro format without any problem.
15/10/29 13:26:23 INFO GeneratePredicate: Code generated in