Re: cache DataFrame

2016-02-11 Thread Gaurav Agarwal
Thanks for the below info. I have one more question. I have my own framework where the Sql query is already build ,so I am thinking instead of using data frame filter criteria I could use Dataframe d=sqlcontext.Sql(" and append query here"). d.printschema() List row =d.collectaslist(); Here when

cache DataFrame

2016-02-11 Thread Gaurav Agarwal
Hi When the dataFrame will load the table into memory when it reads from HIVe/Phoenix or from any database. These are two points where need one info , when tables will be loaded into memory or cached when at point 1 or point 2 below. 1. DataFrame df = sContext.load("jdbc","(select * from

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Romi Kuntsman
Did you try to cache a DataFrame with just a single row? Do you rows have any columns with null values? Can you post a code snippet here on how you load/generate the dataframe? Does dataframe.rdd.cache work? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Oct 29, 2015 at 4:33

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Zhang, Jingyu
Thanks Romi, I resize the dataset to 7MB, however, the code show NullPointerException exception as well. Did you try to cache a DataFrame with just a single row? Yes, I tried. But, Same problem. . Do you rows have any columns with null values? No, I had filter out null values before cache the

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Romi Kuntsman
> > BUT, after change limit(500) to limit(1000). The code report > NullPointerException. > I had a similar situation, and the problem was with a certain record. Try to find which records are returned when you limit to 1000 but not returned when you limit to 500. Could it be a NPE thrown from

NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-28 Thread Zhang, Jingyu
It is not a problem to use JavaRDD.cache() for 200M data (all Objects read form Json Format). But when I try to use DataFrame.cache(), It shown exception in below. My machine can cache 1 G data in Avro format without any problem. 15/10/29 13:26:23 INFO GeneratePredicate: Code generated in