Thanks for the below info. I have one more question. I have my own framework where the Sql query is already build ,so I am thinking instead of using data frame filter criteria I could use Dataframe d=sqlcontext.Sql(" and append query here"). d.printschema() List<row> row =d.collectaslist();
Here when I say d.collectaslist then it will go to database and execute query there. Nothing will be cached there .please confirm my query. Thanks On Feb 12, 2016 12:01 AM, "Rishabh Wadhawan" <rishabh...@gmail.com> wrote: > Hi Gaurav Spark will not load the tables into memory at both the points as > DataFrames are just abstractions of something that might happen in future > when you actually throw an (ACTION) like say df.collectAsList or df.show. > When you run DataFrame df = sContext.load("jdbc","(select * from employee) > as employee); all spark does is that it just generates a queryExecution > plan. That plan gets executed when you throw and ACTION statement. > Take this example. > > DataFrame df = sContext.load("jdbc","(select * from employee) as > employee); // Spark makes a query execution tree > df.filter(df.col(“wmpid”).equalTo(“!”)); // Spark adds this to query > execution tree > System.out.println(df.queryExecution()) // Print out the query execution > plan, with physical and logical plans. > > df.show(); /*This is when spark starts loading data into memory and > executes the optimized execution plan, according to the query execution > tree. This is the point when data gets * materialized > */ > > > On Feb 11, 2016, at 11:20 AM, Gaurav Agarwal <gaurav130...@gmail.com> > wrote: > > > > Hi > > > > When the dataFrame will load the table into memory when it reads from > HIVe/Phoenix or from any database. > > These are two points where need one info , when tables will be loaded > into memory or cached when at point 1 or point 2 below. > > > > 1. DataFrame df = sContext.load("jdbc","(select * from employee) as > employee); > > > > 2.sContext.sql("select * from employee where wmpid="!"); > > > > > > > > > > > > > >