Re: cache DataFrame

Gaurav Agarwal Thu, 11 Feb 2016 15:12:07 -0800

Thanks for the below info.
I have one more question. I have my own framework where the Sql query is
already build ,so I am thinking instead of using data frame filter criteria
I could use
Dataframe d=sqlcontext.Sql(" and append query here").
d.printschema()
List<row> row =d.collectaslist();


Here when I say d.collectaslist then it will go to database and execute
query there.

Nothing will be cached there .please confirm my query.

Thanks



On Feb 12, 2016 12:01 AM, "Rishabh Wadhawan" <rishabh...@gmail.com> wrote:

> Hi Gaurav Spark will not load the tables into memory at both the points as
> DataFrames are just abstractions of something that might happen in future
> when you actually throw an (ACTION) like say df.collectAsList or df.show.
> When you run DataFrame df =  sContext.load("jdbc","(select * from employee)
> as employee); all spark does is that it just generates a queryExecution
> plan. That plan gets executed when you throw and ACTION statement.
> Take this example.
>
> DataFrame df = sContext.load("jdbc","(select * from employee) as
> employee); // Spark makes a query execution tree
> df.filter(df.col(“wmpid”).equalTo(“!”));  // Spark adds this to query
> execution tree
> System.out.println(df.queryExecution()) // Print out the query execution
> plan, with physical and logical plans.
>
> df.show(); /*This is when spark starts loading data into memory and
> executes the optimized execution plan, according to the query execution
> tree. This is the point when data gets                * materialized
>                   */
>
> > On Feb 11, 2016, at 11:20 AM, Gaurav Agarwal <gaurav130...@gmail.com>
> wrote:
> >
> > Hi
> >
> > When the dataFrame will load the table into memory when it reads from
> HIVe/Phoenix or from any database.
> > These are two points where need one info , when tables will be loaded
> into memory or cached when at  point 1 or point 2 below.
> >
> >  1. DataFrame df = sContext.load("jdbc","(select * from employee) as
> employee);
> >
> > 2.sContext.sql("select * from employee where wmpid="!");
> >
> >
> >
> >
> >
> >
>
>

Re: cache DataFrame

Reply via email to