Re: which is better RDD or Dataframe?

2016-02-15 Thread Ted Yu
Can you describe the types of query you want to perform ?

If you don't already have a data flow which is optimized for RDD, I would
suggest using Dataframe API (or event DataSet API) which gives optimizer
more room.

Cheers

On Mon, Feb 15, 2016 at 6:43 PM, Divya Gehlot 
wrote:

> Hi,
> I would like to know which gives better performance RDDs or dataframes ?
> Like for one scenario :
> 1.Read the file as RDD and register as temp table and fire SQL query
>
>  2.Read the file through Dataframe API or convert the RDD to dataframe and
> use dataframe APIs to process the data.
>
> For the scenario like above which gives better performance.
> Does any body have benchmark or statistical data regarding that ?
>
>
> Thanks,
> Divya
>


which is better RDD or Dataframe?

2016-02-15 Thread Divya Gehlot
Hi,
I would like to know which gives better performance RDDs or dataframes ?
Like for one scenario :
1.Read the file as RDD and register as temp table and fire SQL query

 2.Read the file through Dataframe API or convert the RDD to dataframe and
use dataframe APIs to process the data.

For the scenario like above which gives better performance.
Does any body have benchmark or statistical data regarding that ?


Thanks,
Divya