Can you describe the types of query you want to perform ?
If you don't already have a data flow which is optimized for RDD, I would
suggest using Dataframe API (or event DataSet API) which gives optimizer
more room.
Cheers
On Mon, Feb 15, 2016 at 6:43 PM, Divya Gehlot
wrote:
> Hi,
> I would like to know which gives better performance RDDs or dataframes ?
> Like for one scenario :
> 1.Read the file as RDD and register as temp table and fire SQL query
>
> 2.Read the file through Dataframe API or convert the RDD to dataframe and
> use dataframe APIs to process the data.
>
> For the scenario like above which gives better performance.
> Does any body have benchmark or statistical data regarding that ?
>
>
> Thanks,
> Divya
>