Re: is there any significant performance issue converting between rdd and dataframes in pyspark?

2015-07-02 Thread Davies Liu
On Mon, Jun 29, 2015 at 1:27 PM, Axel Dahl a...@whisperstream.com wrote: In pyspark, when I convert from rdds to dataframes it looks like the rdd is being materialized/collected/repartitioned before it's converted to a dataframe. It's not true. When converting a RDD to dataframe, it only take

is there any significant performance issue converting between rdd and dataframes in pyspark?

2015-06-29 Thread Axel Dahl
In pyspark, when I convert from rdds to dataframes it looks like the rdd is being materialized/collected/repartitioned before it's converted to a dataframe. Just wondering if there's any guidelines for doing this conversion and whether it's best to do it early to get the performance benefits of