Dataframe uses a more efficient binary representation to store and persist data. You should go for that one in most of the cases. Rdd is slower.
> On 27 Jun 2016, at 07:54, Brandon White <bwwintheho...@gmail.com> wrote: > > What is the difference between persisting a dataframe and a rdd? When I > persist my RDD, the UI says it takes 50G or more of memory. When I persist my > dataframe, the UI says it takes 9G or less of memory. > > Does the dataframe not persist the actual content? Is it better / faster to > persist a RDD when doing a lot of filter, mapping, and collecting operations? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org