Hi,

I've been asking a similar question myself too! Thanks for sending it to
the mailing list!

Going from a RDD to a Dataset triggers a job to calculate a schema (unless
the RDD is RDD[Row]).

I *think* that transitioning from a Dataset to a RDD is almost a no op
since a Dataset requires more to generate underlying data structures and
optimizations.

Can't wait to hear what more advanced people say.

Jacek
On 24 Jun 2016 8:00 a.m., "pan" <pranav.na...@gmail.com> wrote:

Hello,
   I am trying to understand the cost of converting an RDD to Dataframe and
back. Would a conversion back and forth very frequently cost performance.

I do observe that some operations like join are implemented very differently
for RDD (pair) and Dataframe so trying to figure out the cose of converting
one to another

Regards,
Pranav



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Cost-of-converting-RDD-s-to-dataframe-and-back-tp27222.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to