Hi, I've been asking a similar question myself too! Thanks for sending it to the mailing list!
Going from a RDD to a Dataset triggers a job to calculate a schema (unless the RDD is RDD[Row]). I *think* that transitioning from a Dataset to a RDD is almost a no op since a Dataset requires more to generate underlying data structures and optimizations. Can't wait to hear what more advanced people say. Jacek On 24 Jun 2016 8:00 a.m., "pan" <pranav.na...@gmail.com> wrote: Hello, I am trying to understand the cost of converting an RDD to Dataframe and back. Would a conversion back and forth very frequently cost performance. I do observe that some operations like join are implemented very differently for RDD (pair) and Dataframe so trying to figure out the cose of converting one to another Regards, Pranav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cost-of-converting-RDD-s-to-dataframe-and-back-tp27222.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org