I think I understand that in the second case the DataFrame is created as a Local object, so it lives in the memory of the driver and is serialized as part of the Task that gets sent to each executor.
Though I think the implicit conversion here is something that others could also misunderstand - maybe it would be better if it was not part of spark.implicits? Or at least something can be said/warning in the developer guides. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org