I think I understand that in the second case the DataFrame is created as a
Local object, so it lives in the memory of the driver and is serialized as
part of the Task that gets sent to each executor.
Though I think the implicit conversion here is something that others could
also misunderstand -
Hi, Using Scala, spark version 2.3.0 (also 2.2.0):I've come across two main
ways to create a DataFrame from a sequence. The more
common:(0 until
10).toDF("value") *good*and the less common (but still
prevalent):(0 until 10).toDF("value")*bad*The latter
results in much worse performance