Re: Creating DataFrame with the implicit localSeqToDatasetHolder has bad performance

2018-03-12 Thread msinton
I think I understand that in the second case the DataFrame is created as a Local object, so it lives in the memory of the driver and is serialized as part of the Task that gets sent to each executor. Though I think the implicit conversion here is something that others could also misunderstand -

Creating DataFrame with the implicit localSeqToDatasetHolder has bad performance

2018-03-12 Thread msinton
Hi, Using Scala, spark version 2.3.0 (also 2.2.0):I've come across two main ways to create a DataFrame from a sequence. The more common:(0 until 10).toDF("value") *good*and the less common (but still prevalent):(0 until 10).toDF("value")*bad*The latter results in much worse performance