ator() will create a lot of jobs and will bring a lot of time
consumption. > > > > Best regards, > > maqy > > > > 发件人: Michael Artz > 发送时间:
2020年4月22日 16:09 > 收件人: maqy > 抄送: user@spark.apache.org > 主题: Re: Can I
collect Dataset[Row] to driver with
terator() to traverse the dataset instead of collect
> to the driver, but toLocalIterator() will create a lot of jobs and will bring
> a lot of time consumption.
>
>
>
> Best regards,
>
> maqy
>
>
>
> 发件人: Michael Artz
> 发送时间: 2020年4月22日 16:09
> 收件人: maqy
发件人: Michael Artz
发送时间: 2020年4月22日 16:09
收件人: maqy
抄送: user@spark.apache.org
主题: Re: Can I collect Dataset[Row] to driver without converting it to Array
[Row]?
What would you do with it once you get it into driver in a Dataset[Row]?
Sent from my iPhone
On Apr 22, 2020, at 3:06 AM, maqy <454
What would you do with it once you get it into driver in a Dataset[Row]?
Sent from my iPhone
> On Apr 22, 2020, at 3:06 AM, maqy <454618...@qq.com> wrote:
>
>
> When the data is stored in the Dataset [Row] format, the memory usage is very
> small.
> When I use collect () to collect data to
When the data is stored in the Dataset [Row] format, the memory usage is very
small.
When I use collect () to collect data to the driver, each line of the dataset
will be converted to Row and stored in an array, which will bring great memory
overhead.
So, can I collect Dataset[Row] to