Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread Tang Jinxin
ator() will create a lot of jobs and will bring a lot of time consumption. > > > > Best regards, > > maqy > > > > 发件人: Michael Artz > 发送时间: 2020年4月22日 16:09 > 收件人: maqy > 抄送: user@spark.apache.org > 主题: Re: Can I collect Dataset[Row] to driver with

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread Andrew Melo
terator() to traverse the dataset instead of collect > to the driver, but toLocalIterator() will create a lot of jobs and will bring > a lot of time consumption. > > > > Best regards, > > maqy > > > > 发件人: Michael Artz > 发送时间: 2020年4月22日 16:09 > 收件人: maqy

回复: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread maqy
发件人: Michael Artz 发送时间: 2020年4月22日 16:09 收件人: maqy 抄送: user@spark.apache.org 主题: Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]? What would you do with it once you get it into driver in a Dataset[Row]? Sent from my iPhone On Apr 22, 2020, at 3:06 AM, maqy <454

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread Michael Artz
What would you do with it once you get it into driver in a Dataset[Row]? Sent from my iPhone > On Apr 22, 2020, at 3:06 AM, maqy <454618...@qq.com> wrote: > >  > When the data is stored in the Dataset [Row] format, the memory usage is very > small. > When I use collect () to collect data to

Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread maqy
 When the data is stored in the Dataset [Row] format, the memory usage is very small.  When I use collect () to collect data to the driver, each line of the dataset will be converted to Row and stored in an array, which will bring great memory overhead.  So, can I collect Dataset[Row] to