Re: Dataframe fails for large resultsize

Buntu Dev Fri, 29 Apr 2016 14:38:18 -0700

Just to provide more details, I have 200 blocks (parquet files) with avg
block size of 70M. When limiting the result set to 100k ("select * from tbl
order by c1 limit 100000") works but when increasing it to say 1M I keep
running into this error:
 Connection reset by peer: socket write error


I would ultimately want to store the result set as parquet. Are there any
other options to handle this?

Thanks!

On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev <buntu...@gmail.com> wrote:

> I got 14GB of parquet data and when trying to apply order by using spark
> sql and save the first 1M rows but keeps failing with "Connection reset
> by peer: socket write error" on the executors.
>
> I've allocated about 10g to both driver and the executors along with
> setting the maxResultSize to 10g but still fails with the same error. I'm
> using Spark 1.5.1.
>
> Are there any other alternative ways to handle this?
>
> Thanks!
>

Re: Dataframe fails for large resultsize

Reply via email to