Just to provide more details, I have 200 blocks (parquet files) with avg block size of 70M. When limiting the result set to 100k ("select * from tbl order by c1 limit 100000") works but when increasing it to say 1M I keep running into this error: Connection reset by peer: socket write error
I would ultimately want to store the result set as parquet. Are there any other options to handle this? Thanks! On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev <buntu...@gmail.com> wrote: > I got 14GB of parquet data and when trying to apply order by using spark > sql and save the first 1M rows but keeps failing with "Connection reset > by peer: socket write error" on the executors. > > I've allocated about 10g to both driver and the executors along with > setting the maxResultSize to 10g but still fails with the same error. I'm > using Spark 1.5.1. > > Are there any other alternative ways to handle this? > > Thanks! >