Re: Dataframe fails for large resultsize
Thanks Krishna, but I believe the memory consumed on the executors is being exhausted in my case. I've allocated the max 10g that I can to both the driver and executors. Are there any alternatives solutions to fetching the top 1M rows after ordering the dataset? Thanks! On Fri, Apr 29, 2016 at 6:01 PM, Krishna wrote: > I recently encountered similar network related errors and was able to fix > it by applying the ethtool updates decribed here [ > https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085] > > > On Friday, April 29, 2016, Buntu Dev wrote: > >> Just to provide more details, I have 200 blocks (parquet files) with avg >> block size of 70M. When limiting the result set to 100k ("select * from tbl >> order by c1 limit 10") works but when increasing it to say 1M I keep >> running into this error: >> Connection reset by peer: socket write error >> >> I would ultimately want to store the result set as parquet. Are there any >> other options to handle this? >> >> Thanks! >> >> On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev wrote: >> >>> I got 14GB of parquet data and when trying to apply order by using spark >>> sql and save the first 1M rows but keeps failing with "Connection reset >>> by peer: socket write error" on the executors. >>> >>> I've allocated about 10g to both driver and the executors along with >>> setting the maxResultSize to 10g but still fails with the same error. >>> I'm using Spark 1.5.1. >>> >>> Are there any other alternative ways to handle this? >>> >>> Thanks! >>> >> >>
Re: Dataframe fails for large resultsize
I recently encountered similar network related errors and was able to fix it by applying the ethtool updates decribed here [ https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085] On Friday, April 29, 2016, Buntu Dev wrote: > Just to provide more details, I have 200 blocks (parquet files) with avg > block size of 70M. When limiting the result set to 100k ("select * from tbl > order by c1 limit 10") works but when increasing it to say 1M I keep > running into this error: > Connection reset by peer: socket write error > > I would ultimately want to store the result set as parquet. Are there any > other options to handle this? > > Thanks! > > On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev > wrote: > >> I got 14GB of parquet data and when trying to apply order by using spark >> sql and save the first 1M rows but keeps failing with "Connection reset >> by peer: socket write error" on the executors. >> >> I've allocated about 10g to both driver and the executors along with >> setting the maxResultSize to 10g but still fails with the same error. >> I'm using Spark 1.5.1. >> >> Are there any other alternative ways to handle this? >> >> Thanks! >> > >
Re: Dataframe fails for large resultsize
Just to provide more details, I have 200 blocks (parquet files) with avg block size of 70M. When limiting the result set to 100k ("select * from tbl order by c1 limit 10") works but when increasing it to say 1M I keep running into this error: Connection reset by peer: socket write error I would ultimately want to store the result set as parquet. Are there any other options to handle this? Thanks! On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev wrote: > I got 14GB of parquet data and when trying to apply order by using spark > sql and save the first 1M rows but keeps failing with "Connection reset > by peer: socket write error" on the executors. > > I've allocated about 10g to both driver and the executors along with > setting the maxResultSize to 10g but still fails with the same error. I'm > using Spark 1.5.1. > > Are there any other alternative ways to handle this? > > Thanks! >
Dataframe fails for large resultsize
I got 14GB of parquet data and when trying to apply order by using spark sql and save the first 1M rows but keeps failing with "Connection reset by peer: socket write error" on the executors. I've allocated about 10g to both driver and the executors along with setting the maxResultSize to 10g but still fails with the same error. I'm using Spark 1.5.1. Are there any other alternative ways to handle this? Thanks!