Re: Dataframe fails for large resultsize

2016-04-29 Thread Buntu Dev
Thanks Krishna, but I believe the memory consumed on the executors is being exhausted in my case. I've allocated the max 10g that I can to both the driver and executors. Are there any alternatives solutions to fetching the top 1M rows after ordering the dataset? Thanks! On Fri, Apr 29, 2016 at

Re: Dataframe fails for large resultsize

2016-04-29 Thread Krishna
I recently encountered similar network related errors and was able to fix it by applying the ethtool updates decribed here [ https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085] On Friday, April 29, 2016, Buntu Dev wrote: > Just to provide more details, I

Re: Dataframe fails for large resultsize

2016-04-29 Thread Buntu Dev
Just to provide more details, I have 200 blocks (parquet files) with avg block size of 70M. When limiting the result set to 100k ("select * from tbl order by c1 limit 10") works but when increasing it to say 1M I keep running into this error: Connection reset by peer: socket write error I

Dataframe fails for large resultsize

2016-04-27 Thread Buntu Dev
I got 14GB of parquet data and when trying to apply order by using spark sql and save the first 1M rows but keeps failing with "Connection reset by peer: socket write error" on the executors. I've allocated about 10g to both driver and the executors along with setting the maxResultSize to 10g but