Re: Dataframe fails for large resultsize

Krishna Fri, 29 Apr 2016 18:02:27 -0700

I recently encountered similar network related errors and was able to fix
it by applying the ethtool updates decribed here [
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085]


On Friday, April 29, 2016, Buntu Dev <buntu...@gmail.com> wrote:

> Just to provide more details, I have 200 blocks (parquet files) with avg
> block size of 70M. When limiting the result set to 100k ("select * from tbl
> order by c1 limit 100000") works but when increasing it to say 1M I keep
> running into this error:
>  Connection reset by peer: socket write error
>
> I would ultimately want to store the result set as parquet. Are there any
> other options to handle this?
>
> Thanks!
>
> On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev <buntu...@gmail.com
> <javascript:_e(%7B%7D,'cvml','buntu...@gmail.com');>> wrote:
>
>> I got 14GB of parquet data and when trying to apply order by using spark
>> sql and save the first 1M rows but keeps failing with "Connection reset
>> by peer: socket write error" on the executors.
>>
>> I've allocated about 10g to both driver and the executors along with
>> setting the maxResultSize to 10g but still fails with the same error.
>> I'm using Spark 1.5.1.
>>
>> Are there any other alternative ways to handle this?
>>
>> Thanks!
>>
>
>

Re: Dataframe fails for large resultsize

Reply via email to