Re: Dataframe fails for large resultsize

2016-04-29 Thread Buntu Dev
Thanks Krishna, but I believe the memory consumed on the executors is being
exhausted in my case. I've allocated the max 10g that I can to both the
driver and executors. Are there any alternatives solutions to fetching the
top 1M rows after ordering the dataset?

Thanks!

On Fri, Apr 29, 2016 at 6:01 PM, Krishna  wrote:

> I recently encountered similar network related errors and was able to fix
> it by applying the ethtool updates decribed here [
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085]
>
>
> On Friday, April 29, 2016, Buntu Dev  wrote:
>
>> Just to provide more details, I have 200 blocks (parquet files) with avg
>> block size of 70M. When limiting the result set to 100k ("select * from tbl
>> order by c1 limit 10") works but when increasing it to say 1M I keep
>> running into this error:
>>  Connection reset by peer: socket write error
>>
>> I would ultimately want to store the result set as parquet. Are there any
>> other options to handle this?
>>
>> Thanks!
>>
>> On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev  wrote:
>>
>>> I got 14GB of parquet data and when trying to apply order by using spark
>>> sql and save the first 1M rows but keeps failing with "Connection reset
>>> by peer: socket write error" on the executors.
>>>
>>> I've allocated about 10g to both driver and the executors along with
>>> setting the maxResultSize to 10g but still fails with the same error.
>>> I'm using Spark 1.5.1.
>>>
>>> Are there any other alternative ways to handle this?
>>>
>>> Thanks!
>>>
>>
>>


Re: Dataframe fails for large resultsize

2016-04-29 Thread Krishna
I recently encountered similar network related errors and was able to fix
it by applying the ethtool updates decribed here [
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085]

On Friday, April 29, 2016, Buntu Dev  wrote:

> Just to provide more details, I have 200 blocks (parquet files) with avg
> block size of 70M. When limiting the result set to 100k ("select * from tbl
> order by c1 limit 10") works but when increasing it to say 1M I keep
> running into this error:
>  Connection reset by peer: socket write error
>
> I would ultimately want to store the result set as parquet. Are there any
> other options to handle this?
>
> Thanks!
>
> On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev  > wrote:
>
>> I got 14GB of parquet data and when trying to apply order by using spark
>> sql and save the first 1M rows but keeps failing with "Connection reset
>> by peer: socket write error" on the executors.
>>
>> I've allocated about 10g to both driver and the executors along with
>> setting the maxResultSize to 10g but still fails with the same error.
>> I'm using Spark 1.5.1.
>>
>> Are there any other alternative ways to handle this?
>>
>> Thanks!
>>
>
>


Re: Dataframe fails for large resultsize

2016-04-29 Thread Buntu Dev
Just to provide more details, I have 200 blocks (parquet files) with avg
block size of 70M. When limiting the result set to 100k ("select * from tbl
order by c1 limit 10") works but when increasing it to say 1M I keep
running into this error:
 Connection reset by peer: socket write error

I would ultimately want to store the result set as parquet. Are there any
other options to handle this?

Thanks!

On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev  wrote:

> I got 14GB of parquet data and when trying to apply order by using spark
> sql and save the first 1M rows but keeps failing with "Connection reset
> by peer: socket write error" on the executors.
>
> I've allocated about 10g to both driver and the executors along with
> setting the maxResultSize to 10g but still fails with the same error. I'm
> using Spark 1.5.1.
>
> Are there any other alternative ways to handle this?
>
> Thanks!
>


Dataframe fails for large resultsize

2016-04-27 Thread Buntu Dev
I got 14GB of parquet data and when trying to apply order by using spark
sql and save the first 1M rows but keeps failing with "Connection reset by
peer: socket write error" on the executors.

I've allocated about 10g to both driver and the executors along with
setting the maxResultSize to 10g but still fails with the same error. I'm
using Spark 1.5.1.

Are there any other alternative ways to handle this?

Thanks!