Re: Scalability of group by

ayan guha Mon, 27 Apr 2015 18:58:36 -0700

Hi

Can you test on a smaller dataset to identify if it is cluster issue or
scaling issue in spark
On 28 Apr 2015 11:30, "Ulanov, Alexander" <alexander.ula...@hp.com> wrote:


>  Hi,
>
>
>
> I am running a group by on a dataset of 2B of RDD[Row [id, time, value]]
> in Spark 1.3 as follows:
>
> “select id, time, first(value) from data group by id, time”
>
>
>
> My cluster is 8 nodes with 16GB RAM and one worker per node. Each executor
> is allocated with 5GB of memory. However, all executors are being lost
> during the query execution and I get “ExecutorLostFailure”.
>
>
>
> Could you suggest what might be the reason for it? Could it be that “group
> by” is implemented as RDD.groupBy so it holds the group by result in
> memory? What is the workaround?
>
>
>
> Best regards, Alexander
>

Re: Scalability of group by

Reply via email to