Re: Scalability of group by

2015-04-28 Thread Richard Marscher
guha [mailto:guha.a...@gmail.com] *Sent:* Monday, April 27, 2015 6:58 PM *To:* Ulanov, Alexander *Cc:* user@spark.apache.org *Subject:* Re: Scalability of group by Hi Can you test on a smaller dataset to identify if it is cluster issue or scaling issue in spark On 28 Apr 2015 11:30

RE: Scalability of group by

2015-04-28 Thread Ulanov, Alexander
@spark.apache.org Subject: Re: Scalability of group by Hi, I can offer a few ideas to investigate in regards to your issue here. I've run into resource issues doing shuffle operations with a much smaller dataset than 2B. The data is going to be saved to disk by the BlockManager as part

RE: Scalability of group by

2015-04-27 Thread Ulanov, Alexander
@spark.apache.org Subject: Re: Scalability of group by Hi Can you test on a smaller dataset to identify if it is cluster issue or scaling issue in spark On 28 Apr 2015 11:30, Ulanov, Alexander alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote: Hi, I am running a group by on a dataset of 2B

Re: Scalability of group by

2015-04-27 Thread ayan guha
Hi Can you test on a smaller dataset to identify if it is cluster issue or scaling issue in spark On 28 Apr 2015 11:30, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, I am running a group by on a dataset of 2B of RDD[Row [id, time, value]] in Spark 1.3 as follows: “select id,