guha [mailto:guha.a...@gmail.com]
*Sent:* Monday, April 27, 2015 6:58 PM
*To:* Ulanov, Alexander
*Cc:* user@spark.apache.org
*Subject:* Re: Scalability of group by
Hi
Can you test on a smaller dataset to identify if it is cluster issue or
scaling issue in spark
On 28 Apr 2015 11:30
@spark.apache.org
Subject: Re: Scalability of group by
Hi,
I can offer a few ideas to investigate in regards to your issue here. I've run
into resource issues doing shuffle operations with a much smaller dataset than
2B. The data is going to be saved to disk by the BlockManager as part
@spark.apache.org
Subject: Re: Scalability of group by
Hi
Can you test on a smaller dataset to identify if it is cluster issue or scaling
issue in spark
On 28 Apr 2015 11:30, Ulanov, Alexander
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Hi,
I am running a group by on a dataset of 2B
Hi
Can you test on a smaller dataset to identify if it is cluster issue or
scaling issue in spark
On 28 Apr 2015 11:30, Ulanov, Alexander alexander.ula...@hp.com wrote:
Hi,
I am running a group by on a dataset of 2B of RDD[Row [id, time, value]]
in Spark 1.3 as follows:
“select id,