Re: single worker vs multiple workers on each machine

2014-09-12 Thread Mayur Rustagi
Another aspect to keep in mind is JVM above 8-10GB starts to misbehave.
Typically better to split up ~ 15GB intervals.
if you are choosing machines 10GB/Core is a approx to maintain.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi


On Fri, Sep 12, 2014 at 2:59 AM, Sean Owen so...@cloudera.com wrote:

 As I understand, there's generally not an advantage to running many
 executors per machine. Each will already use all the cores, and
 multiple executors just means splitting the available memory instead
 of having one big pool. I think there may be an argument at extremes
 of scale where one JVM with a huge heap might have excessive GC
 pauses, or too many open files, that kind of thing?

 On Thu, Sep 11, 2014 at 8:42 PM, Mike Sam mikesam...@gmail.com wrote:
  Hi There,
 
  I am new to Spark and I was wondering when you have so much memory on
 each
  machine of the cluster, is it better to run multiple workers with limited
  memory on each machine or is it better to run a single worker with
 access to
  the majority of the machine memory? If the answer is it depends, would
 you
  please elaborate?
 
  Thanks,
  Mike

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




single worker vs multiple workers on each machine

2014-09-11 Thread Mike Sam
Hi There,

I am new to Spark and I was wondering when you have so much memory on each
machine of the cluster, is it better to run multiple workers with limited
memory on each machine or is it better to run a single worker with access
to the majority of the machine memory? If the answer is it depends, would
you please elaborate?

Thanks,
Mike