Re: HW imbalance

2015-01-30 Thread Sandy Ryza
Yup, if you turn off YARN's CPU scheduling then you can run executors to take advantage of the extra memory on the larger boxes. But then some of the nodes will end up severely oversubscribed from a CPU perspective, so I would definitely recommend against that. On Fri, Jan 30, 2015 at 3:31 AM,

Re: HW imbalance

2015-01-30 Thread Michael Segel
Sorry, but I think there’s a disconnect. When you launch a job under YARN on any of the hadoop clusters, the number of mappers/reducers is not set and is dependent on the amount of available resources. So under Ambari, CM, or MapR’s Admin, you should be able to specify the amount of

Re: HW imbalance

2015-01-29 Thread Sandy Ryza
My answer was based off the specs that Antony mentioned: different amounts of memory, but 10 cores on all the boxes. In that case, a single Spark application's homogeneously sized executors won't be able to take advantage of the extra memory on the bigger boxes. Cloudera Manager can certainly

Re: HW imbalance

2015-01-29 Thread Michael Segel
@Sandy, There are two issues. The spark context (executor) and then the cluster under YARN. If you have a box where each yarn job needs 3GB, and your machine has 36GB dedicated as a YARN resource, you can run 12 executors on the single node. If you have a box that has 72GB dedicated to

Re: HW imbalance

2015-01-28 Thread simon elliston ball
You shouldn’t have any issues with differing nodes on the latest Ambari and Hortonworks. It works fine for mixed hardware and spark on yarn. Simon On Jan 26, 2015, at 4:34 PM, Michael Segel msegel_had...@hotmail.com wrote: If you’re running YARN, then you should be able to mix and max

HW imbalance

2015-01-26 Thread Antony Mayi
Hi, is it possible to mix hosts with (significantly) different specs within a cluster (without wasting the extra resources)? for example having 10 nodes with 36GB RAM/10CPUs now trying to add 3 hosts with 128GB/10CPUs - is there a way to utilize the extra memory by spark executors (as my

Re: HW imbalance

2015-01-26 Thread Antony Mayi
should have said I am running as yarn-client. all I can see is specifying the generic executor memory that is then to be used in all containers. On Monday, 26 January 2015, 16:48, Charles Feduke charles.fed...@gmail.com wrote: You should look at using Mesos. This should abstract

Re: HW imbalance

2015-01-26 Thread Charles Feduke
You should look at using Mesos. This should abstract away the individual hosts into a pool of resources and make the different physical specifications manageable. I haven't tried configuring Spark Standalone mode to have different specs on different machines but based on spark-env.sh.template: #

Re: HW imbalance

2015-01-26 Thread Sandy Ryza
Hi Antony, Unfortunately, all executors for any single Spark application must have the same amount of memory. It's possibly to configure YARN with different amounts of memory for each host (using yarn.nodemanager.resource.memory-mb), so other apps might be able to take advantage of the extra