Yup, if you turn off YARN's CPU scheduling then you can run executors to
take advantage of the extra memory on the larger boxes. But then some of
the nodes will end up severely oversubscribed from a CPU perspective, so I
would definitely recommend against that.
On Fri, Jan 30, 2015 at 3:31 AM,
Sorry, but I think there’s a disconnect.
When you launch a job under YARN on any of the hadoop clusters, the number of
mappers/reducers is not set and is dependent on the amount of available
resources.
So under Ambari, CM, or MapR’s Admin, you should be able to specify the amount
of
My answer was based off the specs that Antony mentioned: different amounts
of memory, but 10 cores on all the boxes. In that case, a single Spark
application's homogeneously sized executors won't be able to take advantage
of the extra memory on the bigger boxes.
Cloudera Manager can certainly
@Sandy,
There are two issues.
The spark context (executor) and then the cluster under YARN.
If you have a box where each yarn job needs 3GB, and your machine has 36GB
dedicated as a YARN resource, you can run 12 executors on the single node.
If you have a box that has 72GB dedicated to
You shouldn’t have any issues with differing nodes on the latest Ambari and
Hortonworks. It works fine for mixed hardware and spark on yarn.
Simon
On Jan 26, 2015, at 4:34 PM, Michael Segel msegel_had...@hotmail.com wrote:
If you’re running YARN, then you should be able to mix and max
Hi,
is it possible to mix hosts with (significantly) different specs within a
cluster (without wasting the extra resources)? for example having 10 nodes with
36GB RAM/10CPUs now trying to add 3 hosts with 128GB/10CPUs - is there a way to
utilize the extra memory by spark executors (as my
should have said I am running as yarn-client. all I can see is specifying the
generic executor memory that is then to be used in all containers.
On Monday, 26 January 2015, 16:48, Charles Feduke
charles.fed...@gmail.com wrote:
You should look at using Mesos. This should abstract
You should look at using Mesos. This should abstract away the individual
hosts into a pool of resources and make the different physical
specifications manageable.
I haven't tried configuring Spark Standalone mode to have different specs
on different machines but based on spark-env.sh.template:
#
Hi Antony,
Unfortunately, all executors for any single Spark application must have the
same amount of memory. It's possibly to configure YARN with different
amounts of memory for each host (using
yarn.nodemanager.resource.memory-mb), so other apps might be able to take
advantage of the extra