Re: YARN process with Spark

2016-03-14 Thread Alexander Pivovarov
As in Hadoop 2.5.1 of MapR 4.1.0, virtual memory checker is disabled while physical memory checker is enabled by default. Since on Centos/RHEL 6 there are aggressive allocation of virtual memory due to OS behavior, you should disable virtual memory checker or increase

Re: YARN process with Spark

2016-03-14 Thread Steve Loughran
On 11 Mar 2016, at 23:01, Alexander Pivovarov > wrote: Forgot to mention. To avoid unnecessary container termination add the following setting to yarn yarn.nodemanager.vmem-check-enabled = false That can kill performance on a shared cluster:

Re: YARN process with Spark

2016-03-11 Thread Alexander Pivovarov
you need to set yarn.scheduler.minimum-allocation-mb=32 otherwise Spark AM container will be running on dedicated box instead of running together with the executor container on one of the boxes for slaves I use Amazon EC2 r3.2xlarge box (61GB / 8 cores) - cost ~$0.10 / hour (spot instance)

Re: YARN process with Spark

2016-03-11 Thread Mich Talebzadeh
Thanks Koert and Alexander I think the yarn configuration parameters in yarn-site,xml are important. For those I have yarn.nodemanager.resource.memory-mb Amount of max physical memory, in MB, that can be allocated for YARN containers. 8192 yarn.nodemanager.vmem-pmem-ratio Ratio

Re: YARN process with Spark

2016-03-11 Thread Alexander Pivovarov
Forgot to mention. To avoid unnecessary container termination add the following setting to yarn yarn.nodemanager.vmem-check-enabled = false

Re: YARN process with Spark

2016-03-11 Thread Alexander Pivovarov
YARN cores are virtual cores which are used just to calculate available resources. But usually memory is used to manage yarn resources (not cores) spark executor memory should be ~90% of yarn.scheduler.maximum-allocation-mb (which should be the same as yarn.nodemanager.resource.memory-mb) ~10%

Re: YARN process with Spark

2016-03-11 Thread Koert Kuipers
you get a spark executor per yarn container. the spark executor can have multiple cores, yes. this is configurable. so the number of partitions that can be processed in parallel is num-executors * executor-cores. and for processing a partition the available memory is executor-memory /

YARN process with Spark

2016-03-11 Thread Mich Talebzadeh
Hi, Can these be clarified please 1. Can a YARN container use more than one core and if this is configurable? 2. A YARN container is constraint to 8MB by " yarn.scheduler.maximum-allocation-mb". If a YARN container is a Spark process will that limit also include the memory Spark