Re: ALS.trainImplicit running out of mem when using higher rank

Sean Owen Sat, 17 Jan 2015 02:33:13 -0800

I'm not sure how you are setting these values though. Where is
spark.yarn.executor.memoryOverhead=6144 ? Env variables aren't the
best way to set configuration either. Again have a look at
http://spark.apache.org/docs/latest/running-on-yarn.html


... --executor-memory 22g --conf
"spark.yarn.executor.memoryOverhead=2g" ... should do it, off the top
of my head. That should reserve 24g from YARN.

On Sat, Jan 17, 2015 at 5:29 AM, Antony Mayi <antonym...@yahoo.com> wrote:
> although this helped to improve it significantly I still run into this
> problem despite increasing the spark.yarn.executor.memoryOverhead vastly:
>
> export SPARK_EXECUTOR_MEMORY=24G
> spark.yarn.executor.memoryOverhead=6144
>
> yet getting this:
> 2015-01-17 04:47:40,389 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Container [pid=30211,containerID=container_1421451766649_0002_01_115969] is
> running beyond physical memory limits. Current usage: 30.1 GB of 30 GB
> physical memory used; 33.0 GB of 63.0 GB virtual memory used. Killing
> container.
>
> is there anything more I can do?
>
> thanks,
> Antony.
>
>
> On Monday, 12 January 2015, 8:21, Antony Mayi <antonym...@yahoo.com> wrote:
>
>
>
> this seems to have sorted it, awesome, thanks for great help.
> Antony.
>
>
> On Sunday, 11 January 2015, 13:02, Sean Owen <so...@cloudera.com> wrote:
>
>
>
> I would expect the size of the user/item feature RDDs to grow linearly
> with the rank, of course. They are cached, so that would drive cache
> memory usage on the cluster.
>
> This wouldn't cause executors to fail for running out of memory
> though. In fact, your error does not show the task failing for lack of
> memory. What it shows is that YARN thinks the task is using a little
> bit more memory than it said it would, and killed it.
>
> This happens sometimes with JVM-based YARN jobs since a JVM configured
> to use X heap ends up using a bit more than X physical memory if the
> heap reaches max size. So there's a bit of headroom built in and
> controlled by spark.yarn.executor.memoryOverhead
> (http://spark.apache.org/docs/latest/running-on-yarn.html) You can try
> increasing it to a couple GB.
>
>
> On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
> <antonym...@yahoo.com.invalid> wrote:
>> the question really is whether this is expected that the memory
>> requirements
>> grow rapidly with the rank... as I would expect memory is rather O(1)
>> problem with dependency only on the size of input data.
>>
>> if this is expected is there any rough formula to determine the required
>> memory based on ALS input and parameters?
>>
>> thanks,
>> Antony.
>>
>>
>> On Saturday, 10 January 2015, 10:47, Antony Mayi <antonym...@yahoo.com>
>> wrote:
>>
>>
>>
>> the actual case looks like this:
>> * spark 1.1.0 on yarn (cdh 5.2.1)
>> * ~8-10 executors, 36GB phys RAM per host
>> * input RDD is roughly 3GB containing ~150-200M items (and this RDD is
>> made
>> persistent using .cache())
>> * using pyspark
>>
>> yarn is configured with the limit yarn.nodemanager.resource.memory-mb of
>> 33792 (33GB), spark is set to be:
>> SPARK_EXECUTOR_CORES=6
>> SPARK_EXECUTOR_INSTANCES=9
>> SPARK_EXECUTOR_MEMORY=30G
>>
>> when using higher rank (above 20) for ALS.trainImplicit the executor runs
>> after some time (~hour) of execution out of the yarn limit and gets
>> killed:
>>
>> 2015-01-09 17:51:27,130 WARN
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>> Container [pid=27125,containerID=container_1420871936411_0002_01_000023]
>> is
>> running beyond physical memory limits. Current usage: 31.2 GB of 31 GB
>> physical memory used; 34.7 GB of 65.1 GB virtual memory used. Killing
>> container.
>>
>> thanks for any ideas,
>> Antony.
>>
>>
>>
>> On Saturday, 10 January 2015, 10:11, Antony Mayi <antonym...@yahoo.com>
>> wrote:
>>
>>
>>
>> the memory requirements seem to be rapidly growing hen using higher
>> rank...
>> I am unable to get over 20 without running out of memory. is this
>> expected?
>> thanks, Antony.
>
>>
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

Reply via email to