Hi,

I finally solved the problem by setting spark.yarn.executor.memoryOverhead
with the option --conf "spark.yarn.executor.memoryOverhead=xxxx" for
spark-submit, as pointed out in
http://stackoverflow.com/questions/28404714/yarn-why-doesnt-task-go-out-of-heap-space-but-container-gets-killed
and https://issues.apache.org/jira/browse/SPARK-2444, and now it works ok.

Greetings,

Juan

2015-02-23 10:40 GMT+01:00 Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com>:

>  Hi,
>
> I'm having problems using pipe() from a Spark program written in Java,
> where I call a python script, running in a YARN cluster. The problem is
> that the job fails when YARN kills the container because the python script
> is going beyond the memory limits. I get something like this in the log:
>
> 01_000004. Exit status: 143. Diagnostics: Container
> [pid=6976,containerID=container_1424279690678_0078_01_000004] is running
> beyond physical memory limits. Current usage: 7.5 GB of 7.5 GB physical
> memory used; 8.6 GB of 23.3 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1424279690678_0078_01_000004 :
>     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>     |- 6976 1457 6976 6976 (bash) 0 0 108613632 338 /bin/bash -c
> /usr/java/jdk1.7.0_71/bin/java -server -XX:OnOutOfMemoryError='kill %p'
> -Xms7048m -Xmx7048m
> -Djava.io.tmpdir=/mnt/data1/hadoop/yarn/local/usercache/root/appcache/application_1424279690678_0078/container_1424279690678_0078_01_000004/tmp
> '-Dspark.driver.port=33589' '-Dspark.ui.port=0'
> -Dspark.yarn.app.container.log.dir=/mnt/data1/hadoop/yarn/log/application_1424279690678_0078/container_1424279690678_0078_01_000004
> org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://
> sparkdri...@slave3.lambdoop.com:33589/user/CoarseGrainedScheduler 5
> slave1.lambdoop.com 1 application_1424279690678_0078 1>
> /mnt/data1/hadoop/yarn/log/application_1424279690678_0078/container_1424279690678_0078_01_000004/stdout
> 2>
> /mnt/data1/hadoop/yarn/log/application_1424279690678_0078/container_1424279690678_0078_01_000004/stderr
>
>     |- 10513 6982 6976 6976 (python2.7) 9308 1224 448360448 13857
> /usr/local/bin/python2.7 /mnt/my_script.py my_args
>     |- 6982 6976 6976 6976 (java) 115176 12032 8632229888 1951974
> /usr/java/jdk1.7.0_71/bin/java -server -XX:OnOutOfMemoryError=kill %p
> -Xms7048m -Xmx7048m
> -Djava.io.tmpdir=/mnt/data1/hadoop/yarn/local/usercache/root/appcache/application_1424279690678_0078/container_1424279690678_0078_01_000004/tmp
> -Dspark.driver.port=33589 -Dspark.ui.port=0
> -Dspark.yarn.app.container.log.dir=/mnt/data1/hadoop/yarn/log/application_1424279690678_0078/container_1424279690678_0078_01_000004
> org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://
> sparkdri...@slave3.lambdoop.com:33589/user/CoarseGrainedScheduler 5
> slave1.lambdoop.com 1 application_1424279690678_0078
>
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
>
> I find this strange because the python script process each input line
> separately, and makes a simple independent calculation per line: it
> basically parses the line, calcules the Haversine distance, and returns a
> double value. Input lines are traversed in python with a loop "for line in
> sys.stdin". Also to avoid memory leaks in Python:
> - I call sys.stdout.flush() per each output line generated by python.
> - I call the following function after writing each output line, to force
> garbage collection regularly in Python:
>
> _iterations_until_gc = 1000
> iterations_since_gc = 0
> def update_garbage_collector():
>   global iterations_since_gc
>   if iterations_since_gc >= _iterations_until_gc:
>      gc.collect()
>      iterations_since_gc = 0
>   else:
>      iterations_since_gc += 1
>
> So the memory consumption of the script should be constant, but in
> practice it looks like there is some memory leak, maybe Spark is
> introducing some memory leak when redirecting the IO in pipe()?  Has any of
> you experienced similar situations when using pipe in Spark? Also, do you
> know how could I control the amount of memory reserved for the subprocess
> that is created by pipe. I understand than with --executor-memory I set the
> memory for the Spark executor process, but not for the subprocess created
> by pipe.
>
> Thanks in advance for your help.
>
> Greetings,
>
> Juan
>
>

Reply via email to