Greetings!
My executors apparently are being terminated because they are "running beyond 
physical memory limits" according to the "yarn-hadoop-nodemanager" logs on the 
worker nodes (/mnt/var/log/hadoop on AWS EMR).  I'm setting the "driver-memory" 
to 8G.However, looking at "stdout" in userlogs, I can see GC going on, but the 
lines looklike "6G -> 5G(7.2G), 0.45secs", so the GC seems to think that the 
process is usingabout 6G of space, not 8G of space.  However, "ps aux" shows an 
RSS hovering just below 8G.
The process does a "mapParitionsWithIndex", and the process uses 
compressionwhich (I believe) calls into the native zlib library (the overall 
purpose is to convert each partition into a "matlab" file).
Could it be that the Yarn container is counting both the memory used by the JVM 
proper and memory used by zlib, but that the GC only "sees" the "internal" 
memory.  So the GC keeps the memory usage "reasonable", e.g., 6G in an 8G 
container, but then zlib grabs some memory, and the YARN container then 
terminates the task?
If so, is there anything I can do so that I tell YARN to watch for a 
largermemory limit than I tell the JVM to use for it's memory?
Thanks!
Sincerely, Mike
 

Reply via email to