On Tue, Jul 14, 2015 at 9:53 AM, Elkhan Dadashov <elkhan8...@gmail.com>
wrote:

> While the program is running, these are the stats of how much memory each
> process takes:
>
> SparkSubmit process : 11.266 *gigabyte* Virtual Memory
>
> ApplicationMaster process: 2303480 *byte *Virtual Memory
>

That SparkSubmit number looks very suspicious. In yarn-cluster mode,
SparkSubmit doesn't do much and should not use a lot of memory. You could
set "SPARK_PRINT_LAUNCH_CMD=1" before launching the app to see the exact
java command line being used, and see whether it has any suspicious
configuration. You could also use jmap to dump the heap and look at it with
jvisualvm, and see if there's any low hanging fruit w.r.t. what's using the
memory.

Regarding the fork / exec comment, that's very misleading. OSes are very
efficient when forking - they'll not copy the entire parent process,
instead they'll do COW on memory pages that change. So if you do an exec
right afterwards, you're basically copying very little memory.

-- 
Marcelo

Reply via email to