On Tue, Jul 14, 2015 at 9:53 AM, Elkhan Dadashov <elkhan8...@gmail.com> wrote:
> While the program is running, these are the stats of how much memory each > process takes: > > SparkSubmit process : 11.266 *gigabyte* Virtual Memory > > ApplicationMaster process: 2303480 *byte *Virtual Memory > That SparkSubmit number looks very suspicious. In yarn-cluster mode, SparkSubmit doesn't do much and should not use a lot of memory. You could set "SPARK_PRINT_LAUNCH_CMD=1" before launching the app to see the exact java command line being used, and see whether it has any suspicious configuration. You could also use jmap to dump the heap and look at it with jvisualvm, and see if there's any low hanging fruit w.r.t. what's using the memory. Regarding the fork / exec comment, that's very misleading. OSes are very efficient when forking - they'll not copy the entire parent process, instead they'll do COW on memory pages that change. So if you do an exec right afterwards, you're basically copying very little memory. -- Marcelo