Setting Spark's memoryOverhead configuration variable is recommended in your logs, and has helped me with these issues in the past. Search for "memoryOverhead" here: http://spark.apache.org/docs/latest/running-on-yarn.html
That said, you're running on a huge cluster as it is. If it's possible to filter your tables down before the join (keeping just the rows/columns you need), that may be a better solution. Jon On Mon, Feb 13, 2017 at 5:27 AM, nancy henry <nancyhenry6...@gmail.com> wrote: > Hi All,, > > I am getting below error while I am trying to join 3 tables which are in > ORC format in hive from 5 10gb tables through hive context in spark > > Container killed by YARN for exceeding memory limits. 11.1 GB of 11 GB > physical memory used. Consider boosting spark.yarn.executor. > memoryOverhead. > 17/02/13 02:21:19 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: > Container killed by YARN for exceeding memory limits. 11.1 GB of 11 GB > physical memory used > > > I am using below memory parameters to launch shell .. what else i could > increase from these parameters or do I need to change any configuration > settings please let me know > > spark-shell --master yarn --deploy-mode client --driver-memory 16G > --num-executors 500 executor-cores 7 --executor-memory 10G > >