I¹m running a Python script with spark-submit on top of YARN on an EMR
cluster with 30 nodes. The script reads in approximately 3.9 TB of data
from S3, and then does some transformations and filtering, followed by some
aggregate counts. During Stage 2 of the job, everything looks to complete
Hi Brett,
Are you noticing executors dying? Are you able to check the YARN
NodeManager logs and see whether YARN is killing them for exceeding memory
limits?
-Sandy
On Fri, Nov 21, 2014 at 9:47 AM, Brett Meyer brett.me...@crowdstrike.com
wrote:
I’m running a Python script with spark-submit
@spark.apache.org user@spark.apache.org
Subject: Re: Many retries for Python job
Hi Brett,
Are you noticing executors dying? Are you able to check the YARN
NodeManager logs and see whether YARN is killing them for exceeding memory
limits?
-Sandy
On Fri, Nov 21, 2014 at 9:47 AM, Brett Meyer