Hello Vadim, Interesting. I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.
Much of the EMR-specific code for spark has to do with writing files to s3. In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure). Conrad On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <va...@datadoghq.com> wrote: > Hey Conrad, > > has it started happening recently? > > We recently started having some sporadic problems with drivers on EMR > when it gets stuck, up until two weeks ago everything was fine. > We're trying to figure out with the EMR team where the issue is coming > from. > On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <con...@parsely.com> wrote: > > > > Dear spark community, > > > > I'm running spark 2.3.2 on EMR 5.19.0. I've got a job that's hanging in > the final stage--the job usually works, but I see this hanging behavior in > about one out of 50 runs. > > > > The second-to-last stage sorts the dataframe, and the final stage writes > the dataframe to HDFS. > > > > Here you can see the executor logs, which indicate that it has finished > processing the task. > > > > Here you can see the thread dump from the executor that's hanging. > Here's the text of the blocked thread. > > > > I tried to work around this problem by enabling speculation, but > speculative execution never takes place. I don't know why. > > > > Can anyone here help me? > > > > Thanks, > > Conrad > > > > -- > Sent from my iPhone >