Re: Job hangs in blocked task in final parquet write stage

Conrad Lee Tue, 04 Dec 2018 00:46:02 -0800

Yeah, probably increasing the memory or increasing the number of output
partitions would help.  However increasing memory available to each
executor would add expense.  I want to keep the number of partitions low so
that each parquet file turns out to be around 128 mb, which is best
practice for long-term storage and use with other systems like presto.


This feels like a bug due to the flakey nature of the failure -- also,
usually when the memory gets too low the executor is killed or errors out
and I get one of the typical Spark OOM error codes.  When I run the same
job with the same resources sometimes this job succeeds, and sometimes it
fails.

On Mon, Dec 3, 2018 at 5:19 PM Christopher Petrino <
christopher.petr...@gmail.com> wrote:

> Depending on the size of your data set and how how many resources you have
> (num-executors, executor instances, number of nodes) I'm inclined to
> suspect that issue is related to reduction of partitions from thousands to
> 96; I could be misguided but given the details I have I would consider
> testing an approach to understand the behavior if the final stage operates
> at different number of partitions.
>
> On Mon, Dec 3, 2018 at 2:48 AM Conrad Lee <con...@parsely.com> wrote:
>
>> Thanks for the thoughts.  While the beginning of the job deals with lots
>> of files in the first stage, they're first coalesced down into just a few
>> thousand partitions.  The part of the job that's failing is the reduce-side
>> of a dataframe.sort() that writes output to HDFS.  This last stage has only
>> 96 tasks and the partitions are well balanced.  I'm not using a
>> `partitionBy` option on the dataframe writer.
>>
>> On Fri, Nov 30, 2018 at 8:14 PM Christopher Petrino <
>> christopher.petr...@gmail.com> wrote:
>>
>>> The reason I ask is because I've had some unreliability caused by over
>>> stressing the HDFS. Do you know the number of partitions when these actions
>>> are being. i.e. if you have 1,000,000 files being read you may have
>>> 1,000,000 partitions which may cause HDFS stress. Alternatively if you have
>>> 1 large file, say 100 GB, you may 1 partition which would not fit in memory
>>> and may cause writes to disk. I imagine it may be flaky because you are
>>> doing some action like a groupBy somewhere and depending on how the data
>>> was read certain groups will be in certain partitions; I'm not sure if
>>> reads on files are deterministic, I suspect they are not
>>>
>>> On Fri, Nov 30, 2018 at 2:08 PM Conrad Lee <con...@parsely.com> wrote:
>>>
>>>> I'm loading the data using the dataframe reader from parquet files
>>>> stored on local HDFS.  The stage of the job that fails is not the stage
>>>> that does this.  The stage of the job that fails is one that reads a sorted
>>>> dataframe from the last shuffle and performs the final write to parquet on
>>>> local HDFS.
>>>>
>>>> On Fri, Nov 30, 2018 at 4:02 PM Christopher Petrino <
>>>> christopher.petr...@gmail.com> wrote:
>>>>
>>>>> How are you loading the data?
>>>>>
>>>>> On Fri, Nov 30, 2018 at 2:26 AM Conrad Lee <con...@parsely.com> wrote:
>>>>>
>>>>>> Thanks for the suggestions.  Here's an update that responds to some
>>>>>> of the suggestions/ideas in-line:
>>>>>>
>>>>>> I ran into problems using 5.19 so I referred to 5.17 and it resolved
>>>>>>> my issues.
>>>>>>
>>>>>>
>>>>>> I tried EMR 5.17.0 and the problem still sometimes occurs.
>>>>>>
>>>>>>  try running a coalesce. Your data may have grown and is defaulting
>>>>>>> to a number of partitions that causing unnecessary overhead
>>>>>>>
>>>>>> Well I don't think it's that because this problem occurs flakily.
>>>>>> That is, if the job hangs I can kill it and re-run it and it works fine 
>>>>>> (on
>>>>>> the same hardware and with the same memory settings).  I'm not getting 
>>>>>> any
>>>>>> OOM errors.
>>>>>>
>>>>>> On a related note: the job is spilling to disk. I see messages like
>>>>>> this:
>>>>>>
>>>>>> 18/11/29 21:40:06 INFO UnsafeExternalSorter: Thread 156 spilling sort
>>>>>>> data of 912.0 MB to disk (3  times so far)
>>>>>>
>>>>>>
>>>>>>  This occurs in both successful and unsuccessful runs though.  I've
>>>>>> checked the disks of an executor that's running a hanging job and its 
>>>>>> disks
>>>>>> have plenty of space, so it doesn't seem to be an out of disk space 
>>>>>> issue.
>>>>>> This also doesn't seem to be where it hangs--the logs move on and 
>>>>>> describe
>>>>>> the the parquet commit.
>>>>>>
>>>>>> On Thu, Nov 29, 2018 at 4:06 PM Christopher Petrino <
>>>>>> christopher.petr...@gmail.com> wrote:
>>>>>>
>>>>>>> If not, try running a coalesce. Your data may have grown and is
>>>>>>> defaulting to a number of partitions that causing unnecessary overhead
>>>>>>>
>>>>>>> On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <con...@parsely.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks, I'll try using 5.17.0.
>>>>>>>>
>>>>>>>> For anyone trying to debug this problem in the future: In other
>>>>>>>> jobs that hang in the same manner, the thread dump didn't have any 
>>>>>>>> blocked
>>>>>>>> threads, so that might be a red herring.
>>>>>>>>
>>>>>>>> On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <
>>>>>>>> christopher.petr...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I ran into problems using 5.19 so I referred to 5.17 and it
>>>>>>>>> resolved my issues.
>>>>>>>>>
>>>>>>>>> On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <con...@parsely.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Vadim,
>>>>>>>>>>
>>>>>>>>>> Interesting.  I've only been running this job at scale for a
>>>>>>>>>> couple weeks so I can't say whether this is related to recent EMR 
>>>>>>>>>> changes.
>>>>>>>>>>
>>>>>>>>>> Much of the EMR-specific code for spark has to do with writing
>>>>>>>>>> files to s3.  In this case I'm writing files to the cluster's HDFS 
>>>>>>>>>> though
>>>>>>>>>> so my sense is that this is a spark issue, not an EMR (but I'm not 
>>>>>>>>>> sure).
>>>>>>>>>>
>>>>>>>>>> Conrad
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <
>>>>>>>>>> va...@datadoghq.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Conrad,
>>>>>>>>>>>
>>>>>>>>>>> has it started happening recently?
>>>>>>>>>>>
>>>>>>>>>>> We recently started having some sporadic problems with drivers
>>>>>>>>>>> on EMR
>>>>>>>>>>> when it gets stuck, up until two weeks ago everything was fine.
>>>>>>>>>>> We're trying to figure out with the EMR team where the issue is
>>>>>>>>>>> coming from.
>>>>>>>>>>> On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <con...@parsely.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Dear spark community,
>>>>>>>>>>> >
>>>>>>>>>>> > I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's
>>>>>>>>>>> hanging in the final stage--the job usually works, but I see this 
>>>>>>>>>>> hanging
>>>>>>>>>>> behavior in about one out of 50 runs.
>>>>>>>>>>> >
>>>>>>>>>>> > The second-to-last stage sorts the dataframe, and the final
>>>>>>>>>>> stage writes the dataframe to HDFS.
>>>>>>>>>>> >
>>>>>>>>>>> > Here you can see the executor logs, which indicate that it has
>>>>>>>>>>> finished processing the task.
>>>>>>>>>>> >
>>>>>>>>>>> > Here you can see the thread dump from the executor that's
>>>>>>>>>>> hanging.  Here's the text of the blocked thread.
>>>>>>>>>>> >
>>>>>>>>>>> > I tried to work around this problem by enabling speculation,
>>>>>>>>>>> but speculative execution never takes place.  I don't know why.
>>>>>>>>>>> >
>>>>>>>>>>> > Can anyone here help me?
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> > Conrad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>>
>>>>>>>>>>

Re: Job hangs in blocked task in final parquet write stage

Reply via email to