So based on many more runs of this job I've come to the conclusion that a
workaround to this error is to

   - decrease the amount of data written in each partition, or
   - increase the amount of memory available to each executor

I still don't know what the root cause of the issue is.

On Tue, Dec 4, 2018 at 9:45 AM Conrad Lee <con...@parsely.com> wrote:

> Yeah, probably increasing the memory or increasing the number of output
> partitions would help.  However increasing memory available to each
> executor would add expense.  I want to keep the number of partitions low so
> that each parquet file turns out to be around 128 mb, which is best
> practice for long-term storage and use with other systems like presto.
>
> This feels like a bug due to the flakey nature of the failure -- also,
> usually when the memory gets too low the executor is killed or errors out
> and I get one of the typical Spark OOM error codes.  When I run the same
> job with the same resources sometimes this job succeeds, and sometimes it
> fails.
>
> On Mon, Dec 3, 2018 at 5:19 PM Christopher Petrino <
> christopher.petr...@gmail.com> wrote:
>
>> Depending on the size of your data set and how how many resources you
>> have (num-executors, executor instances, number of nodes) I'm inclined to
>> suspect that issue is related to reduction of partitions from thousands to
>> 96; I could be misguided but given the details I have I would consider
>> testing an approach to understand the behavior if the final stage operates
>> at different number of partitions.
>>
>> On Mon, Dec 3, 2018 at 2:48 AM Conrad Lee <con...@parsely.com> wrote:
>>
>>> Thanks for the thoughts.  While the beginning of the job deals with lots
>>> of files in the first stage, they're first coalesced down into just a few
>>> thousand partitions.  The part of the job that's failing is the reduce-side
>>> of a dataframe.sort() that writes output to HDFS.  This last stage has only
>>> 96 tasks and the partitions are well balanced.  I'm not using a
>>> `partitionBy` option on the dataframe writer.
>>>
>>> On Fri, Nov 30, 2018 at 8:14 PM Christopher Petrino <
>>> christopher.petr...@gmail.com> wrote:
>>>
>>>> The reason I ask is because I've had some unreliability caused by over
>>>> stressing the HDFS. Do you know the number of partitions when these actions
>>>> are being. i.e. if you have 1,000,000 files being read you may have
>>>> 1,000,000 partitions which may cause HDFS stress. Alternatively if you have
>>>> 1 large file, say 100 GB, you may 1 partition which would not fit in memory
>>>> and may cause writes to disk. I imagine it may be flaky because you are
>>>> doing some action like a groupBy somewhere and depending on how the data
>>>> was read certain groups will be in certain partitions; I'm not sure if
>>>> reads on files are deterministic, I suspect they are not
>>>>
>>>> On Fri, Nov 30, 2018 at 2:08 PM Conrad Lee <con...@parsely.com> wrote:
>>>>
>>>>> I'm loading the data using the dataframe reader from parquet files
>>>>> stored on local HDFS.  The stage of the job that fails is not the stage
>>>>> that does this.  The stage of the job that fails is one that reads a 
>>>>> sorted
>>>>> dataframe from the last shuffle and performs the final write to parquet on
>>>>> local HDFS.
>>>>>
>>>>> On Fri, Nov 30, 2018 at 4:02 PM Christopher Petrino <
>>>>> christopher.petr...@gmail.com> wrote:
>>>>>
>>>>>> How are you loading the data?
>>>>>>
>>>>>> On Fri, Nov 30, 2018 at 2:26 AM Conrad Lee <con...@parsely.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the suggestions.  Here's an update that responds to some
>>>>>>> of the suggestions/ideas in-line:
>>>>>>>
>>>>>>> I ran into problems using 5.19 so I referred to 5.17 and it resolved
>>>>>>>> my issues.
>>>>>>>
>>>>>>>
>>>>>>> I tried EMR 5.17.0 and the problem still sometimes occurs.
>>>>>>>
>>>>>>>  try running a coalesce. Your data may have grown and is defaulting
>>>>>>>> to a number of partitions that causing unnecessary overhead
>>>>>>>>
>>>>>>> Well I don't think it's that because this problem occurs flakily.
>>>>>>> That is, if the job hangs I can kill it and re-run it and it works fine 
>>>>>>> (on
>>>>>>> the same hardware and with the same memory settings).  I'm not getting 
>>>>>>> any
>>>>>>> OOM errors.
>>>>>>>
>>>>>>> On a related note: the job is spilling to disk. I see messages like
>>>>>>> this:
>>>>>>>
>>>>>>> 18/11/29 21:40:06 INFO UnsafeExternalSorter: Thread 156 spilling
>>>>>>>> sort data of 912.0 MB to disk (3  times so far)
>>>>>>>
>>>>>>>
>>>>>>>  This occurs in both successful and unsuccessful runs though.  I've
>>>>>>> checked the disks of an executor that's running a hanging job and its 
>>>>>>> disks
>>>>>>> have plenty of space, so it doesn't seem to be an out of disk space 
>>>>>>> issue.
>>>>>>> This also doesn't seem to be where it hangs--the logs move on and 
>>>>>>> describe
>>>>>>> the the parquet commit.
>>>>>>>
>>>>>>> On Thu, Nov 29, 2018 at 4:06 PM Christopher Petrino <
>>>>>>> christopher.petr...@gmail.com> wrote:
>>>>>>>
>>>>>>>> If not, try running a coalesce. Your data may have grown and is
>>>>>>>> defaulting to a number of partitions that causing unnecessary overhead
>>>>>>>>
>>>>>>>> On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <con...@parsely.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks, I'll try using 5.17.0.
>>>>>>>>>
>>>>>>>>> For anyone trying to debug this problem in the future: In other
>>>>>>>>> jobs that hang in the same manner, the thread dump didn't have any 
>>>>>>>>> blocked
>>>>>>>>> threads, so that might be a red herring.
>>>>>>>>>
>>>>>>>>> On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <
>>>>>>>>> christopher.petr...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I ran into problems using 5.19 so I referred to 5.17 and it
>>>>>>>>>> resolved my issues.
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <con...@parsely.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello Vadim,
>>>>>>>>>>>
>>>>>>>>>>> Interesting.  I've only been running this job at scale for a
>>>>>>>>>>> couple weeks so I can't say whether this is related to recent EMR 
>>>>>>>>>>> changes.
>>>>>>>>>>>
>>>>>>>>>>> Much of the EMR-specific code for spark has to do with writing
>>>>>>>>>>> files to s3.  In this case I'm writing files to the cluster's HDFS 
>>>>>>>>>>> though
>>>>>>>>>>> so my sense is that this is a spark issue, not an EMR (but I'm not 
>>>>>>>>>>> sure).
>>>>>>>>>>>
>>>>>>>>>>> Conrad
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <
>>>>>>>>>>> va...@datadoghq.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey Conrad,
>>>>>>>>>>>>
>>>>>>>>>>>> has it started happening recently?
>>>>>>>>>>>>
>>>>>>>>>>>> We recently started having some sporadic problems with drivers
>>>>>>>>>>>> on EMR
>>>>>>>>>>>> when it gets stuck, up until two weeks ago everything was fine.
>>>>>>>>>>>> We're trying to figure out with the EMR team where the issue is
>>>>>>>>>>>> coming from.
>>>>>>>>>>>> On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <con...@parsely.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Dear spark community,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's
>>>>>>>>>>>> hanging in the final stage--the job usually works, but I see this 
>>>>>>>>>>>> hanging
>>>>>>>>>>>> behavior in about one out of 50 runs.
>>>>>>>>>>>> >
>>>>>>>>>>>> > The second-to-last stage sorts the dataframe, and the final
>>>>>>>>>>>> stage writes the dataframe to HDFS.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Here you can see the executor logs, which indicate that it
>>>>>>>>>>>> has finished processing the task.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Here you can see the thread dump from the executor that's
>>>>>>>>>>>> hanging.  Here's the text of the blocked thread.
>>>>>>>>>>>> >
>>>>>>>>>>>> > I tried to work around this problem by enabling speculation,
>>>>>>>>>>>> but speculative execution never takes place.  I don't know why.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Can anyone here help me?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>> > Conrad
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to