Yeah, probably increasing the memory or increasing the number of output partitions would help. However increasing memory available to each executor would add expense. I want to keep the number of partitions low so that each parquet file turns out to be around 128 mb, which is best practice for long-term storage and use with other systems like presto.
This feels like a bug due to the flakey nature of the failure -- also, usually when the memory gets too low the executor is killed or errors out and I get one of the typical Spark OOM error codes. When I run the same job with the same resources sometimes this job succeeds, and sometimes it fails. On Mon, Dec 3, 2018 at 5:19 PM Christopher Petrino < christopher.petr...@gmail.com> wrote: > Depending on the size of your data set and how how many resources you have > (num-executors, executor instances, number of nodes) I'm inclined to > suspect that issue is related to reduction of partitions from thousands to > 96; I could be misguided but given the details I have I would consider > testing an approach to understand the behavior if the final stage operates > at different number of partitions. > > On Mon, Dec 3, 2018 at 2:48 AM Conrad Lee <con...@parsely.com> wrote: > >> Thanks for the thoughts. While the beginning of the job deals with lots >> of files in the first stage, they're first coalesced down into just a few >> thousand partitions. The part of the job that's failing is the reduce-side >> of a dataframe.sort() that writes output to HDFS. This last stage has only >> 96 tasks and the partitions are well balanced. I'm not using a >> `partitionBy` option on the dataframe writer. >> >> On Fri, Nov 30, 2018 at 8:14 PM Christopher Petrino < >> christopher.petr...@gmail.com> wrote: >> >>> The reason I ask is because I've had some unreliability caused by over >>> stressing the HDFS. Do you know the number of partitions when these actions >>> are being. i.e. if you have 1,000,000 files being read you may have >>> 1,000,000 partitions which may cause HDFS stress. Alternatively if you have >>> 1 large file, say 100 GB, you may 1 partition which would not fit in memory >>> and may cause writes to disk. I imagine it may be flaky because you are >>> doing some action like a groupBy somewhere and depending on how the data >>> was read certain groups will be in certain partitions; I'm not sure if >>> reads on files are deterministic, I suspect they are not >>> >>> On Fri, Nov 30, 2018 at 2:08 PM Conrad Lee <con...@parsely.com> wrote: >>> >>>> I'm loading the data using the dataframe reader from parquet files >>>> stored on local HDFS. The stage of the job that fails is not the stage >>>> that does this. The stage of the job that fails is one that reads a sorted >>>> dataframe from the last shuffle and performs the final write to parquet on >>>> local HDFS. >>>> >>>> On Fri, Nov 30, 2018 at 4:02 PM Christopher Petrino < >>>> christopher.petr...@gmail.com> wrote: >>>> >>>>> How are you loading the data? >>>>> >>>>> On Fri, Nov 30, 2018 at 2:26 AM Conrad Lee <con...@parsely.com> wrote: >>>>> >>>>>> Thanks for the suggestions. Here's an update that responds to some >>>>>> of the suggestions/ideas in-line: >>>>>> >>>>>> I ran into problems using 5.19 so I referred to 5.17 and it resolved >>>>>>> my issues. >>>>>> >>>>>> >>>>>> I tried EMR 5.17.0 and the problem still sometimes occurs. >>>>>> >>>>>> try running a coalesce. Your data may have grown and is defaulting >>>>>>> to a number of partitions that causing unnecessary overhead >>>>>>> >>>>>> Well I don't think it's that because this problem occurs flakily. >>>>>> That is, if the job hangs I can kill it and re-run it and it works fine >>>>>> (on >>>>>> the same hardware and with the same memory settings). I'm not getting >>>>>> any >>>>>> OOM errors. >>>>>> >>>>>> On a related note: the job is spilling to disk. I see messages like >>>>>> this: >>>>>> >>>>>> 18/11/29 21:40:06 INFO UnsafeExternalSorter: Thread 156 spilling sort >>>>>>> data of 912.0 MB to disk (3 times so far) >>>>>> >>>>>> >>>>>> This occurs in both successful and unsuccessful runs though. I've >>>>>> checked the disks of an executor that's running a hanging job and its >>>>>> disks >>>>>> have plenty of space, so it doesn't seem to be an out of disk space >>>>>> issue. >>>>>> This also doesn't seem to be where it hangs--the logs move on and >>>>>> describe >>>>>> the the parquet commit. >>>>>> >>>>>> On Thu, Nov 29, 2018 at 4:06 PM Christopher Petrino < >>>>>> christopher.petr...@gmail.com> wrote: >>>>>> >>>>>>> If not, try running a coalesce. Your data may have grown and is >>>>>>> defaulting to a number of partitions that causing unnecessary overhead >>>>>>> >>>>>>> On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <con...@parsely.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks, I'll try using 5.17.0. >>>>>>>> >>>>>>>> For anyone trying to debug this problem in the future: In other >>>>>>>> jobs that hang in the same manner, the thread dump didn't have any >>>>>>>> blocked >>>>>>>> threads, so that might be a red herring. >>>>>>>> >>>>>>>> On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino < >>>>>>>> christopher.petr...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I ran into problems using 5.19 so I referred to 5.17 and it >>>>>>>>> resolved my issues. >>>>>>>>> >>>>>>>>> On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <con...@parsely.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello Vadim, >>>>>>>>>> >>>>>>>>>> Interesting. I've only been running this job at scale for a >>>>>>>>>> couple weeks so I can't say whether this is related to recent EMR >>>>>>>>>> changes. >>>>>>>>>> >>>>>>>>>> Much of the EMR-specific code for spark has to do with writing >>>>>>>>>> files to s3. In this case I'm writing files to the cluster's HDFS >>>>>>>>>> though >>>>>>>>>> so my sense is that this is a spark issue, not an EMR (but I'm not >>>>>>>>>> sure). >>>>>>>>>> >>>>>>>>>> Conrad >>>>>>>>>> >>>>>>>>>> On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov < >>>>>>>>>> va...@datadoghq.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hey Conrad, >>>>>>>>>>> >>>>>>>>>>> has it started happening recently? >>>>>>>>>>> >>>>>>>>>>> We recently started having some sporadic problems with drivers >>>>>>>>>>> on EMR >>>>>>>>>>> when it gets stuck, up until two weeks ago everything was fine. >>>>>>>>>>> We're trying to figure out with the EMR team where the issue is >>>>>>>>>>> coming from. >>>>>>>>>>> On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <con...@parsely.com> >>>>>>>>>>> wrote: >>>>>>>>>>> > >>>>>>>>>>> > Dear spark community, >>>>>>>>>>> > >>>>>>>>>>> > I'm running spark 2.3.2 on EMR 5.19.0. I've got a job that's >>>>>>>>>>> hanging in the final stage--the job usually works, but I see this >>>>>>>>>>> hanging >>>>>>>>>>> behavior in about one out of 50 runs. >>>>>>>>>>> > >>>>>>>>>>> > The second-to-last stage sorts the dataframe, and the final >>>>>>>>>>> stage writes the dataframe to HDFS. >>>>>>>>>>> > >>>>>>>>>>> > Here you can see the executor logs, which indicate that it has >>>>>>>>>>> finished processing the task. >>>>>>>>>>> > >>>>>>>>>>> > Here you can see the thread dump from the executor that's >>>>>>>>>>> hanging. Here's the text of the blocked thread. >>>>>>>>>>> > >>>>>>>>>>> > I tried to work around this problem by enabling speculation, >>>>>>>>>>> but speculative execution never takes place. I don't know why. >>>>>>>>>>> > >>>>>>>>>>> > Can anyone here help me? >>>>>>>>>>> > >>>>>>>>>>> > Thanks, >>>>>>>>>>> > Conrad >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Sent from my iPhone >>>>>>>>>>> >>>>>>>>>>