, Dec 4, 2018 at 9:45 AM Conrad Lee wrote:
> Yeah, probably increasing the memory or increasing the number of output
> partitions would help. However increasing memory available to each
> executor would add expense. I want to keep the number of partitions low so
> that each parquet fi
On Mon, Dec 3, 2018 at 2:48 AM Conrad Lee wrote:
>
>> Thanks for the thoughts. While the beginning of the job deals with lots
>> of files in the first stage, they're first coalesced down into just a few
>> thousand partitions. The part of the job that's failing is the redu
com> wrote:
> I ran into problems using 5.19 so I referred to 5.17 and it resolved my
> issues.
>
> On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee wrote:
>
>> Hello Vadim,
>>
>> Interesting. I've only been running this job at scale for a couple weeks
>> so I ca
ntil two weeks ago everything was fine.
> We're trying to figure out with the EMR team where the issue is coming
> from.
> On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee wrote:
> >
> > Dear spark community,
> >
> > I'm running spark 2.3.2 on EMR 5.19.0. I've got a job t
Dear spark community,
I'm running spark 2.3.2 on EMR 5.19.0. I've got a job that's hanging in
the final stage--the job usually works, but I see this hanging behavior in
about one out of 50 runs.
The second-to-last stage sorts the dataframe, and the final stage writes
the dataframe to HDFS.