I'm sorry for not replying. We are super busy trying to prepare data to
release.

An update:
- We were using G1GC and through slack were advised against that. This
fixed the OOM error we saw and all our 2.15.0 jobs did complete

When we have time (after 3 weeks) I'll try and isolate a test case with the
reshuffle example and parallelism.

Thanks,
Tim


On Thu, Oct 3, 2019 at 1:21 PM Jan Lukavský <[email protected]> wrote:

> Hi Tim,
>
> can you please elaborate more about some parts?
>
> 1) What happens actually in your case? What is the specific settings you
> use?
>
> 3) Can you share stacktrace? Is it always the same, or does it change?
>
> The mentioned GroupCombineFunctions.java:202 comes from a Reshuffle,
> which seems to make a little sense to me regarding the logic you
> described. Do you use Reshuffle transform or does it expand from some
> other transform?
>
> Jan
>
> On 10/3/19 9:24 AM, Tim Robertson wrote:
> > Hi all,
> >
> > We haven't dug enough into this to know where to log issues, but I'll
> > start by sharing here.
> >
> > After upgrading from Beam 2.10.0 to 2.15.0 we see issues on
> > SparkRunner - we suspect all of this related.
> >
> > 1. spark.default.parallelism is not respected
> >
> > 2. File writing (Avro) with dynamic destinations (grouped into folders
> > by a field name) consistently fail with
> > org.apache.beam.sdk.util.UserCodeException:
> > java.nio.file.FileAlreadyExistsException: Unable to rename resource
> >
> hdfs://ha-nn/pipelines/export-20190930-0854/.temp-beam-d4fd89ed-fc7a-4b1e-aceb-68f9d72d50f0/6e086f60-8bda-4d0e-b29d-1b47fdfc88c0
>
> > to
> >
> hdfs://ha-nn/pipelines/export-20190930-0854/7c9d2aec-f762-11e1-a439-00145eb45e9a/verbatimHBaseExport-00000-of-00001.avro
>
> > as destination already exists and couldn't be deleted.
> >
> > 3. GBK operations that run over 500M small records consistently fail
> > with OOM. We tried different configs with 48GB, 60GB, 80GB executor
> > memory
> >
> > Our pipelines run are batch, simple transformations with either an
> > HBaseSnapshot to Avro files or a merge of records in Avro (the GBK
> > issue) pushed to ElasticSearch (it fails upstream of the
> > ElasticsearchIO in the GBK stage).
> >
> > We notice operations that were mapToPair  in 2.10.0 become repartition
> > operations ( (mapToPair at GroupCombineFunctions.java:68 becomes
> > repartition at GroupCombineFunctions.java:202)) which might be related
> > to this and looks surprising.
> >
> > I'll report more as we learn. If anyone has any immediate ideas based
> > on their commits or reviews or if you wish an tests run on other Beam
> > versions please say.
> >
> > Thanks,
> > Tim
> >
> >
> >
>

Reply via email to