Hi all, We haven't dug enough into this to know where to log issues, but I'll start by sharing here.
After upgrading from Beam 2.10.0 to 2.15.0 we see issues on SparkRunner - we suspect all of this related. 1. spark.default.parallelism is not respected 2. File writing (Avro) with dynamic destinations (grouped into folders by a field name) consistently fail with org.apache.beam.sdk.util.UserCodeException: java.nio.file.FileAlreadyExistsException: Unable to rename resource hdfs://ha-nn/pipelines/export-20190930-0854/.temp-beam-d4fd89ed-fc7a-4b1e-aceb-68f9d72d50f0/6e086f60-8bda-4d0e-b29d-1b47fdfc88c0 to hdfs://ha-nn/pipelines/export-20190930-0854/7c9d2aec-f762-11e1-a439-00145eb45e9a/verbatimHBaseExport-00000-of-00001.avro as destination already exists and couldn't be deleted. 3. GBK operations that run over 500M small records consistently fail with OOM. We tried different configs with 48GB, 60GB, 80GB executor memory Our pipelines run are batch, simple transformations with either an HBaseSnapshot to Avro files or a merge of records in Avro (the GBK issue) pushed to ElasticSearch (it fails upstream of the ElasticsearchIO in the GBK stage). We notice operations that were mapToPair in 2.10.0 become repartition operations ( (mapToPair at GroupCombineFunctions.java:68 becomes repartition at GroupCombineFunctions.java:202)) which might be related to this and looks surprising. I'll report more as we learn. If anyone has any immediate ideas based on their commits or reviews or if you wish an tests run on other Beam versions please say. Thanks, Tim
