[ https://issues.apache.org/jira/browse/BEAM-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan De Smit updated BEAM-8576: --------------------------------- Attachment: Beam_2.14_Stages.png Beam_2.14_Dag.png Beam_2.12_Stages.png Beam_2.12_Dag.png > Regression when reading many files > ---------------------------------- > > Key: BEAM-8576 > URL: https://issues.apache.org/jira/browse/BEAM-8576 > Project: Beam > Issue Type: Bug > Components: runner-spark > Affects Versions: 2.14.0, 2.15.0, 2.16.0 > Reporter: Stefan De Smit > Priority: Major > Attachments: Beam_2.12_Dag.png, Beam_2.12_Stages.png, > Beam_2.14_Dag.png, Beam_2.14_Stages.png > > > When reading many files, I used to get many tasks. (beam 2.12) > When I upgrade to beam 2.14, the same code leads to different execution where > all files are read by only 1 task. > This happens when not using the Source but the DoFn's (via > 'withHintMatchesManyFiles') > {code:java} > final PCollection<GenericRecord> records = > pipeline.apply(AvroIO.readGenericRecords(mySchema) > .from(options.getInputPath() + > "/*/*/*/data/file.avro").withHintMatchesManyFiles()); > records.apply(Count.globally()) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)