[ https://issues.apache.org/jira/browse/BEAM-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122594#comment-17122594 ]
Beam JIRA Bot commented on BEAM-8576: ------------------------------------- This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3. Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean. > Regression when reading many files > ---------------------------------- > > Key: BEAM-8576 > URL: https://issues.apache.org/jira/browse/BEAM-8576 > Project: Beam > Issue Type: Bug > Components: runner-spark > Affects Versions: 2.14.0, 2.15.0, 2.16.0 > Reporter: Stefan De Smit > Priority: P2 > Labels: stale-P2 > Attachments: Beam_2.12_Dag.png, Beam_2.12_Stages.png, > Beam_2.14_Dag.png, Beam_2.14_Stages.png > > > When reading many files, I used to get many tasks. (beam 2.12) > When I upgrade to beam 2.14, the same code leads to different execution where > all files are read by only 1 task. > This happens when not using the Source but the DoFn's (via > 'withHintMatchesManyFiles') > {code:java} > final PCollection<GenericRecord> records = > pipeline.apply(AvroIO.readGenericRecords(mySchema) > .from(options.getInputPath() + > "/*/*/*/data/file.avro").withHintMatchesManyFiles()); > records.apply(Count.globally()) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)