This seems like a reasonable optimization to me, I think moving it to a pull request is a good idea - thanks!
- Danny On Sun, Sep 22, 2024 at 11:58 PM LDesire <two_som...@icloud.com> wrote: > Hello Beam community. > > I'm currently trying out Spark Runner and while going through the code, > I noticed that when evaluating a ParDo operation, > it applies too many filter operations (from line 467 in > TransformTranslator.java). > > The original intent of this code seems to be to apply filters because the > output of the ParDo can have multiple outputs. > In other words, it makes sense to apply the filter operation when there > are multiple outputs, but I believe that applying the filter operation when > there is only one output actually degrades pipeline performance (because > the equals operation has to be applied to each element to compare them). > > > So I changed the PTransform to only apply when there are multiple outputs > and tested it. > I need to do more testing, but it didn't affect the output and the results > weren't bad. > If this is ok, would it be ok to make a PR? > > Also, if I'm missing anything, I'd be grateful if you could let me know. > > Cheers.