On Thu, Apr 11, 2024 at 1:00 PM Joey Tran <joey.t...@schrodinger.com> wrote:
> Thanks for the feedback! > > Sounds like it'll be a while before this is supported. Would it make sense > to formally _not_ support this functionality by raising an early and clear > exception if someone does try to use a side input with a combiner? As it > stands now, the exception a user gets is very difficult to understand. My > motivation for looking into this was a new user I was introducing to Beam > ran into this issue and their first impression was that Beam had terrible > error handling / messages. > In general, failing early and with a proper message is a good idea. If something fails 100% of the time later, it is better to fail early. If we can't fail early, try to fail with a better error message. If some functionality fails 90% of the time but works 10% of the time, we'd have to be more careful to not accidentally introduce a breaking change for some niche group of users. For combiners with side inputs, IIRC there was a reference in BEAM-8400 <https://issues.apache.org/jira/browse/BEAM-8400> that they might work in some cases but not other cases - I haven't verified that. > > Best, > Joey > > On Thu, Apr 11, 2024 at 3:52 PM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> I took a look and mentioned the PR to a few folks. Couple of thoughts: >> - We should avoid Beam adding a high-level functionality only for Batch. >> - Supporting Windowing/Triggers would likely be non-trivial and worth >> considering early in the design. >> - If you'd like to continue working on this, I would suggest to start a >> document and gradually cover the following (with help/feedback from others >> in this list): >> - the motivational example, available workarounds (if any) >> - background on aspects of combiner implementation that are relevant. >> - proposed approach and considered alternatives >> - any runner-specific considerations. >> >> Thanks, >> Valentyn >> >> On Fri, Mar 29, 2024 at 5:06 AM Joey Tran <joey.t...@schrodinger.com> >> wrote: >> >>> I posted a PoC PR [1] for fixing deferred side inputs with combiners in >>> the python SDK. Would someone be willing to take a look at it? >>> >>> I have it working but could use some feedback on where to take it next. >>> It looks like bundle processor combiner operations don't currently support >>> side inputs [2] so I added a conditional in `CombinePerKey` that checks >>> whether it was instantiated with a side input and if so, use a ParDo-based >>> version of the combiner so we can piggyback off of the Do operations >>> implementation of side inputs rather than reimplementing it for the >>> combiner operation. >>> >>> [1] https://github.com/apache/beam/pull/30743 >>> [2] >>> https://github.com/apache/beam/blob/e3fee5156b3515f96dc5ba90ea2fbc6f6be2bd34/sdks/python/apache_beam/runners/worker/operations.py#L1146 >>> >>