Eugene Kirpichov created BEAM-4242:
--------------------------------------

             Summary: Wait.on() is O(n)
                 Key: BEAM-4242
                 URL: https://issues.apache.org/jira/browse/BEAM-4242
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov


Wait.on() uses a NeverTrigger and a Sample.any(1) as an implementation detail.
Unfortunately, Sample.any() relies on combiner lifting for performance - 
otherwise all values end up grouped onto the same worker which is not 
acceptable if the signal PCollection is large.

Not all runners support combiner lifting at all; and even those that do (e.g. 
Dataflow) don't guarantee it. In the case of a very large user's pipeline, 
combiner lifting was not performed because it's only supported for 
DefaultTrigger, but not for NeverTrigger.

This should be fixed by modifying Wait to not rely on combiner lifting for 
performance, e.g. by a "manual" precombine (emit 1 value per bundle).

CC: [~mkhadikov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to