Eugene Kirpichov created BEAM-4242: -------------------------------------- Summary: Wait.on() is O(n) Key: BEAM-4242 URL: https://issues.apache.org/jira/browse/BEAM-4242 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Eugene Kirpichov Assignee: Eugene Kirpichov
Wait.on() uses a NeverTrigger and a Sample.any(1) as an implementation detail. Unfortunately, Sample.any() relies on combiner lifting for performance - otherwise all values end up grouped onto the same worker which is not acceptable if the signal PCollection is large. Not all runners support combiner lifting at all; and even those that do (e.g. Dataflow) don't guarantee it. In the case of a very large user's pipeline, combiner lifting was not performed because it's only supported for DefaultTrigger, but not for NeverTrigger. This should be fixed by modifying Wait to not rely on combiner lifting for performance, e.g. by a "manual" precombine (emit 1 value per bundle). CC: [~mkhadikov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)