I have a pipeline that reads from pubsub, does some aggregation, and writes
to various places.  Previously, in older versions of beam, when running
this in the DirectRunner, messages would go through the pipeline almost
instantly, making it very easy to debug locally, etc.

However, after upgrading to beam 2.25, I noticed that it could take on the
order of 5-10 minutes for messages to get from the pubsub read step to the
next step in the pipeline (deserializing them, etc).  The subscription
being read from has on the order of 100,000 elements/sec arriving in it.

Setting --experiments=use_deprecated_read fixes it, and makes the pipeline
behave as it did before.

It seems like the SDF implementation in the DirectRunner here is causing
some kind of issue, either buffering a very large amount of data before
emitting it in a bundle, or something else.  Has anyone else run into this?

Reply via email to