I have a pipeline that reads from pubsub, does some aggregation, and writes to various places. Previously, in older versions of beam, when running this in the DirectRunner, messages would go through the pipeline almost instantly, making it very easy to debug locally, etc.
However, after upgrading to beam 2.25, I noticed that it could take on the order of 5-10 minutes for messages to get from the pubsub read step to the next step in the pipeline (deserializing them, etc). The subscription being read from has on the order of 100,000 elements/sec arriving in it. Setting --experiments=use_deprecated_read fixes it, and makes the pipeline behave as it did before. It seems like the SDF implementation in the DirectRunner here is causing some kind of issue, either buffering a very large amount of data before emitting it in a bundle, or something else. Has anyone else run into this?
