Hi folks, Hoping to get some definitive answers with respect to streaming pipeline bundle retry semantics on Dataflow. I understand that a bundle containing a "poison pill" (bad data, let's say it causes a null pointer exception when processing in DoFn) will be retried indefinitely. What I'm not clear on are the implications of those retries.
1. Is it the case that a worker will continuously retry the same "poison pill" bundle, and not be able to work on any other/new bundles indefinitely after receiving the first poison pill? I've noticed that a small number poison pills can cause all processing to stall, even if the bad data represents only a very small percentage of the overall data being processed 2. Is there any implication with windowing and this retry/stall scenario? I've noticed that the scenario where all processing stalls entirely is more common for a pipeline where all data is globally windowed. I don't, however, have a solid understanding of how to explain that observation; I'd really appreciate any insights that can aid in understanding Thanks, Evan
