scwhittle commented on PR #38407: URL: https://github.com/apache/beam/pull/38407#issuecomment-4421071172
> When backpressure happens, would that look to users as a processing lull? If so, that is less than ideal, since we probably should be throttling reading inputs, and we might mislead users that processing is slow. Ideally the Runner would be smart enough to not send them if it cannot consume outputs fast enough... Yes it does show in the logs as long bundle processing. However perhaps that is correct in that the bundle is slow to process due to this. I think that instead of trying to hide that with the runner we should just improve the lull logger to note that is the reason specifically instead of the stack trace. > > Have you considered tracking unwritten data at the Data Channel? Something like #38422 perhaps (gemini assisted). I am a bit hesitant with us rewriting queue internals, but I can take a closer look if you think this approach would be better. A downside of throttling the input is that it isn't sufficient if a single input which is already being processed is outputting a lot of elements. I think that can be the case if we are processing a state-backed iterable as the input element would be KV<k, iterable<V>> where the iterable may be large. Or the input could be filenames and outputs are the file contents etc. So I think blocking here is ok but we can try to improve the logging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
