scwhittle commented on PR #38407:
URL: https://github.com/apache/beam/pull/38407#issuecomment-4421071172

   > When backpressure happens, would that look to users as a processing lull? 
If so, that is less than ideal, since we probably should be throttling reading 
inputs, and we might mislead users that processing is slow. Ideally the Runner 
would be smart enough to not send them if it cannot consume outputs fast 
enough...
   
   Yes it does show in the logs as long bundle processing.  However perhaps 
that is correct in that the bundle is slow to process due to this.  I think 
that instead of trying to hide that with the runner we should just improve the 
lull logger to note that is the reason specifically instead of the stack trace.
   
   > 
   > Have you considered tracking unwritten data at the Data Channel? Something 
like #38422 perhaps (gemini assisted). I am a bit hesitant with us rewriting 
queue internals, but I can take a closer look if you think this approach would 
be better.
   
   A downside of throttling the input is that it isn't sufficient if a single 
input which is already being processed is outputting a lot of elements. I think 
that can be the case if we are processing a state-backed iterable as the input 
element would be KV<k, iterable<V>> where the iterable may be large. Or the 
input could be filenames and outputs are the file contents etc.
   
   So I think blocking here is ok but we can try to improve the logging.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to