lostluck commented on pull request #11864: URL: https://github.com/apache/beam/pull/11864#issuecomment-636140338
This PR does have the assumption baked in that we'll always get a data message and a control message for an instruction. It doesn't handle the less well behaved cases of "only receive instructions, and never any data" or the "only data, never instructions" cases, which, as you say, probably require a time component to handle them properly. Only instruction, is a bit of a waste of CPU, and ends up using little to no CPU while it's waiting for a channel send. On the other hand, it will then never signal the runner half or self terminate. So if the runner is waiting on that, it's it's own fault, and not well behaved. This may cause problems for a stage as a whole if the runner doesn't decide to disregard this bundle. Only data is a bigger risk for an individual worker, since it will block the worker eventually with what I call the Boulder problem, since the channel buffer may fill, and will start "pushback" on the data channel preventing data messages from reaching the other processing threads. This is overall desirable behavior, up until it isn't and the instruction for that data never comes. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
