On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote: > Konstantin Khlebnikov <khlebni...@yandex-team.ru> writes: > > > This event represents device runtime errors to give time and > > reason why device is broken. > > Can you give an or more examples of the "device runtime errors" you have > in mind?
Initially we wanted to address a situation when a vhost device discovered an inconsistency during virtqueue processing and silently stopped the virtqueue. This resulted in device stall (partial for multiqueue devices) and we were the last to notice that. The solution appeared to be to employ errfd and, upon receiving a notification through it, to emit a QMP event which is actionable in the management layer or further up the stack. Then we observed that virtio (non-vhost) devices suffer from the same issue: they only log the error but don't signal it to the management layer. The case was very similar so we thought it would make sense to share the infrastructure and the QMP event between virtio and vhost. Then Konstantin went a bit further and generalized the concept into generic "device runtime error". I'm personally not completely convinced this generalization is appropriate here; we'd appreciate the opinions from the community on the matter. HTH, Roman.