Roman Kagan <rvka...@yandex-team.ru> writes: > On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote: >> Konstantin Khlebnikov <khlebni...@yandex-team.ru> writes: >> >> > This event represents device runtime errors to give time and >> > reason why device is broken. >> >> Can you give an or more examples of the "device runtime errors" you have >> in mind? > > Initially we wanted to address a situation when a vhost device > discovered an inconsistency during virtqueue processing and silently > stopped the virtqueue. This resulted in device stall (partial for > multiqueue devices) and we were the last to notice that. > > The solution appeared to be to employ errfd and, upon receiving a > notification through it, to emit a QMP event which is actionable in the > management layer or further up the stack. > > Then we observed that virtio (non-vhost) devices suffer from the same > issue: they only log the error but don't signal it to the management > layer. The case was very similar so we thought it would make sense to > share the infrastructure and the QMP event between virtio and vhost. > > Then Konstantin went a bit further and generalized the concept into > generic "device runtime error". I'm personally not completely convinced > this generalization is appropriate here; we'd appreciate the opinions > from the community on the matter.
"Device emulation sending an even on entering certain error states, so that a management application can do something about it" feels reasonable enough to me as a general concept. The key point is of course "can do something": the event needs to be actionable. Can you describe possible actions for the cases you implement? Once we all have a better idea of the event's purpose, usage, and limitations, we should revisit its documentation.