Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-08 Thread Jason Wang



在 2021/4/8 下午1:51, Dongli Zhang 写道:


On 4/6/21 7:20 PM, Jason Wang wrote:

在 2021/4/7 上午7:27, Dongli Zhang 写道:

This will answer your question that "Can it bypass the masking?".

For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is
not able to bypass masking because masking is to unregister the eventfd. To
write to eventfd does not take effect.

However, it is possible to bypass masking for vhost-net because vhost-net
registered a specific masked_notifier eventfd in order to mask irq. To write to
original eventfd still takes effect.

We may leave the user to decide whether to write to 'masked_notifier' or
original 'guest_notifier' for vhost-net.

My fault here. To write to masked_notifier will always be masked:(


Only when there's no bug in the qemu.



If it is EventNotifier level, we will not care whether the EventNotifier is
masked or not. It just provides an interface to write to EventNotifier.


Yes.



To dump the MSI-x table for both virtio and vfio will help confirm if the vector
is masked.


That would be helpful as well. It's probably better to extend "info pci" 
command.

Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.



It's better to just reuse "info pci".




About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.



Sounds good.




Thank you very much for all suggestions!

Dongli Zhang



Thanks





Thank you very much!

Dongli Zhang






Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-07 Thread Dongli Zhang



On 4/6/21 7:20 PM, Jason Wang wrote:
> 
> 在 2021/4/7 上午7:27, Dongli Zhang 写道:
>>> This will answer your question that "Can it bypass the masking?".
>>>
>>> For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd 
>>> is
>>> not able to bypass masking because masking is to unregister the eventfd. To
>>> write to eventfd does not take effect.
>>>
>>> However, it is possible to bypass masking for vhost-net because vhost-net
>>> registered a specific masked_notifier eventfd in order to mask irq. To 
>>> write to
>>> original eventfd still takes effect.
>>>
>>> We may leave the user to decide whether to write to 'masked_notifier' or
>>> original 'guest_notifier' for vhost-net.
>> My fault here. To write to masked_notifier will always be masked:(
> 
> 
> Only when there's no bug in the qemu.
> 
> 
>>
>> If it is EventNotifier level, we will not care whether the EventNotifier is
>> masked or not. It just provides an interface to write to EventNotifier.
> 
> 
> Yes.
> 
> 
>>
>> To dump the MSI-x table for both virtio and vfio will help confirm if the 
>> vector
>> is masked.
> 
> 
> That would be helpful as well. It's probably better to extend "info pci" 
> command.
> 
> Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.

About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.

Thank you very much for all suggestions!

Dongli Zhang

> 
> 
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
> 



Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Jason Wang



在 2021/4/7 上午7:27, Dongli Zhang 写道:

This will answer your question that "Can it bypass the masking?".

For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is
not able to bypass masking because masking is to unregister the eventfd. To
write to eventfd does not take effect.

However, it is possible to bypass masking for vhost-net because vhost-net
registered a specific masked_notifier eventfd in order to mask irq. To write to
original eventfd still takes effect.

We may leave the user to decide whether to write to 'masked_notifier' or
original 'guest_notifier' for vhost-net.

My fault here. To write to masked_notifier will always be masked:(



Only when there's no bug in the qemu.




If it is EventNotifier level, we will not care whether the EventNotifier is
masked or not. It just provides an interface to write to EventNotifier.



Yes.




To dump the MSI-x table for both virtio and vfio will help confirm if the vector
is masked.



That would be helpful as well. It's probably better to extend "info pci" 
command.


Thanks




Thank you very much!

Dongli Zhang






Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Jason Wang



在 2021/4/6 下午4:43, Dongli Zhang 写道:


On 4/5/21 6:55 PM, Jason Wang wrote:

在 2021/4/6 上午4:00, Dongli Zhang 写道:

On 4/1/21 8:47 PM, Jason Wang wrote:

在 2021/3/30 下午3:29, Dongli Zhang 写道:

On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$




... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device
(e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
   kref = {
     refcount = {
   refs = {
     counter = 4
   }
     }
   },
   wqh = {
     lock = {
   {
     rlock = {
   raw_lock = {
     val = {
   counter = 0
     }
   }
     }
   }
     },
     head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
     }
   },
   count = 15, ---> eventfd is 15 !!!
   flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
   "arguments": { "dev": "/machine/peripheral/vscsi0",
  "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing
whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of
virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting
for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfio-pci. Suppose the developer for a
specific device driver suspects IO/networking hangs because of loss if IRQ, 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Dongli Zhang



On 4/6/21 1:43 AM, Dongli Zhang wrote:
> 
> 
> On 4/5/21 6:55 PM, Jason Wang wrote:
>>
>> 在 2021/4/6 上午4:00, Dongli Zhang 写道:
>>>
>>> On 4/1/21 8:47 PM, Jason Wang wrote:
 在 2021/3/30 下午3:29, Dongli Zhang 写道:
> On 3/28/21 8:56 PM, Jason Wang wrote:
>> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>>> Hi Jason,
>>>
>>> On 3/26/21 12:24 AM, Jason Wang wrote:
 在 2021/3/26 下午1:44, Dongli Zhang 写道:
> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due 
> to
> the loss of doorbell kick, e.g.,
>
> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>
>
>
>
> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>
> This patch introduces a new debug interface 'DeviceEvent' to 
> DeviceClass
> to help narrow down if the issue is due to loss of irq/kick. So far 
> the new
> interface handles only two events: 'call' and 'kick'. Any device 
> (e.g.,
> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
> MSI-X
> or legacy IRQ).
>
> The 'call' is to inject irq on purpose by admin for a specific device
> (e.g.,
> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
> doorbell
> on purpose by admin at QEMU/host side for a specific device.
>
>
> This device can be used as a workaround if call/kick is lost due to
> virtualization software (e.g., kernel or QEMU) issue.
>
> We may also implement the interface for VFIO PCI, e.g., to write to
> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to 
> VM
> on purpose. This is considered future work once the virtio part is 
> done.
>
>
> Below is from live crash analysis. Initially, the queue=2 has 
> count=15 for
> 'kick' eventfd_ctx. Suppose there is data in vring avail while there 
> is no
> used available. We suspect this is because vhost-scsi was not 
> notified by
> VM. In order to narrow down and analyze the issue, we use live crash 
> to
> dump the current counter of eventfd for queue=2.
>
> crash> eventfd_ctx 8f67f6bbe700
> struct eventfd_ctx {
>   kref = {
>     refcount = {
>   refs = {
>     counter = 4
>   }
>     }
>   },
>   wqh = {
>     lock = {
>   {
>     rlock = {
>   raw_lock = {
>     val = {
>   counter = 0
>     }
>   }
>     }
>   }
>     },
>     head = {
>   next = 0x8f841dc08e18,
>   prev = 0x8f841dc08e18
>     }
>   },
>   count = 15, ---> eventfd is 15 !!!
>   flags = 526336
> }
>
> Now we kick the doorbell for vhost-scsi queue=2 on purpose for 
> diagnostic
> with this interface.
>
> { "execute": "x-debug-device-event",
>   "arguments": { "dev": "/machine/peripheral/vscsi0",
>  "event": "kick", "queue": 2 } }
>
> The counter is increased to 16. Suppose the hang issue is resolved, it
> indicates something bad is in software that the 'kick' is lost.
 What do you mean by "software" here? And it looks to me you're testing
 whether
 event_notifier_set() is called by virtio_queue_notify() here. If so, 
 I'm not
 sure how much value could we gain from a dedicated debug interface 
 like this
 consider there're a lot of exisinting general purpose debugging method 
 like
 tracing or gdb. I'd say the path from virtio_queue_notify() to
 event_notifier_set() is only a very small fraction of the process of
 virtqueue
 kick which is unlikey to be buggy. Consider usually the ioeventfd will 
 be
 offloaded to KVM, it's more a chance that something is wrong in 
 setuping
 ioeventfd instead of here. Irq is even more complicated.
>>> Thank you very much!
>>>
>>> I am not testing whether event_notifier_set() is called by
>>> virtio_queue_notify().
>>>
>>> The 'software' indicates the data processing and event notification 
>>> mechanism
>>> involved with virtio/vhost PV driver frontend. E.g., 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Dongli Zhang



On 4/5/21 6:55 PM, Jason Wang wrote:
> 
> 在 2021/4/6 上午4:00, Dongli Zhang 写道:
>>
>> On 4/1/21 8:47 PM, Jason Wang wrote:
>>> 在 2021/3/30 下午3:29, Dongli Zhang 写道:
 On 3/28/21 8:56 PM, Jason Wang wrote:
> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>> Hi Jason,
>>
>> On 3/26/21 12:24 AM, Jason Wang wrote:
>>> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
 The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due 
 to
 the loss of doorbell kick, e.g.,

 https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$




 ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
 fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

 This patch introduces a new debug interface 'DeviceEvent' to 
 DeviceClass
 to help narrow down if the issue is due to loss of irq/kick. So far 
 the new
 interface handles only two events: 'call' and 'kick'. Any device (e.g.,
 virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
 MSI-X
 or legacy IRQ).

 The 'call' is to inject irq on purpose by admin for a specific device
 (e.g.,
 vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
 doorbell
 on purpose by admin at QEMU/host side for a specific device.


 This device can be used as a workaround if call/kick is lost due to
 virtualization software (e.g., kernel or QEMU) issue.

 We may also implement the interface for VFIO PCI, e.g., to write to
 VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to 
 VM
 on purpose. This is considered future work once the virtio part is 
 done.


 Below is from live crash analysis. Initially, the queue=2 has count=15 
 for
 'kick' eventfd_ctx. Suppose there is data in vring avail while there 
 is no
 used available. We suspect this is because vhost-scsi was not notified 
 by
 VM. In order to narrow down and analyze the issue, we use live crash to
 dump the current counter of eventfd for queue=2.

 crash> eventfd_ctx 8f67f6bbe700
 struct eventfd_ctx {
   kref = {
     refcount = {
   refs = {
     counter = 4
   }
     }
   },
   wqh = {
     lock = {
   {
     rlock = {
   raw_lock = {
     val = {
   counter = 0
     }
   }
     }
   }
     },
     head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
     }
   },
   count = 15, ---> eventfd is 15 !!!
   flags = 526336
 }

 Now we kick the doorbell for vhost-scsi queue=2 on purpose for 
 diagnostic
 with this interface.

 { "execute": "x-debug-device-event",
   "arguments": { "dev": "/machine/peripheral/vscsi0",
  "event": "kick", "queue": 2 } }

 The counter is increased to 16. Suppose the hang issue is resolved, it
 indicates something bad is in software that the 'kick' is lost.
>>> What do you mean by "software" here? And it looks to me you're testing
>>> whether
>>> event_notifier_set() is called by virtio_queue_notify() here. If so, 
>>> I'm not
>>> sure how much value could we gain from a dedicated debug interface like 
>>> this
>>> consider there're a lot of exisinting general purpose debugging method 
>>> like
>>> tracing or gdb. I'd say the path from virtio_queue_notify() to
>>> event_notifier_set() is only a very small fraction of the process of
>>> virtqueue
>>> kick which is unlikey to be buggy. Consider usually the ioeventfd will 
>>> be
>>> offloaded to KVM, it's more a chance that something is wrong in setuping
>>> ioeventfd instead of here. Irq is even more complicated.
>> Thank you very much!
>>
>> I am not testing whether event_notifier_set() is called by
>> virtio_queue_notify().
>>
>> The 'software' indicates the data processing and event notification 
>> mechanism
>> involved with virtio/vhost PV driver frontend. E.g., while VM is waiting
>> for an
>> extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
>> erroneously returns false due to corrupted ring buffer status.
> So there 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-05 Thread Jason Wang



在 2021/4/6 上午4:00, Dongli Zhang 写道:


On 4/1/21 8:47 PM, Jason Wang wrote:

在 2021/3/30 下午3:29, Dongli Zhang 写道:

On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$



... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
  kref = {
    refcount = {
  refs = {
    counter = 4
  }
    }
  },
  wqh = {
    lock = {
  {
    rlock = {
  raw_lock = {
    val = {
  counter = 0
    }
  }
    }
  }
    },
    head = {
  next = 0x8f841dc08e18,
  prev = 0x8f841dc08e18
    }
  },
  count = 15, ---> eventfd is 15 !!!
  flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
  "arguments": { "dev": "/machine/peripheral/vscsi0",
     "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfio-pci. Suppose the developer for a
specific device driver suspects IO/networking hangs because of loss if IRQ, we
will be able to verify if that assumption is correct by injecting an IRQ on
purpose.

Therefore, this 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-05 Thread Dongli Zhang



On 4/1/21 8:47 PM, Jason Wang wrote:
> 
> 在 2021/3/30 下午3:29, Dongli Zhang 写道:
>>
>> On 3/28/21 8:56 PM, Jason Wang wrote:
>>> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
 Hi Jason,

 On 3/26/21 12:24 AM, Jason Wang wrote:
> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>> the loss of doorbell kick, e.g.,
>>
>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>
>>
>>
>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>
>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>> to help narrow down if the issue is due to loss of irq/kick. So far the 
>> new
>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
>> MSI-X
>> or legacy IRQ).
>>
>> The 'call' is to inject irq on purpose by admin for a specific device 
>> (e.g.,
>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
>> doorbell
>> on purpose by admin at QEMU/host side for a specific device.
>>
>>
>> This device can be used as a workaround if call/kick is lost due to
>> virtualization software (e.g., kernel or QEMU) issue.
>>
>> We may also implement the interface for VFIO PCI, e.g., to write to
>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
>> on purpose. This is considered future work once the virtio part is done.
>>
>>
>> Below is from live crash analysis. Initially, the queue=2 has count=15 
>> for
>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is 
>> no
>> used available. We suspect this is because vhost-scsi was not notified by
>> VM. In order to narrow down and analyze the issue, we use live crash to
>> dump the current counter of eventfd for queue=2.
>>
>> crash> eventfd_ctx 8f67f6bbe700
>> struct eventfd_ctx {
>>  kref = {
>>    refcount = {
>>  refs = {
>>    counter = 4
>>  }
>>    }
>>  },
>>  wqh = {
>>    lock = {
>>  {
>>    rlock = {
>>  raw_lock = {
>>    val = {
>>  counter = 0
>>    }
>>  }
>>    }
>>  }
>>    },
>>    head = {
>>  next = 0x8f841dc08e18,
>>  prev = 0x8f841dc08e18
>>    }
>>  },
>>  count = 15, ---> eventfd is 15 !!!
>>  flags = 526336
>> }
>>
>> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
>> with this interface.
>>
>> { "execute": "x-debug-device-event",
>>  "arguments": { "dev": "/machine/peripheral/vscsi0",
>>     "event": "kick", "queue": 2 } }
>>
>> The counter is increased to 16. Suppose the hang issue is resolved, it
>> indicates something bad is in software that the 'kick' is lost.
> What do you mean by "software" here? And it looks to me you're testing 
> whether
> event_notifier_set() is called by virtio_queue_notify() here. If so, I'm 
> not
> sure how much value could we gain from a dedicated debug interface like 
> this
> consider there're a lot of exisinting general purpose debugging method 
> like
> tracing or gdb. I'd say the path from virtio_queue_notify() to
> event_notifier_set() is only a very small fraction of the process of 
> virtqueue
> kick which is unlikey to be buggy. Consider usually the ioeventfd will be
> offloaded to KVM, it's more a chance that something is wrong in setuping
> ioeventfd instead of here. Irq is even more complicated.
 Thank you very much!

 I am not testing whether event_notifier_set() is called by
 virtio_queue_notify().

 The 'software' indicates the data processing and event notification 
 mechanism
 involved with virtio/vhost PV driver frontend. E.g., while VM is waiting 
 for an
 extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
 erroneously returns false due to corrupted ring buffer status.
>>>
>>> So there could be several factors that may block the notification:
>>>
>>> 1) eventfd bug (ioeventfd vs irqfd)
>>> 2) wrong virtqueue state (either driver or device)
>>> 3) missing barriers (either driver or device)
>>> 4) Qemu bug (irqchip and routing)
>>> ...
>> This is not only about whether notification is blocked.
>>
>> It can also be used to help narrow down and understand if there is any
>> suspicious 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-01 Thread Jason Wang



在 2021/3/30 下午3:29, Dongli Zhang 写道:


On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$


... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
     kref = {
   refcount = {
     refs = {
   counter = 4
     }
   }
     },
     wqh = {
   lock = {
     {
   rlock = {
     raw_lock = {
   val = {
     counter = 0
   }
     }
   }
     }
   },
   head = {
     next = 0x8f841dc08e18,
     prev = 0x8f841dc08e18
   }
     },
     count = 15, ---> eventfd is 15 !!!
     flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
     "arguments": { "dev": "/machine/peripheral/vscsi0",
    "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.


So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfio-pci. Suppose the developer for a
specific device driver suspects IO/networking hangs because of loss if IRQ, we
will be able to verify if that assumption is correct by injecting an IRQ on 
purpose.

Therefore, this is not only about virtio PV driver (frontend/backend), but also
used to help analyze the issue 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-30 Thread Dongli Zhang



On 3/28/21 8:56 PM, Jason Wang wrote:
> 
> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>> Hi Jason,
>>
>> On 3/26/21 12:24 AM, Jason Wang wrote:
>>> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
 The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
 the loss of doorbell kick, e.g.,

 https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$


 ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
 fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

 This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
 to help narrow down if the issue is due to loss of irq/kick. So far the new
 interface handles only two events: 'call' and 'kick'. Any device (e.g.,
 virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
 or legacy IRQ).

 The 'call' is to inject irq on purpose by admin for a specific device 
 (e.g.,
 vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
 on purpose by admin at QEMU/host side for a specific device.


 This device can be used as a workaround if call/kick is lost due to
 virtualization software (e.g., kernel or QEMU) issue.

 We may also implement the interface for VFIO PCI, e.g., to write to
 VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
 on purpose. This is considered future work once the virtio part is done.


 Below is from live crash analysis. Initially, the queue=2 has count=15 for
 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
 used available. We suspect this is because vhost-scsi was not notified by
 VM. In order to narrow down and analyze the issue, we use live crash to
 dump the current counter of eventfd for queue=2.

 crash> eventfd_ctx 8f67f6bbe700
 struct eventfd_ctx {
     kref = {
   refcount = {
     refs = {
   counter = 4
     }
   }
     },
     wqh = {
   lock = {
     {
   rlock = {
     raw_lock = {
   val = {
     counter = 0
   }
     }
   }
     }
   },
   head = {
     next = 0x8f841dc08e18,
     prev = 0x8f841dc08e18
   }
     },
     count = 15, ---> eventfd is 15 !!!
     flags = 526336
 }

 Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
 with this interface.

 { "execute": "x-debug-device-event",
     "arguments": { "dev": "/machine/peripheral/vscsi0",
    "event": "kick", "queue": 2 } }

 The counter is increased to 16. Suppose the hang issue is resolved, it
 indicates something bad is in software that the 'kick' is lost.
>>> What do you mean by "software" here? And it looks to me you're testing 
>>> whether
>>> event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
>>> sure how much value could we gain from a dedicated debug interface like this
>>> consider there're a lot of exisinting general purpose debugging method like
>>> tracing or gdb. I'd say the path from virtio_queue_notify() to
>>> event_notifier_set() is only a very small fraction of the process of 
>>> virtqueue
>>> kick which is unlikey to be buggy. Consider usually the ioeventfd will be
>>> offloaded to KVM, it's more a chance that something is wrong in setuping
>>> ioeventfd instead of here. Irq is even more complicated.
>> Thank you very much!
>>
>> I am not testing whether event_notifier_set() is called by 
>> virtio_queue_notify().
>>
>> The 'software' indicates the data processing and event notification mechanism
>> involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for 
>> an
>> extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
>> erroneously returns false due to corrupted ring buffer status.
> 
> 
> So there could be several factors that may block the notification:
> 
> 1) eventfd bug (ioeventfd vs irqfd)
> 2) wrong virtqueue state (either driver or device)
> 3) missing barriers (either driver or device)
> 4) Qemu bug (irqchip and routing)
> ...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-28 Thread Jason Wang



在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
    kref = {
  refcount = {
    refs = {
  counter = 4
    }
  }
    },
    wqh = {
  lock = {
    {
  rlock = {
    raw_lock = {
  val = {
    counter = 0
  }
    }
  }
    }
  },
  head = {
    next = 0x8f841dc08e18,
    prev = 0x8f841dc08e18
  }
    },
    count = 15, ---> eventfd is 15 !!!
    flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
    "arguments": { "dev": "/machine/peripheral/vscsi0",
   "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.



So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

Consider we want to debug virtio issue, only 2) or 3) is what we really 
cared.


So for kick you did (assume vhost is on):

virtio_device_event_kick()
    virtio_queue_notify()
        event_notifier_set()

It looks to me you're actaully testing if ioeventfd is correctly set by 
Qemu.


For call you did:

virtio_device_event_call()
    event_notifier_set()

A test of irqfd is correctly set by Qemu. So all of those are not virtio 
specific stuffs but you introduce virtio specific command to do debug 
non virtio functions.


In the case of what you mentioned for vring_need_event(), what we really 
want is to dump the virtqueue state from the guest. This might requries 
some work of extending virtio spec (e.g to dump device status like 
indices, event, wrap counters).




This was initially proposed for vhost only and I was going to export
ioeventfd/irqfd from vhost to admin via sysfs. Finally, I realized I would
better implement this at QEMU.

The QEMU inits the eventfd (ioeventfd and irqfd), and offloads them to
KVM/vhost. The VM 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-26 Thread Dongli Zhang
Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:
> 
> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>> the loss of doorbell kick, e.g.,
>>
>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>
>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>
>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>> to help narrow down if the issue is due to loss of irq/kick. So far the new
>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
>> or legacy IRQ).
>>
>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>> on purpose by admin at QEMU/host side for a specific device.
>>
>>
>> This device can be used as a workaround if call/kick is lost due to
>> virtualization software (e.g., kernel or QEMU) issue.
>>
>> We may also implement the interface for VFIO PCI, e.g., to write to
>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
>> on purpose. This is considered future work once the virtio part is done.
>>
>>
>> Below is from live crash analysis. Initially, the queue=2 has count=15 for
>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
>> used available. We suspect this is because vhost-scsi was not notified by
>> VM. In order to narrow down and analyze the issue, we use live crash to
>> dump the current counter of eventfd for queue=2.
>>
>> crash> eventfd_ctx 8f67f6bbe700
>> struct eventfd_ctx {
>>    kref = {
>>  refcount = {
>>    refs = {
>>  counter = 4
>>    }
>>  }
>>    },
>>    wqh = {
>>  lock = {
>>    {
>>  rlock = {
>>    raw_lock = {
>>  val = {
>>    counter = 0
>>  }
>>    }
>>  }
>>    }
>>  },
>>  head = {
>>    next = 0x8f841dc08e18,
>>    prev = 0x8f841dc08e18
>>  }
>>    },
>>    count = 15, ---> eventfd is 15 !!!
>>    flags = 526336
>> }
>>
>> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
>> with this interface.
>>
>> { "execute": "x-debug-device-event",
>>    "arguments": { "dev": "/machine/peripheral/vscsi0",
>>   "event": "kick", "queue": 2 } }
>>
>> The counter is increased to 16. Suppose the hang issue is resolved, it
>> indicates something bad is in software that the 'kick' is lost.
> 
> 
> What do you mean by "software" here? And it looks to me you're testing whether
> event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
> sure how much value could we gain from a dedicated debug interface like this
> consider there're a lot of exisinting general purpose debugging method like
> tracing or gdb. I'd say the path from virtio_queue_notify() to
> event_notifier_set() is only a very small fraction of the process of virtqueue
> kick which is unlikey to be buggy. Consider usually the ioeventfd will be
> offloaded to KVM, it's more a chance that something is wrong in setuping
> ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

This was initially proposed for vhost only and I was going to export
ioeventfd/irqfd from vhost to admin via sysfs. Finally, I realized I would
better implement this at QEMU.

The QEMU inits the eventfd (ioeventfd and irqfd), and offloads them to
KVM/vhost. The VM side sends requests to ring buffer and kicks the doorbell (via
ioeventfd), while the backend vhost side sends responses back and calls the IRQ
(via ioeventfd).

Unfortunately, sometimes there is issue with virtio/vhost so that kick/call was
missed/ignored, or even never triggered. The example mentioned in the patchset
cover letter is with virtio-net (I assume vhost=on), where a kick to ioventfd
was ignored, due to pci-bridge/hotplug issue.

The hotplug is with a very small window but the IO hangs permanently. I did test
that kicking the doorbell again will help recover the IO, so that I would be
able to conclude this was due to lost of kick.

The loss of irq/doorbell is painful especially in production environment where
we are not able to attach to QEMU via gdb. While the patchset is only for QEMU,
Xen PV driver used to experience loss of IRQ issue 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-26 Thread Jason Wang



在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
   kref = {
 refcount = {
   refs = {
 counter = 4
   }
 }
   },
   wqh = {
 lock = {
   {
 rlock = {
   raw_lock = {
 val = {
   counter = 0
 }
   }
 }
   }
 },
 head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
 }
   },
   count = 15, ---> eventfd is 15 !!!
   flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
   "arguments": { "dev": "/machine/peripheral/vscsi0",
  "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.



What do you mean by "software" here? And it looks to me you're testing 
whether event_notifier_set() is called by virtio_queue_notify() here. If 
so, I'm not sure how much value could we gain from a dedicated debug 
interface like this consider there're a lot of exisinting general 
purpose debugging method like tracing or gdb. I'd say the path from 
virtio_queue_notify() to event_notifier_set() is only a very small 
fraction of the process of virtqueue kick which is unlikey to be buggy. 
Consider usually the ioeventfd will be offloaded to KVM, it's more a 
chance that something is wrong in setuping ioeventfd instead of here. 
Irq is even more complicated.


I think we could not gain much for introducing an dedicated mechanism 
for such a corner case.


Thanks




crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
   kref = {
 refcount = {
   refs = {
 counter = 4
   }
 }
   },
   wqh = {
 lock = {
   {
 rlock = {
   raw_lock = {
 val = {
   counter = 0
 }
   }
 }
   }
 },
 head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
 }
   },
   count = 16, ---> eventfd incremented to 16 !!!
   flags = 526336
}


Original RFC link:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html

Changed since RFC:
   - add support for more virtio/vhost pci devices
   - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this
 mischeivous command had been used
   - fix grammer error (s/lost/loss/)
   - change version to 6.1
   - fix incorrect example in qapi/qdev.json
   - manage event types with enum/array, instead of hard coding


Dongli Zhang (6):
qdev: introduce qapi/hmp command for kick/call event
virtio: introduce helper function for kick/call device event
virtio-blk-pci: implement device event interface for kick/call
virtio-scsi-pci: implement device event interface for kick/call
vhost-scsi-pci: implement device event interface for kick/call
virtio-net-pci: implement device event interface for kick/call

  hmp-commands.hx | 14 
  hw/block/virtio-blk.c   |  9 +
  hw/net/virtio-net.c |  9 +
  hw/scsi/vhost-scsi.c|  6 
  hw/scsi/virtio-scsi.c   |  9 +
  hw/virtio/vhost-scsi-pci.c  | 10 ++
  hw/virtio/virtio-blk-pci.c  | 10 ++
  hw/virtio/virtio-net-pci.c  | 10 ++
  hw/virtio/virtio-scsi-pci.c | 10 ++
  hw/virtio/virtio.c

[PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-25 Thread Dongli Zhang
The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
  kref = {
refcount = {
  refs = {
counter = 4
  }
}
  }, 
  wqh = {
lock = {
  {
rlock = {
  raw_lock = {
val = {
  counter = 0
}
  }
}
  }
}, 
head = {
  next = 0x8f841dc08e18, 
  prev = 0x8f841dc08e18
}
  }, 
  count = 15, ---> eventfd is 15 !!!
  flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
  "arguments": { "dev": "/machine/peripheral/vscsi0",
 "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
  kref = {
refcount = {
  refs = {
counter = 4
  }
}
  },
  wqh = {
lock = {
  {
rlock = {
  raw_lock = {
val = {
  counter = 0
}
  }
}
  }
},
head = {
  next = 0x8f841dc08e18,
  prev = 0x8f841dc08e18
}
  },
  count = 16, ---> eventfd incremented to 16 !!!
  flags = 526336
}


Original RFC link:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html

Changed since RFC:
  - add support for more virtio/vhost pci devices
  - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this 
mischeivous command had been used
  - fix grammer error (s/lost/loss/)
  - change version to 6.1
  - fix incorrect example in qapi/qdev.json
  - manage event types with enum/array, instead of hard coding


Dongli Zhang (6):
   qdev: introduce qapi/hmp command for kick/call event
   virtio: introduce helper function for kick/call device event
   virtio-blk-pci: implement device event interface for kick/call
   virtio-scsi-pci: implement device event interface for kick/call
   vhost-scsi-pci: implement device event interface for kick/call
   virtio-net-pci: implement device event interface for kick/call

 hmp-commands.hx | 14 
 hw/block/virtio-blk.c   |  9 +
 hw/net/virtio-net.c |  9 +
 hw/scsi/vhost-scsi.c|  6 
 hw/scsi/virtio-scsi.c   |  9 +
 hw/virtio/vhost-scsi-pci.c  | 10 ++
 hw/virtio/virtio-blk-pci.c  | 10 ++
 hw/virtio/virtio-net-pci.c  | 10 ++
 hw/virtio/virtio-scsi-pci.c | 10 ++
 hw/virtio/virtio.c  | 64 
 include/hw/qdev-core.h  |  9 +
 include/hw/virtio/vhost-scsi.h  |  3 ++
 include/hw/virtio/virtio-blk.h  |  2 ++
 include/hw/virtio/virtio-net.h  |  3 ++
 include/hw/virtio/virtio-scsi.h |  3 ++
 include/hw/virtio/virtio.h  |  3 ++
 include/monitor/hmp.h   |  1 +
 qapi/qdev.json  | 30 +
 softmmu/qdev-monitor.c  | 56 +++
 19 files changed, 261 insertions(+)


I did tests with below cases.

- virtio-blk-pci (ioeventfd on/off, iothread, live migration)
- virtio-scsi-pci (ioeventfd on/off)
- vhost-scsi-pci
- virtio-net-pci (ioeventfd on/off, vhost)

Thank you very much!

Dongli Zhang