Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-27 Thread Ilya Maximets
On 9/26/23 00:24, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2023 at 12:13:11AM +0200, Ilya Maximets wrote:
>> On 9/25/23 23:24, Michael S. Tsirkin wrote:
>>> On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
 On 9/25/23 17:38, Stefan Hajnoczi wrote:
> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets  wrote:
>>
>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:

 On 9/25/23 16:23, Stefan Hajnoczi wrote:
> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  
> wrote:
>>
>> We do not need the most up to date number of heads, we only want to
>> know if there is at least one.
>>
>> Use shadow variable as long as it is not equal to the last available
>> index checked.  This avoids expensive qatomic dereference of the
>> RCU-protected memory region cache as well as the memory access itself
>> and the subsequent memory barrier.
>>
>> The change improves performance of the af-xdp network backend by 
>> 2-3%.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  hw/virtio/virtio.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 309038fd46..04bf7cc977 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
>> VirtQueueElement *elem,
>>  /* Called within rcu_read_lock().  */
>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>  {
>> -uint16_t num_heads = vring_avail_idx(vq) - idx;
>> +uint16_t num_heads;
>> +
>> +if (vq->shadow_avail_idx != idx) {
>> +num_heads = vq->shadow_avail_idx - idx;
>> +
>> +return num_heads;
>
> This still needs to check num_heads > vq->vring.num and return -EINVAL
> as is done below.

 Hmm, yeas, you're right.  If the value was incorrect initially, the 
 shadow
 will be incorrect.  However, I think we should just not return here in 
 this
 case and let vring_avail_idx() to grab an actual new value below.  
 Otherwise
 we may never break out of this error.

 Does that make sense?
>>>
>>> No, because virtio_error() marks the device as broken. The device
>>> requires a reset in order to function again. Fetching
>>> vring_avail_idx() again won't help.
>>
>> OK, I see.  In this case we're talking about situation where
>> vring_avail_idx() was called in some other place and stored a bad value
>> in the shadow variable, then virtqueue_num_heads() got called.  Right?

 Hmm, I suppose we also need a read barrier after all even if we use
 a shadow index.  Assuming the index is correct, but the shadow variable
 was updated by a call outside of this function, then we may miss a
 barrier and read the descriptor out of order, in theory.  Read barrier
 is going to be a compiler barrier on x86, so the performance gain from
 this patch should still be mostly there.  I'll test that.
>>>
>>> I can't say I understand generally. shadow is under qemu control,
>>> I don't think it can be updated concurrently by multiple CPUs.
>>
>> It can't, I agree.  Scenario I'm thinking about is the following:
>>
>> 1. vring_avail_idx() is called from one of the places other than
>>virtqueue_num_heads().  Shadow is updated with the current value.
>>Some users of vring_avail_idx() do not use barriers after the call.
>>
>> 2. virtqueue_split_get_avail_bytes() is called.
>>
>> 3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().
>>
>> 4. virtqueue_num_heads() checks the shadow and returns early.
>>
>> 5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
>>reads the descriptor.
>>
>> If between steps 1 and 5 we do not have a read barrier, we potentially
>> risk reading descriptor data that is not yet fully written, because
>> there is no guarantee that reading the last_avail_idx on step 1 wasn't
>> reordered with the descriptor read.
>>
>> In current code we always have smp_rmb() in virtqueue_num_heads().
>> But if we return from this function without a barrier, we may have an
>> issue, IIUC.
>>
>> I agree that it's kind of a very unlikely scenario and we will probably
>> have a control dependency between steps 1 and 5 that will prevent the
>> issue, but it might be safer to just have an explicit barrier in
>> virtqueue_num_heads().
>>
>> Does that make sense?  Or am I missing something else here?
> 
> Aha, got it. Good point, yes. Pls document in a code comment.

Done.  Posted v2:
  
https://lore.kernel.org/qemu-devel/20230927135157.2316982-1-i.maxim...@ovn.org/

> 
> 

Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Michael S. Tsirkin
On Tue, Sep 26, 2023 at 12:13:11AM +0200, Ilya Maximets wrote:
> On 9/25/23 23:24, Michael S. Tsirkin wrote:
> > On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
> >> On 9/25/23 17:38, Stefan Hajnoczi wrote:
> >>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets  wrote:
> 
>  On 9/25/23 17:12, Stefan Hajnoczi wrote:
> > On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:
> >>
> >> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> >>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  
> >>> wrote:
> 
>  We do not need the most up to date number of heads, we only want to
>  know if there is at least one.
> 
>  Use shadow variable as long as it is not equal to the last available
>  index checked.  This avoids expensive qatomic dereference of the
>  RCU-protected memory region cache as well as the memory access itself
>  and the subsequent memory barrier.
> 
>  The change improves performance of the af-xdp network backend by 
>  2-3%.
> 
>  Signed-off-by: Ilya Maximets 
>  ---
>   hw/virtio/virtio.c | 10 +-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
>  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>  index 309038fd46..04bf7cc977 100644
>  --- a/hw/virtio/virtio.c
>  +++ b/hw/virtio/virtio.c
>  @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
>  VirtQueueElement *elem,
>   /* Called within rcu_read_lock().  */
>   static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>   {
>  -uint16_t num_heads = vring_avail_idx(vq) - idx;
>  +uint16_t num_heads;
>  +
>  +if (vq->shadow_avail_idx != idx) {
>  +num_heads = vq->shadow_avail_idx - idx;
>  +
>  +return num_heads;
> >>>
> >>> This still needs to check num_heads > vq->vring.num and return -EINVAL
> >>> as is done below.
> >>
> >> Hmm, yeas, you're right.  If the value was incorrect initially, the 
> >> shadow
> >> will be incorrect.  However, I think we should just not return here in 
> >> this
> >> case and let vring_avail_idx() to grab an actual new value below.  
> >> Otherwise
> >> we may never break out of this error.
> >>
> >> Does that make sense?
> >
> > No, because virtio_error() marks the device as broken. The device
> > requires a reset in order to function again. Fetching
> > vring_avail_idx() again won't help.
> 
>  OK, I see.  In this case we're talking about situation where
>  vring_avail_idx() was called in some other place and stored a bad value
>  in the shadow variable, then virtqueue_num_heads() got called.  Right?
> >>
> >> Hmm, I suppose we also need a read barrier after all even if we use
> >> a shadow index.  Assuming the index is correct, but the shadow variable
> >> was updated by a call outside of this function, then we may miss a
> >> barrier and read the descriptor out of order, in theory.  Read barrier
> >> is going to be a compiler barrier on x86, so the performance gain from
> >> this patch should still be mostly there.  I'll test that.
> > 
> > I can't say I understand generally. shadow is under qemu control,
> > I don't think it can be updated concurrently by multiple CPUs.
> 
> It can't, I agree.  Scenario I'm thinking about is the following:
> 
> 1. vring_avail_idx() is called from one of the places other than
>virtqueue_num_heads().  Shadow is updated with the current value.
>Some users of vring_avail_idx() do not use barriers after the call.
> 
> 2. virtqueue_split_get_avail_bytes() is called.
> 
> 3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().
> 
> 4. virtqueue_num_heads() checks the shadow and returns early.
> 
> 5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
>reads the descriptor.
> 
> If between steps 1 and 5 we do not have a read barrier, we potentially
> risk reading descriptor data that is not yet fully written, because
> there is no guarantee that reading the last_avail_idx on step 1 wasn't
> reordered with the descriptor read.
> 
> In current code we always have smp_rmb() in virtqueue_num_heads().
> But if we return from this function without a barrier, we may have an
> issue, IIUC.
> 
> I agree that it's kind of a very unlikely scenario and we will probably
> have a control dependency between steps 1 and 5 that will prevent the
> issue, but it might be safer to just have an explicit barrier in
> virtqueue_num_heads().
> 
> Does that make sense?  Or am I missing something else here?

Aha, got it. Good point, yes. Pls document in a code comment.


> > 
> > 
> 
>  AFAIU, we can still just fall through here and let vring_avail_idx()
>  to read the index again and fail the existing check.  That would happen
> 

Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Ilya Maximets
On 9/25/23 23:24, Michael S. Tsirkin wrote:
> On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
>> On 9/25/23 17:38, Stefan Hajnoczi wrote:
>>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets  wrote:

 On 9/25/23 17:12, Stefan Hajnoczi wrote:
> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:
>>
>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:

 We do not need the most up to date number of heads, we only want to
 know if there is at least one.

 Use shadow variable as long as it is not equal to the last available
 index checked.  This avoids expensive qatomic dereference of the
 RCU-protected memory region cache as well as the memory access itself
 and the subsequent memory barrier.

 The change improves performance of the af-xdp network backend by 2-3%.

 Signed-off-by: Ilya Maximets 
 ---
  hw/virtio/virtio.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

 diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
 index 309038fd46..04bf7cc977 100644
 --- a/hw/virtio/virtio.c
 +++ b/hw/virtio/virtio.c
 @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
 VirtQueueElement *elem,
  /* Called within rcu_read_lock().  */
  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
  {
 -uint16_t num_heads = vring_avail_idx(vq) - idx;
 +uint16_t num_heads;
 +
 +if (vq->shadow_avail_idx != idx) {
 +num_heads = vq->shadow_avail_idx - idx;
 +
 +return num_heads;
>>>
>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>> as is done below.
>>
>> Hmm, yeas, you're right.  If the value was incorrect initially, the 
>> shadow
>> will be incorrect.  However, I think we should just not return here in 
>> this
>> case and let vring_avail_idx() to grab an actual new value below.  
>> Otherwise
>> we may never break out of this error.
>>
>> Does that make sense?
>
> No, because virtio_error() marks the device as broken. The device
> requires a reset in order to function again. Fetching
> vring_avail_idx() again won't help.

 OK, I see.  In this case we're talking about situation where
 vring_avail_idx() was called in some other place and stored a bad value
 in the shadow variable, then virtqueue_num_heads() got called.  Right?
>>
>> Hmm, I suppose we also need a read barrier after all even if we use
>> a shadow index.  Assuming the index is correct, but the shadow variable
>> was updated by a call outside of this function, then we may miss a
>> barrier and read the descriptor out of order, in theory.  Read barrier
>> is going to be a compiler barrier on x86, so the performance gain from
>> this patch should still be mostly there.  I'll test that.
> 
> I can't say I understand generally. shadow is under qemu control,
> I don't think it can be updated concurrently by multiple CPUs.

It can't, I agree.  Scenario I'm thinking about is the following:

1. vring_avail_idx() is called from one of the places other than
   virtqueue_num_heads().  Shadow is updated with the current value.
   Some users of vring_avail_idx() do not use barriers after the call.

2. virtqueue_split_get_avail_bytes() is called.

3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().

4. virtqueue_num_heads() checks the shadow and returns early.

5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
   reads the descriptor.

If between steps 1 and 5 we do not have a read barrier, we potentially
risk reading descriptor data that is not yet fully written, because
there is no guarantee that reading the last_avail_idx on step 1 wasn't
reordered with the descriptor read.

In current code we always have smp_rmb() in virtqueue_num_heads().
But if we return from this function without a barrier, we may have an
issue, IIUC.

I agree that it's kind of a very unlikely scenario and we will probably
have a control dependency between steps 1 and 5 that will prevent the
issue, but it might be safer to just have an explicit barrier in
virtqueue_num_heads().

Does that make sense?  Or am I missing something else here?

> 
> 

 AFAIU, we can still just fall through here and let vring_avail_idx()
 to read the index again and fail the existing check.  That would happen
 today without this patch applied.
>>>
>>> Yes, that is fine.
>>>

 I'm jut trying to avoid duplication of the virtio_error call, i.e.:

 if (vq->shadow_avail_idx != idx) {
 num_heads = vq->shadow_avail_idx - idx;

 /* Check it isn't doing very strange things with descriptor 
 numbers. */
 if (num_heads 

Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Michael S. Tsirkin
On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
> On 9/25/23 17:38, Stefan Hajnoczi wrote:
> > On Mon, 25 Sept 2023 at 11:36, Ilya Maximets  wrote:
> >>
> >> On 9/25/23 17:12, Stefan Hajnoczi wrote:
> >>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:
> 
>  On 9/25/23 16:23, Stefan Hajnoczi wrote:
> > On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:
> >>
> >> We do not need the most up to date number of heads, we only want to
> >> know if there is at least one.
> >>
> >> Use shadow variable as long as it is not equal to the last available
> >> index checked.  This avoids expensive qatomic dereference of the
> >> RCU-protected memory region cache as well as the memory access itself
> >> and the subsequent memory barrier.
> >>
> >> The change improves performance of the af-xdp network backend by 2-3%.
> >>
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>  hw/virtio/virtio.c | 10 +-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 309038fd46..04bf7cc977 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
> >> VirtQueueElement *elem,
> >>  /* Called within rcu_read_lock().  */
> >>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
> >>  {
> >> -uint16_t num_heads = vring_avail_idx(vq) - idx;
> >> +uint16_t num_heads;
> >> +
> >> +if (vq->shadow_avail_idx != idx) {
> >> +num_heads = vq->shadow_avail_idx - idx;
> >> +
> >> +return num_heads;
> >
> > This still needs to check num_heads > vq->vring.num and return -EINVAL
> > as is done below.
> 
>  Hmm, yeas, you're right.  If the value was incorrect initially, the 
>  shadow
>  will be incorrect.  However, I think we should just not return here in 
>  this
>  case and let vring_avail_idx() to grab an actual new value below.  
>  Otherwise
>  we may never break out of this error.
> 
>  Does that make sense?
> >>>
> >>> No, because virtio_error() marks the device as broken. The device
> >>> requires a reset in order to function again. Fetching
> >>> vring_avail_idx() again won't help.
> >>
> >> OK, I see.  In this case we're talking about situation where
> >> vring_avail_idx() was called in some other place and stored a bad value
> >> in the shadow variable, then virtqueue_num_heads() got called.  Right?
> 
> Hmm, I suppose we also need a read barrier after all even if we use
> a shadow index.  Assuming the index is correct, but the shadow variable
> was updated by a call outside of this function, then we may miss a
> barrier and read the descriptor out of order, in theory.  Read barrier
> is going to be a compiler barrier on x86, so the performance gain from
> this patch should still be mostly there.  I'll test that.

I can't say I understand generally. shadow is under qemu control,
I don't think it can be updated concurrently by multiple CPUs.


> >>
> >> AFAIU, we can still just fall through here and let vring_avail_idx()
> >> to read the index again and fail the existing check.  That would happen
> >> today without this patch applied.
> > 
> > Yes, that is fine.
> > 
> >>
> >> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
> >>
> >> if (vq->shadow_avail_idx != idx) {
> >> num_heads = vq->shadow_avail_idx - idx;
> >>
> >> /* Check it isn't doing very strange things with descriptor 
> >> numbers. */
> >> if (num_heads > vq->vring.num) {
> >> virtio_error(vq->vdev, "Guest moved used index from %u to %u",
> >>  idx, vq->shadow_avail_idx);
> >> return -EINVAL;
> >> }
> >> return num_heads;
> >> }
> >>
> >> vs
> >>
> >> if (vq->shadow_avail_idx != idx) {
> >> num_heads = vq->shadow_avail_idx - idx;
> >>
> >> /* Only use the shadow value if it was good initially. */
> >> if (num_heads <= vq->vring.num) {
> >> return num_heads;
> >> }
> >> }
> >>
> >>
> >> What do you think?
> > 
> > Sounds good.
> > 
> >>
> >> Best regards, Ilya Maximets.




Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Ilya Maximets
On 9/25/23 17:38, Stefan Hajnoczi wrote:
> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets  wrote:
>>
>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:

 On 9/25/23 16:23, Stefan Hajnoczi wrote:
> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:
>>
>> We do not need the most up to date number of heads, we only want to
>> know if there is at least one.
>>
>> Use shadow variable as long as it is not equal to the last available
>> index checked.  This avoids expensive qatomic dereference of the
>> RCU-protected memory region cache as well as the memory access itself
>> and the subsequent memory barrier.
>>
>> The change improves performance of the af-xdp network backend by 2-3%.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  hw/virtio/virtio.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 309038fd46..04bf7cc977 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
>> VirtQueueElement *elem,
>>  /* Called within rcu_read_lock().  */
>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>  {
>> -uint16_t num_heads = vring_avail_idx(vq) - idx;
>> +uint16_t num_heads;
>> +
>> +if (vq->shadow_avail_idx != idx) {
>> +num_heads = vq->shadow_avail_idx - idx;
>> +
>> +return num_heads;
>
> This still needs to check num_heads > vq->vring.num and return -EINVAL
> as is done below.

 Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
 will be incorrect.  However, I think we should just not return here in this
 case and let vring_avail_idx() to grab an actual new value below.  
 Otherwise
 we may never break out of this error.

 Does that make sense?
>>>
>>> No, because virtio_error() marks the device as broken. The device
>>> requires a reset in order to function again. Fetching
>>> vring_avail_idx() again won't help.
>>
>> OK, I see.  In this case we're talking about situation where
>> vring_avail_idx() was called in some other place and stored a bad value
>> in the shadow variable, then virtqueue_num_heads() got called.  Right?

Hmm, I suppose we also need a read barrier after all even if we use
a shadow index.  Assuming the index is correct, but the shadow variable
was updated by a call outside of this function, then we may miss a
barrier and read the descriptor out of order, in theory.  Read barrier
is going to be a compiler barrier on x86, so the performance gain from
this patch should still be mostly there.  I'll test that.

>>
>> AFAIU, we can still just fall through here and let vring_avail_idx()
>> to read the index again and fail the existing check.  That would happen
>> today without this patch applied.
> 
> Yes, that is fine.
> 
>>
>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>>
>> if (vq->shadow_avail_idx != idx) {
>> num_heads = vq->shadow_avail_idx - idx;
>>
>> /* Check it isn't doing very strange things with descriptor numbers. 
>> */
>> if (num_heads > vq->vring.num) {
>> virtio_error(vq->vdev, "Guest moved used index from %u to %u",
>>  idx, vq->shadow_avail_idx);
>> return -EINVAL;
>> }
>> return num_heads;
>> }
>>
>> vs
>>
>> if (vq->shadow_avail_idx != idx) {
>> num_heads = vq->shadow_avail_idx - idx;
>>
>> /* Only use the shadow value if it was good initially. */
>> if (num_heads <= vq->vring.num) {
>> return num_heads;
>> }
>> }
>>
>>
>> What do you think?
> 
> Sounds good.
> 
>>
>> Best regards, Ilya Maximets.




Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Stefan Hajnoczi
On Mon, 25 Sept 2023 at 11:36, Ilya Maximets  wrote:
>
> On 9/25/23 17:12, Stefan Hajnoczi wrote:
> > On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:
> >>
> >> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> >>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:
> 
>  We do not need the most up to date number of heads, we only want to
>  know if there is at least one.
> 
>  Use shadow variable as long as it is not equal to the last available
>  index checked.  This avoids expensive qatomic dereference of the
>  RCU-protected memory region cache as well as the memory access itself
>  and the subsequent memory barrier.
> 
>  The change improves performance of the af-xdp network backend by 2-3%.
> 
>  Signed-off-by: Ilya Maximets 
>  ---
>   hw/virtio/virtio.c | 10 +-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
>  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>  index 309038fd46..04bf7cc977 100644
>  --- a/hw/virtio/virtio.c
>  +++ b/hw/virtio/virtio.c
>  @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
>  VirtQueueElement *elem,
>   /* Called within rcu_read_lock().  */
>   static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>   {
>  -uint16_t num_heads = vring_avail_idx(vq) - idx;
>  +uint16_t num_heads;
>  +
>  +if (vq->shadow_avail_idx != idx) {
>  +num_heads = vq->shadow_avail_idx - idx;
>  +
>  +return num_heads;
> >>>
> >>> This still needs to check num_heads > vq->vring.num and return -EINVAL
> >>> as is done below.
> >>
> >> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
> >> will be incorrect.  However, I think we should just not return here in this
> >> case and let vring_avail_idx() to grab an actual new value below.  
> >> Otherwise
> >> we may never break out of this error.
> >>
> >> Does that make sense?
> >
> > No, because virtio_error() marks the device as broken. The device
> > requires a reset in order to function again. Fetching
> > vring_avail_idx() again won't help.
>
> OK, I see.  In this case we're talking about situation where
> vring_avail_idx() was called in some other place and stored a bad value
> in the shadow variable, then virtqueue_num_heads() got called.  Right?
>
> AFAIU, we can still just fall through here and let vring_avail_idx()
> to read the index again and fail the existing check.  That would happen
> today without this patch applied.

Yes, that is fine.

>
> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>
> if (vq->shadow_avail_idx != idx) {
> num_heads = vq->shadow_avail_idx - idx;
>
> /* Check it isn't doing very strange things with descriptor numbers. 
> */
> if (num_heads > vq->vring.num) {
> virtio_error(vq->vdev, "Guest moved used index from %u to %u",
>  idx, vq->shadow_avail_idx);
> return -EINVAL;
> }
> return num_heads;
> }
>
> vs
>
> if (vq->shadow_avail_idx != idx) {
> num_heads = vq->shadow_avail_idx - idx;
>
> /* Only use the shadow value if it was good initially. */
> if (num_heads <= vq->vring.num) {
> return num_heads;
> }
> }
>
>
> What do you think?

Sounds good.

>
> Best regards, Ilya Maximets.



Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Ilya Maximets
On 9/25/23 17:12, Stefan Hajnoczi wrote:
> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:
>>
>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:

 We do not need the most up to date number of heads, we only want to
 know if there is at least one.

 Use shadow variable as long as it is not equal to the last available
 index checked.  This avoids expensive qatomic dereference of the
 RCU-protected memory region cache as well as the memory access itself
 and the subsequent memory barrier.

 The change improves performance of the af-xdp network backend by 2-3%.

 Signed-off-by: Ilya Maximets 
 ---
  hw/virtio/virtio.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

 diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
 index 309038fd46..04bf7cc977 100644
 --- a/hw/virtio/virtio.c
 +++ b/hw/virtio/virtio.c
 @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
 VirtQueueElement *elem,
  /* Called within rcu_read_lock().  */
  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
  {
 -uint16_t num_heads = vring_avail_idx(vq) - idx;
 +uint16_t num_heads;
 +
 +if (vq->shadow_avail_idx != idx) {
 +num_heads = vq->shadow_avail_idx - idx;
 +
 +return num_heads;
>>>
>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>> as is done below.
>>
>> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
>> will be incorrect.  However, I think we should just not return here in this
>> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
>> we may never break out of this error.
>>
>> Does that make sense?
> 
> No, because virtio_error() marks the device as broken. The device
> requires a reset in order to function again. Fetching
> vring_avail_idx() again won't help.

OK, I see.  In this case we're talking about situation where
vring_avail_idx() was called in some other place and stored a bad value
in the shadow variable, then virtqueue_num_heads() got called.  Right?

AFAIU, we can still just fall through here and let vring_avail_idx()
to read the index again and fail the existing check.  That would happen
today without this patch applied.

I'm jut trying to avoid duplication of the virtio_error call, i.e.:

if (vq->shadow_avail_idx != idx) {
num_heads = vq->shadow_avail_idx - idx;

/* Check it isn't doing very strange things with descriptor numbers. */
if (num_heads > vq->vring.num) {
virtio_error(vq->vdev, "Guest moved used index from %u to %u",
 idx, vq->shadow_avail_idx);
return -EINVAL;
}
return num_heads;
}

vs

if (vq->shadow_avail_idx != idx) {
num_heads = vq->shadow_avail_idx - idx;

/* Only use the shadow value if it was good initially. */
if (num_heads <= vq->vring.num) {
return num_heads;
}
}


What do you think?

Best regards, Ilya Maximets.



Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Stefan Hajnoczi
On Mon, 25 Sept 2023 at 11:02, Ilya Maximets  wrote:
>
> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> > On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:
> >>
> >> We do not need the most up to date number of heads, we only want to
> >> know if there is at least one.
> >>
> >> Use shadow variable as long as it is not equal to the last available
> >> index checked.  This avoids expensive qatomic dereference of the
> >> RCU-protected memory region cache as well as the memory access itself
> >> and the subsequent memory barrier.
> >>
> >> The change improves performance of the af-xdp network backend by 2-3%.
> >>
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>  hw/virtio/virtio.c | 10 +-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 309038fd46..04bf7cc977 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
> >> VirtQueueElement *elem,
> >>  /* Called within rcu_read_lock().  */
> >>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
> >>  {
> >> -uint16_t num_heads = vring_avail_idx(vq) - idx;
> >> +uint16_t num_heads;
> >> +
> >> +if (vq->shadow_avail_idx != idx) {
> >> +num_heads = vq->shadow_avail_idx - idx;
> >> +
> >> +return num_heads;
> >
> > This still needs to check num_heads > vq->vring.num and return -EINVAL
> > as is done below.
>
> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
> will be incorrect.  However, I think we should just not return here in this
> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
> we may never break out of this error.
>
> Does that make sense?

No, because virtio_error() marks the device as broken. The device
requires a reset in order to function again. Fetching
vring_avail_idx() again won't help.

>
> >
> >> +}
> >> +
> >> +num_heads = vring_avail_idx(vq) - idx;
> >>
> >>  /* Check it isn't doing very strange things with descriptor numbers. 
> >> */
> >>  if (num_heads > vq->vring.num) {
> >> --
> >> 2.40.1
> >>
> >>
>



Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Ilya Maximets
On 9/25/23 16:23, Stefan Hajnoczi wrote:
> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:
>>
>> We do not need the most up to date number of heads, we only want to
>> know if there is at least one.
>>
>> Use shadow variable as long as it is not equal to the last available
>> index checked.  This avoids expensive qatomic dereference of the
>> RCU-protected memory region cache as well as the memory access itself
>> and the subsequent memory barrier.
>>
>> The change improves performance of the af-xdp network backend by 2-3%.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  hw/virtio/virtio.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 309038fd46..04bf7cc977 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
>> VirtQueueElement *elem,
>>  /* Called within rcu_read_lock().  */
>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>  {
>> -uint16_t num_heads = vring_avail_idx(vq) - idx;
>> +uint16_t num_heads;
>> +
>> +if (vq->shadow_avail_idx != idx) {
>> +num_heads = vq->shadow_avail_idx - idx;
>> +
>> +return num_heads;
> 
> This still needs to check num_heads > vq->vring.num and return -EINVAL
> as is done below.

Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
will be incorrect.  However, I think we should just not return here in this
case and let vring_avail_idx() to grab an actual new value below.  Otherwise
we may never break out of this error.

Does that make sense?

> 
>> +}
>> +
>> +num_heads = vring_avail_idx(vq) - idx;
>>
>>  /* Check it isn't doing very strange things with descriptor numbers. */
>>  if (num_heads > vq->vring.num) {
>> --
>> 2.40.1
>>
>>




Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Stefan Hajnoczi
On Fri, 25 Aug 2023 at 13:04, Ilya Maximets  wrote:
>
> We do not need the most up to date number of heads, we only want to
> know if there is at least one.
>
> Use shadow variable as long as it is not equal to the last available
> index checked.  This avoids expensive qatomic dereference of the
> RCU-protected memory region cache as well as the memory access itself
> and the subsequent memory barrier.
>
> The change improves performance of the af-xdp network backend by 2-3%.
>
> Signed-off-by: Ilya Maximets 
> ---
>  hw/virtio/virtio.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 309038fd46..04bf7cc977 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
> VirtQueueElement *elem,
>  /* Called within rcu_read_lock().  */
>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>  {
> -uint16_t num_heads = vring_avail_idx(vq) - idx;
> +uint16_t num_heads;
> +
> +if (vq->shadow_avail_idx != idx) {
> +num_heads = vq->shadow_avail_idx - idx;
> +
> +return num_heads;

This still needs to check num_heads > vq->vring.num and return -EINVAL
as is done below.

> +}
> +
> +num_heads = vring_avail_idx(vq) - idx;
>
>  /* Check it isn't doing very strange things with descriptor numbers. */
>  if (num_heads > vq->vring.num) {
> --
> 2.40.1
>
>



Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-09-25 Thread Ilya Maximets
On 8/25/23 19:04, Ilya Maximets wrote:
> We do not need the most up to date number of heads, we only want to
> know if there is at least one.
> 
> Use shadow variable as long as it is not equal to the last available
> index checked.  This avoids expensive qatomic dereference of the
> RCU-protected memory region cache as well as the memory access itself
> and the subsequent memory barrier.
> 
> The change improves performance of the af-xdp network backend by 2-3%.
> 
> Signed-off-by: Ilya Maximets 
> ---

Kind reminder.

Best regards, Ilya Maximets.

>  hw/virtio/virtio.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 309038fd46..04bf7cc977 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
> VirtQueueElement *elem,
>  /* Called within rcu_read_lock().  */
>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>  {
> -uint16_t num_heads = vring_avail_idx(vq) - idx;
> +uint16_t num_heads;
> +
> +if (vq->shadow_avail_idx != idx) {
> +num_heads = vq->shadow_avail_idx - idx;
> +
> +return num_heads;
> +}
> +
> +num_heads = vring_avail_idx(vq) - idx;
>  
>  /* Check it isn't doing very strange things with descriptor numbers. */
>  if (num_heads > vq->vring.num) {




[PATCH] virtio: use shadow_avail_idx while checking number of heads

2023-08-25 Thread Ilya Maximets
We do not need the most up to date number of heads, we only want to
know if there is at least one.

Use shadow variable as long as it is not equal to the last available
index checked.  This avoids expensive qatomic dereference of the
RCU-protected memory region cache as well as the memory access itself
and the subsequent memory barrier.

The change improves performance of the af-xdp network backend by 2-3%.

Signed-off-by: Ilya Maximets 
---
 hw/virtio/virtio.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 309038fd46..04bf7cc977 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement 
*elem,
 /* Called within rcu_read_lock().  */
 static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
 {
-uint16_t num_heads = vring_avail_idx(vq) - idx;
+uint16_t num_heads;
+
+if (vq->shadow_avail_idx != idx) {
+num_heads = vq->shadow_avail_idx - idx;
+
+return num_heads;
+}
+
+num_heads = vring_avail_idx(vq) - idx;
 
 /* Check it isn't doing very strange things with descriptor numbers. */
 if (num_heads > vq->vring.num) {
-- 
2.40.1