Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-12-15 Thread Cornelia Huck
On Tue, 15 Dec 2020 12:33:34 +0100
Halil Pasic  wrote:

> On Tue, 15 Dec 2020 09:26:56 +0100
> Cornelia Huck  wrote:

> > Do we have a better idea now about which values would make sense here?
> >   
> 
> Hi Conny!
> 
> I have nothing new since then. Capping at 4 queues still looks like a
> reasonable compromise to me.  @Mimu: anything new since then?
> 
> If you like I can spin a new version (we need compat handling now).
> 
> Halil
> 

If your tests show that 4 queues are a reasonable value, this is fine
by me (with an explanatory comment in the code.)

Feel free to respin.




Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-12-15 Thread Halil Pasic
On Tue, 15 Dec 2020 09:26:56 +0100
Cornelia Huck  wrote:

> On Tue, 10 Nov 2020 14:18:39 +0100
> Christian Borntraeger  wrote:
> 
> > On 10.11.20 11:40, Halil Pasic wrote:
> > > On Tue, 10 Nov 2020 09:47:51 +0100
> > > Christian Borntraeger  wrote:
> > >   
> > >>
> > >>
> > >> On 09.11.20 19:53, Halil Pasic wrote:  
> > >>> On Mon, 9 Nov 2020 17:06:16 +0100
> > >>> Cornelia Huck  wrote:
> > >>>  
> > > @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> > > *ccw_dev, Error **errp)
> > >  {
> > >  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
> > >  DeviceState *vdev = DEVICE(&dev->vdev);
> > > +VirtIOBlkConf *conf = &dev->vdev.conf;
> > > +
> > > +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> > > +conf->num_queues = MIN(4, current_machine->smp.cpus);
> > > +}
> > 
> >  I would like to have a comment explaining the numbers here, however.
> > 
> >  virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
> >  possible, apply some other capping). 4 seems to be a bit arbitrary
> >  without explanation, although I'm sure you did some measurements :)  
> > >>>
> > >>> Frankly, I don't have any measurements yet. For the secure case,
> > >>> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
> > >>> the cc list. @Mimu can you help us out.
> > >>>
> > >>> Regarding the normal non-protected VMs I'm in a middle of producing some
> > >>> measurement data. This was admittedly a bit rushed because of where we
> > >>> are in the cycle. Sorry to disappoint you.
> > >>>
> > >>> The number 4 was suggested by Christian, maybe Christian does have some
> > >>> readily available measurement data for the normal VM case. @Christian:
> > >>> can you help me out?  
> > >> My point was to find a balance between performance gain and memory usage.
> > >> As a matter of fact, virtqueue do consume memory. So 4 looked like a
> > >> reasonable default for me for large guests as long as we do not have 
> > >> directed
> > >> interrupts.
> > >>
> > >> Now, thinking about this again: If we want to change the default to 
> > >> something
> > >> else in the future (e.g. to num vcpus) then the compat handling will get
> > >> really complicated.  
> > > 
> > > Regarding compat handling, I believe we would need a new property for
> > > virtio-blk-ccw: something like def_num_queues_max. Then logic would
> > > morph to MIN(def_num_queues_max, current_machine->smp.cpus), and we could
> > > relatively freely do compat stuff on def_num_queues_max.
> > > 
> > > IMHO not pretty but certainly doable.
> > >   
> > >>
> > >> So we can
> > >> - go with num queues = num cpus. But this will consume memory
> > >> for guests with lots of CPUs.  
> > > 
> > > In absence of data that showcases the benefit outweighing the obvious
> > > detriment, I lean towards finding this option the least favorable.
> > >   
> > >> - go with the proposed logic of min(4,vcpus) and accept that future 
> > >> compat handling
> > >> is harder  
> > > 
> > > IMHO not a bad option, but I think I would still feel better about a
> > > more informed decision. In the end, the end user can already specify the
> > > num_queues explicitly, so I don't think this is urgent.
> > >   
> > >> - defer this change  
> > > 
> > > So I tend to lean towards deferring.  
> > 
> > Yes, I was pushing this for 5.2 to avoid compat handling. But maybe it is 
> > better
> > to wait and do it later. But we should certainly continue the discussion to 
> > have
> > something for the next release.
> 
> 
> 
> Do we have a better idea now about which values would make sense here?
> 

Hi Conny!

I have nothing new since then. Capping at 4 queues still looks like a
reasonable compromise to me.  @Mimu: anything new since then?

If you like I can spin a new version (we need compat handling now).

Halil



Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-12-15 Thread Cornelia Huck
On Tue, 10 Nov 2020 14:18:39 +0100
Christian Borntraeger  wrote:

> On 10.11.20 11:40, Halil Pasic wrote:
> > On Tue, 10 Nov 2020 09:47:51 +0100
> > Christian Borntraeger  wrote:
> >   
> >>
> >>
> >> On 09.11.20 19:53, Halil Pasic wrote:  
> >>> On Mon, 9 Nov 2020 17:06:16 +0100
> >>> Cornelia Huck  wrote:
> >>>  
> > @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> > *ccw_dev, Error **errp)
> >  {
> >  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
> >  DeviceState *vdev = DEVICE(&dev->vdev);
> > +VirtIOBlkConf *conf = &dev->vdev.conf;
> > +
> > +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> > +conf->num_queues = MIN(4, current_machine->smp.cpus);
> > +}
> 
>  I would like to have a comment explaining the numbers here, however.
> 
>  virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
>  possible, apply some other capping). 4 seems to be a bit arbitrary
>  without explanation, although I'm sure you did some measurements :)  
> >>>
> >>> Frankly, I don't have any measurements yet. For the secure case,
> >>> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
> >>> the cc list. @Mimu can you help us out.
> >>>
> >>> Regarding the normal non-protected VMs I'm in a middle of producing some
> >>> measurement data. This was admittedly a bit rushed because of where we
> >>> are in the cycle. Sorry to disappoint you.
> >>>
> >>> The number 4 was suggested by Christian, maybe Christian does have some
> >>> readily available measurement data for the normal VM case. @Christian:
> >>> can you help me out?  
> >> My point was to find a balance between performance gain and memory usage.
> >> As a matter of fact, virtqueue do consume memory. So 4 looked like a
> >> reasonable default for me for large guests as long as we do not have 
> >> directed
> >> interrupts.
> >>
> >> Now, thinking about this again: If we want to change the default to 
> >> something
> >> else in the future (e.g. to num vcpus) then the compat handling will get
> >> really complicated.  
> > 
> > Regarding compat handling, I believe we would need a new property for
> > virtio-blk-ccw: something like def_num_queues_max. Then logic would
> > morph to MIN(def_num_queues_max, current_machine->smp.cpus), and we could
> > relatively freely do compat stuff on def_num_queues_max.
> > 
> > IMHO not pretty but certainly doable.
> >   
> >>
> >> So we can
> >> - go with num queues = num cpus. But this will consume memory
> >> for guests with lots of CPUs.  
> > 
> > In absence of data that showcases the benefit outweighing the obvious
> > detriment, I lean towards finding this option the least favorable.
> >   
> >> - go with the proposed logic of min(4,vcpus) and accept that future compat 
> >> handling
> >> is harder  
> > 
> > IMHO not a bad option, but I think I would still feel better about a
> > more informed decision. In the end, the end user can already specify the
> > num_queues explicitly, so I don't think this is urgent.
> >   
> >> - defer this change  
> > 
> > So I tend to lean towards deferring.  
> 
> Yes, I was pushing this for 5.2 to avoid compat handling. But maybe it is 
> better
> to wait and do it later. But we should certainly continue the discussion to 
> have
> something for the next release.



Do we have a better idea now about which values would make sense here?

> 
> > 
> > Another thought is, provided the load is about evenly spread on the
> > different virtqueues, if the game is about vCPU locality, one could
> > think of decreasing the size of each individual virtqueue while
> > increasing their number, with the idea of not paying much more in terms
> > of memory. The queue size however needs to be a power of 2,
> > so there is a limit on the granularity.
> > 
> > Regarding secure VMs, currently we have to cramp at least the swiotlb and
> > the virtqueues into ZONE_DMA. So increasing the number of
> > virtqueues heavily may get us into new trouble with exotic configs.
> > 
> > Regards,
> > Halil
> >   
> 




Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-12 Thread Halil Pasic
On Wed, 11 Nov 2020 13:49:08 +0100
Michael Mueller  wrote:

> Halil,
> 
> still I would like to know what the exact memory consumption per queue
> is that you are talking about. Have you made a calculation? Thanks.

Hi! 

The default size for virtio-blk seems to be 256 ring entries, which
translates to 6668 bytes for the split ring(s). The queue size is user
configurable, and guest, in theory, can decide to have a smaller queue.
The indirect descriptors are allocated separately, and bounced via
swiotlb in case of secure guests.

Regards,
Halil




Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-11 Thread Michael Mueller




On 11.11.20 13:38, Cornelia Huck wrote:

On Wed, 11 Nov 2020 13:26:11 +0100
Michael Mueller  wrote:


On 10.11.20 15:16, Michael Mueller wrote:



On 09.11.20 19:53, Halil Pasic wrote:

On Mon, 9 Nov 2020 17:06:16 +0100
Cornelia Huck  wrote:
  

@@ -20,6 +21,11 @@ static void
virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp)
   {
   VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
   DeviceState *vdev = DEVICE(&dev->vdev);
+    VirtIOBlkConf *conf = &dev->vdev.conf;
+
+    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+    conf->num_queues = MIN(4, current_machine->smp.cpus);
+    }


I would like to have a comment explaining the numbers here, however.

virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
possible, apply some other capping). 4 seems to be a bit arbitrary
without explanation, although I'm sure you did some measurements :)


Frankly, I don't have any measurements yet. For the secure case,
I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
the cc list. @Mimu can you help us out.

Regarding the normal non-protected VMs I'm in a middle of producing some
measurement data. This was admittedly a bit rushed because of where we
are in the cycle. Sorry to disappoint you.


I'm talking with the perf team tomorrow. They have done some
measurements with multiqueue for PV guests and I asked for a comparison
to non PV guests as well.


The perf team has performed measurements for us that show that a *PV
KVM guest* benefits in terms of throughput for random read, random write
and sequential read (no difference for sequential write) by a multi
queue setup. CPU cost are reduced as well due to reduced spinlock
contention.


Just to be clear, that was with 4 queues?


Yes, we have seen it with 4 and also with 9 queues.

Halil,

still I would like to know what the exact memory consumption per queue
is that you are talking about. Have you made a calculation? Thanks.





For a *standard KVM guest* it currently has no throughput effect. No
benefit and no harm. I have asked them to finalize their measurements
by comparing the CPU cost as well. I will receive that information on
Friday.


Thank you for checking!



Michael




Michael
   


The number 4 was suggested by Christian, maybe Christian does have some
readily available measurement data for the normal VM case. @Christian:
can you help me out?

Regards,
Halil
  
   










Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-11 Thread Cornelia Huck
On Wed, 11 Nov 2020 13:26:11 +0100
Michael Mueller  wrote:

> On 10.11.20 15:16, Michael Mueller wrote:
> > 
> > 
> > On 09.11.20 19:53, Halil Pasic wrote:  
> >> On Mon, 9 Nov 2020 17:06:16 +0100
> >> Cornelia Huck  wrote:
> >>  
>  @@ -20,6 +21,11 @@ static void 
>  virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp)
>    {
>    VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
>    DeviceState *vdev = DEVICE(&dev->vdev);
>  +    VirtIOBlkConf *conf = &dev->vdev.conf;
>  +
>  +    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
>  +    conf->num_queues = MIN(4, current_machine->smp.cpus);
>  +    }  
> >>>
> >>> I would like to have a comment explaining the numbers here, however.
> >>>
> >>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
> >>> possible, apply some other capping). 4 seems to be a bit arbitrary
> >>> without explanation, although I'm sure you did some measurements :)  
> >>
> >> Frankly, I don't have any measurements yet. For the secure case,
> >> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
> >> the cc list. @Mimu can you help us out.
> >>
> >> Regarding the normal non-protected VMs I'm in a middle of producing some
> >> measurement data. This was admittedly a bit rushed because of where we
> >> are in the cycle. Sorry to disappoint you.  
> > 
> > I'm talking with the perf team tomorrow. They have done some 
> > measurements with multiqueue for PV guests and I asked for a comparison 
> > to non PV guests as well.  
> 
> The perf team has performed measurements for us that show that a *PV
> KVM guest* benefits in terms of throughput for random read, random write
> and sequential read (no difference for sequential write) by a multi
> queue setup. CPU cost are reduced as well due to reduced spinlock
> contention.

Just to be clear, that was with 4 queues?

> 
> For a *standard KVM guest* it currently has no throughput effect. No
> benefit and no harm. I have asked them to finalize their measurements
> by comparing the CPU cost as well. I will receive that information on 
> Friday.

Thank you for checking!

> 
> Michael
> 
> 
> > 
> > Michael
> >   
> >>
> >> The number 4 was suggested by Christian, maybe Christian does have some
> >> readily available measurement data for the normal VM case. @Christian:
> >> can you help me out?
> >>
> >> Regards,
> >> Halil
> >>  
> >   
> 
> 




Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-11 Thread Michael Mueller




On 10.11.20 15:16, Michael Mueller wrote:



On 09.11.20 19:53, Halil Pasic wrote:

On Mon, 9 Nov 2020 17:06:16 +0100
Cornelia Huck  wrote:

@@ -20,6 +21,11 @@ static void 
virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp)

  {
  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
  DeviceState *vdev = DEVICE(&dev->vdev);
+    VirtIOBlkConf *conf = &dev->vdev.conf;
+
+    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+    conf->num_queues = MIN(4, current_machine->smp.cpus);
+    }


I would like to have a comment explaining the numbers here, however.

virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
possible, apply some other capping). 4 seems to be a bit arbitrary
without explanation, although I'm sure you did some measurements :)


Frankly, I don't have any measurements yet. For the secure case,
I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
the cc list. @Mimu can you help us out.

Regarding the normal non-protected VMs I'm in a middle of producing some
measurement data. This was admittedly a bit rushed because of where we
are in the cycle. Sorry to disappoint you.


I'm talking with the perf team tomorrow. They have done some 
measurements with multiqueue for PV guests and I asked for a comparison 
to non PV guests as well.


The perf team has performed measurements for us that show that a *PV
KVM guest* benefits in terms of throughput for random read, random write
and sequential read (no difference for sequential write) by a multi
queue setup. CPU cost are reduced as well due to reduced spinlock
contention.

For a *standard KVM guest* it currently has no throughput effect. No
benefit and no harm. I have asked them to finalize their measurements
by comparing the CPU cost as well. I will receive that information on 
Friday.


Michael




Michael



The number 4 was suggested by Christian, maybe Christian does have some
readily available measurement data for the normal VM case. @Christian:
can you help me out?

Regards,
Halil









Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-10 Thread Michael Mueller




On 09.11.20 19:53, Halil Pasic wrote:

On Mon, 9 Nov 2020 17:06:16 +0100
Cornelia Huck  wrote:


@@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, 
Error **errp)
  {
  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
  DeviceState *vdev = DEVICE(&dev->vdev);
+VirtIOBlkConf *conf = &dev->vdev.conf;
+
+if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+conf->num_queues = MIN(4, current_machine->smp.cpus);
+}


I would like to have a comment explaining the numbers here, however.

virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
possible, apply some other capping). 4 seems to be a bit arbitrary
without explanation, although I'm sure you did some measurements :)


Frankly, I don't have any measurements yet. For the secure case,
I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
the cc list. @Mimu can you help us out.

Regarding the normal non-protected VMs I'm in a middle of producing some
measurement data. This was admittedly a bit rushed because of where we
are in the cycle. Sorry to disappoint you.


I'm talking with the perf team tomorrow. They have done some 
measurements with multiqueue for PV guests and I asked for a comparison 
to non PV guests as well.


Michael



The number 4 was suggested by Christian, maybe Christian does have some
readily available measurement data for the normal VM case. @Christian:
can you help me out?

Regards,
Halil



--
Mit freundlichen Grüßen / Kind regards
Michael Müller

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-10 Thread Christian Borntraeger



On 10.11.20 11:40, Halil Pasic wrote:
> On Tue, 10 Nov 2020 09:47:51 +0100
> Christian Borntraeger  wrote:
> 
>>
>>
>> On 09.11.20 19:53, Halil Pasic wrote:
>>> On Mon, 9 Nov 2020 17:06:16 +0100
>>> Cornelia Huck  wrote:
>>>
> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> *ccw_dev, Error **errp)
>  {
>  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
>  DeviceState *vdev = DEVICE(&dev->vdev);
> +VirtIOBlkConf *conf = &dev->vdev.conf;
> +
> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> +conf->num_queues = MIN(4, current_machine->smp.cpus);
> +}  

 I would like to have a comment explaining the numbers here, however.

 virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
 possible, apply some other capping). 4 seems to be a bit arbitrary
 without explanation, although I'm sure you did some measurements :)
>>>
>>> Frankly, I don't have any measurements yet. For the secure case,
>>> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
>>> the cc list. @Mimu can you help us out.
>>>
>>> Regarding the normal non-protected VMs I'm in a middle of producing some
>>> measurement data. This was admittedly a bit rushed because of where we
>>> are in the cycle. Sorry to disappoint you.
>>>
>>> The number 4 was suggested by Christian, maybe Christian does have some
>>> readily available measurement data for the normal VM case. @Christian:
>>> can you help me out?
>> My point was to find a balance between performance gain and memory usage.
>> As a matter of fact, virtqueue do consume memory. So 4 looked like a
>> reasonable default for me for large guests as long as we do not have directed
>> interrupts.
>>
>> Now, thinking about this again: If we want to change the default to something
>> else in the future (e.g. to num vcpus) then the compat handling will get
>> really complicated.
> 
> Regarding compat handling, I believe we would need a new property for
> virtio-blk-ccw: something like def_num_queues_max. Then logic would
> morph to MIN(def_num_queues_max, current_machine->smp.cpus), and we could
> relatively freely do compat stuff on def_num_queues_max.
> 
> IMHO not pretty but certainly doable.
> 
>>
>> So we can
>> - go with num queues = num cpus. But this will consume memory
>> for guests with lots of CPUs.
> 
> In absence of data that showcases the benefit outweighing the obvious
> detriment, I lean towards finding this option the least favorable.
> 
>> - go with the proposed logic of min(4,vcpus) and accept that future compat 
>> handling
>> is harder
> 
> IMHO not a bad option, but I think I would still feel better about a
> more informed decision. In the end, the end user can already specify the
> num_queues explicitly, so I don't think this is urgent.
> 
>> - defer this change
> 
> So I tend to lean towards deferring.

Yes, I was pushing this for 5.2 to avoid compat handling. But maybe it is better
to wait and do it later. But we should certainly continue the discussion to have
something for the next release.

> 
> Another thought is, provided the load is about evenly spread on the
> different virtqueues, if the game is about vCPU locality, one could
> think of decreasing the size of each individual virtqueue while
> increasing their number, with the idea of not paying much more in terms
> of memory. The queue size however needs to be a power of 2,
> so there is a limit on the granularity.
> 
> Regarding secure VMs, currently we have to cramp at least the swiotlb and
> the virtqueues into ZONE_DMA. So increasing the number of
> virtqueues heavily may get us into new trouble with exotic configs.
> 
> Regards,
> Halil
> 



Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-10 Thread Halil Pasic
On Tue, 10 Nov 2020 09:47:51 +0100
Christian Borntraeger  wrote:

> 
> 
> On 09.11.20 19:53, Halil Pasic wrote:
> > On Mon, 9 Nov 2020 17:06:16 +0100
> > Cornelia Huck  wrote:
> > 
> >>> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> >>> *ccw_dev, Error **errp)
> >>>  {
> >>>  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
> >>>  DeviceState *vdev = DEVICE(&dev->vdev);
> >>> +VirtIOBlkConf *conf = &dev->vdev.conf;
> >>> +
> >>> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> >>> +conf->num_queues = MIN(4, current_machine->smp.cpus);
> >>> +}  
> >>
> >> I would like to have a comment explaining the numbers here, however.
> >>
> >> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
> >> possible, apply some other capping). 4 seems to be a bit arbitrary
> >> without explanation, although I'm sure you did some measurements :)
> > 
> > Frankly, I don't have any measurements yet. For the secure case,
> > I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
> > the cc list. @Mimu can you help us out.
> > 
> > Regarding the normal non-protected VMs I'm in a middle of producing some
> > measurement data. This was admittedly a bit rushed because of where we
> > are in the cycle. Sorry to disappoint you.
> > 
> > The number 4 was suggested by Christian, maybe Christian does have some
> > readily available measurement data for the normal VM case. @Christian:
> > can you help me out?
> My point was to find a balance between performance gain and memory usage.
> As a matter of fact, virtqueue do consume memory. So 4 looked like a
> reasonable default for me for large guests as long as we do not have directed
> interrupts.
> 
> Now, thinking about this again: If we want to change the default to something
> else in the future (e.g. to num vcpus) then the compat handling will get
> really complicated.

Regarding compat handling, I believe we would need a new property for
virtio-blk-ccw: something like def_num_queues_max. Then logic would
morph to MIN(def_num_queues_max, current_machine->smp.cpus), and we could
relatively freely do compat stuff on def_num_queues_max.

IMHO not pretty but certainly doable.

> 
> So we can
> - go with num queues = num cpus. But this will consume memory
> for guests with lots of CPUs.

In absence of data that showcases the benefit outweighing the obvious
detriment, I lean towards finding this option the least favorable.

> - go with the proposed logic of min(4,vcpus) and accept that future compat 
> handling
> is harder

IMHO not a bad option, but I think I would still feel better about a
more informed decision. In the end, the end user can already specify the
num_queues explicitly, so I don't think this is urgent.

> - defer this change

So I tend to lean towards deferring.

Another thought is, provided the load is about evenly spread on the
different virtqueues, if the game is about vCPU locality, one could
think of decreasing the size of each individual virtqueue while
increasing their number, with the idea of not paying much more in terms
of memory. The queue size however needs to be a power of 2,
so there is a limit on the granularity.

Regarding secure VMs, currently we have to cramp at least the swiotlb and
the virtqueues into ZONE_DMA. So increasing the number of
virtqueues heavily may get us into new trouble with exotic configs.

Regards,
Halil



Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-10 Thread Cornelia Huck
On Tue, 10 Nov 2020 09:47:51 +0100
Christian Borntraeger  wrote:

> On 09.11.20 19:53, Halil Pasic wrote:
> > On Mon, 9 Nov 2020 17:06:16 +0100
> > Cornelia Huck  wrote:
> >   
> >>> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> >>> *ccw_dev, Error **errp)
> >>>  {
> >>>  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
> >>>  DeviceState *vdev = DEVICE(&dev->vdev);
> >>> +VirtIOBlkConf *conf = &dev->vdev.conf;
> >>> +
> >>> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> >>> +conf->num_queues = MIN(4, current_machine->smp.cpus);
> >>> +}
> >>
> >> I would like to have a comment explaining the numbers here, however.
> >>
> >> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
> >> possible, apply some other capping). 4 seems to be a bit arbitrary
> >> without explanation, although I'm sure you did some measurements :)  
> > 
> > Frankly, I don't have any measurements yet. For the secure case,
> > I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
> > the cc list. @Mimu can you help us out.
> > 
> > Regarding the normal non-protected VMs I'm in a middle of producing some
> > measurement data. This was admittedly a bit rushed because of where we
> > are in the cycle. Sorry to disappoint you.
> > 
> > The number 4 was suggested by Christian, maybe Christian does have some
> > readily available measurement data for the normal VM case. @Christian:
> > can you help me out?  
> My point was to find a balance between performance gain and memory usage.
> As a matter of fact, virtqueue do consume memory. So 4 looked like a
> reasonable default for me for large guests as long as we do not have directed
> interrupts.

Yes, 4 does not look like a bad number, but I still don't feel really
comfortable with it without at least some data.

What about large guests with slow vs. fast storage?

> 
> Now, thinking about this again: If we want to change the default to something
> else in the future (e.g. to num vcpus) then the compat handling will get
> really complicated.

Yes, I fear that will be messy. Just picking a value later will need
compat handling, but not a really complicated one.

> 
> So we can
> - go with num queues = num cpus. But this will consume memory
> for guests with lots of CPUs.

I'm not sure that would be a good choice, as we don't have the benefits
that pci has.

> - go with the proposed logic of min(4,vcpus) and accept that future compat 
> handling
> is harder

With a bit more data, I'd be way more comfortable. Might still be ok
for the next rc.

> - defer this change

We might end up with that, given the timing :( (not blaming anyone)




Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-10 Thread Christian Borntraeger



On 09.11.20 19:53, Halil Pasic wrote:
> On Mon, 9 Nov 2020 17:06:16 +0100
> Cornelia Huck  wrote:
> 
>>> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
>>> *ccw_dev, Error **errp)
>>>  {
>>>  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
>>>  DeviceState *vdev = DEVICE(&dev->vdev);
>>> +VirtIOBlkConf *conf = &dev->vdev.conf;
>>> +
>>> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
>>> +conf->num_queues = MIN(4, current_machine->smp.cpus);
>>> +}  
>>
>> I would like to have a comment explaining the numbers here, however.
>>
>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
>> possible, apply some other capping). 4 seems to be a bit arbitrary
>> without explanation, although I'm sure you did some measurements :)
> 
> Frankly, I don't have any measurements yet. For the secure case,
> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
> the cc list. @Mimu can you help us out.
> 
> Regarding the normal non-protected VMs I'm in a middle of producing some
> measurement data. This was admittedly a bit rushed because of where we
> are in the cycle. Sorry to disappoint you.
> 
> The number 4 was suggested by Christian, maybe Christian does have some
> readily available measurement data for the normal VM case. @Christian:
> can you help me out?
My point was to find a balance between performance gain and memory usage.
As a matter of fact, virtqueue do consume memory. So 4 looked like a
reasonable default for me for large guests as long as we do not have directed
interrupts.

Now, thinking about this again: If we want to change the default to something
else in the future (e.g. to num vcpus) then the compat handling will get
really complicated.

So we can
- go with num queues = num cpus. But this will consume memory
for guests with lots of CPUs.
- go with the proposed logic of min(4,vcpus) and accept that future compat 
handling
is harder
- defer this change



Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-09 Thread Halil Pasic
On Mon, 9 Nov 2020 17:06:16 +0100
Cornelia Huck  wrote:

> > @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> > *ccw_dev, Error **errp)
> >  {
> >  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
> >  DeviceState *vdev = DEVICE(&dev->vdev);
> > +VirtIOBlkConf *conf = &dev->vdev.conf;
> > +
> > +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> > +conf->num_queues = MIN(4, current_machine->smp.cpus);
> > +}  
> 
> I would like to have a comment explaining the numbers here, however.
> 
> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
> possible, apply some other capping). 4 seems to be a bit arbitrary
> without explanation, although I'm sure you did some measurements :)

Frankly, I don't have any measurements yet. For the secure case,
I think Mimu has assessed the impact of multiqueue, hence adding Mimu to
the cc list. @Mimu can you help us out.

Regarding the normal non-protected VMs I'm in a middle of producing some
measurement data. This was admittedly a bit rushed because of where we
are in the cycle. Sorry to disappoint you.

The number 4 was suggested by Christian, maybe Christian does have some
readily available measurement data for the normal VM case. @Christian:
can you help me out?

Regards,
Halil



Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-09 Thread Cornelia Huck
On Mon,  9 Nov 2020 16:48:31 +0100
Halil Pasic  wrote:

> Currently the default value of num_queues is effectively 1 for
> virtio-blk-ccw. Recently 9445e1e15e ("virtio-blk-pci: default num_queues
> to -smp N") changed the default for pci to the number of vcpus, citing
> interrupt better locality and better performance as a rationale.
> 
> While virtio-blk-ccw does not yet benefit from better interrupt
> locality, measurements have shown that for secure VMs multiqueue does
> help with performance. Since the bounce buffering happens with the queue
> lock held (in the guest) this is hardly a surprise.
> 
> As for non-secure VMs the extra queues shouldn't hurt too much.
> 
> Suggested-by: Christian Borntraeger 
> Signed-off-by: Halil Pasic 
> ---
> 
> We would prefer to land this for 5.2. If we do then commit 9445e1e15e
> ("virtio-blk-pci: default num_queues to -smp N") took care of all the
> necessary compat handling.
> 
> If that's not possible, I will send a version that does the necessary
> compat handling.

I think we can still get this into 5.2, and that would indeed be less
hassle.

> ---
>  hw/s390x/virtio-ccw-blk.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/s390x/virtio-ccw-blk.c b/hw/s390x/virtio-ccw-blk.c
> index 2294ce1ce4..7296140dde 100644
> --- a/hw/s390x/virtio-ccw-blk.c
> +++ b/hw/s390x/virtio-ccw-blk.c
> @@ -10,6 +10,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "hw/boards.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/virtio/virtio.h"
>  #include "qapi/error.h"
> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> *ccw_dev, Error **errp)
>  {
>  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
>  DeviceState *vdev = DEVICE(&dev->vdev);
> +VirtIOBlkConf *conf = &dev->vdev.conf;
> +
> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> +conf->num_queues = MIN(4, current_machine->smp.cpus);
> +}

I would like to have a comment explaining the numbers here, however.

virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if
possible, apply some other capping). 4 seems to be a bit arbitrary
without explanation, although I'm sure you did some measurements :)

Do we also want something similar for virtio-scsi (and vhost-scsi)?

>  
>  qdev_realize(vdev, BUS(&ccw_dev->bus), errp);
>  }
> 
> base-commit: 2a190a7256a3e0563b29ffd67e0164097b4a6dac




Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-09 Thread Christian Borntraeger



On 09.11.20 16:48, Halil Pasic wrote:
> Currently the default value of num_queues is effectively 1 for
> virtio-blk-ccw. Recently 9445e1e15e ("virtio-blk-pci: default num_queues
> to -smp N") changed the default for pci to the number of vcpus, citing
> interrupt better locality and better performance as a rationale.
> 
> While virtio-blk-ccw does not yet benefit from better interrupt
> locality, measurements have shown that for secure VMs multiqueue does
> help with performance. Since the bounce buffering happens with the queue
> lock held (in the guest) this is hardly a surprise.
> 
> As for non-secure VMs the extra queues shouldn't hurt too much.

In fact we have also seen improvments for multiqueues for non secure guests
as one virtqueue seems to have a hard capping in terms of bandwidth that can
be smaller than newer storage devices.

> 
> Suggested-by: Christian Borntraeger 
> Signed-off-by: Halil Pasic 
> ---
> 
> We would prefer to land this for 5.2. If we do then commit 9445e1e15e
> ("virtio-blk-pci: default num_queues to -smp N") took care of all the
> necessary compat handling.
> 
> If that's not possible, I will send a version that does the necessary
> compat handling.

Right. At the moment the original patch has all the necessary compat handling
for 5.1 for all platforms. Adding multi-queue for s390x for > 5.2 would mean
that we would need to have additional platform specific compat handling and it
would increase the patch size unnecessarily. 
In the end you could consider this an enhancement (not calling it fix) of the
original patch.

Acked-by: Christian Borntraeger 
> ---
>  hw/s390x/virtio-ccw-blk.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/s390x/virtio-ccw-blk.c b/hw/s390x/virtio-ccw-blk.c
> index 2294ce1ce4..7296140dde 100644
> --- a/hw/s390x/virtio-ccw-blk.c
> +++ b/hw/s390x/virtio-ccw-blk.c
> @@ -10,6 +10,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "hw/boards.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/virtio/virtio.h"
>  #include "qapi/error.h"
> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
> *ccw_dev, Error **errp)
>  {
>  VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
>  DeviceState *vdev = DEVICE(&dev->vdev);
> +VirtIOBlkConf *conf = &dev->vdev.conf;
> +
> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> +conf->num_queues = MIN(4, current_machine->smp.cpus);
> +}
>  
>  qdev_realize(vdev, BUS(&ccw_dev->bus), errp);
>  }
> 
> base-commit: 2a190a7256a3e0563b29ffd67e0164097b4a6dac
> 



[PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues

2020-11-09 Thread Halil Pasic
Currently the default value of num_queues is effectively 1 for
virtio-blk-ccw. Recently 9445e1e15e ("virtio-blk-pci: default num_queues
to -smp N") changed the default for pci to the number of vcpus, citing
interrupt better locality and better performance as a rationale.

While virtio-blk-ccw does not yet benefit from better interrupt
locality, measurements have shown that for secure VMs multiqueue does
help with performance. Since the bounce buffering happens with the queue
lock held (in the guest) this is hardly a surprise.

As for non-secure VMs the extra queues shouldn't hurt too much.

Suggested-by: Christian Borntraeger 
Signed-off-by: Halil Pasic 
---

We would prefer to land this for 5.2. If we do then commit 9445e1e15e
("virtio-blk-pci: default num_queues to -smp N") took care of all the
necessary compat handling.

If that's not possible, I will send a version that does the necessary
compat handling.
---
 hw/s390x/virtio-ccw-blk.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/s390x/virtio-ccw-blk.c b/hw/s390x/virtio-ccw-blk.c
index 2294ce1ce4..7296140dde 100644
--- a/hw/s390x/virtio-ccw-blk.c
+++ b/hw/s390x/virtio-ccw-blk.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio.h"
 #include "qapi/error.h"
@@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, 
Error **errp)
 {
 VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev);
 DeviceState *vdev = DEVICE(&dev->vdev);
+VirtIOBlkConf *conf = &dev->vdev.conf;
+
+if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+conf->num_queues = MIN(4, current_machine->smp.cpus);
+}
 
 qdev_realize(vdev, BUS(&ccw_dev->bus), errp);
 }

base-commit: 2a190a7256a3e0563b29ffd67e0164097b4a6dac
-- 
2.17.1