Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-17 Thread Michael S. Tsirkin
On Wed, Jan 16, 2013 at 10:14:33AM -0600, Anthony Liguori wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
> >> Jason Wang  writes:
> >> 
> >> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> >> >> Jason Wang  writes:
> >> >>
> >> >>> Hello all:
> >> >>>
> >> >>> This seires is an update of last version of multiqueue virtio-net 
> >> >>> support.
> >> >>>
> >> >>> Recently, linux tap gets multiqueue support. This series implements 
> >> >>> basic
> >> >>> support for multiqueue tap, nic and vhost. Then use it as an 
> >> >>> infrastructure to
> >> >>> enable the multiqueue support for virtio-net.
> >> >>>
> >> >>> Both vhost and userspace multiqueue were implemented for virtio-net, 
> >> >>> but
> >> >>> userspace could be get much benefits since dataplane like parallized 
> >> >>> mechanism
> >> >>> were not implemented.
> >> >>>
> >> >>> User could start a multiqueue virtio-net card through adding a "queues"
> >> >>> parameter to tap.
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
> >> >>> virtio-net-pci,netdev=hn0
> >> >>>
> >> >>> Management tools such as libvirt can pass multiple pre-created fds 
> >> >>> through
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> >> >>> virtio-net-pci,netdev=hn0
> >> >> I'm confused/frightened that this syntax works.  You shouldn't be
> >> >> allowed to have two values for the same property.  Better to have a
> >> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
> >> >
> >> > Yes, but this what current a StringList type works for command line.
> >> > Some other parameters such as dnssearch, hostfwd and guestfwd have
> >> > already worked in this way. Looks like your suggestions need some
> >> > extension on QemuOps visitor, maybe we can do this on top.
> >> 
> >> It's a silly syntax and breaks compatibility.  This is valid syntax:
> >> 
> >> -net tap,fd=3,fd=4
> >> 
> >> In this case, it means 'fd=4' because the last fd overwrites the first
> >> one.
> >> 
> >> Now you've changed it to mean something else.  Having one thing mean
> >> something in one context, but something else in another context is
> >> terrible interface design.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Aha so just renaming the field 'fds' would address this issue?
> 
> No, you still have the problem of different meanings.
> 
> -netdev tap,fd=X,fd=Y
> 
> -netdev tap,fds=X,fds=Y
> 
> Would have wildly different behavior.

I think even caring about -net tap,fd=1,fd=2 is a bit silly.  If this
resulted in fd=2 by mistake, I don't think it was ever intentionally
legal.
As Jason points out we have list support and for better or worse
it is currently using repeated options, e.g. with dnssearch, hostfwd and
guestfwd.
Isn't it better to be consistent?

> Just do:
> 
> -netdev tap,fds=X:Y
> 
> And then we're staying consistent wrt the interpretation of multiple
> properties of the same name.
> 
> Regards,
> 
> Anthony Liguori

This introduces : as a special character. However fds can
be fd names passed in with getfd, where : is a legal character.

-- 
MST



Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Michael S. Tsirkin
On Wed, Jan 16, 2013 at 10:14:33AM -0600, Anthony Liguori wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
> >> Jason Wang  writes:
> >> 
> >> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> >> >> Jason Wang  writes:
> >> >>
> >> >>> Hello all:
> >> >>>
> >> >>> This seires is an update of last version of multiqueue virtio-net 
> >> >>> support.
> >> >>>
> >> >>> Recently, linux tap gets multiqueue support. This series implements 
> >> >>> basic
> >> >>> support for multiqueue tap, nic and vhost. Then use it as an 
> >> >>> infrastructure to
> >> >>> enable the multiqueue support for virtio-net.
> >> >>>
> >> >>> Both vhost and userspace multiqueue were implemented for virtio-net, 
> >> >>> but
> >> >>> userspace could be get much benefits since dataplane like parallized 
> >> >>> mechanism
> >> >>> were not implemented.
> >> >>>
> >> >>> User could start a multiqueue virtio-net card through adding a "queues"
> >> >>> parameter to tap.
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
> >> >>> virtio-net-pci,netdev=hn0
> >> >>>
> >> >>> Management tools such as libvirt can pass multiple pre-created fds 
> >> >>> through
> >> >>>
> >> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> >> >>> virtio-net-pci,netdev=hn0
> >> >> I'm confused/frightened that this syntax works.  You shouldn't be
> >> >> allowed to have two values for the same property.  Better to have a
> >> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
> >> >
> >> > Yes, but this what current a StringList type works for command line.
> >> > Some other parameters such as dnssearch, hostfwd and guestfwd have
> >> > already worked in this way. Looks like your suggestions need some
> >> > extension on QemuOps visitor, maybe we can do this on top.
> >> 
> >> It's a silly syntax and breaks compatibility.  This is valid syntax:
> >> 
> >> -net tap,fd=3,fd=4
> >> 
> >> In this case, it means 'fd=4' because the last fd overwrites the first
> >> one.
> >> 
> >> Now you've changed it to mean something else.  Having one thing mean
> >> something in one context, but something else in another context is
> >> terrible interface design.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Aha so just renaming the field 'fds' would address this issue?
> 
> No, you still have the problem of different meanings.
> 
> -netdev tap,fd=X,fd=Y
> 
> -netdev tap,fds=X,fds=Y
> 
> Would have wildly different behavior.

fd=X,fd=Y is more a bug than a feature. It could have failed
just as well.

> Just do:
> 
> -netdev tap,fds=X:Y
> 
> And then we're staying consistent wrt the interpretation of multiple
> properties of the same name.
> 
> Regards,
> 
> Anthony Liguori


Issue is ':' would only work for a list of numbers.
As Jason points out StringList is already used - do
we really want to invent yet another syntax for a list
that will work only for this case?

-- 
MST



Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
>> Jason Wang  writes:
>> 
>> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
>> >> Jason Wang  writes:
>> >>
>> >>> Hello all:
>> >>>
>> >>> This seires is an update of last version of multiqueue virtio-net 
>> >>> support.
>> >>>
>> >>> Recently, linux tap gets multiqueue support. This series implements basic
>> >>> support for multiqueue tap, nic and vhost. Then use it as an 
>> >>> infrastructure to
>> >>> enable the multiqueue support for virtio-net.
>> >>>
>> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>> >>> userspace could be get much benefits since dataplane like parallized 
>> >>> mechanism
>> >>> were not implemented.
>> >>>
>> >>> User could start a multiqueue virtio-net card through adding a "queues"
>> >>> parameter to tap.
>> >>>
>> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
>> >>> virtio-net-pci,netdev=hn0
>> >>>
>> >>> Management tools such as libvirt can pass multiple pre-created fds 
>> >>> through
>> >>>
>> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>> >>> virtio-net-pci,netdev=hn0
>> >> I'm confused/frightened that this syntax works.  You shouldn't be
>> >> allowed to have two values for the same property.  Better to have a
>> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
>> >
>> > Yes, but this what current a StringList type works for command line.
>> > Some other parameters such as dnssearch, hostfwd and guestfwd have
>> > already worked in this way. Looks like your suggestions need some
>> > extension on QemuOps visitor, maybe we can do this on top.
>> 
>> It's a silly syntax and breaks compatibility.  This is valid syntax:
>> 
>> -net tap,fd=3,fd=4
>> 
>> In this case, it means 'fd=4' because the last fd overwrites the first
>> one.
>> 
>> Now you've changed it to mean something else.  Having one thing mean
>> something in one context, but something else in another context is
>> terrible interface design.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Aha so just renaming the field 'fds' would address this issue?

No, you still have the problem of different meanings.

-netdev tap,fd=X,fd=Y

-netdev tap,fds=X,fds=Y

Would have wildly different behavior.

Just do:

-netdev tap,fds=X:Y

And then we're staying consistent wrt the interpretation of multiple
properties of the same name.

Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Michael S. Tsirkin
On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
> Jason Wang  writes:
> 
> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> >> Jason Wang  writes:
> >>
> >>> Hello all:
> >>>
> >>> This seires is an update of last version of multiqueue virtio-net support.
> >>>
> >>> Recently, linux tap gets multiqueue support. This series implements basic
> >>> support for multiqueue tap, nic and vhost. Then use it as an 
> >>> infrastructure to
> >>> enable the multiqueue support for virtio-net.
> >>>
> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but
> >>> userspace could be get much benefits since dataplane like parallized 
> >>> mechanism
> >>> were not implemented.
> >>>
> >>> User could start a multiqueue virtio-net card through adding a "queues"
> >>> parameter to tap.
> >>>
> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
> >>> virtio-net-pci,netdev=hn0
> >>>
> >>> Management tools such as libvirt can pass multiple pre-created fds through
> >>>
> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> >>> virtio-net-pci,netdev=hn0
> >> I'm confused/frightened that this syntax works.  You shouldn't be
> >> allowed to have two values for the same property.  Better to have a
> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
> >
> > Yes, but this what current a StringList type works for command line.
> > Some other parameters such as dnssearch, hostfwd and guestfwd have
> > already worked in this way. Looks like your suggestions need some
> > extension on QemuOps visitor, maybe we can do this on top.
> 
> It's a silly syntax and breaks compatibility.  This is valid syntax:
> 
> -net tap,fd=3,fd=4
> 
> In this case, it means 'fd=4' because the last fd overwrites the first
> one.
> 
> Now you've changed it to mean something else.  Having one thing mean
> something in one context, but something else in another context is
> terrible interface design.
> 
> Regards,
> 
> Anthony Liguori

Aha so just renaming the field 'fds' would address this issue?



Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Anthony Liguori
Jason Wang  writes:

> On 01/15/2013 03:44 AM, Anthony Liguori wrote:
>> Jason Wang  writes:
>>
>>> Hello all:
>>>
>>> This seires is an update of last version of multiqueue virtio-net support.
>>>
>>> Recently, linux tap gets multiqueue support. This series implements basic
>>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure 
>>> to
>>> enable the multiqueue support for virtio-net.
>>>
>>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>>> userspace could be get much benefits since dataplane like parallized 
>>> mechanism
>>> were not implemented.
>>>
>>> User could start a multiqueue virtio-net card through adding a "queues"
>>> parameter to tap.
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
>>> virtio-net-pci,netdev=hn0
>>>
>>> Management tools such as libvirt can pass multiple pre-created fds through
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>>> virtio-net-pci,netdev=hn0
>> I'm confused/frightened that this syntax works.  You shouldn't be
>> allowed to have two values for the same property.  Better to have a
>> syntax like fd[0]=X,fd[1]=Y or something along those lines.
>
> Yes, but this what current a StringList type works for command line.
> Some other parameters such as dnssearch, hostfwd and guestfwd have
> already worked in this way. Looks like your suggestions need some
> extension on QemuOps visitor, maybe we can do this on top.

It's a silly syntax and breaks compatibility.  This is valid syntax:

-net tap,fd=3,fd=4

In this case, it means 'fd=4' because the last fd overwrites the first
one.

Now you've changed it to mean something else.  Having one thing mean
something in one context, but something else in another context is
terrible interface design.

Regards,

Anthony Liguori

>
> Thanks
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> You can fetch and try the code from:
>>> git://github.com/jasowang/qemu.git
>>>
>>> Patch 1 adds a generic method of creating multiqueue taps and implement the
>>> linux part.
>>> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
>>> emulation codes to support multiqueue.
>>> Patch 5 introduces multiqueue support for qemu networking code: each peers 
>>> of
>>> NetClientState were abstracted as a queue. Though this, most of the codes 
>>> could
>>> be reusued without change.
>>> Patch 6 adds basic multiqueue support for vhost which could let vhost just
>>> handle a subset of all virtqueues.
>>> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
>>> virtio-net.
>>> Patch 9-12 implement the multiqueue support of virtio-net
>>>
>>> Changes from RFC v2:
>>> - rebase the codes to latest qemu
>>> - align the multiqueue virtio-net implementation to virtio spec
>>> - split the patches into more smaller patches
>>> - set_link and hotplug support
>>>
>>> Changes from RFC V1:
>>> - rebase to the latest
>>> - fix memory leak in parse_netdev
>>> - fix guest notifiers assignment/de-assignment
>>> - changes the command lines to:
>>>qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>>>
>>> Reference:
>>> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
>>> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>>>
>>> Perf Numbers:
>>>
>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>> Host/Guest kernel: David net tree
>>> vhost enabled
>>>
>>> - lots of improvents of both latency and cpu utilization in request-reponse 
>>> test
>>> - get regression of guest sending small packets which because TCP tends to 
>>> batch
>>>   less when the latency were improved
>>>
>>> 1q/2q/4q
>>> TCP_RR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
>>> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>>> TCP_CRR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
>>> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
>>> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>>> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>>> 64 50   28585.72 582.54 40576.7 

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-15 Thread Jason Wang
On 01/15/2013 03:44 AM, Anthony Liguori wrote:
> Jason Wang  writes:
>
>> Hello all:
>>
>> This seires is an update of last version of multiqueue virtio-net support.
>>
>> Recently, linux tap gets multiqueue support. This series implements basic
>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure 
>> to
>> enable the multiqueue support for virtio-net.
>>
>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>> userspace could be get much benefits since dataplane like parallized 
>> mechanism
>> were not implemented.
>>
>> User could start a multiqueue virtio-net card through adding a "queues"
>> parameter to tap.
>>
>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
>>
>> Management tools such as libvirt can pass multiple pre-created fds through
>>
>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>> virtio-net-pci,netdev=hn0
> I'm confused/frightened that this syntax works.  You shouldn't be
> allowed to have two values for the same property.  Better to have a
> syntax like fd[0]=X,fd[1]=Y or something along those lines.

Yes, but this what current a StringList type works for command line.
Some other parameters such as dnssearch, hostfwd and guestfwd have
already worked in this way. Looks like your suggestions need some
extension on QemuOps visitor, maybe we can do this on top.

Thanks
>
> Regards,
>
> Anthony Liguori
>
>> You can fetch and try the code from:
>> git://github.com/jasowang/qemu.git
>>
>> Patch 1 adds a generic method of creating multiqueue taps and implement the
>> linux part.
>> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
>> emulation codes to support multiqueue.
>> Patch 5 introduces multiqueue support for qemu networking code: each peers of
>> NetClientState were abstracted as a queue. Though this, most of the codes 
>> could
>> be reusued without change.
>> Patch 6 adds basic multiqueue support for vhost which could let vhost just
>> handle a subset of all virtqueues.
>> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
>> virtio-net.
>> Patch 9-12 implement the multiqueue support of virtio-net
>>
>> Changes from RFC v2:
>> - rebase the codes to latest qemu
>> - align the multiqueue virtio-net implementation to virtio spec
>> - split the patches into more smaller patches
>> - set_link and hotplug support
>>
>> Changes from RFC V1:
>> - rebase to the latest
>> - fix memory leak in parse_netdev
>> - fix guest notifiers assignment/de-assignment
>> - changes the command lines to:
>>qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>>
>> Reference:
>> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
>> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>>
>> Perf Numbers:
>>
>> Two Intel Xeon 5620 with direct connected intel 82599EB
>> Host/Guest kernel: David net tree
>> vhost enabled
>>
>> - lots of improvents of both latency and cpu utilization in request-reponse 
>> test
>> - get regression of guest sending small packets which because TCP tends to 
>> batch
>>   less when the latency were improved
>>
>> 1q/2q/4q
>> TCP_RR
>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
>> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>> TCP_CRR
>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
>> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
>> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>> TCP_STREAM guest receiving
>>  size #sessions throughput  norm throughput  norm throughput  norm
>> 1 1 16.27   1.33   16.11.12   16.13   0.99
>> 1 2 33.04   2.08   32.96   2.1

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-14 Thread Anthony Liguori
Jason Wang  writes:

> Hello all:
>
> This seires is an update of last version of multiqueue virtio-net support.
>
> Recently, linux tap gets multiqueue support. This series implements basic
> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
> enable the multiqueue support for virtio-net.
>
> Both vhost and userspace multiqueue were implemented for virtio-net, but
> userspace could be get much benefits since dataplane like parallized mechanism
> were not implemented.
>
> User could start a multiqueue virtio-net card through adding a "queues"
> parameter to tap.
>
> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
>
> Management tools such as libvirt can pass multiple pre-created fds through
>
> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> virtio-net-pci,netdev=hn0

I'm confused/frightened that this syntax works.  You shouldn't be
allowed to have two values for the same property.  Better to have a
syntax like fd[0]=X,fd[1]=Y or something along those lines.

Regards,

Anthony Liguori

>
> You can fetch and try the code from:
> git://github.com/jasowang/qemu.git
>
> Patch 1 adds a generic method of creating multiqueue taps and implement the
> linux part.
> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
> emulation codes to support multiqueue.
> Patch 5 introduces multiqueue support for qemu networking code: each peers of
> NetClientState were abstracted as a queue. Though this, most of the codes 
> could
> be reusued without change.
> Patch 6 adds basic multiqueue support for vhost which could let vhost just
> handle a subset of all virtqueues.
> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
> virtio-net.
> Patch 9-12 implement the multiqueue support of virtio-net
>
> Changes from RFC v2:
> - rebase the codes to latest qemu
> - align the multiqueue virtio-net implementation to virtio spec
> - split the patches into more smaller patches
> - set_link and hotplug support
>
> Changes from RFC V1:
> - rebase to the latest
> - fix memory leak in parse_netdev
> - fix guest notifiers assignment/de-assignment
> - changes the command lines to:
>qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>
> Reference:
> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>
> Perf Numbers:
>
> Two Intel Xeon 5620 with direct connected intel 82599EB
> Host/Guest kernel: David net tree
> vhost enabled
>
> - lots of improvents of both latency and cpu utilization in request-reponse 
> test
> - get regression of guest sending small packets which because TCP tends to 
> batch
>   less when the latency were improved
>
> 1q/2q/4q
> TCP_RR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> TCP_CRR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> TCP_STREAM guest receiving
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1 16.27   1.33   16.11.12   16.13   0.99
> 1 2 33.04   2.08   32.96   2.19   32.75   1.98
> 1 4 66.62   6.83   68.35.56   66.14   2.65
> 64 1896.55  56.67  914.02  58.14  898.9   61.56
> 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
> 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> 512 1   3592.43 165.24 3603.12 167.19 35

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-10 Thread Jason Wang
On 01/10/2013 07:49 PM, Stefan Hajnoczi wrote:
> On Thu, Jan 10, 2013 at 05:34:14PM +0800, Jason Wang wrote:
>> On 01/10/2013 04:44 PM, Stefan Hajnoczi wrote:
>>> On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
 On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
>> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
>>> Perf Numbers:
>>>
>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>> Host/Guest kernel: David net tree
>>> vhost enabled
>>>
>>> - lots of improvents of both latency and cpu utilization in 
>>> request-reponse test
>>> - get regression of guest sending small packets which because TCP tends 
>>> to batch
>>>   less when the latency were improved
>>>
>>> 1q/2q/4q
>>> TCP_RR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
>>> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>>> TCP_CRR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
>>> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
>>> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>>> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>>> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
>>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>>> TCP_STREAM guest receiving
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1 16.27   1.33   16.11.12   16.13   0.99
>>> 1 2 33.04   2.08   32.96   2.19   32.75   1.98
>>> 1 4 66.62   6.83   68.35.56   66.14   2.65
>>> 64 1896.55  56.67  914.02  58.14  898.9   61.56
>>> 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
>>> 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
>>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
>>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
>>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
>>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
>>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
>>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
>>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
>>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
>>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
>>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
>>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
>>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
>>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
>>> 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
>>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
>>> TCP_MAERTS guest sending
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1 15.94   0.62   15.55   0.61   15.13   0.59
>>> 1 2 36.11   0.83   32.46   0.69   32.28   0.69
>>> 1 4 71.59   1  68.91   0.94   61.52   0.77
>>> 64 1630.71  22.52  622.11  22.35  605.09  21.84
>>> 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
>>> 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
>>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
>>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
>>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
>>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
>>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
>>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
>>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
>>> 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
>>> 1024 4  9430.66 290.44 9499.29 232.

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-10 Thread Stefan Hajnoczi
On Thu, Jan 10, 2013 at 05:34:14PM +0800, Jason Wang wrote:
> On 01/10/2013 04:44 PM, Stefan Hajnoczi wrote:
> > On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
> >> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
>  On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> > Perf Numbers:
> >
> > Two Intel Xeon 5620 with direct connected intel 82599EB
> > Host/Guest kernel: David net tree
> > vhost enabled
> >
> > - lots of improvents of both latency and cpu utilization in 
> > request-reponse test
> > - get regression of guest sending small packets which because TCP tends 
> > to batch
> >   less when the latency were improved
> >
> > 1q/2q/4q
> > TCP_RR
> >  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> > 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
> > 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> > 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> > 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> > 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
> > 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> > 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
> > 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> > 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> > 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> > 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> > 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> > TCP_CRR
> >  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> > 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
> > 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
> > 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
> > 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> > 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
> > 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> > 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> > 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> > 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> > 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> > 256 50  28354.7  579.85 40578.31 60760261.71 657.87
> > 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> > TCP_STREAM guest receiving
> >  size #sessions throughput  norm throughput  norm throughput  norm
> > 1 1 16.27   1.33   16.11.12   16.13   0.99
> > 1 2 33.04   2.08   32.96   2.19   32.75   1.98
> > 1 4 66.62   6.83   68.35.56   66.14   2.65
> > 64 1896.55  56.67  914.02  58.14  898.9   61.56
> > 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
> > 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
> > 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> > 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> > 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> > 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> > 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> > 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> > 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> > 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> > 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> > 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> > 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> > 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> > 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> > 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
> > 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> > TCP_MAERTS guest sending
> >  size #sessions throughput  norm throughput  norm throughput  norm
> > 1 1 15.94   0.62   15.55   0.61   15.13   0.59
> > 1 2 36.11   0.83   32.46   0.69   32.28   0.69
> > 1 4 71.59   1  68.91   0.94   61.52   0.77
> > 64 1630.71  22.52  622.11  22.35  605.09  21.84
> > 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
> > 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
> > 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> > 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> > 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> > 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> > 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> > 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> > 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> > 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
> > 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> > 4096 1  9339.28 296.48 9

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-10 Thread Jason Wang
On 01/10/2013 04:44 PM, Stefan Hajnoczi wrote:
> On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
>> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
>>> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
 On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> Perf Numbers:
>
> Two Intel Xeon 5620 with direct connected intel 82599EB
> Host/Guest kernel: David net tree
> vhost enabled
>
> - lots of improvents of both latency and cpu utilization in 
> request-reponse test
> - get regression of guest sending small packets which because TCP tends 
> to batch
>   less when the latency were improved
>
> 1q/2q/4q
> TCP_RR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> TCP_CRR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> TCP_STREAM guest receiving
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1 16.27   1.33   16.11.12   16.13   0.99
> 1 2 33.04   2.08   32.96   2.19   32.75   1.98
> 1 4 66.62   6.83   68.35.56   66.14   2.65
> 64 1896.55  56.67  914.02  58.14  898.9   61.56
> 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
> 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> TCP_MAERTS guest sending
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1 15.94   0.62   15.55   0.61   15.13   0.59
> 1 2 36.11   0.83   32.46   0.69   32.28   0.69
> 1 4 71.59   1  68.91   0.94   61.52   0.77
> 64 1630.71  22.52  622.11  22.35  605.09  21.84
> 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
> 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> 16384 2 9367.69 406.93

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-10 Thread Stefan Hajnoczi
On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
> >> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> >>> Perf Numbers:
> >>>
> >>> Two Intel Xeon 5620 with direct connected intel 82599EB
> >>> Host/Guest kernel: David net tree
> >>> vhost enabled
> >>>
> >>> - lots of improvents of both latency and cpu utilization in 
> >>> request-reponse test
> >>> - get regression of guest sending small packets which because TCP tends 
> >>> to batch
> >>>   less when the latency were improved
> >>>
> >>> 1q/2q/4q
> >>> TCP_RR
> >>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
> >>> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> >>> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> >>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> >>> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
> >>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> >>> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
> >>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> >>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> >>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> >>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> >>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> >>> TCP_CRR
> >>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
> >>> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
> >>> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
> >>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> >>> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
> >>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> >>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> >>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> >>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> >>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> >>> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
> >>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> >>> TCP_STREAM guest receiving
> >>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>> 1 1 16.27   1.33   16.11.12   16.13   0.99
> >>> 1 2 33.04   2.08   32.96   2.19   32.75   1.98
> >>> 1 4 66.62   6.83   68.35.56   66.14   2.65
> >>> 64 1896.55  56.67  914.02  58.14  898.9   61.56
> >>> 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
> >>> 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
> >>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> >>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> >>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> >>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> >>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> >>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> >>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> >>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> >>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> >>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> >>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> >>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> >>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> >>> 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
> >>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> >>> TCP_MAERTS guest sending
> >>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>> 1 1 15.94   0.62   15.55   0.61   15.13   0.59
> >>> 1 2 36.11   0.83   32.46   0.69   32.28   0.69
> >>> 1 4 71.59   1  68.91   0.94   61.52   0.77
> >>> 64 1630.71  22.52  622.11  22.35  605.09  21.84
> >>> 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
> >>> 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
> >>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> >>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> >>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> >>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> >>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> >>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> >>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> >>> 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
> >>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> >>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> >>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> >>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> >>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> >>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> >>> 16384 4 9391.9

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> > Perf Numbers:
> > 
> > Two Intel Xeon 5620 with direct connected intel 82599EB
> > Host/Guest kernel: David net tree
> > vhost enabled
> > 
> > - lots of improvents of both latency and cpu utilization in request-reponse 
> > test
> > - get regression of guest sending small packets which because TCP tends to 
> > batch
> >   less when the latency were improved
> > 
> > 1q/2q/4q
> > TCP_RR
> >  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> > 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
> > 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> > 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> > 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> > 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
> > 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> > 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
> > 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> > 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> > 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> > 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> > 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> > TCP_CRR
> >  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> > 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
> > 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
> > 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
> > 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> > 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
> > 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> > 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> > 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> > 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> > 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> > 256 50  28354.7  579.85 40578.31 60760261.71 657.87
> > 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> > TCP_STREAM guest receiving
> >  size #sessions throughput  norm throughput  norm throughput  norm
> > 1 1 16.27   1.33   16.11.12   16.13   0.99
> > 1 2 33.04   2.08   32.96   2.19   32.75   1.98
> > 1 4 66.62   6.83   68.35.56   66.14   2.65
> > 64 1896.55  56.67  914.02  58.14  898.9   61.56
> > 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
> > 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
> > 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> > 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> > 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> > 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> > 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> > 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> > 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> > 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> > 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> > 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> > 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> > 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> > 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> > 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
> > 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> > TCP_MAERTS guest sending
> >  size #sessions throughput  norm throughput  norm throughput  norm
> > 1 1 15.94   0.62   15.55   0.61   15.13   0.59
> > 1 2 36.11   0.83   32.46   0.69   32.28   0.69
> > 1 4 71.59   1  68.91   0.94   61.52   0.77
> > 64 1630.71  22.52  622.11  22.35  605.09  21.84
> > 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
> > 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
> > 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> > 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> > 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> > 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> > 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> > 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> > 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> > 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
> > 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> > 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> > 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> > 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> > 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> > 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> > 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
> 
> Trying to understand the performance results:
> 
> What is the host device configuration?  tap + bridge?
> 
> Did you use host CPU affinity for the vhost threads?
> 
> Can multiqueue tap take advantage of multiqueue host NICs or is
> virtio-net m

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-09 Thread Jason Wang
On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
>> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
>>> Perf Numbers:
>>>
>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>> Host/Guest kernel: David net tree
>>> vhost enabled
>>>
>>> - lots of improvents of both latency and cpu utilization in request-reponse 
>>> test
>>> - get regression of guest sending small packets which because TCP tends to 
>>> batch
>>>   less when the latency were improved
>>>
>>> 1q/2q/4q
>>> TCP_RR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
>>> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
>>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
>>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
>>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
>>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
>>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
>>> TCP_CRR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
>>> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
>>> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
>>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
>>> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
>>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
>>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
>>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
>>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
>>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
>>> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
>>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
>>> TCP_STREAM guest receiving
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1 16.27   1.33   16.11.12   16.13   0.99
>>> 1 2 33.04   2.08   32.96   2.19   32.75   1.98
>>> 1 4 66.62   6.83   68.35.56   66.14   2.65
>>> 64 1896.55  56.67  914.02  58.14  898.9   61.56
>>> 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
>>> 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
>>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
>>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
>>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
>>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
>>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
>>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
>>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
>>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
>>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
>>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
>>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
>>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
>>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
>>> 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
>>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
>>> TCP_MAERTS guest sending
>>>  size #sessions throughput  norm throughput  norm throughput  norm
>>> 1 1 15.94   0.62   15.55   0.61   15.13   0.59
>>> 1 2 36.11   0.83   32.46   0.69   32.28   0.69
>>> 1 4 71.59   1  68.91   0.94   61.52   0.77
>>> 64 1630.71  22.52  622.11  22.35  605.09  21.84
>>> 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
>>> 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
>>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
>>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
>>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
>>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
>>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
>>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
>>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
>>> 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
>>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
>>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
>>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
>>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
>>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
>>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
>>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
>> Trying to understand the performance results:
>>
>> What is the host device configuration?  tap + bridge?

Yes.
>>
>> Did you use host CPU affinity for the vhost threads?

I use numactl to pin cpu t

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-09 Thread Stefan Hajnoczi
On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> Perf Numbers:
> 
> Two Intel Xeon 5620 with direct connected intel 82599EB
> Host/Guest kernel: David net tree
> vhost enabled
> 
> - lots of improvents of both latency and cpu utilization in request-reponse 
> test
> - get regression of guest sending small packets which because TCP tends to 
> batch
>   less when the latency were improved
> 
> 1q/2q/4q
> TCP_RR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> TCP_CRR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> TCP_STREAM guest receiving
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1 16.27   1.33   16.11.12   16.13   0.99
> 1 2 33.04   2.08   32.96   2.19   32.75   1.98
> 1 4 66.62   6.83   68.35.56   66.14   2.65
> 64 1896.55  56.67  914.02  58.14  898.9   61.56
> 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
> 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> TCP_MAERTS guest sending
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1 15.94   0.62   15.55   0.61   15.13   0.59
> 1 2 36.11   0.83   32.46   0.69   32.28   0.69
> 1 4 71.59   1  68.91   0.94   61.52   0.77
> 64 1630.71  22.52  622.11  22.35  605.09  21.84
> 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
> 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47

Trying to understand the performance results:

What is the host device configuration?  tap + bridge?

Did you use host CPU affinity for the vhost threads?

Can multiqueue tap take advantage of multiqueue host NICs or is
virtio-net multiqueue unaware of the physical NIC multiqueue
capabilities?

The results seem pretty mixed - as a user it's not obvious what to
choose as a good all-round setting.  Any observations on how multiqueue
should be configured?

What is the "norm" statistic?

St

[Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2012-12-28 Thread Jason Wang
Hello all:

This seires is an update of last version of multiqueue virtio-net support.

Recently, linux tap gets multiqueue support. This series implements basic
support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
enable the multiqueue support for virtio-net.

Both vhost and userspace multiqueue were implemented for virtio-net, but
userspace could be get much benefits since dataplane like parallized mechanism
were not implemented.

User could start a multiqueue virtio-net card through adding a "queues"
parameter to tap.

./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0

Management tools such as libvirt can pass multiple pre-created fds through

./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device virtio-net-pci,netdev=hn0

You can fetch and try the code from:
git://github.com/jasowang/qemu.git

Patch 1 adds a generic method of creating multiqueue taps and implement the
linux part.
Patch 2 - 4 introduce some helpers which could be used to refactor the nic
emulation codes to support multiqueue.
Patch 5 introduces multiqueue support for qemu networking code: each peers of
NetClientState were abstracted as a queue. Though this, most of the codes could
be reusued without change.
Patch 6 adds basic multiqueue support for vhost which could let vhost just
handle a subset of all virtqueues.
Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
virtio-net.
Patch 9-12 implement the multiqueue support of virtio-net

Changes from RFC v2:
- rebase the codes to latest qemu
- align the multiqueue virtio-net implementation to virtio spec
- split the patches into more smaller patches
- set_link and hotplug support

Changes from RFC V1:
- rebase to the latest
- fix memory leak in parse_netdev
- fix guest notifiers assignment/de-assignment
- changes the command lines to:
   qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2

Reference:
v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481

Perf Numbers:

Two Intel Xeon 5620 with direct connected intel 82599EB
Host/Guest kernel: David net tree
vhost enabled

- lots of improvents of both latency and cpu utilization in request-reponse test
- get regression of guest sending small packets which because TCP tends to batch
  less when the latency were improved

1q/2q/4q
TCP_RR
 size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
64 19453.42   632.33  9371.37   616.13  9338.19   615.97
64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
TCP_CRR
 size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
64 12780.08  159.4  2201.07  127.96 2006.8   117.63
64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
64 100  28747.37 584.17 49081.87 667.87 60612.94 662
256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
256 50  28354.7  579.85 40578.31 60760261.71 657.87
256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
TCP_STREAM guest receiving
 size #sessions throughput  norm throughput  norm throughput  norm
1 1 16.27   1.33   16.11.12   16.13   0.99
1 2 33.04   2.08   32.96   2.19   32.75   1.98
1 4 66.62   6.83   68.35.56   66.14   2.65
64 1896.55  56.67  914.02  58.14  898.9   61.56
64 21830.46 91.02  1812.02 64.59  1835.57 66.26
64 43626.61 142.55 3636.25 100.64 3607.46 75.03
256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
16384 1 7795.