Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-08 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 03/08/2011 09:11:04 PM:

> Also, could you post your current version of the qemu code pls?
> It's useful for testing and to see the whole picture.

Sorry for the delay on this.

I am attaching the qemu changes. Some parts of the patch are
completely redundant, eg MAX_TUN_DEVICES and I will remove it
later.

It works with latest qemu and the kernel patch sent earlier.

Please let me know if there are any issues.

thanks,

- KK


(See attached file: qemu.patch)

qemu.patch
Description: Binary data


Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2011 at 09:02:38PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin"  wrote on 03/02/2011 03:36:00 PM:
> 
> Sorry for the delayed response, I have been sick the last few days.
> I am responding to both your posts here.
> 
> > > Both virtio-net and vhost need some check to make sure very
> > > high values are not passed by userspace. Is this not required?
> >
> > Whatever we stick in the header is effectively part of
> > host/gues interface. Are you sure we'll never want
> > more than 16 VQs? This value does not seem that high.
> 
> OK, so even constants cannot change?

Parse error :).

> Given that, should I remove all
> checks and use kcalloc?

I think that it's not a bad idea to avoid crashing if
hardware (in our case, host) misbehaves.
But as long as the code is prepared to handle any # of vqs,
I don't see a point of limiting that arbitrarily,
and if the code to handle hardware errors is complex
it'll have bugs itself.

> > > OK, so virtio_net_config has num_queue_pairs, and this gets converted to
> > > numtxqs in virtnet_info?
> >
> > Or put num_queue_pairs in virtnet_info too.
> 
> For virtnet_info, having numtxqs is easier since all code that loops needs
> only 'numtxq'.

It seemed to me that the code used numtxqs for # of rx qs as well
sometimes.

> > > Also, vhost has some
> > > code that processes tx first before rx (e.g. vhost_net_stop/flush),
> >
> > No idea why did I do it this way. I don't think it matters.
> >
> > > so this approach seemed helpful.
> > > I am OK either way, what do you
> > > suggest?
> >
> > We get less code generated but also less flexibility.
> > I am not sure, I'll play around with code, for now
> > let's keep it as is.
> 
> OK.
> 
> > > Yes, it is a waste to have these vectors for tx ints. I initially
> > > thought of adding a flag to virtio_device to pass to vp_find_vqs,
> > > but it won't work, so a new API is needed. I can work with you on
> > > this in the background if you like.
> >
> > OK. For starters, how about we change find_vqs to get a structure?  Then
> > we can easily add flags that tell us that some interrupts are rare.
> 
> Yes. OK to work on this outside this patch series, I guess?

We could work on this in parallel. Changing APIs is not a problem,
I'm a bit concerned that this might affect the host/guest ABI as well.

> > > vq's are matched between qemu, virtio-net and vhost. Isn't some check
> > > required that userspace has not passed a bad value?
> >
> >
> > For virtio, I'm not too concerned: qemu can already easily
> > crash the guest :)
> > For vhost yes, but I'm concerned that even with 16 VQs we are
> > drinking a lot of resources already. I would be happier
> > if we had a file descriptor per VQs pair in some way.
> > The the amount of memory userspace can use up is
> > limited by the # of file descriptors.
> 
> I will start working on this approach this week and see how it goes.

Also, could you post your current version of the qemu code pls?
It's useful for testing and to see the whole picture.

> > > OK, so define free_unused_bufs() as:
> > >
> > > static void free_unused_bufs(struct virtnet_info *vi, struct virtqueue
> > > *svq,
> > > struct virtqueue *rvq)
> > > {
> > >/* Use svq and rvq with the remaining code unchanged */
> > > }
> >
> > Not sure I understand. I am just suggesting
> > adding symmetrical functions like init/cleanup
> > alloc/free etc instead of adding stuff in random
> > functions that just happens to be called at the right time.
> 
> OK, I will clean up this part in the next revision.
> 
> > > I was not sure what is the best way - a sysctl parameter? Or should the
> > > maximum depend on number of host cpus? But that results in too many
> > > threads, e.g. if I have 16 cpus and 16 txqs.
> >
> > I guess the question is, wouldn't # of threads == # of vqs work best?
> > If we process stuff on a single CPU, let's make it pass through
> > a single VQ.
> > And to do this, we could simply open multiple vhost fds without
> > changing vhost at all.
> >
> > Would this work well?
> >
> > > > > -   enum vhost_net_poll_state 
> > > > > tx_poll_state;
> > > > > +   enum vhost_net_poll_state 
> > > > > *tx_poll_state;
> > > >
> > > > another array?
> > >
> > > Yes... I am also allocating twice the space than what is required
> > > to make it's usage simple.
> >
> > Where's the allocation? Couldn't find it.
> 
> vhost_setup_vqs(net.c) allocates it based on nvqs, though numtxqs is
> enough.
> 
> Thanks,
> 
> - KK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-08 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 03/02/2011 03:36:00 PM:

Sorry for the delayed response, I have been sick the last few days.
I am responding to both your posts here.

> > Both virtio-net and vhost need some check to make sure very
> > high values are not passed by userspace. Is this not required?
>
> Whatever we stick in the header is effectively part of
> host/gues interface. Are you sure we'll never want
> more than 16 VQs? This value does not seem that high.

OK, so even constants cannot change?  Given that, should I remove all
checks and use kcalloc?

> > OK, so virtio_net_config has num_queue_pairs, and this gets converted to
> > numtxqs in virtnet_info?
>
> Or put num_queue_pairs in virtnet_info too.

For virtnet_info, having numtxqs is easier since all code that loops needs
only 'numtxq'.

> > Also, vhost has some
> > code that processes tx first before rx (e.g. vhost_net_stop/flush),
>
> No idea why did I do it this way. I don't think it matters.
>
> > so this approach seemed helpful.
> > I am OK either way, what do you
> > suggest?
>
> We get less code generated but also less flexibility.
> I am not sure, I'll play around with code, for now
> let's keep it as is.

OK.

> > Yes, it is a waste to have these vectors for tx ints. I initially
> > thought of adding a flag to virtio_device to pass to vp_find_vqs,
> > but it won't work, so a new API is needed. I can work with you on
> > this in the background if you like.
>
> OK. For starters, how about we change find_vqs to get a structure?  Then
> we can easily add flags that tell us that some interrupts are rare.

Yes. OK to work on this outside this patch series, I guess?

> > vq's are matched between qemu, virtio-net and vhost. Isn't some check
> > required that userspace has not passed a bad value?
>
>
> For virtio, I'm not too concerned: qemu can already easily
> crash the guest :)
> For vhost yes, but I'm concerned that even with 16 VQs we are
> drinking a lot of resources already. I would be happier
> if we had a file descriptor per VQs pair in some way.
> The the amount of memory userspace can use up is
> limited by the # of file descriptors.

I will start working on this approach this week and see how it goes.

> > OK, so define free_unused_bufs() as:
> >
> > static void free_unused_bufs(struct virtnet_info *vi, struct virtqueue
> > *svq,
> >   struct virtqueue *rvq)
> > {
> >  /* Use svq and rvq with the remaining code unchanged */
> > }
>
> Not sure I understand. I am just suggesting
> adding symmetrical functions like init/cleanup
> alloc/free etc instead of adding stuff in random
> functions that just happens to be called at the right time.

OK, I will clean up this part in the next revision.

> > I was not sure what is the best way - a sysctl parameter? Or should the
> > maximum depend on number of host cpus? But that results in too many
> > threads, e.g. if I have 16 cpus and 16 txqs.
>
> I guess the question is, wouldn't # of threads == # of vqs work best?
> If we process stuff on a single CPU, let's make it pass through
> a single VQ.
> And to do this, we could simply open multiple vhost fds without
> changing vhost at all.
>
> Would this work well?
>
> > > > - enum vhost_net_poll_state 
> > > > tx_poll_state;
> > > > + enum vhost_net_poll_state 
> > > > *tx_poll_state;
> > >
> > > another array?
> >
> > Yes... I am also allocating twice the space than what is required
> > to make it's usage simple.
>
> Where's the allocation? Couldn't find it.

vhost_setup_vqs(net.c) allocates it based on nvqs, though numtxqs is
enough.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-02 Thread Michael S. Tsirkin
On Tue, Mar 01, 2011 at 09:32:56PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin"  wrote on 02/28/2011 03:13:20 PM:
> 
> Thank you once again for your feedback on both these patches.
> I will send the qemu patch tomorrow. I will also send the next
> version incorporating these suggestions once we finalize some
> minor points.
> 
> > Overall looks good.
> > The numtxqs meaning the number of rx queues needs some cleanup.
> > init/cleanup routines need more symmetry.
> > Error handling on setup also seems slightly buggy or at least
> asymmetrical.
> > Finally, this will use up a large number of MSI vectors,
> > while TX interrupts mostly stay unused.
> >
> > Some comments below.
> >
> > > +/* Maximum number of individual RX/TX queues supported */
> > > +#define VIRTIO_MAX_TXQS   16
> > > +
> >
> > This also does not seem to belong in the header.
> 
> Both virtio-net and vhost need some check to make sure very
> high values are not passed by userspace. Is this not required?

Whatever we stick in the header is effectively part of
host/gues interface. Are you sure we'll never want
more than 16 VQs? This value does not seem that high.

> > > +#define VIRTIO_NET_F_NUMTXQS  21  /* Device 
> > > supports multiple
> TX queue */
> >
> > VIRTIO_NET_F_MULTIQUEUE ?
> 
> Yes, that's a better name.
> 
> > > @@ -34,6 +38,8 @@ struct virtio_net_config {
> > >__u8 mac[6];
> > >/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> > >__u16 status;
> > > +  /* number of RX/TX queues */
> > > +  __u16 numtxqs;
> >
> > The interface here is a bit ugly:
> > - this is really both # of tx and rx queues but called numtxqs
> > - there's a hardcoded max value
> > - 0 is assumed to be same as 1
> > - assumptions above are undocumented.
> >
> > One way to address this could be num_queue_pairs, and something like
> >  /* The actual number of TX and RX queues is num_queue_pairs +
> 1 each. */
> >  __u16 num_queue_pairs;
> > (and tweak code to match).
> >
> > Alternatively, have separate registers for the number of tx and rx
> queues.
> 
> OK, so virtio_net_config has num_queue_pairs, and this gets converted to
> numtxqs in virtnet_info?

Or put num_queue_pairs in virtnet_info too.

> > > +struct virtnet_info {
> > > +  struct send_queue **sq;
> > > +  struct receive_queue **rq;
> > > +
> > > +  /* read-mostly variables */
> > > +  int numtxqs cacheline_aligned_in_smp;
> >
> > Why do you think this alignment is a win?
> 
> Actually this code was from the earlier patchset (MQ TX only) where
> the layout was different. Now rq and sq are allocated as follows:
>   vi->sq = kzalloc(numtxqs * sizeof(*vi->sq), GFP_KERNEL);
>   for (i = 0; i < numtxqs; i++) {
>   vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
> Since the two pointers becomes read-only during use, there is no cache
> line dirty'ing.  I will remove this directive.
> 
> > > +/*
> > > + * Note for 'qnum' below:
> > > + *first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX.
> > > + */
> >
> > Another option to consider is to have them RX,TX,RX,TX:
> > this way vq->queue_index / 2 gives you the
> > queue pair number, no need to read numtxqs. On the other hand, it makes
> the
> > #RX==#TX assumption even more entrenched.
> 
> OK. I was following how many drivers were allocating RX and TX's
> together - eg ixgbe_adapter has tx_ring and rx_ring arrays; bnx2
> has rx_buf_ring and tx_buf_ring arrays, etc.

That's fine. I am only talking about the VQ numbers.

> Also, vhost has some
> code that processes tx first before rx (e.g. vhost_net_stop/flush),

No idea why did I do it this way. I don't think it matters.

> so this approach seemed helpful.
> I am OK either way, what do you
> suggest?

We get less code generated but also less flexibility.
I am not sure, I'll play around with code, for now
let's keep it as is.


> > > +  err = vi->vdev->config->find_vqs(vi->vdev, totalvqs, vqs,
> callbacks,
> > > + 
> > >   (const char
> **)names);
> > > +  if (err)
> > > +  goto free_params;
> > > +
> >
> > This would use up quite a lot of vectors. However,
> > tx interrupt is, in fact, slow path. So, assuming we don't have
> > enough vectors to use per vq, I think it's a good idea to
> > support reducing MSI vector usage by mapping all TX VQs to the same
> vector
> > and separate vectors for RX.
> > The hypervisor actually allows this, but we don't have an API at the
> virtio
> > level to pass that info to virtio pci ATM.
> > Any idea what a good API to use would be?
> 
> Yes, it is a waste to have these vectors for tx ints. I initially
> thought of adding a flag to virtio_device to pass to vp_find_vqs,
> but it won't work, so a new API is needed. I can work with you on
> this in the ba

Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-01 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 02/28/2011 03:13:20 PM:

Thank you once again for your feedback on both these patches.
I will send the qemu patch tomorrow. I will also send the next
version incorporating these suggestions once we finalize some
minor points.

> Overall looks good.
> The numtxqs meaning the number of rx queues needs some cleanup.
> init/cleanup routines need more symmetry.
> Error handling on setup also seems slightly buggy or at least
asymmetrical.
> Finally, this will use up a large number of MSI vectors,
> while TX interrupts mostly stay unused.
>
> Some comments below.
>
> > +/* Maximum number of individual RX/TX queues supported */
> > +#define VIRTIO_MAX_TXQS 16
> > +
>
> This also does not seem to belong in the header.

Both virtio-net and vhost need some check to make sure very
high values are not passed by userspace. Is this not required?

> > +#define VIRTIO_NET_F_NUMTXQS21  /* Device 
> > supports multiple
TX queue */
>
> VIRTIO_NET_F_MULTIQUEUE ?

Yes, that's a better name.

> > @@ -34,6 +38,8 @@ struct virtio_net_config {
> >  __u8 mac[6];
> >  /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> >  __u16 status;
> > +/* number of RX/TX queues */
> > +__u16 numtxqs;
>
> The interface here is a bit ugly:
> - this is really both # of tx and rx queues but called numtxqs
> - there's a hardcoded max value
> - 0 is assumed to be same as 1
> - assumptions above are undocumented.
>
> One way to address this could be num_queue_pairs, and something like
>/* The actual number of TX and RX queues is num_queue_pairs +
1 each. */
>__u16 num_queue_pairs;
> (and tweak code to match).
>
> Alternatively, have separate registers for the number of tx and rx
queues.

OK, so virtio_net_config has num_queue_pairs, and this gets converted to
numtxqs in virtnet_info?

> > +struct virtnet_info {
> > +struct send_queue **sq;
> > +struct receive_queue **rq;
> > +
> > +/* read-mostly variables */
> > +int numtxqs cacheline_aligned_in_smp;
>
> Why do you think this alignment is a win?

Actually this code was from the earlier patchset (MQ TX only) where
the layout was different. Now rq and sq are allocated as follows:
vi->sq = kzalloc(numtxqs * sizeof(*vi->sq), GFP_KERNEL);
for (i = 0; i < numtxqs; i++) {
vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
Since the two pointers becomes read-only during use, there is no cache
line dirty'ing.  I will remove this directive.

> > +/*
> > + * Note for 'qnum' below:
> > + *  first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX.
> > + */
>
> Another option to consider is to have them RX,TX,RX,TX:
> this way vq->queue_index / 2 gives you the
> queue pair number, no need to read numtxqs. On the other hand, it makes
the
> #RX==#TX assumption even more entrenched.

OK. I was following how many drivers were allocating RX and TX's
together - eg ixgbe_adapter has tx_ring and rx_ring arrays; bnx2
has rx_buf_ring and tx_buf_ring arrays, etc. Also, vhost has some
code that processes tx first before rx (e.g. vhost_net_stop/flush),
so this approach seemed helpful. I am OK either way, what do you
suggest?

> > +err = vi->vdev->config->find_vqs(vi->vdev, totalvqs, vqs,
callbacks,
> > +   
> >   (const char
**)names);
> > +if (err)
> > +goto free_params;
> > +
>
> This would use up quite a lot of vectors. However,
> tx interrupt is, in fact, slow path. So, assuming we don't have
> enough vectors to use per vq, I think it's a good idea to
> support reducing MSI vector usage by mapping all TX VQs to the same
vector
> and separate vectors for RX.
> The hypervisor actually allows this, but we don't have an API at the
virtio
> level to pass that info to virtio pci ATM.
> Any idea what a good API to use would be?

Yes, it is a waste to have these vectors for tx ints. I initially
thought of adding a flag to virtio_device to pass to vp_find_vqs,
but it won't work, so a new API is needed. I can work with you on
this in the background if you like.

> > +for (i = 0; i < numtxqs; i++) {
> > +vi->rq[i]->rvq = vqs[i];
> > +vi->sq[i]->svq = vqs[i + numtxqs];
>
> This logic is spread all over. We need some kind of macro to
> get queue number of vq number and back.

Will add this.

> > +if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
> > +vi->cvq = vqs[i + numtxqs];
> > +
> > +if (virtio_has_feature(vi->vdev,
VIRTIO_NET_F_CTRL_VLAN))
> > +vi->dev->features |=
NETIF_F_HW_VLAN_FILTER;
>
> This bit does not seem to belong in initialize_vqs.

I will mo

Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-02-28 Thread Michael S. Tsirkin
On Mon, Feb 28, 2011 at 12:04:37PM +0530, Krishna Kumar wrote:
> Implement mq virtio-net driver. 
> 
> Though struct virtio_net_config changes, it works with the old
> qemu since the last element is not accessed unless qemu sets
> VIRTIO_NET_F_NUMTXQS.  Patch also adds a macro for the maximum
> number of TX vq's (VIRTIO_MAX_TXQS) that the user can specify.
> 
> Signed-off-by: Krishna Kumar 

Overall looks good.
The numtxqs meaning the number of rx queues needs some cleanup.
init/cleanup routines need more symmetry.
Error handling on setup also seems slightly buggy or at least asymmetrical.
Finally, this will use up a large number of MSI vectors,
while TX interrupts mostly stay unused.

Some comments below.

> ---
>  drivers/net/virtio_net.c   |  543 ---
>  include/linux/virtio_net.h |6 
>  2 files changed, 386 insertions(+), 163 deletions(-)
> 
> diff -ruNp org/include/linux/virtio_net.h new/include/linux/virtio_net.h
> --- org/include/linux/virtio_net.h2010-10-11 10:20:22.0 +0530
> +++ new/include/linux/virtio_net.h2011-02-25 16:24:15.0 +0530
> @@ -7,6 +7,9 @@
>  #include 
>  #include 
>  
> +/* Maximum number of individual RX/TX queues supported */
> +#define VIRTIO_MAX_TXQS  16
> +

This also does not seem to belong in the header.

>  /* The feature bitmap for virtio net */
>  #define VIRTIO_NET_F_CSUM0   /* Host handles pkts w/ partial csum */
>  #define VIRTIO_NET_F_GUEST_CSUM  1   /* Guest handles pkts w/ 
> partial csum */
> @@ -26,6 +29,7 @@
>  #define VIRTIO_NET_F_CTRL_RX 18  /* Control channel RX mode support */
>  #define VIRTIO_NET_F_CTRL_VLAN   19  /* Control channel VLAN 
> filtering */
>  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20/* Extra RX mode control 
> support */
> +#define VIRTIO_NET_F_NUMTXQS 21  /* Device supports multiple TX queue */

VIRTIO_NET_F_MULTIQUEUE ?

>  
>  #define VIRTIO_NET_S_LINK_UP 1   /* Link is up */
>  
> @@ -34,6 +38,8 @@ struct virtio_net_config {
>   __u8 mac[6];
>   /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
>   __u16 status;
> + /* number of RX/TX queues */
> + __u16 numtxqs;


The interface here is a bit ugly:
- this is really both # of tx and rx queues but called numtxqs
- there's a hardcoded max value
- 0 is assumed to be same as 1
- assumptions above are undocumented.

One way to address this could be num_queue_pairs, and something like
/* The actual number of TX and RX queues is num_queue_pairs + 1 each. */
__u16 num_queue_pairs;
(and tweak code to match).

Alternatively, have separate registers for the number of tx and rx queues.

>  } __attribute__((packed));
>  
>  /* This is the first element of the scatter-gather list.  If you don't
> diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
> --- org/drivers/net/virtio_net.c  2011-02-21 17:55:42.0 +0530
> +++ new/drivers/net/virtio_net.c  2011-02-25 16:23:41.0 +0530
> @@ -40,31 +40,53 @@ module_param(gso, bool, 0444);
>  
>  #define VIRTNET_SEND_COMMAND_SG_MAX2
>  
> -struct virtnet_info {
> - struct virtio_device *vdev;
> - struct virtqueue *rvq, *svq, *cvq;
> - struct net_device *dev;
> +/* Internal representation of a send virtqueue */
> +struct send_queue {
> + struct virtqueue *svq;
> +
> + /* TX: fragments + linear part + virtio header */
> + struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
> +};
> +
> +/* Internal representation of a receive virtqueue */
> +struct receive_queue {
> + /* Virtqueue associated with this receive_queue */
> + struct virtqueue *rvq;
> +
> + /* Back pointer to the virtnet_info */
> + struct virtnet_info *vi;
> +
>   struct napi_struct napi;
> - unsigned int status;
>  
>   /* Number of input buffers, and max we've ever had. */
>   unsigned int num, max;
>  
> - /* I like... big packets and I cannot lie! */
> - bool big_packets;
> -
> - /* Host will merge rx buffers for big packets (shake it! shake it!) */
> - bool mergeable_rx_bufs;
> -
>   /* Work struct for refilling if we run low on memory. */
>   struct delayed_work refill;
>  
>   /* Chain pages by the private ptr. */
>   struct page *pages;
>  
> - /* fragments + linear part + virtio header */
> + /* RX: fragments + linear part + virtio header */
>   struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
> - struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
> +};
> +
> +struct virtnet_info {
> + struct send_queue **sq;
> + struct receive_queue **rq;
> +
> + /* read-mostly variables */
> + int numtxqs cacheline_aligned_in_smp;   /* # of rxqs/txqs */

Why do you think this alignment is a win?

> + struct virtio_device *vdev;
> + struct virtqueue *cvq;
> + struct net_device *dev;
> + unsigned int status;
> +
> + /* I like... big packets and I cannot lie! */
> + bool big_packets;
> +
> +