Re: netfront for review
Gerd Hoffmann wrote: > Drawback is that the guest kernel wouldn't work with older xen > versions (dom0 netback driver to be exact) any more. Probably > wouldn't be a showstopper though, given that xen 3.0.3 probably is > almost one year out by the time 2.6.22 will be released ... I don't think we've decided how backwards-compatible the pv_ops guest should be. I've been working on the basis "as much as possible", so long as it isn't a large burden. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Christoph Hellwig wrote: On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote: Guess so. It defaults to flip. I simplified the rx_copy/flip module parameter to a simple rx_mode=0/1, but this is preserved from the original. My guess is that originally there was only flip, and copy was added later. Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and made the default) later. So can we please just rip out the obsolete code completely? There's no point adding this code anymore. Drawback is that the guest kernel wouldn't work with older xen versions (dom0 netback driver to be exact) any more. Probably wouldn't be a showstopper though, given that xen 3.0.3 probably is almost one year out by the time 2.6.22 will be released ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote: > >Guess so. It defaults to flip. I simplified the rx_copy/flip module > >parameter to a simple rx_mode=0/1, but this is preserved from the > >original. My guess is that originally there was only flip, and copy was > >added later. > > Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and > made the default) later. So can we please just rip out the obsolete code completely? There's no point adding this code anymore. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Jeremy Fitzhardinge wrote: Gerd Hoffmann wrote: Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when tearing down an interface." you added a call to "add_id_to_freelist(np->rx_skbs, id);". However, rx_skbs doesn't have an extra entry for the list head, and there's never any corresponding get_id_from_freelist(np->rx_skbs). What should it be? The function has an effect in page flipping mode only. It walks the whole list of rx skbufs (id is the loop variable ...), checks whenever they are handed out to the frontend driver to fill in packet data and not returned yet, and if so reclaim them ... Yes, but why use add_id_to_freelist? rx_skbs are not being used on a freelist anywhere else. It just means the rx_skb array gets filled with small integers, but the rest of the code assumes they're either NULL or an skb pointer. Hmm, good point. Have to look at the code again, it has been some time I've written that, and it took me some time to figure how all the grant table stuff works ... Maybe the add_id_to_freelist() call can simply be dropped. The whole interface is released shortly thereafter, probably thats why filling the freelist with yunk never caused any problems ... cheers, Gerd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On 3/5/07 15:27, "Jeremy Fitzhardinge" <[EMAIL PROTECTED]> wrote: >> The function has an effect in page flipping mode only. It walks the >> whole list of rx skbufs (id is the loop variable ...), checks whenever >> they are handed out to the frontend driver to fill in packet data and >> not returned yet, and if so reclaim them ... > > Yes, but why use add_id_to_freelist? rx_skbs are not being used on a > freelist anywhere else. It just means the rx_skb array gets filled with > small integers, but the rest of the code assumes they're either NULL or > an skb pointer. The need for it went away when Herbert Xu made the mapping between receive-ring slots and receive-request/response identifiers static. I think there was a race between Gerd writing his patch, Herbert removing the need for add_id_to_freelist, and Gerd's patch being checked in. -- Keir - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Gerd Hoffmann wrote: >> Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when >> tearing down an interface." you added a call to >> "add_id_to_freelist(np->rx_skbs, id);". However, rx_skbs doesn't have >> an extra entry for the list head, and there's never any corresponding >> get_id_from_freelist(np->rx_skbs). What should it be? > > The function has an effect in page flipping mode only. It walks the > whole list of rx skbufs (id is the loop variable ...), checks whenever > they are handed out to the frontend driver to fill in packet data and > not returned yet, and if so reclaim them ... Yes, but why use add_id_to_freelist? rx_skbs are not being used on a freelist anywhere else. It just means the rx_skb array gets filled with small integers, but the rest of the code assumes they're either NULL or an skb pointer. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Hi, Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when tearing down an interface." you added a call to "add_id_to_freelist(np->rx_skbs, id);". However, rx_skbs doesn't have an extra entry for the list head, and there's never any corresponding get_id_from_freelist(np->rx_skbs). What should it be? The function has an effect in page flipping mode only. It walks the whole list of rx skbufs (id is the loop variable ...), checks whenever they are handed out to the frontend driver to fill in packet data and not returned yet, and if so reclaim them ... + err = xenbus_scanf(XBT_NIL, np->xbdev->otherend, + "feature-rx-flip", "%u", _rx_flip); + if (err != 1) + feature_rx_flip = 1; This second one deserves a comment. If it doesn't set feature_rx_flip it's *on*? Historical reasons Guess so. It defaults to flip. I simplified the rx_copy/flip module parameter to a simple rx_mode=0/1, but this is preserved from the original. My guess is that originally there was only flip, and copy was added later. Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and made the default) later. cheers, Gerd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Hi, Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when tearing down an interface. you added a call to add_id_to_freelist(np-rx_skbs, id);. However, rx_skbs doesn't have an extra entry for the list head, and there's never any corresponding get_id_from_freelist(np-rx_skbs). What should it be? The function has an effect in page flipping mode only. It walks the whole list of rx skbufs (id is the loop variable ...), checks whenever they are handed out to the frontend driver to fill in packet data and not returned yet, and if so reclaim them ... + err = xenbus_scanf(XBT_NIL, np-xbdev-otherend, + feature-rx-flip, %u, feature_rx_flip); + if (err != 1) + feature_rx_flip = 1; This second one deserves a comment. If it doesn't set feature_rx_flip it's *on*? Historical reasons Guess so. It defaults to flip. I simplified the rx_copy/flip module parameter to a simple rx_mode=0/1, but this is preserved from the original. My guess is that originally there was only flip, and copy was added later. Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and made the default) later. cheers, Gerd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Gerd Hoffmann wrote: Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when tearing down an interface. you added a call to add_id_to_freelist(np-rx_skbs, id);. However, rx_skbs doesn't have an extra entry for the list head, and there's never any corresponding get_id_from_freelist(np-rx_skbs). What should it be? The function has an effect in page flipping mode only. It walks the whole list of rx skbufs (id is the loop variable ...), checks whenever they are handed out to the frontend driver to fill in packet data and not returned yet, and if so reclaim them ... Yes, but why use add_id_to_freelist? rx_skbs are not being used on a freelist anywhere else. It just means the rx_skb array gets filled with small integers, but the rest of the code assumes they're either NULL or an skb pointer. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On 3/5/07 15:27, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: The function has an effect in page flipping mode only. It walks the whole list of rx skbufs (id is the loop variable ...), checks whenever they are handed out to the frontend driver to fill in packet data and not returned yet, and if so reclaim them ... Yes, but why use add_id_to_freelist? rx_skbs are not being used on a freelist anywhere else. It just means the rx_skb array gets filled with small integers, but the rest of the code assumes they're either NULL or an skb pointer. The need for it went away when Herbert Xu made the mapping between receive-ring slots and receive-request/response identifiers static. I think there was a race between Gerd writing his patch, Herbert removing the need for add_id_to_freelist, and Gerd's patch being checked in. -- Keir - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Jeremy Fitzhardinge wrote: Gerd Hoffmann wrote: Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when tearing down an interface. you added a call to add_id_to_freelist(np-rx_skbs, id);. However, rx_skbs doesn't have an extra entry for the list head, and there's never any corresponding get_id_from_freelist(np-rx_skbs). What should it be? The function has an effect in page flipping mode only. It walks the whole list of rx skbufs (id is the loop variable ...), checks whenever they are handed out to the frontend driver to fill in packet data and not returned yet, and if so reclaim them ... Yes, but why use add_id_to_freelist? rx_skbs are not being used on a freelist anywhere else. It just means the rx_skb array gets filled with small integers, but the rest of the code assumes they're either NULL or an skb pointer. Hmm, good point. Have to look at the code again, it has been some time I've written that, and it took me some time to figure how all the grant table stuff works ... Maybe the add_id_to_freelist() call can simply be dropped. The whole interface is released shortly thereafter, probably thats why filling the freelist with yunk never caused any problems ... cheers, Gerd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote: Guess so. It defaults to flip. I simplified the rx_copy/flip module parameter to a simple rx_mode=0/1, but this is preserved from the original. My guess is that originally there was only flip, and copy was added later. Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and made the default) later. So can we please just rip out the obsolete code completely? There's no point adding this code anymore. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Christoph Hellwig wrote: On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote: Guess so. It defaults to flip. I simplified the rx_copy/flip module parameter to a simple rx_mode=0/1, but this is preserved from the original. My guess is that originally there was only flip, and copy was added later. Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and made the default) later. So can we please just rip out the obsolete code completely? There's no point adding this code anymore. Drawback is that the guest kernel wouldn't work with older xen versions (dom0 netback driver to be exact) any more. Probably wouldn't be a showstopper though, given that xen 3.0.3 probably is almost one year out by the time 2.6.22 will be released ... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Gerd Hoffmann wrote: Drawback is that the guest kernel wouldn't work with older xen versions (dom0 netback driver to be exact) any more. Probably wouldn't be a showstopper though, given that xen 3.0.3 probably is almost one year out by the time 2.6.22 will be released ... I don't think we've decided how backwards-compatible the pv_ops guest should be. I've been working on the basis as much as possible, so long as it isn't a large burden. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
(Gerd, Herbert: there's some questions directed to you down there.) Rusty Russell wrote: >> /* >> * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs >> * is an index into a chain of free entries. >> */ >> struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; >> > > This screams "union" to me, since you're stuffing ints into pointers. > This was a useful cleanup, and I think it revealed a bug. Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when tearing down an interface." you added a call to "add_id_to_freelist(np->rx_skbs, id);". However, rx_skbs doesn't have an extra entry for the list head, and there's never any corresponding get_id_from_freelist(np->rx_skbs). What should it be? >> grant_ref_t gref_tx_head; >> grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; >> grant_ref_t gref_rx_head; >> grant_ref_t grant_rx_ref[NET_RX_RING_SIZE]; >> > > There seems to be a correspondence between tx_skbs[], gref_tx_head and > grant_tx_ref[] - perhaps group them together with a nice comment? > Similarly the rx side. > Yep, rearranged. And I added an explicit separate freelist head for tx_skbs, so it matches the grant side (which doesn't need the +1 as a result). >> +/* >> + * Implement our own carrier flag: the network stack's version causes delays >> + * when the carrier is re-enabled (in particular, dev_activate() may not >> + * immediately be called, which can cause packet loss). >> + */ >> +#define netfront_carrier_on(netif) ((netif)->carrier = 1) >> +#define netfront_carrier_off(netif) ((netif)->carrier = 0) >> +#define netfront_carrier_ok(netif) ((netif)->carrier) >> > > Well, you only call netfront_carrier_on() from one place, so it's pretty > easy to do "netif_carrier_on(); dev_activate();" there. > > I don't think this is critical though. > Are you saying that we wouldn't need to have a private carrier flag if we do the "netif_carrier_on(); dev_activate()" sequence instead? >> +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, >> + struct netif_tx_request *tx) >> > > This needs a big comment. AFAICT, entries in the ring cannot cross page > boundaries. And gso means that you have to put the first (partial) page > of the packet in the ring first, with the NETTXF_extra_info flag, then > the GSO stuff, then the rest of the packet. This results in this > strange xennet_make_frags which does everything after the first packet > page (which may be only *part* of the skb head). > > This also complicates xennet_get_responses(), where the loop is awkward > because it has to get the first bit, then do the loop. > > >> skb_queue_head_init(); >> skb_queue_head_init(); >> skb_queue_head_init(); >> > > I'd love a comment explaining exactly what these three queues are used > for. It seems that rxq is the queue of received skbs (the result), tmpq > is parts of the current skb for multi-fragment skbs, and errq is skbs to > be discarded, which are kept around during the function because ? (we > may need to unmap the pages?) > Um, Herbert? >> +txs = (struct netif_tx_sring *)get_zeroed_page(GFP_KERNEL); >> +if (!txs) { >> +err = -ENOMEM; >> +xenbus_dev_fatal(dev, err, "allocating tx ring page"); >> +goto fail; >> +} >> +SHARED_RING_INIT(txs); >> +FRONT_RING_INIT(>tx, txs, PAGE_SIZE); >> + >> +err = xenbus_grant_ring(dev, virt_to_mfn(txs)); >> +if (err < 0) { >> +free_page((unsigned long)txs); >> +goto fail; >> +} >> + >> +info->tx_ring_ref = err; >> +rxs = (struct netif_rx_sring *)get_zeroed_page(GFP_KERNEL); >> +if (!rxs) { >> +err = -ENOMEM; >> +xenbus_dev_fatal(dev, err, "allocating rx ring page"); >> +goto fail; >> > > Why doesn't this (and the following) need to free txs? Higher level > magic? In general this function seems to lack cleanup. > Yes, I need to look at this more closely. It does seem that txs and rxs will get leaked. >> +err = xenbus_scanf(XBT_NIL, np->xbdev->otherend, >> + "feature-rx-copy", "%u", _rx_copy); >> +if (err != 1) >> +feature_rx_copy = 0; >> +err = xenbus_scanf(XBT_NIL, np->xbdev->otherend, >> + "feature-rx-flip", "%u", _rx_flip); >> +if (err != 1) >> +feature_rx_flip = 1; >> > > This second one deserves a comment. If it doesn't set feature_rx_flip > it's *on*? Historical reasons Guess so. It defaults to flip. I simplified the rx_copy/flip module parameter to a simple rx_mode=0/1, but this is preserved from the original. My guess is that originally there was only flip, and copy was added later. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: netfront for review
(Gerd, Herbert: there's some questions directed to you down there.) Rusty Russell wrote: /* * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs * is an index into a chain of free entries. */ struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; This screams union to me, since you're stuffing ints into pointers. This was a useful cleanup, and I think it revealed a bug. Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when tearing down an interface. you added a call to add_id_to_freelist(np-rx_skbs, id);. However, rx_skbs doesn't have an extra entry for the list head, and there's never any corresponding get_id_from_freelist(np-rx_skbs). What should it be? grant_ref_t gref_tx_head; grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; grant_ref_t gref_rx_head; grant_ref_t grant_rx_ref[NET_RX_RING_SIZE]; There seems to be a correspondence between tx_skbs[], gref_tx_head and grant_tx_ref[] - perhaps group them together with a nice comment? Similarly the rx side. Yep, rearranged. And I added an explicit separate freelist head for tx_skbs, so it matches the grant side (which doesn't need the +1 as a result). +/* + * Implement our own carrier flag: the network stack's version causes delays + * when the carrier is re-enabled (in particular, dev_activate() may not + * immediately be called, which can cause packet loss). + */ +#define netfront_carrier_on(netif) ((netif)-carrier = 1) +#define netfront_carrier_off(netif) ((netif)-carrier = 0) +#define netfront_carrier_ok(netif) ((netif)-carrier) Well, you only call netfront_carrier_on() from one place, so it's pretty easy to do netif_carrier_on(); dev_activate(); there. I don't think this is critical though. Are you saying that we wouldn't need to have a private carrier flag if we do the netif_carrier_on(); dev_activate() sequence instead? +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, + struct netif_tx_request *tx) This needs a big comment. AFAICT, entries in the ring cannot cross page boundaries. And gso means that you have to put the first (partial) page of the packet in the ring first, with the NETTXF_extra_info flag, then the GSO stuff, then the rest of the packet. This results in this strange xennet_make_frags which does everything after the first packet page (which may be only *part* of the skb head). This also complicates xennet_get_responses(), where the loop is awkward because it has to get the first bit, then do the loop. skb_queue_head_init(rxq); skb_queue_head_init(errq); skb_queue_head_init(tmpq); I'd love a comment explaining exactly what these three queues are used for. It seems that rxq is the queue of received skbs (the result), tmpq is parts of the current skb for multi-fragment skbs, and errq is skbs to be discarded, which are kept around during the function because ? (we may need to unmap the pages?) Um, Herbert? +txs = (struct netif_tx_sring *)get_zeroed_page(GFP_KERNEL); +if (!txs) { +err = -ENOMEM; +xenbus_dev_fatal(dev, err, allocating tx ring page); +goto fail; +} +SHARED_RING_INIT(txs); +FRONT_RING_INIT(info-tx, txs, PAGE_SIZE); + +err = xenbus_grant_ring(dev, virt_to_mfn(txs)); +if (err 0) { +free_page((unsigned long)txs); +goto fail; +} + +info-tx_ring_ref = err; +rxs = (struct netif_rx_sring *)get_zeroed_page(GFP_KERNEL); +if (!rxs) { +err = -ENOMEM; +xenbus_dev_fatal(dev, err, allocating rx ring page); +goto fail; Why doesn't this (and the following) need to free txs? Higher level magic? In general this function seems to lack cleanup. Yes, I need to look at this more closely. It does seem that txs and rxs will get leaked. +err = xenbus_scanf(XBT_NIL, np-xbdev-otherend, + feature-rx-copy, %u, feature_rx_copy); +if (err != 1) +feature_rx_copy = 0; +err = xenbus_scanf(XBT_NIL, np-xbdev-otherend, + feature-rx-flip, %u, feature_rx_flip); +if (err != 1) +feature_rx_flip = 1; This second one deserves a comment. If it doesn't set feature_rx_flip it's *on*? Historical reasons Guess so. It defaults to flip. I simplified the rx_copy/flip module parameter to a simple rx_mode=0/1, but this is preserved from the original. My guess is that originally there was only flip, and copy was added later. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On Wed, 2007-05-02 at 13:51 +1000, Herbert Xu wrote: > On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote: > > > > > +static int xennet_change_mtu(struct net_device *dev, int mtu) > > > +{ > > > + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; > > > > This seems odd to me: just because a device does TSO should we really > > allow huge mtu settings? Herbert? > > Actually this has nothing to do with TSO/GSO. This driver inherently > supports arbitrary MTUs that's only limited by our network stack. If > the physical network device can't handle it you'll just get an ICMP > error back or fragmentation. Oops, I misread "xennet_can_sg" as a test for GSO. Thanks for the clarification, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Rusty Russell wrote: > On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote: > >> Add the Xen virtual network device driver. >> > > (Herbert: there's a question for you: grep for Herbert) > > OK, this is a remarkably non-trivial driver. If the v0.1 of the driver > had been in the kernel, I'm sure it would have been about 1/4 the size > and far easier to review 8( > > Nonetheless, I have scoured the entire thing. It's not actually too > bad. > Lovely, thanks. I'm at the double disadvantage of not really knowing how the network stack works or precisely how Xen does networking, but the homework exercises ("comment this") will definitely help my education. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote: > > > +static int xennet_change_mtu(struct net_device *dev, int mtu) > > +{ > > + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; > > + > > + if (mtu > max) > > + return -EINVAL; > > + dev->mtu = mtu; > > + return 0; > > +} > > This seems odd to me: just because a device does TSO should we really > allow huge mtu settings? Herbert? Actually this has nothing to do with TSO/GSO. This driver inherently supports arbitrary MTUs that's only limited by our network stack. If the physical network device can't handle it you'll just get an ICMP error back or fragmentation. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote: > Add the Xen virtual network device driver. (Herbert: there's a question for you: grep for Herbert) OK, this is a remarkably non-trivial driver. If the v0.1 of the driver had been in the kernel, I'm sure it would have been about 1/4 the size and far easier to review 8( Nonetheless, I have scoured the entire thing. It's not actually too bad. > struct netfront_cb { > struct page *page; > unsigned offset; >}; Comment this please. This is used when assembling incoming skbs, and may well correspond to skb_shinfo(skb)->frags[0].page, but not always it seems. > struct netfront_info { ... > struct net_device *netdev; ... > unsigned int evtchn, irq; The netdev has an irq field already. Using it will probably help ifconfig output, too. > u8 mac[ETH_ALEN]; You simply copy this into netdev->dev_addr: put it on the stack instead? > /* >* {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs >* is an index into a chain of free entries. >*/ > struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; This screams "union" to me, since you're stuffing ints into pointers. > #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256) This seems not to be used here, yet it's declared in the middle of the struct? > grant_ref_t gref_tx_head; > grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; > grant_ref_t gref_rx_head; > grant_ref_t grant_rx_ref[NET_RX_RING_SIZE]; There seems to be a correspondence between tx_skbs[], gref_tx_head and grant_tx_ref[] - perhaps group them together with a nice comment? Similarly the rx side. > +/* > + * Implement our own carrier flag: the network stack's version causes delays > + * when the carrier is re-enabled (in particular, dev_activate() may not > + * immediately be called, which can cause packet loss). > + */ > +#define netfront_carrier_on(netif) ((netif)->carrier = 1) > +#define netfront_carrier_off(netif) ((netif)->carrier = 0) > +#define netfront_carrier_ok(netif) ((netif)->carrier) Well, you only call netfront_carrier_on() from one place, so it's pretty easy to do "netif_carrier_on(); dev_activate();" there. I don't think this is critical though. > + id = txrsp->id; > + skb = np->tx_skbs[id]; > + if (unlikely(gnttab_query_foreign_access( > + np->grant_tx_ref[id]) != 0)) { > + printk(KERN_ALERT "xennet_tx_buf_gc: warning " > +"-- grant still in use by backend " > +"domain.\n"); > + BUG(); > + } I shall resist the urge to point out the can of worms that Xen opened by trying to allow domains to directly access each others' memory. > if ( nr_flips != 0 ) { Style. > +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, > + struct netif_tx_request *tx) This needs a big comment. AFAICT, entries in the ring cannot cross page boundaries. And gso means that you have to put the first (partial) page of the packet in the ring first, with the NETTXF_extra_info flag, then the GSO stuff, then the rest of the packet. This results in this strange xennet_make_frags which does everything after the first packet page (which may be only *part* of the skb head). This also complicates xennet_get_responses(), where the loop is awkward because it has to get the first bit, then do the loop. > skb_queue_head_init(); > skb_queue_head_init(); > skb_queue_head_init(); I'd love a comment explaining exactly what these three queues are used for. It seems that rxq is the queue of received skbs (the result), tmpq is parts of the current skb for multi-fragment skbs, and errq is skbs to be discarded, which are kept around during the function because ? (we may need to unmap the pages?) > + /* > + * Truesize must approximates the size of true data plus s/approximates/approximate/ or s/must //. > +/* > + * Nothing to do here. Virtual interface is point-to-point and the > + * physical interface is probably promiscuous anyway. > + */ > +static void xennet_set_multicast_list(struct net_device *dev) > +{ > +} Hmm, you can just leave this as NULL then. It will fail if someone tries to set multicast on it, but that's probably correct behaviour. > +static int xennet_change_mtu(struct net_device *dev, int mtu) > +{ > + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; > + > + if (mtu > max) > + return -EINVAL; > + dev->mtu = mtu; > + return 0; > +} This seems odd to me: just because a device does TSO should we really allow huge mtu settings? Herbert? > + /* Initialise {tx,rx}_skbs as a free chain containing every entry. */ > + for (i = 0; i <= NET_TX_RING_SIZE; i++) {
Re: netfront for review
On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote: Add the Xen virtual network device driver. (Herbert: there's a question for you: grep for Herbert) OK, this is a remarkably non-trivial driver. If the v0.1 of the driver had been in the kernel, I'm sure it would have been about 1/4 the size and far easier to review 8( Nonetheless, I have scoured the entire thing. It's not actually too bad. struct netfront_cb { struct page *page; unsigned offset; }; Comment this please. This is used when assembling incoming skbs, and may well correspond to skb_shinfo(skb)-frags[0].page, but not always it seems. struct netfront_info { ... struct net_device *netdev; ... unsigned int evtchn, irq; The netdev has an irq field already. Using it will probably help ifconfig output, too. u8 mac[ETH_ALEN]; You simply copy this into netdev-dev_addr: put it on the stack instead? /* * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs * is an index into a chain of free entries. */ struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; This screams union to me, since you're stuffing ints into pointers. #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256) This seems not to be used here, yet it's declared in the middle of the struct? grant_ref_t gref_tx_head; grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; grant_ref_t gref_rx_head; grant_ref_t grant_rx_ref[NET_RX_RING_SIZE]; There seems to be a correspondence between tx_skbs[], gref_tx_head and grant_tx_ref[] - perhaps group them together with a nice comment? Similarly the rx side. +/* + * Implement our own carrier flag: the network stack's version causes delays + * when the carrier is re-enabled (in particular, dev_activate() may not + * immediately be called, which can cause packet loss). + */ +#define netfront_carrier_on(netif) ((netif)-carrier = 1) +#define netfront_carrier_off(netif) ((netif)-carrier = 0) +#define netfront_carrier_ok(netif) ((netif)-carrier) Well, you only call netfront_carrier_on() from one place, so it's pretty easy to do netif_carrier_on(); dev_activate(); there. I don't think this is critical though. + id = txrsp-id; + skb = np-tx_skbs[id]; + if (unlikely(gnttab_query_foreign_access( + np-grant_tx_ref[id]) != 0)) { + printk(KERN_ALERT xennet_tx_buf_gc: warning +-- grant still in use by backend +domain.\n); + BUG(); + } I shall resist the urge to point out the can of worms that Xen opened by trying to allow domains to directly access each others' memory. if ( nr_flips != 0 ) { Style. +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, + struct netif_tx_request *tx) This needs a big comment. AFAICT, entries in the ring cannot cross page boundaries. And gso means that you have to put the first (partial) page of the packet in the ring first, with the NETTXF_extra_info flag, then the GSO stuff, then the rest of the packet. This results in this strange xennet_make_frags which does everything after the first packet page (which may be only *part* of the skb head). This also complicates xennet_get_responses(), where the loop is awkward because it has to get the first bit, then do the loop. skb_queue_head_init(rxq); skb_queue_head_init(errq); skb_queue_head_init(tmpq); I'd love a comment explaining exactly what these three queues are used for. It seems that rxq is the queue of received skbs (the result), tmpq is parts of the current skb for multi-fragment skbs, and errq is skbs to be discarded, which are kept around during the function because ? (we may need to unmap the pages?) + /* + * Truesize must approximates the size of true data plus s/approximates/approximate/ or s/must //. +/* + * Nothing to do here. Virtual interface is point-to-point and the + * physical interface is probably promiscuous anyway. + */ +static void xennet_set_multicast_list(struct net_device *dev) +{ +} Hmm, you can just leave this as NULL then. It will fail if someone tries to set multicast on it, but that's probably correct behaviour. +static int xennet_change_mtu(struct net_device *dev, int mtu) +{ + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; + + if (mtu max) + return -EINVAL; + dev-mtu = mtu; + return 0; +} This seems odd to me: just because a device does TSO should we really allow huge mtu settings? Herbert? + /* Initialise {tx,rx}_skbs as a free chain containing every entry. */ + for (i = 0; i = NET_TX_RING_SIZE; i++) { + np-tx_skbs[i] = (void *)((unsigned long) i+1); +
Re: netfront for review
On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote: +static int xennet_change_mtu(struct net_device *dev, int mtu) +{ + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; + + if (mtu max) + return -EINVAL; + dev-mtu = mtu; + return 0; +} This seems odd to me: just because a device does TSO should we really allow huge mtu settings? Herbert? Actually this has nothing to do with TSO/GSO. This driver inherently supports arbitrary MTUs that's only limited by our network stack. If the physical network device can't handle it you'll just get an ICMP error back or fragmentation. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
Rusty Russell wrote: On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote: Add the Xen virtual network device driver. (Herbert: there's a question for you: grep for Herbert) OK, this is a remarkably non-trivial driver. If the v0.1 of the driver had been in the kernel, I'm sure it would have been about 1/4 the size and far easier to review 8( Nonetheless, I have scoured the entire thing. It's not actually too bad. Lovely, thanks. I'm at the double disadvantage of not really knowing how the network stack works or precisely how Xen does networking, but the homework exercises (comment this) will definitely help my education. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: netfront for review
On Wed, 2007-05-02 at 13:51 +1000, Herbert Xu wrote: On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote: +static int xennet_change_mtu(struct net_device *dev, int mtu) +{ + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; This seems odd to me: just because a device does TSO should we really allow huge mtu settings? Herbert? Actually this has nothing to do with TSO/GSO. This driver inherently supports arbitrary MTUs that's only limited by our network stack. If the physical network device can't handle it you'll just get an ICMP error back or fragmentation. Oops, I misread xennet_can_sg as a test for GSO. Thanks for the clarification, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/