Re: netfront for review

2007-05-03 Thread Jeremy Fitzhardinge
Gerd Hoffmann wrote:
> Drawback is that the guest kernel wouldn't work with older xen
> versions (dom0 netback driver to be exact) any more.  Probably
> wouldn't be a showstopper though, given that xen 3.0.3 probably is
> almost one year out by the time 2.6.22 will be released ...

I don't think we've decided how backwards-compatible the pv_ops guest
should be.  I've been working on the basis "as much as possible", so
long as it isn't a large burden.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Gerd Hoffmann

Christoph Hellwig wrote:

On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote:

Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
parameter to a simple rx_mode=0/1, but this is preserved from the
original.  My guess is that originally there was only flip, and copy was
added later.
Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and 
made the default) later.


So can we please just rip out the obsolete code completely?  There's
no point adding this code anymore.


Drawback is that the guest kernel wouldn't work with older xen versions 
(dom0 netback driver to be exact) any more.  Probably wouldn't be a 
showstopper though, given that xen 3.0.3 probably is almost one year out 
by the time 2.6.22 will be released ...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Christoph Hellwig
On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote:
> >Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
> >parameter to a simple rx_mode=0/1, but this is preserved from the
> >original.  My guess is that originally there was only flip, and copy was
> >added later.
> 
> Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and 
> made the default) later.

So can we please just rip out the obsolete code completely?  There's
no point adding this code anymore.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Gerd Hoffmann

Jeremy Fitzhardinge wrote:

Gerd Hoffmann wrote:

Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when
tearing down an interface." you added a call to
"add_id_to_freelist(np->rx_skbs, id);".  However, rx_skbs doesn't have
an extra entry for the list head, and there's never any corresponding
get_id_from_freelist(np->rx_skbs).  What should it be?

The function has an effect in page flipping mode only.  It walks the
whole list of rx skbufs (id is the loop variable ...), checks whenever
they are handed out to the frontend driver to fill in packet data and
not returned yet, and if so reclaim them ...


Yes, but why use add_id_to_freelist?  rx_skbs are not being used on a
freelist anywhere else.  It just means the rx_skb array gets filled with
small integers, but the rest of the code assumes they're either NULL or
an skb pointer.


Hmm, good point.  Have to look at the code again, it has been some time 
I've written that, and it took me some time to figure how all the grant 
table stuff works ...


Maybe the add_id_to_freelist() call can simply be dropped.  The whole 
interface is released shortly thereafter, probably thats why filling the 
freelist with yunk never caused any problems ...


cheers,
  Gerd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Keir Fraser



On 3/5/07 15:27, "Jeremy Fitzhardinge" <[EMAIL PROTECTED]> wrote:

>> The function has an effect in page flipping mode only.  It walks the
>> whole list of rx skbufs (id is the loop variable ...), checks whenever
>> they are handed out to the frontend driver to fill in packet data and
>> not returned yet, and if so reclaim them ...
> 
> Yes, but why use add_id_to_freelist?  rx_skbs are not being used on a
> freelist anywhere else.  It just means the rx_skb array gets filled with
> small integers, but the rest of the code assumes they're either NULL or
> an skb pointer.

The need for it went away when Herbert Xu made the mapping between
receive-ring slots and receive-request/response identifiers static. I think
there was a race between Gerd writing his patch, Herbert removing the need
for add_id_to_freelist, and Gerd's patch being checked in.

 -- Keir

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Jeremy Fitzhardinge
Gerd Hoffmann wrote:
>> Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when
>> tearing down an interface." you added a call to
>> "add_id_to_freelist(np->rx_skbs, id);".  However, rx_skbs doesn't have
>> an extra entry for the list head, and there's never any corresponding
>> get_id_from_freelist(np->rx_skbs).  What should it be?
>
> The function has an effect in page flipping mode only.  It walks the
> whole list of rx skbufs (id is the loop variable ...), checks whenever
> they are handed out to the frontend driver to fill in packet data and
> not returned yet, and if so reclaim them ...

Yes, but why use add_id_to_freelist?  rx_skbs are not being used on a
freelist anywhere else.  It just means the rx_skb array gets filled with
small integers, but the rest of the code assumes they're either NULL or
an skb pointer.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Gerd Hoffmann

  Hi,


Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when
tearing down an interface." you added a call to
"add_id_to_freelist(np->rx_skbs, id);".  However, rx_skbs doesn't have
an extra entry for the list head, and there's never any corresponding
get_id_from_freelist(np->rx_skbs).  What should it be?


The function has an effect in page flipping mode only.  It walks the 
whole list of rx skbufs (id is the loop variable ...), checks whenever 
they are handed out to the frontend driver to fill in packet data and 
not returned yet, and if so reclaim them ...



+   err = xenbus_scanf(XBT_NIL, np->xbdev->otherend,
+  "feature-rx-flip", "%u", _rx_flip);
+   if (err != 1)
+   feature_rx_flip = 1;


This second one deserves a comment.  If it doesn't set feature_rx_flip
it's *on*?  Historical reasons


Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
parameter to a simple rx_mode=0/1, but this is preserved from the
original.  My guess is that originally there was only flip, and copy was
added later.


Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and 
made the default) later.


cheers,
  Gerd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Gerd Hoffmann

  Hi,


Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when
tearing down an interface. you added a call to
add_id_to_freelist(np-rx_skbs, id);.  However, rx_skbs doesn't have
an extra entry for the list head, and there's never any corresponding
get_id_from_freelist(np-rx_skbs).  What should it be?


The function has an effect in page flipping mode only.  It walks the 
whole list of rx skbufs (id is the loop variable ...), checks whenever 
they are handed out to the frontend driver to fill in packet data and 
not returned yet, and if so reclaim them ...



+   err = xenbus_scanf(XBT_NIL, np-xbdev-otherend,
+  feature-rx-flip, %u, feature_rx_flip);
+   if (err != 1)
+   feature_rx_flip = 1;


This second one deserves a comment.  If it doesn't set feature_rx_flip
it's *on*?  Historical reasons


Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
parameter to a simple rx_mode=0/1, but this is preserved from the
original.  My guess is that originally there was only flip, and copy was
added later.


Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and 
made the default) later.


cheers,
  Gerd
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Jeremy Fitzhardinge
Gerd Hoffmann wrote:
 Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when
 tearing down an interface. you added a call to
 add_id_to_freelist(np-rx_skbs, id);.  However, rx_skbs doesn't have
 an extra entry for the list head, and there's never any corresponding
 get_id_from_freelist(np-rx_skbs).  What should it be?

 The function has an effect in page flipping mode only.  It walks the
 whole list of rx skbufs (id is the loop variable ...), checks whenever
 they are handed out to the frontend driver to fill in packet data and
 not returned yet, and if so reclaim them ...

Yes, but why use add_id_to_freelist?  rx_skbs are not being used on a
freelist anywhere else.  It just means the rx_skb array gets filled with
small integers, but the rest of the code assumes they're either NULL or
an skb pointer.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Keir Fraser



On 3/5/07 15:27, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

 The function has an effect in page flipping mode only.  It walks the
 whole list of rx skbufs (id is the loop variable ...), checks whenever
 they are handed out to the frontend driver to fill in packet data and
 not returned yet, and if so reclaim them ...
 
 Yes, but why use add_id_to_freelist?  rx_skbs are not being used on a
 freelist anywhere else.  It just means the rx_skb array gets filled with
 small integers, but the rest of the code assumes they're either NULL or
 an skb pointer.

The need for it went away when Herbert Xu made the mapping between
receive-ring slots and receive-request/response identifiers static. I think
there was a race between Gerd writing his patch, Herbert removing the need
for add_id_to_freelist, and Gerd's patch being checked in.

 -- Keir

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Gerd Hoffmann

Jeremy Fitzhardinge wrote:

Gerd Hoffmann wrote:

Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when
tearing down an interface. you added a call to
add_id_to_freelist(np-rx_skbs, id);.  However, rx_skbs doesn't have
an extra entry for the list head, and there's never any corresponding
get_id_from_freelist(np-rx_skbs).  What should it be?

The function has an effect in page flipping mode only.  It walks the
whole list of rx skbufs (id is the loop variable ...), checks whenever
they are handed out to the frontend driver to fill in packet data and
not returned yet, and if so reclaim them ...


Yes, but why use add_id_to_freelist?  rx_skbs are not being used on a
freelist anywhere else.  It just means the rx_skb array gets filled with
small integers, but the rest of the code assumes they're either NULL or
an skb pointer.


Hmm, good point.  Have to look at the code again, it has been some time 
I've written that, and it took me some time to figure how all the grant 
table stuff works ...


Maybe the add_id_to_freelist() call can simply be dropped.  The whole 
interface is released shortly thereafter, probably thats why filling the 
freelist with yunk never caused any problems ...


cheers,
  Gerd

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Christoph Hellwig
On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote:
 Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
 parameter to a simple rx_mode=0/1, but this is preserved from the
 original.  My guess is that originally there was only flip, and copy was
 added later.
 
 Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and 
 made the default) later.

So can we please just rip out the obsolete code completely?  There's
no point adding this code anymore.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Gerd Hoffmann

Christoph Hellwig wrote:

On Thu, May 03, 2007 at 09:33:43AM +0200, Gerd Hoffmann wrote:

Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
parameter to a simple rx_mode=0/1, but this is preserved from the
original.  My guess is that originally there was only flip, and copy was
added later.
Yep, page flipping is the old (pre xen 3.0.3) way, copy was added (and 
made the default) later.


So can we please just rip out the obsolete code completely?  There's
no point adding this code anymore.


Drawback is that the guest kernel wouldn't work with older xen versions 
(dom0 netback driver to be exact) any more.  Probably wouldn't be a 
showstopper though, given that xen 3.0.3 probably is almost one year out 
by the time 2.6.22 will be released ...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-03 Thread Jeremy Fitzhardinge
Gerd Hoffmann wrote:
 Drawback is that the guest kernel wouldn't work with older xen
 versions (dom0 netback driver to be exact) any more.  Probably
 wouldn't be a showstopper though, given that xen 3.0.3 probably is
 almost one year out by the time 2.6.22 will be released ...

I don't think we've decided how backwards-compatible the pv_ops guest
should be.  I've been working on the basis as much as possible, so
long as it isn't a large burden.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-02 Thread Jeremy Fitzhardinge
(Gerd, Herbert: there's some questions directed to you down there.)

Rusty Russell wrote:
>>  /*
>>   * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs
>>   * is an index into a chain of free entries.
>>   */
>>  struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];
>> 
>
> This screams "union" to me, since you're stuffing ints into pointers.
>   

This was a useful cleanup, and I think it revealed a bug.

Gerd, in change 11196:b85da7cd9ea5 "front: Fix rx buffer leak when
tearing down an interface." you added a call to
"add_id_to_freelist(np->rx_skbs, id);".  However, rx_skbs doesn't have
an extra entry for the list head, and there's never any corresponding
get_id_from_freelist(np->rx_skbs).  What should it be?

>>  grant_ref_t gref_tx_head;
>>  grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
>>  grant_ref_t gref_rx_head;
>>  grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
>> 
>
> There seems to be a correspondence between tx_skbs[], gref_tx_head and
> grant_tx_ref[] - perhaps group them together with a nice comment?
> Similarly the rx side.
>   

Yep, rearranged.  And I added an explicit separate freelist head for
tx_skbs, so it matches the grant side (which doesn't need the +1 as a
result).

>> +/*
>> + * Implement our own carrier flag: the network stack's version causes delays
>> + * when the carrier is re-enabled (in particular, dev_activate() may not
>> + * immediately be called, which can cause packet loss).
>> + */
>> +#define netfront_carrier_on(netif)  ((netif)->carrier = 1)
>> +#define netfront_carrier_off(netif) ((netif)->carrier = 0)
>> +#define netfront_carrier_ok(netif)  ((netif)->carrier)
>> 
>
> Well, you only call netfront_carrier_on() from one place, so it's pretty
> easy to do "netif_carrier_on(); dev_activate();" there.
>
> I don't think this is critical though.
>   

Are you saying that we wouldn't need to have a private carrier flag if
we do the "netif_carrier_on(); dev_activate()" sequence instead?

>> +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>> +  struct netif_tx_request *tx)
>> 
>
> This needs a big comment.  AFAICT, entries in the ring cannot cross page
> boundaries.  And gso means that you have to put the first (partial) page
> of the packet in the ring first, with the NETTXF_extra_info flag, then
> the GSO stuff, then the rest of the packet.  This results in this
> strange xennet_make_frags which does everything after the first packet
> page (which may be only *part* of the skb head).
>
> This also complicates xennet_get_responses(), where the loop is awkward
> because it has to get the first bit, then do the loop.
>
>   
>>  skb_queue_head_init();
>>  skb_queue_head_init();
>>  skb_queue_head_init();
>> 
>
> I'd love a comment explaining exactly what these three queues are used
> for.  It seems that rxq is the queue of received skbs (the result), tmpq
> is parts of the current skb for multi-fragment skbs, and errq is skbs to
> be discarded, which are kept around during the function because ? (we
> may need to unmap the pages?)
>   

Um, Herbert?

>> +txs = (struct netif_tx_sring *)get_zeroed_page(GFP_KERNEL);
>> +if (!txs) {
>> +err = -ENOMEM;
>> +xenbus_dev_fatal(dev, err, "allocating tx ring page");
>> +goto fail;
>> +}
>> +SHARED_RING_INIT(txs);
>> +FRONT_RING_INIT(>tx, txs, PAGE_SIZE);
>> +
>> +err = xenbus_grant_ring(dev, virt_to_mfn(txs));
>> +if (err < 0) {
>> +free_page((unsigned long)txs);
>> +goto fail;
>> +}
>> +
>> +info->tx_ring_ref = err;
>> +rxs = (struct netif_rx_sring *)get_zeroed_page(GFP_KERNEL);
>> +if (!rxs) {
>> +err = -ENOMEM;
>> +xenbus_dev_fatal(dev, err, "allocating rx ring page");
>> +goto fail;
>> 
>
> Why doesn't this (and the following) need to free txs?  Higher level
> magic?  In general this function seems to lack cleanup.
>   

Yes, I need to look at this more closely.  It does seem that txs and rxs
will get leaked.

>> +err = xenbus_scanf(XBT_NIL, np->xbdev->otherend,
>> +   "feature-rx-copy", "%u", _rx_copy);
>> +if (err != 1)
>> +feature_rx_copy = 0;
>> +err = xenbus_scanf(XBT_NIL, np->xbdev->otherend,
>> +   "feature-rx-flip", "%u", _rx_flip);
>> +if (err != 1)
>> +feature_rx_flip = 1;
>> 
>
> This second one deserves a comment.  If it doesn't set feature_rx_flip
> it's *on*?  Historical reasons

Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
parameter to a simple rx_mode=0/1, but this is preserved from the
original.  My guess is that originally there was only flip, and copy was
added later.


J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: netfront for review

2007-05-02 Thread Jeremy Fitzhardinge
(Gerd, Herbert: there's some questions directed to you down there.)

Rusty Russell wrote:
  /*
   * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs
   * is an index into a chain of free entries.
   */
  struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];
 

 This screams union to me, since you're stuffing ints into pointers.
   

This was a useful cleanup, and I think it revealed a bug.

Gerd, in change 11196:b85da7cd9ea5 front: Fix rx buffer leak when
tearing down an interface. you added a call to
add_id_to_freelist(np-rx_skbs, id);.  However, rx_skbs doesn't have
an extra entry for the list head, and there's never any corresponding
get_id_from_freelist(np-rx_skbs).  What should it be?

  grant_ref_t gref_tx_head;
  grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
  grant_ref_t gref_rx_head;
  grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
 

 There seems to be a correspondence between tx_skbs[], gref_tx_head and
 grant_tx_ref[] - perhaps group them together with a nice comment?
 Similarly the rx side.
   

Yep, rearranged.  And I added an explicit separate freelist head for
tx_skbs, so it matches the grant side (which doesn't need the +1 as a
result).

 +/*
 + * Implement our own carrier flag: the network stack's version causes delays
 + * when the carrier is re-enabled (in particular, dev_activate() may not
 + * immediately be called, which can cause packet loss).
 + */
 +#define netfront_carrier_on(netif)  ((netif)-carrier = 1)
 +#define netfront_carrier_off(netif) ((netif)-carrier = 0)
 +#define netfront_carrier_ok(netif)  ((netif)-carrier)
 

 Well, you only call netfront_carrier_on() from one place, so it's pretty
 easy to do netif_carrier_on(); dev_activate(); there.

 I don't think this is critical though.
   

Are you saying that we wouldn't need to have a private carrier flag if
we do the netif_carrier_on(); dev_activate() sequence instead?

 +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 +  struct netif_tx_request *tx)
 

 This needs a big comment.  AFAICT, entries in the ring cannot cross page
 boundaries.  And gso means that you have to put the first (partial) page
 of the packet in the ring first, with the NETTXF_extra_info flag, then
 the GSO stuff, then the rest of the packet.  This results in this
 strange xennet_make_frags which does everything after the first packet
 page (which may be only *part* of the skb head).

 This also complicates xennet_get_responses(), where the loop is awkward
 because it has to get the first bit, then do the loop.

   
  skb_queue_head_init(rxq);
  skb_queue_head_init(errq);
  skb_queue_head_init(tmpq);
 

 I'd love a comment explaining exactly what these three queues are used
 for.  It seems that rxq is the queue of received skbs (the result), tmpq
 is parts of the current skb for multi-fragment skbs, and errq is skbs to
 be discarded, which are kept around during the function because ? (we
 may need to unmap the pages?)
   

Um, Herbert?

 +txs = (struct netif_tx_sring *)get_zeroed_page(GFP_KERNEL);
 +if (!txs) {
 +err = -ENOMEM;
 +xenbus_dev_fatal(dev, err, allocating tx ring page);
 +goto fail;
 +}
 +SHARED_RING_INIT(txs);
 +FRONT_RING_INIT(info-tx, txs, PAGE_SIZE);
 +
 +err = xenbus_grant_ring(dev, virt_to_mfn(txs));
 +if (err  0) {
 +free_page((unsigned long)txs);
 +goto fail;
 +}
 +
 +info-tx_ring_ref = err;
 +rxs = (struct netif_rx_sring *)get_zeroed_page(GFP_KERNEL);
 +if (!rxs) {
 +err = -ENOMEM;
 +xenbus_dev_fatal(dev, err, allocating rx ring page);
 +goto fail;
 

 Why doesn't this (and the following) need to free txs?  Higher level
 magic?  In general this function seems to lack cleanup.
   

Yes, I need to look at this more closely.  It does seem that txs and rxs
will get leaked.

 +err = xenbus_scanf(XBT_NIL, np-xbdev-otherend,
 +   feature-rx-copy, %u, feature_rx_copy);
 +if (err != 1)
 +feature_rx_copy = 0;
 +err = xenbus_scanf(XBT_NIL, np-xbdev-otherend,
 +   feature-rx-flip, %u, feature_rx_flip);
 +if (err != 1)
 +feature_rx_flip = 1;
 

 This second one deserves a comment.  If it doesn't set feature_rx_flip
 it's *on*?  Historical reasons

Guess so.  It defaults to flip.  I simplified the rx_copy/flip module
parameter to a simple rx_mode=0/1, but this is preserved from the
original.  My guess is that originally there was only flip, and copy was
added later.


J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-01 Thread Rusty Russell
On Wed, 2007-05-02 at 13:51 +1000, Herbert Xu wrote:
> On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote:
> >
> > > +static int xennet_change_mtu(struct net_device *dev, int mtu)
> > > +{
> > > + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
> >
> > This seems odd to me: just because a device does TSO should we really
> > allow huge mtu settings?  Herbert?
> 
> Actually this has nothing to do with TSO/GSO.  This driver inherently
> supports arbitrary MTUs that's only limited by our network stack.  If
> the physical network device can't handle it you'll just get an ICMP
> error back or fragmentation.

Oops, I misread "xennet_can_sg" as a test for GSO.

Thanks for the clarification,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-01 Thread Jeremy Fitzhardinge
Rusty Russell wrote:
> On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote:
>   
>> Add the Xen virtual network device driver.
>> 
>
> (Herbert: there's a question for you: grep for Herbert)
>
> OK, this is a remarkably non-trivial driver.  If the v0.1 of the driver
> had been in the kernel, I'm sure it would have been about 1/4 the size
> and far easier to review 8(
>
> Nonetheless, I have scoured the entire thing.  It's not actually too
> bad.
>   

Lovely, thanks.  I'm at the double disadvantage of not really knowing
how the network stack works or precisely how Xen does networking, but
the homework exercises ("comment this") will definitely help my education.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-01 Thread Herbert Xu
On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote:
>
> > +static int xennet_change_mtu(struct net_device *dev, int mtu)
> > +{
> > +   int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
> > +
> > +   if (mtu > max)
> > +   return -EINVAL;
> > +   dev->mtu = mtu;
> > +   return 0;
> > +}
> 
> This seems odd to me: just because a device does TSO should we really
> allow huge mtu settings?  Herbert?

Actually this has nothing to do with TSO/GSO.  This driver inherently
supports arbitrary MTUs that's only limited by our network stack.  If
the physical network device can't handle it you'll just get an ICMP
error back or fragmentation.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-01 Thread Rusty Russell
On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote:
> Add the Xen virtual network device driver.

(Herbert: there's a question for you: grep for Herbert)

OK, this is a remarkably non-trivial driver.  If the v0.1 of the driver
had been in the kernel, I'm sure it would have been about 1/4 the size
and far easier to review 8(

Nonetheless, I have scoured the entire thing.  It's not actually too
bad.

> struct netfront_cb {
>   struct page *page;
>   unsigned offset;
>};

Comment this please.  This is used when assembling incoming skbs, and
may well correspond to skb_shinfo(skb)->frags[0].page, but not always it
seems.

> struct netfront_info {
...
>   struct net_device *netdev;
...
>   unsigned int evtchn, irq;

The netdev has an irq field already.  Using it will probably help
ifconfig output, too.

>   u8 mac[ETH_ALEN];

You simply copy this into netdev->dev_addr: put it on the stack instead?

>   /*
>* {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs
>* is an index into a chain of free entries.
>*/
>   struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];

This screams "union" to me, since you're stuffing ints into pointers.

> #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)

This seems not to be used here, yet it's declared in the middle of the struct?

>   grant_ref_t gref_tx_head;
>   grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
>   grant_ref_t gref_rx_head;
>   grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];

There seems to be a correspondence between tx_skbs[], gref_tx_head and
grant_tx_ref[] - perhaps group them together with a nice comment?
Similarly the rx side.

> +/*
> + * Implement our own carrier flag: the network stack's version causes delays
> + * when the carrier is re-enabled (in particular, dev_activate() may not
> + * immediately be called, which can cause packet loss).
> + */
> +#define netfront_carrier_on(netif)   ((netif)->carrier = 1)
> +#define netfront_carrier_off(netif)  ((netif)->carrier = 0)
> +#define netfront_carrier_ok(netif)   ((netif)->carrier)

Well, you only call netfront_carrier_on() from one place, so it's pretty
easy to do "netif_carrier_on(); dev_activate();" there.

I don't think this is critical though.

> + id  = txrsp->id;
> + skb = np->tx_skbs[id];
> + if (unlikely(gnttab_query_foreign_access(
> + np->grant_tx_ref[id]) != 0)) {
> + printk(KERN_ALERT "xennet_tx_buf_gc: warning "
> +"-- grant still in use by backend "
> +"domain.\n");
> + BUG();
> + }

I shall resist the urge to point out the can of worms that Xen opened by
trying to allow domains to directly access each others' memory.

>   if ( nr_flips != 0 ) {

Style.

> +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
> +   struct netif_tx_request *tx)

This needs a big comment.  AFAICT, entries in the ring cannot cross page
boundaries.  And gso means that you have to put the first (partial) page
of the packet in the ring first, with the NETTXF_extra_info flag, then
the GSO stuff, then the rest of the packet.  This results in this
strange xennet_make_frags which does everything after the first packet
page (which may be only *part* of the skb head).

This also complicates xennet_get_responses(), where the loop is awkward
because it has to get the first bit, then do the loop.

>   skb_queue_head_init();
>   skb_queue_head_init();
>   skb_queue_head_init();

I'd love a comment explaining exactly what these three queues are used
for.  It seems that rxq is the queue of received skbs (the result), tmpq
is parts of the current skb for multi-fragment skbs, and errq is skbs to
be discarded, which are kept around during the function because ? (we
may need to unmap the pages?)

> + /*
> +  * Truesize must approximates the size of true data plus

s/approximates/approximate/ or s/must //.

> +/*
> + * Nothing to do here. Virtual interface is point-to-point and the
> + * physical interface is probably promiscuous anyway.
> + */
> +static void xennet_set_multicast_list(struct net_device *dev)
> +{
> +}

Hmm, you can just leave this as NULL then.  It will fail if someone
tries to set multicast on it, but that's probably correct behaviour.

> +static int xennet_change_mtu(struct net_device *dev, int mtu)
> +{
> + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
> +
> + if (mtu > max)
> + return -EINVAL;
> + dev->mtu = mtu;
> + return 0;
> +}

This seems odd to me: just because a device does TSO should we really
allow huge mtu settings?  Herbert?

> + /* Initialise {tx,rx}_skbs as a free chain containing every entry. */
> + for (i = 0; i <= NET_TX_RING_SIZE; i++) {

Re: netfront for review

2007-05-01 Thread Rusty Russell
On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote:
 Add the Xen virtual network device driver.

(Herbert: there's a question for you: grep for Herbert)

OK, this is a remarkably non-trivial driver.  If the v0.1 of the driver
had been in the kernel, I'm sure it would have been about 1/4 the size
and far easier to review 8(

Nonetheless, I have scoured the entire thing.  It's not actually too
bad.

 struct netfront_cb {
   struct page *page;
   unsigned offset;
};

Comment this please.  This is used when assembling incoming skbs, and
may well correspond to skb_shinfo(skb)-frags[0].page, but not always it
seems.

 struct netfront_info {
...
   struct net_device *netdev;
...
   unsigned int evtchn, irq;

The netdev has an irq field already.  Using it will probably help
ifconfig output, too.

   u8 mac[ETH_ALEN];

You simply copy this into netdev-dev_addr: put it on the stack instead?

   /*
* {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs
* is an index into a chain of free entries.
*/
   struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];

This screams union to me, since you're stuffing ints into pointers.

 #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)

This seems not to be used here, yet it's declared in the middle of the struct?

   grant_ref_t gref_tx_head;
   grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
   grant_ref_t gref_rx_head;
   grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];

There seems to be a correspondence between tx_skbs[], gref_tx_head and
grant_tx_ref[] - perhaps group them together with a nice comment?
Similarly the rx side.

 +/*
 + * Implement our own carrier flag: the network stack's version causes delays
 + * when the carrier is re-enabled (in particular, dev_activate() may not
 + * immediately be called, which can cause packet loss).
 + */
 +#define netfront_carrier_on(netif)   ((netif)-carrier = 1)
 +#define netfront_carrier_off(netif)  ((netif)-carrier = 0)
 +#define netfront_carrier_ok(netif)   ((netif)-carrier)

Well, you only call netfront_carrier_on() from one place, so it's pretty
easy to do netif_carrier_on(); dev_activate(); there.

I don't think this is critical though.

 + id  = txrsp-id;
 + skb = np-tx_skbs[id];
 + if (unlikely(gnttab_query_foreign_access(
 + np-grant_tx_ref[id]) != 0)) {
 + printk(KERN_ALERT xennet_tx_buf_gc: warning 
 +-- grant still in use by backend 
 +domain.\n);
 + BUG();
 + }

I shall resist the urge to point out the can of worms that Xen opened by
trying to allow domains to directly access each others' memory.

   if ( nr_flips != 0 ) {

Style.

 +static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 +   struct netif_tx_request *tx)

This needs a big comment.  AFAICT, entries in the ring cannot cross page
boundaries.  And gso means that you have to put the first (partial) page
of the packet in the ring first, with the NETTXF_extra_info flag, then
the GSO stuff, then the rest of the packet.  This results in this
strange xennet_make_frags which does everything after the first packet
page (which may be only *part* of the skb head).

This also complicates xennet_get_responses(), where the loop is awkward
because it has to get the first bit, then do the loop.

   skb_queue_head_init(rxq);
   skb_queue_head_init(errq);
   skb_queue_head_init(tmpq);

I'd love a comment explaining exactly what these three queues are used
for.  It seems that rxq is the queue of received skbs (the result), tmpq
is parts of the current skb for multi-fragment skbs, and errq is skbs to
be discarded, which are kept around during the function because ? (we
may need to unmap the pages?)

 + /*
 +  * Truesize must approximates the size of true data plus

s/approximates/approximate/ or s/must //.

 +/*
 + * Nothing to do here. Virtual interface is point-to-point and the
 + * physical interface is probably promiscuous anyway.
 + */
 +static void xennet_set_multicast_list(struct net_device *dev)
 +{
 +}

Hmm, you can just leave this as NULL then.  It will fail if someone
tries to set multicast on it, but that's probably correct behaviour.

 +static int xennet_change_mtu(struct net_device *dev, int mtu)
 +{
 + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
 +
 + if (mtu  max)
 + return -EINVAL;
 + dev-mtu = mtu;
 + return 0;
 +}

This seems odd to me: just because a device does TSO should we really
allow huge mtu settings?  Herbert?

 + /* Initialise {tx,rx}_skbs as a free chain containing every entry. */
 + for (i = 0; i = NET_TX_RING_SIZE; i++) {
 + np-tx_skbs[i] = (void *)((unsigned long) i+1);
 +

Re: netfront for review

2007-05-01 Thread Herbert Xu
On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote:

  +static int xennet_change_mtu(struct net_device *dev, int mtu)
  +{
  +   int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
  +
  +   if (mtu  max)
  +   return -EINVAL;
  +   dev-mtu = mtu;
  +   return 0;
  +}
 
 This seems odd to me: just because a device does TSO should we really
 allow huge mtu settings?  Herbert?

Actually this has nothing to do with TSO/GSO.  This driver inherently
supports arbitrary MTUs that's only limited by our network stack.  If
the physical network device can't handle it you'll just get an ICMP
error back or fragmentation.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-01 Thread Jeremy Fitzhardinge
Rusty Russell wrote:
 On Tue, 2007-05-01 at 17:08 -0700, Jeremy Fitzhardinge wrote:
   
 Add the Xen virtual network device driver.
 

 (Herbert: there's a question for you: grep for Herbert)

 OK, this is a remarkably non-trivial driver.  If the v0.1 of the driver
 had been in the kernel, I'm sure it would have been about 1/4 the size
 and far easier to review 8(

 Nonetheless, I have scoured the entire thing.  It's not actually too
 bad.
   

Lovely, thanks.  I'm at the double disadvantage of not really knowing
how the network stack works or precisely how Xen does networking, but
the homework exercises (comment this) will definitely help my education.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: netfront for review

2007-05-01 Thread Rusty Russell
On Wed, 2007-05-02 at 13:51 +1000, Herbert Xu wrote:
 On Wed, May 02, 2007 at 01:37:13PM +1000, Rusty Russell wrote:
 
   +static int xennet_change_mtu(struct net_device *dev, int mtu)
   +{
   + int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
 
  This seems odd to me: just because a device does TSO should we really
  allow huge mtu settings?  Herbert?
 
 Actually this has nothing to do with TSO/GSO.  This driver inherently
 supports arbitrary MTUs that's only limited by our network stack.  If
 the physical network device can't handle it you'll just get an ICMP
 error back or fragmentation.

Oops, I misread xennet_can_sg as a test for GSO.

Thanks for the clarification,
Rusty.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/