Re: [PATCH] [TCP]: Fix logic breakage due to DSACK separation

2007-06-15 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Sat, 16 Jun 2007 02:04:25 +0300 (EEST)

> On Fri, 15 Jun 2007, David Miller wrote:
> 
> > Patch applied.
> 
> ...I think it should go to stable as well.

I thought Baruch's DSACK seperation that created this problem didn't
go in until 2.6.22?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/2] 2.6.22-rc4: known regressions v3

2007-06-15 Thread Linus Torvalds


On Wed, 13 Jun 2007, Michal Piotrowski wrote:
> 
> Subject: hibernate(?) fails totally - regression
> References : http://lkml.org/lkml/2007/6/1/401
> Submitter  : David Greaves <[EMAIL PROTECTED]>
> Handled-By : Rafael J. Wysocki <[EMAIL PROTECTED]>
> Caused-By  : Tejun Heo <[EMAIL PROTECTED]>
> commit 9666f4009c22f6520ac3fb8a19c9e32ab973e828
> Status : problem is being debugged

Ahh. This is fixed (fix by Tejun, confirmed by David), and the fix has 
been merged. It's commit bc90ba093a, in case anybody cares.

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP]: Fix logic breakage due to DSACK separation

2007-06-15 Thread Ilpo Järvinen
On Fri, 15 Jun 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Fri, 15 Jun 2007 16:07:37 +0300 (EEST)
> 
> > Commit 6f74651ae626ec672028587bc700538076dfbefb is found guilty
> > of breaking DSACK counting, which should be done only for the
> > SACK block reported by the DSACK instead of every SACK block
> > that is received along with DSACK information.
> > 
> > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> 
> Thanks for performing such rigorous audits and finding
> regressions like this!

They just come across, one thing leads to another, which again leads to 
another and so on... ...I was just trying to address the concerns you 
noted about tcp-2.6 patchset a while ago and came across other issues...

There are still some things I must think carefully in sacktag processing 
since it does not validate start_seq and end_seq at all which can be 
abused currently at least in tcp-2.6. ...I would rather put end to the 
whole russian roulette in tcp-2.6 sacktag rather than fix/think individual 
cases and leave future modifications of it similarily hazardous. It's not 
very clear to me how to handle all possible cases of invalid SACK blocks 
though, perhaps TCP should just ignore such sack blocks without doing 
any processing based on them, i.e., ignore them whenever start_seq-end_seq 
does not fully fit to snd_una-snd_nxt (expect DSACK of course, which 
should be processed if it's between undo_marker-snd_nxt). Do you have any 
thoughts about this?

It seems to me that net-2.6 is safe in this respect (probably just by 
accident) but the analysis wasn't that trivial, my main concern was 
tcp_fragment's pkt_len argument, if it ever becomes > skb->len, a boom 
would result (and perhaps zero has similar issues)! I couldn't find 
breakage analytically (but I could be wrong in it due to mind-boggling 
number of negations :-)) nor by bruteforcing seqno combinations near 0, 
skb->len and 2^31 wrap.


> Patch applied.

...I think it should go to stable as well.


-- 
 i.

Re: [RFC: 2.6 patch] net/rxrpc/ar-output.c: remove dead code

2007-06-15 Thread David Miller
From: Adrian Bunk <[EMAIL PROTECTED]>
Date: Fri, 15 Jun 2007 23:45:20 +0200

> This patch removes dead code spotted by the Coverity checker.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

I'll let David H. decide how to handle these cleanups
and simplifications.  Thanks Adrian.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] net/rxrpc/ar-connection.c: fix NULL dereference

2007-06-15 Thread David Miller
From: Adrian Bunk <[EMAIL PROTECTED]>
Date: Fri, 15 Jun 2007 23:45:13 +0200

> This patch fixes a NULL dereference spotted by the Coverity checker.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Looks good to me, patch applied, thanks Adrian.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP]: Fix logic breakage due to DSACK separation

2007-06-15 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Fri, 15 Jun 2007 16:07:37 +0300 (EEST)

> Commit 6f74651ae626ec672028587bc700538076dfbefb is found guilty
> of breaking DSACK counting, which should be done only for the
> SACK block reported by the DSACK instead of every SACK block
> that is received along with DSACK information.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Thanks for performing such rigorous audits and finding
regressions like this!

Patch applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6] addrconf: Fix IPv6 on tuntap tunnels

2007-06-15 Thread David Miller
From: Vlad Yasevich <[EMAIL PROTECTED]>
Date: Fri, 15 Jun 2007 16:42:22 -0400

> No, the questions should really be:
> 
> 1.  Is IPV6 supported over this media type.
>   yes: got to 2
>   no:  stop
> 
> 2.  Is the device MTU >= IPV6_MIN_MTU
>   yes: continue
>   no:  stop
> 
> Autoconfiguration is a layer on top of IPv6.  Whether it's enabled
> or not should not dictate whether IPv6 addressed may be configured or not.

Sounds good to me, patches? :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6] [TCP]: Congestion control API RTT sampling fix

2007-06-15 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 15 Jun 2007 07:32:55 -0700

> On Fri, 15 Jun 2007 12:30:12 +0300 (EEST)
> "Ilpo Järvinen" <[EMAIL PROTECTED]> wrote:
> 
> > Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT
> > sampling of congestion control modules. Inaccurate timestamps
> > could be fed to them without providing any way for them to
> > identify such cases. Previously RTT sampler was called only if
> > FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
> > timestamps nicely. In addition, the new behavior could give an
> > invalid timestamp (zero) to RTT sampler if only skbs with
> > TCPCB_RETRANS were ACKed. This solves both problems.
> > 
> > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> >
> Yes, let's get this in 2.6.22

Agreed, patch applied, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22-rc4 hibernate disables skge wol

2007-06-15 Thread Rafael J. Wysocki
On Friday, 15 June 2007 23:50, David Greaves wrote:
> I've started a new thread here since the old one got somewhat hijacked.
> 
> Rafael J. Wysocki wrote:
>  > On Friday, 1 June 2007 23:23, David Greaves wrote:
>  >> Not a regression though, it does it in 2.6.21
>  >>
>  >> If I cause the system to save state to disk then whilst off it no longer
>  >> responds to g-wol.
>  >
>  > Can you please try with the hibernation and suspend patch series from
>  >
>  > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc3/patches/
>  >
>  > applied?
>  >
>  > Greetings,
>  > Rafael
>  >
>  >
> Now, IIRC, some time ago you asked me to try some patches :)
> 
> So I applied them (and Tejun's fixes as per the old thread) to rc4 (which 
> seems 
> to include a couple already).
> 
> Hibernate/resume works but although WOL works on an init 0, it doesn't work 
> on a 
> hibernated system :(

Well, I have one idea, but I'd like the recent paches currently in -mm to
settle down before trying it. ;-)

In the meantime, you may want to open a bugzilla entry related to this issue
(please add my address to the CC list if you do that).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.22-rc4 hibernate disables skge wol

2007-06-15 Thread David Greaves

I've started a new thread here since the old one got somewhat hijacked.

Rafael J. Wysocki wrote:
> On Friday, 1 June 2007 23:23, David Greaves wrote:
>> Not a regression though, it does it in 2.6.21
>>
>> If I cause the system to save state to disk then whilst off it no longer
>> responds to g-wol.
>
> Can you please try with the hibernation and suspend patch series from
>
> http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc3/patches/
>
> applied?
>
> Greetings,
> Rafael
>
>
Now, IIRC, some time ago you asked me to try some patches :)

So I applied them (and Tejun's fixes as per the old thread) to rc4 (which seems 
to include a couple already).


Hibernate/resume works but although WOL works on an init 0, it doesn't work on a 
hibernated system :(


David

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC: 2.6 patch] net/rxrpc/ar-output.c: remove dead code

2007-06-15 Thread Adrian Bunk
This patch removes dead code spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 net/rxrpc/ar-output.c |   14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

--- linux-2.6.22-rc4-mm2/net/rxrpc/ar-output.c.old  2007-06-15 
21:38:02.0 +0200
+++ linux-2.6.22-rc4-mm2/net/rxrpc/ar-output.c  2007-06-15 21:38:47.0 
+0200
@@ -530,7 +530,7 @@ static int rxrpc_send_data(struct kiocb 
struct sock *sk = &rx->sk;
long timeo;
bool more;
-   int ret, ioc, segment, copied;
+   int ret, ioc, segment;
 
_enter(",,,{%zu},%zu", msg->msg_iovlen, len);
 
@@ -552,7 +552,6 @@ static int rxrpc_send_data(struct kiocb 
skb = call->tx_pending;
call->tx_pending = NULL;
 
-   copied = 0;
do {
int copy;
 
@@ -570,11 +569,11 @@ static int rxrpc_send_data(struct kiocb 
   call->acks_winsz) <= 0) {
ret = -EAGAIN;
if (msg->msg_flags & MSG_DONTWAIT)
-   goto maybe_error;
+   goto out;
ret = rxrpc_wait_for_tx_window(rx, call,
   &timeo);
if (ret < 0)
-   goto maybe_error;
+   goto out;
}
 
max = call->conn->trans->peer->maxdata;
@@ -596,7 +595,7 @@ static int rxrpc_send_data(struct kiocb 
skb = sock_alloc_send_skb(
sk, size, msg->msg_flags & MSG_DONTWAIT, &ret);
if (!skb)
-   goto maybe_error;
+   goto out;
 
rxrpc_new_skb(skb);
 
@@ -723,11 +722,6 @@ call_aborted:
_leave(" = %d", ret);
return ret;
 
-maybe_error:
-   if (copied)
-   ret = copied;
-   goto out;
-
 efault:
ret = -EFAULT;
goto out;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6 patch] net/rxrpc/ar-connection.c: fix NULL dereference

2007-06-15 Thread Adrian Bunk
This patch fixes a NULL dereference spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 net/rxrpc/ar-connection.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.22-rc4-mm2/net/rxrpc/ar-connection.c.old  2007-06-15 
23:05:26.0 +0200
+++ linux-2.6.22-rc4-mm2/net/rxrpc/ar-connection.c  2007-06-15 
23:06:42.0 +0200
@@ -200,29 +200,29 @@ static struct rxrpc_connection *rxrpc_al
if (conn) {
INIT_WORK(&conn->processor, &rxrpc_process_connection);
INIT_LIST_HEAD(&conn->bundle_link);
conn->calls = RB_ROOT;
skb_queue_head_init(&conn->rx_queue);
rwlock_init(&conn->lock);
spin_lock_init(&conn->state_lock);
atomic_set(&conn->usage, 1);
conn->debug_id = atomic_inc_return(&rxrpc_debug_id);
conn->avail_calls = RXRPC_MAXCALLS;
conn->size_align = 4;
conn->header_size = sizeof(struct rxrpc_header);
}
 
-   _leave(" = %p{%d}", conn, conn->debug_id);
+   _leave(" = %p{%d}", conn, conn ? conn->debug_id : 0);
return conn;
 }
 
 /*
  * assign a connection ID to a connection and add it to the transport's
  * connection lookup tree
  * - called with transport client lock held
  */
 static void rxrpc_assign_connection_id(struct rxrpc_connection *conn)
 {
struct rxrpc_connection *xconn;
struct rb_node *parent, **p;
__be32 epoch;
u32 real_conn_id;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6] addrconf: Fix IPv6 on tuntap tunnels

2007-06-15 Thread Vlad Yasevich
Oliver Hartkopp wrote:
> Hallo David, hello Herbert,
> 
> indeed i have some concerns about reverting the patch as i do not see
> that the MTU is the right thing to distinguish whether a netdevice is
> capable to have IPv4/IPv6. E.g. is decnet able to run IPv6?
> 
> IMHO the autoconf (in any case) should only handle netdevices that are
> capable to be auto configured (e.g. with IPv6).
> 
> So the question looks like:
> "Is this device capable to be auto configured with IPv6?"
> and not
> "Is the devices MTU >= IPV6_MIN_MTU ?"

No, the questions should really be:

1.  Is IPV6 supported over this media type.
yes: got to 2
no:  stop

2.  Is the device MTU >= IPV6_MIN_MTU
yes: continue
no:  stop

Autoconfiguration is a layer on top of IPv6.  Whether it's enabled
or not should not dictate whether IPv6 addressed may be configured or not.

-vlad
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: ps3: gigabit ethernet driver for PS3

2007-06-15 Thread Geoff Levand
Hi Dan.

Dan Williams wrote:
> On Fri, 2007-06-15 at 10:16 -0700, Geoff Levand wrote:
>> This all seems fine, but what is causing us trouble is that
>> during the review of the wireless part Dan Williams made many
>> non-trivial recommendations, and so we decided we need to re-work
> 
> sorry :)

Its not just because of you...  We found some other things lacking
and so we plan to do some other work on it.

-Geoff

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: ps3: gigabit ethernet driver for PS3

2007-06-15 Thread Dan Williams
On Fri, 2007-06-15 at 10:16 -0700, Geoff Levand wrote:
> Jeff Garzik wrote:
> > MOKUNO Masakazu wrote:
> >> +config GELIC_NET
> >> +  tristate "PS3 Gigabit Ethernet driver"
> >> +  depends on PPC_PS3
> >> +  help
> >> +This driver supports the Gigabit Ethernet device on the
> >> +PS3 game console.
> >> +
> >> +To compile this driver as a module, choose M here: the
> >> +module will be called ps3_gelic.
> >> +
> >>  config GIANFAR
> >>tristate "Gianfar Ethernet"
> >>depends on 85xx || 83xx || PPC_86xx
> >> --- a/drivers/net/Makefile
> >> +++ b/drivers/net/Makefile
> >> @@ -60,6 +60,8 @@ obj-$(CONFIG_TIGON3) += tg3.o
> >>  obj-$(CONFIG_BNX2) += bnx2.o
> >>  spidernet-y += spider_net.o spider_net_ethtool.o
> >>  obj-$(CONFIG_SPIDER_NET) += spidernet.o sungem_phy.o
> >> +obj-$(CONFIG_GELIC_NET) += ps3_gelic.o
> >> +ps3_gelic-objs += gelic_net.o
> >>  obj-$(CONFIG_TC35815) += tc35815.o
> >>  obj-$(CONFIG_SKGE) += skge.o
> >>  obj-$(CONFIG_SKY2) += sky2.o
> > 
> > How about ps3_gige for the driver name.  Ditto DaveM's comments about 
> > cleanups here.
> 
> Hi Jeff.  
> 
> Based on your comments and those from Dave, I feel I need to explain a
> little about our system and what we plan for the network support.
> 
> First about the name.  The name 'gelic' comes from the underlying
> device's name.  The name 'gelic' is already a well known in public
> to be the PS3's network device (try google), so I think ps3_gelic
> is a good name to use here.
> 
> Regarding the device, Mokuno-san already mentioned it, but this
> is a virtual device which supports both Ether and wireless, so
> it is more that just an Ethernet device (I think we can change
> the Kconfig text to be more accurate).
> 
> Now, about the driver.  We have arranged the driver so that
> the wireless support can be enabled by a config option.  So
> when CONFIG_GELIC_WIRELESS=n you get ps3_gelic.ko with just
> Ether support, and when CONFIG_GELIC_WIRELESS=y, you get
> ps3_gelic.ko with both Ether and wireless support.  So in
> the makefile we would like to have this:
> 
>   obj-$(CONFIG_GELIC_NET) += ps3_gelic.o
>   gelic-$(CONFIG_GELIC_WIRELESS) += gelic_wireless.o
>   ps3_gelic-objs += gelic_net.o $(gelic-y)
> 
> This all seems fine, but what is causing us trouble is that
> during the review of the wireless part Dan Williams made many
> non-trivial recommendations, and so we decided we need to re-work

sorry :)

> the wireless and it won't be ready for 2.6.23. So, to avoid having 
> to either change module names, or change file names when we submit
> the wireless part for 2.6.24, we would like to have this for 2.6.23:
> 
>   obj-$(CONFIG_GELIC_NET) += ps3_gelic.o
>   ps3_gelic-objs += gelic_net.o
> 
> Does it make sense?  If you have a better idea, please let us know.
> 
> -Geoff
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Zhu Han

On 6/15/07, Kieran Mansley <[EMAIL PROTECTED]> wrote:


The lock protects the use_count variable.  The use_count variable
prevents the plugin module unloading while it is being used.  I couldn't
just use the lock to prevent the module unloading as the hook function
(i) might block (and holding a spin_lock would be rather antisocial)
(ii) might call back into netfront and try to take the lock again, which
would deadlock.



If the hook routine blocks on the other code path instead of tx/rx
path,  why not use a simple atomic reference count. When the reference
count reachs zero, free it. Considering you can synchronzie on tx/rx
path, the free will not happen under the critical code path. So the
uninitialize work could be done inside the free routine even if it
blocks.


I think that RCU would only work in this situation if the hook functions
didn't block,.


I agree.



Kieran









--
best regards,
hanzhu
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] qdisc_restart - readability changes, one bug fix.

2007-06-15 Thread Waskiewicz Jr, Peter P
> - "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and 
> the lockless
>   check should be removed, since driver will return 
> NETDEV_TX_LOCKED only
>   if lockless is true and driver has to do the locking. In 
> the original
>   code as well as the latest code, this code can result in a bug where
>   if LLTX is not set for a driver (lockless == 0) but the 
> driver is written
>   wrongly to do a trylock (despite LLTX being set), the driver returns
>   LOCKED. But since lockless is zero, the packet is requeue'd 
> instead of
>   calling collision code, issuing a warning and freeing up 
> the skb. This
>   skb will be retried with this driver next time, and the 
> same result will
>   ensue. Removing this check will catch these driver bugs 
> instead of hiding
>   the problem. I am keeping this change to readability section since :
>   a. it is confusing to check two things as it is; and
>   b. it is difficult to keep this check in the changed 
> 'switch' code.

I agree that the case shouldn't happen, and will only surface if the
driver is indeed buggy.  I've thought about this conditional being
removed for awhile, since it will protect against a poorly written
driver wrt locking, but then again a driver behaving like that shouldn't
be making it into the kernel.  That being said, out-of-tree drivers are
heavily used (we have an out-of-tree e1000), and something like this
could be missed.  But since that is the corner case of a crappy driver,
I'm ok with this being removed.

> 
> - Changed some names, like try_get_tx_pkt to dequeue_skb (as 
> that is the work
>   being done and easier to understand) and do_dev_requeue to 
> requeue_skb,

These, at a glance, look very similar to the qdisc ->enqueue() and
->dequeue() function pointers.  I know I had to look a few times to
realize they were separate calls through the qdisc and this new function
name.  Perhaps dequeue_skb() can become dev_dequeue_skb() and
requeue_skb() can be dev_requeue_skb()?  Something that helps
differentiate these from the function pointers of the qdisc_ops might
help distinguish what layer the operation is running on.

> +static inline int handle_dev_cpu_collision(struct sk_buff *skb,
> +struct net_device *dev,
> +struct Qdisc *q)
>  {
> - int ret = handle_dev_cpu_collision(dev);
> + int ret;
>  
> - if (ret == SCHED_TX_DROP) {
> + if (unlikely(dev->xmit_lock_owner == smp_processor_id())) {
> + /*
> +  * Same CPU holding the lock. It may be a transient
> +  * configuration error, when hard_start_xmit() 
> recurses. We
> +  * detect it by checking xmit owner and drop 
> the packet when
> +  * deadloop is detected. Return OK to try the next skb.
> +  */
>   kfree_skb(skb);
> - return qdisc_qlen(q);
> + if (net_ratelimit())
> + printk(KERN_DEBUG "Dead loop on netdevice %s, "
> +"fix it urgently!\n", dev->name);
> + ret = qdisc_qlen(q);
> + } else {
> + /*
> +  * Another cpu is holding lock, requeue & delay 
> xmits for
> +  * some time.
> +  */
> + __get_cpu_var(netdev_rx_stat).cpu_collision++;
> + ret = requeue_skb(skb, dev, q);
>   }
>  
> - return do_dev_requeue(skb, dev, q);
> + return ret;
>  }

I like the single return here.

>  static inline int qdisc_restart(struct net_device *dev)  {
>   struct Qdisc *q = dev->qdisc;
> - unsigned lockless = (dev->features & NETIF_F_LLTX);
>   struct sk_buff *skb;
> + unsigned lockless;
>   int ret;
>  
> - skb = try_get_tx_pkt(dev, q);
> - if (skb == NULL)
> + /* Dequeue packet */
> + if (unlikely((skb = dequeue_skb(dev, q)) == NULL))
>   return 0;

I know there's been discussion on the driver side of the world to stop
using unlikely() and likely() clauses.  I personally think it's ok in
situations like this where you have one corner-case conditional without
an else, but it's something to keep in mind when using them in
new/rewritten code like this.

>   if (!netif_queue_stopped(dev))
>   ret = dev_hard_start_xmit(skb, dev);

I thought you were removing this check of netif_queue_stopped(dev)?
Nevermind, I see the followup patch takes care of this.

> + switch (ret) {
> + case NETDEV_TX_OK:
> + /* Driver sent out skb successfully */
> + ret = qdisc_qlen(q);
> + break;
> +
> + case NETDEV_TX_LOCKED:
> + /* Driver try lock failed */
> + ret = handle_dev_cpu_collision(skb, dev, q);
> + break;
> +
> + default:
> + /* Driver returned NETDEV_TX_BUSY - requeue skb */
> + ret = requeue_skb(skb, dev, q);
> + break;
> + }

Ooo.  I like this; very clean and simpl

Re: [PATCH]: ps3: gigabit ethernet driver for PS3

2007-06-15 Thread Geoff Levand
Jeff Garzik wrote:
> MOKUNO Masakazu wrote:
>> +config GELIC_NET
>> +tristate "PS3 Gigabit Ethernet driver"
>> +depends on PPC_PS3
>> +help
>> +  This driver supports the Gigabit Ethernet device on the
>> +  PS3 game console.
>> +
>> +  To compile this driver as a module, choose M here: the
>> +  module will be called ps3_gelic.
>> +
>>  config GIANFAR
>>  tristate "Gianfar Ethernet"
>>  depends on 85xx || 83xx || PPC_86xx
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -60,6 +60,8 @@ obj-$(CONFIG_TIGON3) += tg3.o
>>  obj-$(CONFIG_BNX2) += bnx2.o
>>  spidernet-y += spider_net.o spider_net_ethtool.o
>>  obj-$(CONFIG_SPIDER_NET) += spidernet.o sungem_phy.o
>> +obj-$(CONFIG_GELIC_NET) += ps3_gelic.o
>> +ps3_gelic-objs += gelic_net.o
>>  obj-$(CONFIG_TC35815) += tc35815.o
>>  obj-$(CONFIG_SKGE) += skge.o
>>  obj-$(CONFIG_SKY2) += sky2.o
> 
> How about ps3_gige for the driver name.  Ditto DaveM's comments about 
> cleanups here.

Hi Jeff.  

Based on your comments and those from Dave, I feel I need to explain a
little about our system and what we plan for the network support.

First about the name.  The name 'gelic' comes from the underlying
device's name.  The name 'gelic' is already a well known in public
to be the PS3's network device (try google), so I think ps3_gelic
is a good name to use here.

Regarding the device, Mokuno-san already mentioned it, but this
is a virtual device which supports both Ether and wireless, so
it is more that just an Ethernet device (I think we can change
the Kconfig text to be more accurate).

Now, about the driver.  We have arranged the driver so that
the wireless support can be enabled by a config option.  So
when CONFIG_GELIC_WIRELESS=n you get ps3_gelic.ko with just
Ether support, and when CONFIG_GELIC_WIRELESS=y, you get
ps3_gelic.ko with both Ether and wireless support.  So in
the makefile we would like to have this:

  obj-$(CONFIG_GELIC_NET) += ps3_gelic.o
  gelic-$(CONFIG_GELIC_WIRELESS) += gelic_wireless.o
  ps3_gelic-objs += gelic_net.o $(gelic-y)

This all seems fine, but what is causing us trouble is that
during the review of the wireless part Dan Williams made many
non-trivial recommendations, and so we decided we need to re-work
the wireless and it won't be ready for 2.6.23. So, to avoid having 
to either change module names, or change file names when we submit
the wireless part for 2.6.24, we would like to have this for 2.6.23:

  obj-$(CONFIG_GELIC_NET) += ps3_gelic.o
  ps3_gelic-objs += gelic_net.o

Does it make sense?  If you have a better idea, please let us know.

-Geoff

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Keir Fraser
On 15/6/07 17:22, "Kieran Mansley" <[EMAIL PROTECTED]> wrote:

> The lock protects the use_count variable.

Yes, that's one thing I noticed -- can you use atomic_t for reference counts
and hence reduce the number of times you need to lock/unlock? At least the
open-coded lock-decrement-test-maybe-free-unlock sequences could be
abstracted into a put_foo() function.

 -- Keir


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Kieran Mansley
On Fri, 2007-06-15 at 11:59 -0400, Zhu Han wrote:
> Hi, Kieran,
> 
> I'm just wonder why you try to acquire the lock and increase the
> hooks_usecount each time when you use the hook routine. Is there any
> generic ways to synchronze the code path using hook routines and
> netfront_accelerator_unloaded, considering you can synchronize the
> tx/rx data path easily.

The lock protects the use_count variable.  The use_count variable
prevents the plugin module unloading while it is being used.  I couldn't
just use the lock to prevent the module unloading as the hook function
(i) might block (and holding a spin_lock would be rather antisocial)
(ii) might call back into netfront and try to take the lock again, which
would deadlock.

The data path hooks do not block, and are already protected by locks, so
these are also taken when trying to unload the plugin module.  For this
reason it's not necessary to use the hooks_usecount on the data path.

I think that RCU would only work in this situation if the hook functions
didn't block, and wouldn't affect the data path locking overhead as it
wouldn't be necessary there.  

Kieran





-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Stephen Hemminger
On Fri, 15 Jun 2007 11:59:43 -0400
"Zhu Han" <[EMAIL PROTECTED]> wrote:

> Hi, Kieran,
> 
> I'm just wonder why you try to acquire the lock and increase the
> hooks_usecount each time when you use the hook routine. Is there any
> generic ways to synchronze the code path using hook routines and
> netfront_accelerator_unloaded, considering you can synchronize the
> 

Learn to use RCU for this. It would reduce the lock overhead.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Zhu Han

Hi, Kieran,

I'm just wonder why you try to acquire the lock and increase the
hooks_usecount each time when you use the hook routine. Is there any
generic ways to synchronze the code path using hook routines and
netfront_accelerator_unloaded, considering you can synchronize the
tx/rx data path easily.

On 6/15/07, Kieran Mansley <[EMAIL PROTECTED]> wrote:

Frontend net driver acceleration

diff -r cd3ade350f3f drivers/xen/netfront/netfront.c
--- a/drivers/xen/netfront/netfront.c   Thu Jun 14 15:04:32 2007 +0100
+++ b/drivers/xen/netfront/netfront.c   Fri Jun 15 09:34:41 2007 +0100
@@ -3,6 +3,7 @@
  *
  * Copyright (c) 2002-2005, K A Fraser
  * Copyright (c) 2005, XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -47,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -73,6 +75,8 @@ struct netfront_cb {
 };

 #define NETFRONT_SKB_CB(skb)   ((struct netfront_cb *)((skb)->cb))
+
+#include "netfront.h"

 /*
  * Mutually-exclusive module options to select receive data path:
@@ -144,57 +148,6 @@ static inline int netif_needs_gso(struct

 #define GRANT_INVALID_REF  0

-#define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE)
-#define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE)
-
-struct netfront_info {
-   struct list_head list;
-   struct net_device *netdev;
-
-   struct net_device_stats stats;
-
-   struct netif_tx_front_ring tx;
-   struct netif_rx_front_ring rx;
-
-   spinlock_t   tx_lock;
-   spinlock_t   rx_lock;
-
-   unsigned int irq;
-   unsigned int copying_receiver;
-   unsigned int carrier;
-
-   /* Receive-ring batched refills. */
-#define RX_MIN_TARGET 8
-#define RX_DFL_MIN_TARGET 64
-#define RX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
-   unsigned rx_min_target, rx_max_target, rx_target;
-   struct sk_buff_head rx_batch;
-
-   struct timer_list rx_refill_timer;
-
-   /*
-* {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs
-* is an index into a chain of free entries.
-*/
-   struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];
-   struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
-
-#define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
-   grant_ref_t gref_tx_head;
-   grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
-   grant_ref_t gref_rx_head;
-   grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
-
-   struct xenbus_device *xbdev;
-   int tx_ring_ref;
-   int rx_ring_ref;
-   u8 mac[ETH_ALEN];
-
-   unsigned long rx_pfn_array[NET_RX_RING_SIZE];
-   struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
-   struct mmu_update rx_mmu[NET_RX_RING_SIZE];
-};
-
 struct netfront_rx_info {
struct netif_rx_response rx;
struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1];
@@ -278,6 +231,369 @@ static void xennet_sysfs_delif(struct ne
 #define xennet_sysfs_delif(dev) do { } while(0)
 #endif

+/*
+ * List of all netfront accelerator plugin modules available.  Each
+ * list entry is of type struct netfront_accelerator.
+ */
+static struct list_head accelerators_list;
+/*
+ * Lock to protect access to accelerators_list, and also used to
+ * protect the hooks_usecount field in struct netfront_accelerator
+ * against concurrent access
+ */
+static spinlock_t accelerators_lock;
+
+/*
+ * Safely remove the accelerator function hooks from a netfront state.
+ * Must only be called when there are no current users of the hooks.
+ */
+static void accelerator_remove_hooks(struct netfront_accelerator *accelerator)
+{
+struct netfront_accel_vif_state *vif_state;
+
+list_for_each_entry( vif_state,
+ &accelerator->vif_states,
+ link ) {
+/* Make sure there are no data path operations going on */
+netif_poll_disable(vif_state->np->netdev);
+netif_tx_lock_bh(vif_state->np->netdev);
+
+/*
+ * Remove the hooks, but leave the vif_state on the
+ * accelerator's list as that signifies this vif is
+ * interested in using that accelerator if it becomes
+ * available again
+ */
+vif_state->hooks = NULL;
+
+netif_tx_unlock_bh(vif_state->np->netdev);
+netif_poll_enable(vif_state->np->netdev);
+}
+
+accelerator->hooks = NULL;
+
+/* Signal that all users of hooks are done */
+up(&accelerator->exit_semaphore);
+}
+
+
+/*
+ * Compare a frontend description string against an accelerator to see
+ * if they match.  Would ultimately be nice to replace the string with
+ * a unique numeric identifier for each accelerator.
+ */
+static int match_acceler

Re: 2.4.35-pre1: new e1000 driver breaks old hardware

2007-06-15 Thread Kok, Auke

Wolfgang Nothdurft wrote:

Hi,

with the new e1000 driver version 7.3.20 the onboard gigabit nic 82547EI
(8086:1019) doesn't work correctly.
After transferring about 100 megabytes over a gigabit link the transfer
stopped and I have to reinit the link either by doing a ifconfig down/up
or unplugging the network cable.
The RxIntDelay is set to 0 like described in the docu. Playing with this
Parameter only increases the amount of traffic to be send and the error
occurs later.


have you tried (the default) 8000 ? Do you get the same problem when you run 
2.6.21 with 7.5.5.1 or 7.3.20 ?



This happens also with the 7.4.35 and 7.5.5 driver from the intel side.
The driver 5.7.6 from kernel 2.4.33.3 and the 6.1.16 from intel works
very well on this hardware.
Also other gigabit nics we use didn't have this problem.
Is this issue already known?


this is the first time I have heard this issue.


Is there any solution yet?


I'm not that good :)

Can you file a bugreport on e1000.sf.net and attach the usual (ethtool -e, 
dmesg, ifconfig -a, lspci -vv) debugging output and problem description for us? 
I'll try to see if I can have our labs setup a repro case, which might be hard 
given the adapter type, but we will do our best.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-06-15 Thread Jeff Garzik

Milton Miller wrote:


On Jun 6, 2007, at 4:28 AM, Milton Miller wrote:



On Jun 5, 2007, at 9:26 PM, Kok, Auke wrote:


Kok, Auke wrote:


Hmm git-revert seems to do the job right. I checked it with git-show 
| patch -p1 -R and the results look OK. The two patches on top of 
the one we want to revert are unrelated enough to apply (manually it 
shows some fuzz, but otherwise it's OK).
Jeff, please `git-revert d52df4a35af569071fda3f4eb08e47cc7023f094` 
to revert the following patch for now:

---
commit  d52df4a35af569071fda3f4eb08e47cc7023f094
Author: Scott Feldman <[EMAIL PROTECTED]>
Date:   Wed Nov 9 02:18:52 2005 -0500
 [netdrvr e100] experiment with doing RX in a similar manner to 
eepro100
 I was going to say that eepro100's speedo_rx_link() does the 
same DMA
 abuse as e100, but then I noticed one little detail: eepro100 
sets  both
 EL (end of list) and S (suspend) bits in the RFD as it chains 
it  to the
 RFD list.  e100 was only setting the EL bit.  Hmmm, that's  
interesting.
 That means that if HW reads a RFD with the S-bit set,  it'll 
process
 that RFD and then suspend the receive unit.  The  receive unit 
will
 resume when SW clears the S-bit.  There is no need  for SW to 
restart
 the receive unit.  Which means a lot of the receive  unit state 
tracking

 code in the driver goes away.
 So here's a patch against 2.6.14.  (Sorry for inlining it; the 
mailer

 I'm using now will mess with the word wrap).  I can't test this on
 XScale (unless someone has an e100 module for Gumstix :) .  It 
should
 be doing exactly what eepro100 does with RFDs.  I don't believe 
this
 change will introduce a performance hit because the S-bit and 
EL-bit  go
 hand-in-hand meaning if we're going to suspend because of the 
S- bit,
 we're on the last resource anyway, so we'll have to wait for SW 
 to

 replenish.
 (cherry picked from 29e79da9495261119e3b2e4e7c72507348e75976 
commit)

---


A little bit more is needed to explain why we're reverting it for 
now. Jeff, please insert this into the revert commit.


Auke

--
This patch attempted to fix e100 for non-cache coherent memory 
architectures by using the cb style code that eepro100 had and using 
the EL and s bits from the RFD list. Unfortunately the hardware 
doesn't work exactly like this and therefore this patch actually 
breaks e100 on those systems.


on all systems.  (Both the &| typo and the removed restart logic).

Reverting the change brings it back to the previously known good 
state for 2.6.22. The pending rewrite in progress to this code can 
then be safely merged later.


Signed-off-by: Auke Kok <[EMAIL PROTECTED]>



Jeff I don't see this in netdev upstream-fixes.  Are you waiting on Auke 
to respond to this?


It should now be upstream, as of a day or two ago.

Sorry for the delaying, was travelling, and wound up accidentally losing 
Auke's commit description.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6] [TCP]: Congestion control API RTT sampling fix

2007-06-15 Thread Stephen Hemminger
On Fri, 15 Jun 2007 12:30:12 +0300 (EEST)
"Ilpo Järvinen" <[EMAIL PROTECTED]> wrote:

> Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT
> sampling of congestion control modules. Inaccurate timestamps
> could be fed to them without providing any way for them to
> identify such cases. Previously RTT sampler was called only if
> FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
> timestamps nicely. In addition, the new behavior could give an
> invalid timestamp (zero) to RTT sampler if only skbs with
> TCPCB_RETRANS were ACKed. This solves both problems.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
>
Yes, let's get this in 2.6.22
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive v2

2007-06-15 Thread Jens Axboe
On Fri, Jun 15 2007, Evgeniy Polyakov wrote:
> On Fri, Jun 15, 2007 at 11:34:30AM +0200, Jens Axboe ([EMAIL PROTECTED]) 
> wrote:
> > > Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
> > > was 2 there was only 1 free slot in pipe_buffer.
> > 
> > Ah duh, indeed, it is a dumb bug indeed. This should work.
> 
> Yep, this one works too.

Great, thanks for testing!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [TCP]: Fix logic breakage due to DSACK separation

2007-06-15 Thread Ilpo Järvinen
Commit 6f74651ae626ec672028587bc700538076dfbefb is found guilty
of breaking DSACK counting, which should be done only for the
SACK block reported by the DSACK instead of every SACK block
that is received along with DSACK information.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_input.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d6d0f9b..c6a363b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -953,7 +953,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
int prior_fackets;
u32 lost_retrans = 0;
int flag = 0;
-   int dup_sack = 0;
+   int found_dup_sack = 0;
int cached_fack_count;
int i;
int first_sack_index;
@@ -964,20 +964,20 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 
/* Check for D-SACK. */
if (before(ntohl(sp[0].start_seq), TCP_SKB_CB(ack_skb)->ack_seq)) {
-   dup_sack = 1;
+   found_dup_sack = 1;
tp->rx_opt.sack_ok |= 4;
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
} else if (num_sacks > 1 &&
!after(ntohl(sp[0].end_seq), ntohl(sp[1].end_seq)) &&
!before(ntohl(sp[0].start_seq), 
ntohl(sp[1].start_seq))) {
-   dup_sack = 1;
+   found_dup_sack = 1;
tp->rx_opt.sack_ok |= 4;
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
}
 
/* D-SACK for already forgotten data...
 * Do dumb counting. */
-   if (dup_sack &&
+   if (found_dup_sack &&
!after(ntohl(sp[0].end_seq), prior_snd_una) &&
after(ntohl(sp[0].end_seq), tp->undo_marker))
tp->undo_retrans--;
@@ -1058,6 +1058,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
__u32 start_seq = ntohl(sp->start_seq);
__u32 end_seq = ntohl(sp->end_seq);
int fack_count;
+   int dup_sack = (found_dup_sack && (i == first_sack_index));
 
skb = cached_skb;
fack_count = cached_fack_count;
-- 
1.5.0.6


Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Keir Fraser



On 15/6/07 13:21, "Michael Buesch" <[EMAIL PROTECTED]> wrote:

>> True in general, but not the cases I've seen in this patchset, where 'foo'
>> is a predicate.
> 
> Ok, if foo is a variable containing an error code, it's
> better to return that error code. I assumed that foo is a
> variable containing some value (counter or something).

I mean specifically things like:
 if (be->accelerator == NULL)
return 1;
 else
return 0;

 K.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Michael Buesch
On Friday 15 June 2007 14:20:34 Keir Fraser wrote:
> 
> On 15/6/07 13:11, "Michael Buesch" <[EMAIL PROTECTED]> wrote:
> 
> >> No use of the following please:
> >> If (foo) return 1; else return 0;
> >> Is clearer as:
> >> Return foo;
> > 
> > But it's not the same.
> > return !!foo;
> > would be the same. And yes, it does matter:
> 
> True in general, but not the cases I've seen in this patchset, where 'foo'
> is a predicate.

Ok, if foo is a variable containing an error code, it's
better to return that error code. I assumed that foo is a
variable containing some value (counter or something).

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Keir Fraser



On 15/6/07 13:11, "Michael Buesch" <[EMAIL PROTECTED]> wrote:

>> No use of the following please:
>> If (foo) return 1; else return 0;
>> Is clearer as:
>> Return foo;
> 
> But it's not the same.
> return !!foo;
> would be the same. And yes, it does matter:

True in general, but not the cases I've seen in this patchset, where 'foo'
is a predicate.

 -- Keir


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Michael Buesch
On Friday 15 June 2007 13:26:03 Keir Fraser wrote:
> On 15/6/07 11:46, "Kieran Mansley" <[EMAIL PROTECTED]> wrote:
> 
> > This is a repost of some earlier patches to the xen-devel mailing list,
> > with a number of changes thanks to some useful suggestions from others.
> 
> The coding style needs fixing in various ways.
> 
> Hard tabs need to be used, no spaces inside brackets, but should include
> spaces between if/while/for and bracket, and bracket and brace:
> if (foo) {
> Not
> if( foo ){
> if(foo ) {
> Or various other possibilities.
> 
> No use of the following please:
> If (foo) return 1; else return 0;
> Is clearer as:
> Return foo;

But it's not the same.
return !!foo;
would be the same. And yes, it does matter:

int x(void)
{
unsigned long long v = 0xFF00ULL;
/*foo*/
return v;
}

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Keir Fraser
On 15/6/07 11:46, "Kieran Mansley" <[EMAIL PROTECTED]> wrote:

> This is a repost of some earlier patches to the xen-devel mailing list,
> with a number of changes thanks to some useful suggestions from others.

The coding style needs fixing in various ways.

Hard tabs need to be used, no spaces inside brackets, but should include
spaces between if/while/for and bracket, and bracket and brace:
if (foo) {
Not
if( foo ){
if(foo ) {
Or various other possibilities.

No use of the following please:
If (foo) return 1; else return 0;
Is clearer as:
Return foo;

There's a (good) idiom for error handling in Linux:
If (!error) {
page of code;
} else {
oh no, error, print and return;
}
Should be changed to:
If (error) {
handle error, bail if necessary
}
Page of code, no longer indented by error check;

Apart from the various coding style bits, accelerator_probe_vifs_on_load()
walks the vif_states list with no lock held. Is this safe against the list
changing?

 -- Keir


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH tcp-2.6 3/3] [TCP]: Move sack_ok access to obviously named funcs & cleanup

2007-06-15 Thread Ilpo Järvinen
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <[EMAIL PROTECTED]>

Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has users elsewhere
too for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.

Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
  enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack => I dropped it

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 include/net/tcp.h|   28 
 net/ipv4/tcp.c   |2 +-
 net/ipv4/tcp_input.c |  103 +-
 net/ipv4/tcp_minisocks.c |2 +-
 net/ipv4/tcp_output.c|6 +-
 net/ipv4/tcp_timer.c |2 +-
 6 files changed, 90 insertions(+), 53 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d12154a..dd9749e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -715,6 +715,34 @@ static inline void tcp_ca_event(struct sock *sk, const 
enum tcp_ca_event event)
icsk->icsk_ca_ops->cwnd_event(sk, event);
 }
 
+/* These functions determine how the current flow behaves in respect of SACK
+ * handling. SACK is negotiated with the peer, and therefore it can vary
+ * between different flows.
+ *
+ * tcp_is_sack - SACK enabled
+ * tcp_is_reno - No SACK
+ * tcp_is_fack - FACK enabled, implies SACK enabled
+ */
+static inline int tcp_is_sack(const struct tcp_sock *tp)
+{
+   return tp->rx_opt.sack_ok;
+}
+
+static inline int tcp_is_reno(const struct tcp_sock *tp)
+{
+   return !tcp_is_sack(tp);
+}
+
+static inline int tcp_is_fack(const struct tcp_sock *tp)
+{
+   return tp->rx_opt.sack_ok & 2;
+}
+
+static inline void tcp_enable_fack(struct tcp_sock *tp)
+{
+   tp->rx_opt.sack_ok |= 2;
+}
+
 /* This determines how many packets are "in the network" to the best
  * of our knowledge.  In many cases it is conservative, but where
  * detailed information is available from the receiver (via SACK
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bd4c295..e3de919 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2000,7 +2000,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 
if (tp->rx_opt.tstamp_ok)
info->tcpi_options |= TCPI_OPT_TIMESTAMPS;
-   if (tp->rx_opt.sack_ok)
+   if (tcp_is_sack(tp))
info->tcpi_options |= TCPI_OPT_SACK;
if (tp->rx_opt.wscale_ok) {
info->tcpi_options |= TCPI_OPT_WSCALE;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f3942c3..3efd4f5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -108,11 +108,6 @@ int sysctl_tcp_abc __read_mostly;
 #define FLAG_CA_ALERT  (FLAG_DATA_SACKED|FLAG_ECE)
 #define FLAG_FORWARD_PROGRESS  (FLAG_ACKED|FLAG_DATA_SACKED)
 
-#define IsReno(tp) ((tp)->rx_opt.sack_ok == 0)
-#define IsFack(tp) ((tp)->rx_opt.sack_ok & 2)
-#define IsDSack(tp) ((tp)->rx_opt.sack_ok & 4)
-#define Is3517Sack(tp) (!IsReno(tp) && !IsFack(tp))
-
 #define IsSackFrto() (sysctl_tcp_frto == 0x2)
 
 #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_PSH)
@@ -840,6 +835,21 @@ void tcp_enter_cwr(struct sock *sk, const int set_ssthresh)
}
 }
 
+/*
+ * Packet counting of FACK is based on in-order assumptions, therefore TCP
+ * disables it when reordering is detected
+ */
+static void tcp_disable_fack(struct tcp_sock *tp)
+{
+   tp->rx_opt.sack_ok &= ~2;
+}
+
+/* Take a notice that peer is sending DSACKs */
+static void tcp_dsack_seen(struct tcp_sock *tp)
+{
+   tp->rx_opt.sack_ok |= 4;
+}
+
 /* Initialize metrics on socket. */
 
 static void tcp_init_metrics(struct sock *sk)
@@ -861,7 +871,7 @@ static void tcp_init_metrics(struct sock *sk)
}
if (dst_metric(dst, RTAX_REORDERING) &&
tp->reordering != dst_metric(dst, RTAX_REORDERING)) {
-   tp->rx_opt.sack_ok &= ~2;
+   tcp_disable_fack(tp);
tp->reordering = dst_metric(dst, RTAX_REORDERING);
}
 
@@ -923,9 +933,9 @@ static void tcp_update_reordering(struct sock *sk, const 
int metric,
/* This exciting event is worth to be remembered. 8) */
if (ts)
NET_INC_STATS_BH(LINUX_MIB_TCPTSREORDER);
-   else if (IsReno(tp))
+   else if (tcp_is_reno(tp))
NET_INC_STATS_BH(LINUX_MIB_TCPRENOREORDER);
-   else if (IsFack(tp))
+   else if (tcp_is_fack(tp))
NET_INC_STATS_BH(LINUX_MIB_TCPFACKREORDER);
else
NET_INC_STATS_BH(LINUX_MIB_TCPSACKREORDER);
@@ -937,8 +947,7 @@ static void tcp_update_reordering(struct sock *sk, const 
int metric,
   tp->s

[PATCH tcp-2.6 0/3]: cleanups

2007-06-15 Thread Ilpo Järvinen
Couple of relatively simple cleanups to tcp-2.6.

-- 
 i.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH tcp-2.6 1/3] [TCP]: Add tcp_dec_pcount_approx int variant

2007-06-15 Thread Ilpo Järvinen
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <[EMAIL PROTECTED]>

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 include/net/tcp.h |   11 ---
 net/ipv4/tcp_output.c |8 ++--
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6f88d13..d12154a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -593,16 +593,21 @@ static inline int tcp_skb_mss(const struct sk_buff *skb)
return skb_shinfo(skb)->gso_size;
 }
 
-static inline void tcp_dec_pcount_approx(__u32 *count,
-const struct sk_buff *skb)
+static inline void tcp_dec_pcount_approx_int(__u32 *count, const int decr)
 {
if (*count) {
-   *count -= tcp_skb_pcount(skb);
+   *count -= decr;
if ((int)*count < 0)
*count = 0;
}
 }
 
+static inline void tcp_dec_pcount_approx(__u32 *count,
+const struct sk_buff *skb)
+{
+   tcp_dec_pcount_approx_int(count, tcp_skb_pcount(skb));
+}
+
 static inline void tcp_packets_out_inc(struct sock *sk,
   const struct sk_buff *skb)
 {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 23ee283..c8be1c8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -738,15 +738,11 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, 
u32 len, unsigned int mss
if (diff > 0) {
/* Adjust Reno SACK estimate. */
if (!tp->rx_opt.sack_ok) {
-   tp->sacked_out -= diff;
-   if ((int)tp->sacked_out < 0)
-   tp->sacked_out = 0;
+   tcp_dec_pcount_approx_int(&tp->sacked_out, 
diff);
tcp_sync_left_out(tp);
}
 
-   tp->fackets_out -= diff;
-   if ((int)tp->fackets_out < 0)
-   tp->fackets_out = 0;
+   tcp_dec_pcount_approx_int(&tp->fackets_out, diff);
}
}
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH tcp-2.6 2/3] [TCP]: Remove num_acked>0 checks from cong.ctrl mods pkts_acked

2007-06-15 Thread Ilpo Järvinen
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <[EMAIL PROTECTED]>

There is no need for such check in pkts_acked because the
callback is not invoked unless at least one segment got fully
ACKed (i.e., the snd_una moved past skb's end_seq) by the
cumulative ACK's snd_una advancement.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_bic.c  |2 +-
 net/ipv4/tcp_cubic.c|2 +-
 net/ipv4/tcp_westwood.c |3 +--
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c
index 281c9f9..9ebd8ec 100644
--- a/net/ipv4/tcp_bic.c
+++ b/net/ipv4/tcp_bic.c
@@ -210,7 +210,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, ktime_t 
last)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
 
-   if (cnt > 0 && icsk->icsk_ca_state == TCP_CA_Open) {
+   if (icsk->icsk_ca_state == TCP_CA_Open) {
struct bictcp *ca = inet_csk_ca(sk);
cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
ca->delayed_ack += cnt;
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 1422448..8be8ac4 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -338,7 +338,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, ktime_t 
last)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
 
-   if (cnt > 0 && icsk->icsk_ca_state == TCP_CA_Open) {
+   if (icsk->icsk_ca_state == TCP_CA_Open) {
struct bictcp *ca = inet_csk_ca(sk);
cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
ca->delayed_ack += cnt;
diff --git a/net/ipv4/tcp_westwood.c b/net/ipv4/tcp_westwood.c
index e61e09d..0d56a5a 100644
--- a/net/ipv4/tcp_westwood.c
+++ b/net/ipv4/tcp_westwood.c
@@ -103,8 +103,7 @@ static void westwood_filter(struct westwood *w, u32 delta)
 static void tcp_westwood_pkts_acked(struct sock *sk, u32 cnt, ktime_t last)
 {
struct westwood *w = inet_csk_ca(sk);
-   if (cnt > 0)
-   w->rtt = tcp_sk(sk)->srtt >> 3;
+   w->rtt = tcp_sk(sk)->srtt >> 3;
 }
 
 /*
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-15 Thread jamal
Hello Yi,

On Fri, 2007-15-06 at 09:27 +0800, Zhu Yi wrote:

> 1. driver becomes complicated (as it is too elaborate in the queue
> wakeup strategies design)

I am not sure i see the complexity in the wireless driver's wakeup
strategy. I just gave some suggestions to use management frames - they
dont have to be literally that way.

> 2. duplicated code among drivers (otherwise you put all the queue
> management logics in a new layer?)

There will be some shared code on drivers of same media on the
netif_stop/wake strategy perhaps, but not related to queue management. 

> 3. it's not 100% accurate. there has to be some overhead, more or less
> depends on the queue wakeup strategy the driver selected.

Why is it not accurate for wireless? I can see the corner case Patrick
mentioned in wired ethernet but then wired ethernet doesnt have other
events such as management frames (actually DCE does) to help. 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Kieran Mansley
Backend net driver acceleration

Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]>

diff -r fd61ea65cba3 drivers/xen/netback/common.h
--- a/drivers/xen/netback/common.h  Thu Jun 14 14:51:20 2007 +0100
+++ b/drivers/xen/netback/common.h  Thu Jun 14 14:56:41 2007 +0100
@@ -122,6 +122,41 @@ enum {
 
 extern int netbk_copy_skb_mode;
 
+
+#include 
+
+/* Function pointers into netback accelerator plugin modules */
+struct netback_accel_hooks {
+int  (*probe)(struct xenbus_device *dev);
+int (*remove)(struct xenbus_device *dev);
+};
+
+/* Structure to track the state of a netback accelerator plugin */
+struct netback_accelerator {
+struct list_head link;
+int id;
+char *frontend;
+struct netback_accel_hooks *hooks;
+};
+
+/* Connect an accelerator plugin module to netback */
+extern void netback_connect_accelerator(int id, const char *frontend, 
+struct netback_accel_hooks *hooks);
+/* Disconnect a previously connected accelerator pluging module */
+extern void netback_disconnect_accelerator(int id, const char *frontend);
+
+struct backend_info {
+   struct xenbus_device *dev;
+   netif_t *netif;
+   enum xenbus_state frontend_state;
+
+/* State relating to the netback accelerator */
+void *netback_accel_priv;
+/* The accelerator that this backend is currently using */
+struct netback_accelerator *accelerator;
+};
+
+
 #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE)
 #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE)
 
diff -r fd61ea65cba3 drivers/xen/netback/xenbus.c
--- a/drivers/xen/netback/xenbus.c  Thu Jun 14 14:51:20 2007 +0100
+++ b/drivers/xen/netback/xenbus.c  Thu Jun 14 14:56:41 2007 +0100
@@ -1,6 +1,7 @@
 /*  Xenbus code for netif backend
 Copyright (C) 2005 Rusty Russell <[EMAIL PROTECTED]>
 Copyright (C) 2005 XenSource Ltd
+Copyright (c) 2007 Solarflare Communications, Inc.
 
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
@@ -28,11 +29,126 @@
 printk("netback/xenbus (%s:%d) " fmt ".\n", __FUNCTION__, __LINE__, ##args)
 #endif
 
-struct backend_info {
-   struct xenbus_device *dev;
-   netif_t *netif;
-   enum xenbus_state frontend_state;
-};
+/* 
+ * A list of available netback accelerator plugin modules (each list
+ * entry is of type struct netback_accelerator) 
+ */ 
+static struct list_head accelerators_list;
+/* Lock used to protect access to accelerators_list */
+static spinlock_t accelerators_lock;
+
+/* 
+ * Compare a backend to an accelerator, and decide if they are
+ * compatible (i.e. if the accelerator should be used by the
+ * backend) 
+ */
+static int match_accelerator(struct backend_info *be, 
+ struct netback_accelerator *accelerator)
+{
+/*
+ * This could do with being more sophisticated.  For example,
+ * determine which hardware is being used by each backend from
+ * the bridge and network topology of the domain
+ */
+if ( be->accelerator == NULL )
+return 1;
+else
+return 0;
+}
+
+/*
+ * Notify all suitable backends that a new accelerator is available
+ * and connected.  This will also notify the accelerator plugin module
+ * that it is being used for a device through the probe hook.
+ */
+static int netback_accelerator_tell_backend(struct device *dev, void *arg)
+{
+struct netback_accelerator *accelerator = 
+(struct netback_accelerator *)arg;
+struct xenbus_device *xendev = to_xenbus_device(dev);
+
+if( !strcmp("vif", xendev->devicetype) ) {
+struct backend_info *be = xendev->dev.driver_data;
+
+if ( match_accelerator(be, accelerator) ) {
+ be->accelerator = accelerator;
+ be->accelerator->hooks->probe(xendev);
+}
+}
+return 0;
+}
+
+
+/*
+ * Entry point for an netback accelerator plugin module.  Called to
+ * advertise its presence, and connect to any suitable backends.
+ */
+void netback_connect_accelerator(int id, const char *frontend, 
+ struct netback_accel_hooks *hooks)
+{
+struct netback_accelerator *new_accelerator = 
+kmalloc(sizeof(struct netback_accelerator), GFP_KERNEL);
+unsigned frontend_len, flags;
+
+if ( new_accelerator ) {
+new_accelerator->id = id;
+
+frontend_len = strlen(frontend)+1;
+new_accelerator->frontend = kmalloc(frontend_len, GFP_KERNEL);
+if ( !new_accelerator->frontend ) {
+DPRINTK("%s: failed to allocate memory for frontend 
string\n",
+__FUNCTION__);
+

[PATCH 2/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Kieran Mansley
Add accel option to vif xend config

Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]>

diff -r 405eb3e22887 tools/python/xen/xend/server/netif.py
--- a/tools/python/xen/xend/server/netif.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xend/server/netif.py Thu Jun 14 14:52:55 2007 +0100
@@ -107,6 +107,7 @@ class NetifController(DevController):
 uuid= config.get('uuid')
 ipaddr  = config.get('ip')
 model   = config.get('model')
+accel   = config.get('accel')
 
 if not typ:
 typ = xoptions.netback_type
@@ -131,6 +132,8 @@ class NetifController(DevController):
 back['uuid'] = uuid
 if model:
 back['model'] = model
+if accel:
+back['accel'] = accel
 
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 for x in back:
@@ -157,10 +160,10 @@ class NetifController(DevController):
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 devinfo = ()
 for x in ( 'script', 'ip', 'bridge', 'mac',
-   'type', 'vifname', 'rate', 'uuid', 'model' ):
+   'type', 'vifname', 'rate', 'uuid', 'model', 'accel'):
 y = self.vm._readVm(config_path + x)
 devinfo += (y,)
-(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo
+(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = 
devinfo
 
 if script:
 result['script'] = script
@@ -180,5 +183,7 @@ class NetifController(DevController):
 result['uuid'] = uuid
 if model:
 result['model'] = model
+if accel:
+result['accel'] = accel
 
 return result
diff -r 405eb3e22887 tools/python/xen/xm/create.py
--- a/tools/python/xen/xm/create.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xm/create.py Thu Jun 14 14:52:55 2007 +0100
@@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals):
 
 def f(k):
 if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type',
- 'vifname', 'rate', 'model']:
+ 'vifname', 'rate', 'model', 'accel']:
 err('Invalid vif option: ' + k)
 
 config_vif.append([k, d[k]])

Add accel option to vif xend config

diff -r 405eb3e22887 tools/python/xen/xend/server/netif.py
--- a/tools/python/xen/xend/server/netif.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xend/server/netif.py Thu Jun 14 14:52:55 2007 +0100
@@ -107,6 +107,7 @@ class NetifController(DevController):
 uuid= config.get('uuid')
 ipaddr  = config.get('ip')
 model   = config.get('model')
+accel   = config.get('accel')
 
 if not typ:
 typ = xoptions.netback_type
@@ -131,6 +132,8 @@ class NetifController(DevController):
 back['uuid'] = uuid
 if model:
 back['model'] = model
+if accel:
+back['accel'] = accel
 
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 for x in back:
@@ -157,10 +160,10 @@ class NetifController(DevController):
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 devinfo = ()
 for x in ( 'script', 'ip', 'bridge', 'mac',
-   'type', 'vifname', 'rate', 'uuid', 'model' ):
+   'type', 'vifname', 'rate', 'uuid', 'model', 'accel'):
 y = self.vm._readVm(config_path + x)
 devinfo += (y,)
-(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo
+(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = 
devinfo
 
 if script:
 result['script'] = script
@@ -180,5 +183,7 @@ class NetifController(DevController):
 result['uuid'] = uuid
 if model:
 result['model'] = model
+if accel:
+result['accel'] = accel
 
 return result
diff -r 405eb3e22887 tools/python/xen/xm/create.py
--- a/tools/python/xen/xm/create.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xm/create.py Thu Jun 14 14:52:55 2007 +0100
@@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals):
 
 def f(k):
 if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type',
- 'vifname', 'rate', 'model']:
+ 'vifname', 'rate', 'model', 'accel']:
 err('Invalid vif option: ' + k)
 
 config_vif.append([k, d[k]])


[PATCH 0/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Kieran Mansley
This is a repost of some earlier patches to the xen-devel mailing list,
with a number of changes thanks to some useful suggestions from others.

I've also CC'd netdev@vger.kernel.org at Herbert Xu's request as some of
the files being patched may be merged into upstream linux soon, and so
folks there may have
opinions too.

Changes from last time:
 - Improved the link between frontend and accelerated plugin so that
netif_wake_queue is only called when both have available TX slots.
Thanks to Herbert Xu for pointing this out.
 - Removed macro that safely called the accelerated hooks; this code is
now inline in the relevant places. 
 - Some small cleanup modifications, and resync patches to latest xen-
unstable.hg, including source tree layout changes for linux-2.6.18-
xen.hg

The discussion following the previous posting of these patches was
mainly concerned with the approach to locking in the frontend.  Some
felt it was too complex and suggested using RCU instead.  However, I
have been unable to convince myself that RCU would offer suitable
protection against the accelerated plugin module unloading in the middle
of a hook call, particularly where these hook calls might block.  The
existing locking has been well tested and is therefore known to work.
In addition, the complexity is largely to ensure there is low locking
overhead on the data path, and so a change to simplify it would not
likely increase performance.  As a result the locking has not changed
substantially since the last patch.  However if others feel strongly
about this and can convince me that RCU would be adequate, I wouldn't
object to making this change.

What follows is the text from the original post providing some
background on the patches:

This set of patches provides the hooks and support necessary for
accelerated network plugin modules to attach to Xen's netback and
netfront.  These modules provide a fast path for network traffic where
there is hardware support available for the netfront driver to send and
receive packets directly to a NIC (such as those available from
Solarflare).

As there are currently no available plugins, I've attached a couple of
dummy ones to illustrate how the hooks could be used.  These are
incomplete (and clearly wouldn't even compile) in that they only include
code to show the interface between the accelerated module and
netfront/netback.  A lot of the comments hint at what code should go
where.  They don't show any interface between the accelerated frontend
and accelerated backend, or hardware access, for example, as those would
both be specific to the implementation.  I hope they help illustrate
this, but if you have any questions I'm happy to provide more
information.

A brief overview of the operation of the plugins:  When the accelerated
modules are loaded, a VI is created by the accelerated backend to allow
the accelerated frontend to safely access portions of the NIC.  For RX,
when packets are received by the accelerated backend, it will examine
them and if appropriate insert filters into the NIC to deliver future
packets on that address directly to the accelerated frontend's VI.  For
TX, netfront gives each accelerated frontend the option of sending each
packet, which it can accept (if it wants to send it directly to the
hardware) or decline (if it thinks this is more appropriate to send via
the normal network path).

We have found that using this approach to accelerating network traffic,
domU to domU connections (across the network) can achieve close to the
performance of dom0 to dom0 connections on a 10Gbps ethernet.  This is
roughly double the bandwidth seen with unmodified Xen.

/***/
/*! \file dumm_accel_backend.c Dummy accelerated plugin module

Copyright 2006 Solarflare Communications Inc,
   9501 Jeronimo Road, Suite 250,
   Irvine, CA 92618, USA

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License version 2 as published by the Free
Software Foundation, incorporated herein by reference.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

*/
/***/

static struct netback_accel_hooks accel_hooks = {
&netback_accel_probe,
&netback_accel_remove
};

static const char *frontend_name = "dummynetaccel";

static int netback_accel_init(void)
{
/* Initialise the rest of the module... */

/* Tell the netback that we're here */
netback_connect_accelerator(

[PATCH 1/4] [Net] Support accelerated network plugin modules

2007-06-15 Thread Kieran Mansley
Add xenbus_for_each_[back,front]end functions to iterate each bus

Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]> 

diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe.c
--- a/drivers/xen/xenbus/xenbus_probe.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe.c Thu Jun 14 14:49:40 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -1085,3 +1086,10 @@ static int __init boot_wait_for_devices(
 
 late_initcall(boot_wait_for_devices);
 #endif
+
+
+int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *))
+{
+return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_frontend);
diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe_backend.c
--- a/drivers/xen/xenbus/xenbus_probe_backend.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c Thu Jun 14 14:49:40 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -285,3 +286,10 @@ void xenbus_backend_device_register(void
   xenbus_backend.error);
}
 }
+
+int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *))
+{
+return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_backend);
+
diff -r d5e0eb7dd069 include/xen/xenbus.h
--- a/include/xen/xenbus.h  Sun Jun 10 19:50:32 2007 +0100
+++ b/include/xen/xenbus.h  Thu Jun 14 14:49:40 2007 +0100
@@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d
 int xenbus_dev_is_online(struct xenbus_device *dev);
 int xenbus_frontend_closed(struct xenbus_device *dev);
 
+extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void 
*));
+extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void 
*));
+
 #endif /* _XEN_XENBUS_H */

Add xenbus_for_each_[back,front]end functions to iterate each bus

diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe.c
--- a/drivers/xen/xenbus/xenbus_probe.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe.c Thu Jun 14 14:49:40 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -1085,3 +1086,10 @@ static int __init boot_wait_for_devices(
 
 late_initcall(boot_wait_for_devices);
 #endif
+
+
+int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *))
+{
+return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_frontend);
diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe_backend.c
--- a/drivers/xen/xenbus/xenbus_probe_backend.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c Thu Jun 14 14:49:40 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -285,3 +286,10 @@ void xenbus_backend_device_register(void
   xenbus_backend.error);
}
 }
+
+int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *))
+{
+return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_backend);
+
diff -r d5e0eb7dd069 include/xen/xenbus.h
--- a/include/xen/xenbus.h  Sun Jun 10 19:50:32 2007 +0100
+++ b/include/xen/xenbus.h  Thu Jun 14 14:49:40 2007 +0100
@@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d
 int xenbus_dev_is_online(struct xenbus_device *dev);
 int xenbus_frontend_closed(struct xenbus_device *dev);
 
+extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void 
*));
+extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void 
*));
+
 #endif /* _XEN_XENBUS_H */


Re: Fwd: [PATCH] [-mm] ACPI: export ACPI events via netlink

2007-06-15 Thread jamal
On Fri, 2007-15-06 at 09:01 +0800, Zhang Rui wrote:

> > rtnl_open_byproto(&rth, nl_mgrp(mydiscoveredacpiid), NETLINK_GENERIC)
> > 
> Yes. It doesn't work if I use my group id here.
> In fact, I'm using rtnl_open_byproto(&rth, 1, NETLINK_GENERIC) now.
> That's why I said that this demo receives all the broadcasted genetlink
> messages.

Ok, tell me how to generate these ACPI events and i will patch my laptop
and test it. What kernel compile options do i need? 2.6.24-rc4 will do?

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive v2

2007-06-15 Thread Evgeniy Polyakov
On Fri, Jun 15, 2007 at 11:34:30AM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote:
> > Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
> > was 2 there was only 1 free slot in pipe_buffer.
> 
> Ah duh, indeed, it is a dumb bug indeed. This should work.

Yep, this one works too.

> diff --git a/fs/splice.c b/fs/splice.c
> index 89871c6..5327cbf 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -172,6 +172,7 @@ static const struct pipe_buf_operations 
> user_page_pipe_buf_ops = {
>  ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  struct splice_pipe_desc *spd)
>  {
> + unsigned int spd_pages = spd->nr_pages;
>   int ret, do_wakeup, page_nr;
>  
>   ret = 0;
> @@ -252,7 +253,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>   }
>   }
>  
> - while (page_nr < spd->nr_pages)
> + while (page_nr < spd_pages)
>   spd->spd_release(spd, page_nr++);
>  
>   return ret;
> 
> -- 
> Jens Axboe

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive v2

2007-06-15 Thread Jens Axboe
On Fri, Jun 15 2007, Evgeniy Polyakov wrote:
> On Fri, Jun 15, 2007 at 10:43:18AM +0200, Jens Axboe ([EMAIL PROTECTED]) 
> wrote:
> > > So, things turned down to be not in the __splice_from_pipe(), but
> > > splice_to_pipe(). Attached patch fixes a leak for me.
> > > It was tested with different data files and all were received correctly.
> > > 
> > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>
> > > 
> > > diff --git a/fs/splice.c b/fs/splice.c
> > > index bc481f1..365bfd9 100644
> > > --- a/fs/splice.c
> > > +++ b/fs/splice.c
> > > @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> > >   break;
> > >   if (pipe->nrbufs < PIPE_BUFFERS)
> > >   continue;
> > > -
> > > - break;
> > >   }
> > >  
> > >   if (spd->flags & SPLICE_F_NONBLOCK) {
> > > 
> > 
> > Hmm, curious. If we hit that location, then two conditions are true:
> > 
> > - Pipe is full
> > - We transferred some data
> 
> Yep.
> 
> > if you remove the break, then you'll end up blocking in pipe_wait()
> > (unless you have SPLICE_F_NONBLOCK also set). And we don't want to block
> > waiting for more room, if we already transferred some data. In that case
> > we just want to return a short splice. Looking at pipe_write(), it'll
> > block as well though. Just doesn't seem optimal to me, but...
> > 
> > So the question is why would doing the break there cause a leak? I just
> > don't yet see how it can happen, I'd love to fix that condition instead.
> > For the case you describe, we should have page_nr == 1 and spd->nr_pages
> > == 2. Is the:
> > 
> > while (page_nr < spd->nr_pages)
> > spd->spd_release(spd, page_nr++);
> > 
> > not dropping the right reference?
> 
> Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
> was 2 there was only 1 free slot in pipe_buffer.

Ah duh, indeed, it is a dumb bug indeed. This should work.

diff --git a/fs/splice.c b/fs/splice.c
index 89871c6..5327cbf 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -172,6 +172,7 @@ static const struct pipe_buf_operations 
user_page_pipe_buf_ops = {
 ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
   struct splice_pipe_desc *spd)
 {
+   unsigned int spd_pages = spd->nr_pages;
int ret, do_wakeup, page_nr;
 
ret = 0;
@@ -252,7 +253,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
}
}
 
-   while (page_nr < spd->nr_pages)
+   while (page_nr < spd_pages)
spd->spd_release(spd, page_nr++);
 
return ret;

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6] [TCP]: Congestion control API RTT sampling fix

2007-06-15 Thread Ilpo Järvinen
Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT
sampling of congestion control modules. Inaccurate timestamps
could be fed to them without providing any way for them to
identify such cases. Previously RTT sampler was called only if
FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
timestamps nicely. In addition, the new behavior could give an
invalid timestamp (zero) to RTT sampler if only skbs with
TCPCB_RETRANS were ACKed. This solves both problems.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 include/linux/ktime.h   |   12 
 include/linux/skbuff.h  |4 
 net/ipv4/tcp_illinois.c |3 +++
 net/ipv4/tcp_input.c|6 +-
 net/ipv4/tcp_lp.c   |3 ++-
 net/ipv4/tcp_vegas.c|3 +++
 net/ipv4/tcp_veno.c |3 +++
 7 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index c762954..2b139f6 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -261,6 +261,18 @@ static inline s64 ktime_to_ns(const ktime_t kt)
 
 #endif
 
+/**
+ * ktime_equal - Compares two ktime_t variables to see if they are equal
+ * @cmp1:  comparable1
+ * @cmp2:  comparable2
+ *
+ * Compare two ktime_t variables, returns 1 if equal
+ */
+static inline int ktime_equal(const ktime_t cmp1, const ktime_t cmp2)
+{
+   return cmp1.tv64 == cmp2.tv64;
+}
+
 static inline s64 ktime_to_us(const ktime_t kt)
 {
struct timeval tv = ktime_to_timeval(kt);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e7367c7..6f0b2f7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1579,6 +1579,10 @@ static inline ktime_t net_timedelta(ktime_t t)
return ktime_sub(ktime_get_real(), t);
 }
 
+static inline ktime_t net_invalid_timestamp(void)
+{
+   return ktime_set(0, 0);
+}
 
 extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
 extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 4adc47c..b2b2256 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -90,6 +90,9 @@ static void tcp_illinois_acked(struct sock *sk, u32 
pkts_acked, ktime_t last)
 
ca->acked = pkts_acked;
 
+   if (ktime_equal(last, net_invalid_timestamp()))
+   return;
+
rtt = ktime_to_us(net_timedelta(last));
 
/* ignore bogus values, this prevents wraparound in alpha math */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d6d0f9b..aaf6f66 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2409,7 +2409,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
int acked = 0;
int prior_packets = tp->packets_out;
__s32 seq_rtt = -1;
-   ktime_t last_ackt = ktime_set(0,0);
+   ktime_t last_ackt = net_invalid_timestamp();
 
while ((skb = tcp_write_queue_head(sk)) &&
   skb != tcp_send_head(sk)) {
@@ -2487,6 +2487,10 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
tcp_ack_update_rtt(sk, acked, seq_rtt);
tcp_ack_packets_out(sk);
 
+   /* Is the ACK triggering packet unambiguous? */
+   if (acked & FLAG_RETRANS_DATA_ACKED)
+   last_ackt = net_invalid_timestamp();
+
if (ca_ops->pkts_acked)
ca_ops->pkts_acked(sk, pkts_acked, last_ackt);
}
diff --git a/net/ipv4/tcp_lp.c b/net/ipv4/tcp_lp.c
index 43294ad..e49836c 100644
--- a/net/ipv4/tcp_lp.c
+++ b/net/ipv4/tcp_lp.c
@@ -266,7 +266,8 @@ static void tcp_lp_pkts_acked(struct sock *sk, u32 
num_acked, ktime_t last)
struct tcp_sock *tp = tcp_sk(sk);
struct lp *lp = inet_csk_ca(sk);
 
-   tcp_lp_rtt_sample(sk,  ktime_to_us(net_timedelta(last)));
+   if (!ktime_equal(last, net_invalid_timestamp()))
+   tcp_lp_rtt_sample(sk,  ktime_to_us(net_timedelta(last)));
 
/* calc inference */
if (tcp_time_stamp > tp->rx_opt.rcv_tsecr)
diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index 73e19cf..e218a51 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -117,6 +117,9 @@ void tcp_vegas_pkts_acked(struct sock *sk, u32 cnt, ktime_t 
last)
struct vegas *vegas = inet_csk_ca(sk);
u32 vrtt;
 
+   if (ktime_equal(last, net_invalid_timestamp()))
+   return;
+
/* Never allow zero rtt or baseRTT */
vrtt = ktime_to_us(net_timedelta(last)) + 1;
 
diff --git a/net/ipv4/tcp_veno.c b/net/ipv4/tcp_veno.c
index 9edb340..ec854cc 100644
--- a/net/ipv4/tcp_veno.c
+++ b/net/ipv4/tcp_veno.c
@@ -74,6 +74,9 @@ static void tcp_veno_pkts_acked(struct sock *sk, u32 cnt, 
ktime_t last)
struct veno *veno = inet_csk_ca(sk);
u32 vrtt;
 
+   if (ktime_equal(last, net_invalid_timestamp()))
+   return;
+
/* Never allow zero rtt or baseRTT */
vrtt = ktime_

Re: [IPV6] addrconf: Fix IPv6 on tuntap tunnels

2007-06-15 Thread Oliver Hartkopp
Hallo David, hello Herbert,

indeed i have some concerns about reverting the patch as i do not see
that the MTU is the right thing to distinguish whether a netdevice is
capable to have IPv4/IPv6. E.g. is decnet able to run IPv6?

IMHO the autoconf (in any case) should only handle netdevices that are
capable to be auto configured (e.g. with IPv6).

So the question looks like:
"Is this device capable to be auto configured with IPv6?"
and not
"Is the devices MTU >= IPV6_MIN_MTU ?"

My original patch showed the way to ask the (IMO) correct question. And
maybe this has to be improved somehow (like Florian Zumbiel suggested).
And also maybe the  Autoconfiguration has to be a part of code that is
independent from net/ipv6. But to ask about the MTU size does not look
the right way to me.

Best regards,
Oliver

David Miller wrote:
> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Thu, 14 Jun 2007 18:16:07 +1000
>
>   
>> [IPV6] addrconf: Fix IPv6 on tuntap tunnels
>>
>> The recent patch that added ipv6_hwtype is broken on tuntap tunnels.
>> Indeed, it's broken on any device that does not pass the ipv6_hwtype
>> test.
>>
>> The reason is that the original test only applies to autoconfiguration,
>> not IPv6 support.  IPv6 support is allowed on any device.  In fact,
>> even with the ipv6_hwtype patch applied you can still add IPv6 addresses
>> to any interface that doesn't pass thw ipv6_hwtype test provided that
>> they have a sufficiently large MTU.  This is a serious problem because
>> come deregistration time these devices won't be cleaned up properly.
>>
>> I've gone back and looked at the rationale for the patch.  It appears
>> that the real problem is that we were creating IPv6 devices even if the
>> MTU was too small.  So here's a patch which fixes that and reverts the
>> ipv6_hwtype stuff.
>>
>> Thanks to Kanru Chen for reporting this issue.
>>
>> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
>> 
>
> Thanks for fixing this up Herbert.
>
> Patch applied.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive v2

2007-06-15 Thread Evgeniy Polyakov
On Fri, Jun 15, 2007 at 10:43:18AM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote:
> > So, things turned down to be not in the __splice_from_pipe(), but
> > splice_to_pipe(). Attached patch fixes a leak for me.
> > It was tested with different data files and all were received correctly.
> > 
> > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>
> > 
> > diff --git a/fs/splice.c b/fs/splice.c
> > index bc481f1..365bfd9 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> > break;
> > if (pipe->nrbufs < PIPE_BUFFERS)
> > continue;
> > -
> > -   break;
> > }
> >  
> > if (spd->flags & SPLICE_F_NONBLOCK) {
> > 
> 
> Hmm, curious. If we hit that location, then two conditions are true:
> 
> - Pipe is full
> - We transferred some data

Yep.

> if you remove the break, then you'll end up blocking in pipe_wait()
> (unless you have SPLICE_F_NONBLOCK also set). And we don't want to block
> waiting for more room, if we already transferred some data. In that case
> we just want to return a short splice. Looking at pipe_write(), it'll
> block as well though. Just doesn't seem optimal to me, but...
> 
> So the question is why would doing the break there cause a leak? I just
> don't yet see how it can happen, I'd love to fix that condition instead.
> For the case you describe, we should have page_nr == 1 and spd->nr_pages
> == 2. Is the:
> 
> while (page_nr < spd->nr_pages)
> spd->spd_release(spd, page_nr++);
> 
> not dropping the right reference?

Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
was 2 there was only 1 free slot in pipe_buffer.

spd_fill_page: allocated: 89, freed: 73, data: 81003d606d28
spd_fill_page: allocated: 90, freed: 73, data: 81003d606d28
splice_to_pipe: priv: 81003d606d28, spd_nrpages: 1, pipe_nrbufs: 16, 
page_nr: 1.

spd_fill_page: allocated: 91, freed: 73, data: fff81003d6549c8 // next data
...

__splice_from_pipe: process: sd_len: 0, buf_len: 0, buf_priv: 81003d606d28.
__splice_from_pipe: release 81003d606d28.
sock_pipe_buf_release: allocated: 91, freed: 89, ptr: 81003d606d28

splice_to_pipe: priv: 81003d6549c8, spd_nrpages: 0, pipe_nrbufs: 1, 
page_nr: 1. // next data

> -- 
> Jens Axboe

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive v2

2007-06-15 Thread Jens Axboe
On Fri, Jun 15 2007, Evgeniy Polyakov wrote:
> On Thu, Jun 14, 2007 at 01:59:02PM +0200, Jens Axboe ([EMAIL PROTECTED]) 
> wrote:
> > On Thu, Jun 14 2007, Evgeniy Polyakov wrote:
> > > On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov ([EMAIL 
> > > PROTECTED]) wrote:
> > > > I will rebase my tree, likely something was not merged correctly.
> > > 
> > > Ok, I've just rebased a tree from recent git and pulled from brick -
> > > things seems to be settled. I've ran several tests with different
> > > filesizes and all files were received correctly without kernel crashes.
> > > There is skb leak indeed, and it looks like it is in the
> > > __splice_from_pipe() for the last page.
> > 
> > Uh, the leak, right - I had forgotten about that, was working on getting
> > vmsplice into shape the last two days. Interesting that you mention the
> > last page, I'll dig in now! Any more info on this (how did you notice
> > the leak originates from there)?
> 
> I first observed leak via slabinfo data (not a leak, but number of skbs
> did not dropped after quite huge files were transferred), then added
> number of allocated and freed objects in skbuff.c, they did not match
> for big files, so I started to check splice source and found that
> sometimes ->release callback is not called, but code breaks out of the
> loop. I've put some printks in __splice_from_pipe() and found following
> case, when skb is leaked:
> when the same cloned skb was shared multiple times (no more than 2 though),
> only one copy was freed.
> 
> Further analysis description requires some splice background (obvious
> for you, but that clears it for me):
> pipe_buffer only contains 16 pages.
> There is a code, which copies pages (pointers) from spd to pipe_buffer
> (splice_to_pipe()).
> skb allocations happens in chunks of different size (i.e. with different
> number of skbs/pages per call), so it is possible that number of
> allocated skbs will be less than pipe_buffer size (16), and then the
> rest of the pages will be put into different (or the same) pipe_buffer later.
> Sometimes two skbs from above example happens to be on the boundary of
> the pipe buffer, so only one of them is being copied into pipe_buffer,
> which is then transferred over the pipe.
> So, we have a case, when spd has (it had more, but this two are special) 
> 2 pages (actually the same page, but two references to it), but pipe_buffer 
> has a place only for one. In that case second page from spd will be missed.
> 
> So, things turned down to be not in the __splice_from_pipe(), but
> splice_to_pipe(). Attached patch fixes a leak for me.
> It was tested with different data files and all were received correctly.
> 
> Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index bc481f1..365bfd9 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>   break;
>   if (pipe->nrbufs < PIPE_BUFFERS)
>   continue;
> -
> - break;
>   }
>  
>   if (spd->flags & SPLICE_F_NONBLOCK) {
> 

Hmm, curious. If we hit that location, then two conditions are true:

- Pipe is full
- We transferred some data

if you remove the break, then you'll end up blocking in pipe_wait()
(unless you have SPLICE_F_NONBLOCK also set). And we don't want to block
waiting for more room, if we already transferred some data. In that case
we just want to return a short splice. Looking at pipe_write(), it'll
block as well though. Just doesn't seem optimal to me, but...

So the question is why would doing the break there cause a leak? I just
don't yet see how it can happen, I'd love to fix that condition instead.
For the case you describe, we should have page_nr == 1 and spd->nr_pages
== 2. Is the:

while (page_nr < spd->nr_pages)
spd->spd_release(spd, page_nr++);

not dropping the right reference?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive v2

2007-06-15 Thread Evgeniy Polyakov
On Thu, Jun 14, 2007 at 01:59:02PM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote:
> On Thu, Jun 14 2007, Evgeniy Polyakov wrote:
> > On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov ([EMAIL 
> > PROTECTED]) wrote:
> > > I will rebase my tree, likely something was not merged correctly.
> > 
> > Ok, I've just rebased a tree from recent git and pulled from brick -
> > things seems to be settled. I've ran several tests with different
> > filesizes and all files were received correctly without kernel crashes.
> > There is skb leak indeed, and it looks like it is in the
> > __splice_from_pipe() for the last page.
> 
> Uh, the leak, right - I had forgotten about that, was working on getting
> vmsplice into shape the last two days. Interesting that you mention the
> last page, I'll dig in now! Any more info on this (how did you notice
> the leak originates from there)?

I first observed leak via slabinfo data (not a leak, but number of skbs
did not dropped after quite huge files were transferred), then added
number of allocated and freed objects in skbuff.c, they did not match
for big files, so I started to check splice source and found that
sometimes ->release callback is not called, but code breaks out of the
loop. I've put some printks in __splice_from_pipe() and found following
case, when skb is leaked:
when the same cloned skb was shared multiple times (no more than 2 though),
only one copy was freed.

Further analysis description requires some splice background (obvious
for you, but that clears it for me):
pipe_buffer only contains 16 pages.
There is a code, which copies pages (pointers) from spd to pipe_buffer
(splice_to_pipe()).
skb allocations happens in chunks of different size (i.e. with different
number of skbs/pages per call), so it is possible that number of
allocated skbs will be less than pipe_buffer size (16), and then the
rest of the pages will be put into different (or the same) pipe_buffer later.
Sometimes two skbs from above example happens to be on the boundary of
the pipe buffer, so only one of them is being copied into pipe_buffer,
which is then transferred over the pipe.
So, we have a case, when spd has (it had more, but this two are special) 
2 pages (actually the same page, but two references to it), but pipe_buffer 
has a place only for one. In that case second page from spd will be missed.

So, things turned down to be not in the __splice_from_pipe(), but
splice_to_pipe(). Attached patch fixes a leak for me.
It was tested with different data files and all were received correctly.

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/fs/splice.c b/fs/splice.c
index bc481f1..365bfd9 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
break;
if (pipe->nrbufs < PIPE_BUFFERS)
continue;
-
-   break;
}
 
if (spd->flags & SPLICE_F_NONBLOCK) {

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.4.35-pre1: new e1000 driver breaks old hardware

2007-06-15 Thread Wolfgang Nothdurft
Hi,

with the new e1000 driver version 7.3.20 the onboard gigabit nic 82547EI
(8086:1019) doesn't work correctly.
After transferring about 100 megabytes over a gigabit link the transfer
stopped and I have to reinit the link either by doing a ifconfig down/up
or unplugging the network cable.
The RxIntDelay is set to 0 like described in the docu. Playing with this
Parameter only increases the amount of traffic to be send and the error
occurs later.
This happens also with the 7.4.35 and 7.5.5 driver from the intel side.
The driver 5.7.6 from kernel 2.4.33.3 and the 6.1.16 from intel works
very well on this hardware.
Also other gigabit nics we use didn't have this problem.
Is this issue already known?
Is there any solution yet?

Kind regards
Wolfgang
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html