Re: [PATCH] [TCP]: Fix logic breakage due to DSACK separation
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Sat, 16 Jun 2007 02:04:25 +0300 (EEST) > On Fri, 15 Jun 2007, David Miller wrote: > > > Patch applied. > > ...I think it should go to stable as well. I thought Baruch's DSACK seperation that created this problem didn't go in until 2.6.22? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/2] 2.6.22-rc4: known regressions v3
On Wed, 13 Jun 2007, Michal Piotrowski wrote: > > Subject: hibernate(?) fails totally - regression > References : http://lkml.org/lkml/2007/6/1/401 > Submitter : David Greaves <[EMAIL PROTECTED]> > Handled-By : Rafael J. Wysocki <[EMAIL PROTECTED]> > Caused-By : Tejun Heo <[EMAIL PROTECTED]> > commit 9666f4009c22f6520ac3fb8a19c9e32ab973e828 > Status : problem is being debugged Ahh. This is fixed (fix by Tejun, confirmed by David), and the fix has been merged. It's commit bc90ba093a, in case anybody cares. Linus - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Fix logic breakage due to DSACK separation
On Fri, 15 Jun 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Fri, 15 Jun 2007 16:07:37 +0300 (EEST) > > > Commit 6f74651ae626ec672028587bc700538076dfbefb is found guilty > > of breaking DSACK counting, which should be done only for the > > SACK block reported by the DSACK instead of every SACK block > > that is received along with DSACK information. > > > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > > Thanks for performing such rigorous audits and finding > regressions like this! They just come across, one thing leads to another, which again leads to another and so on... ...I was just trying to address the concerns you noted about tcp-2.6 patchset a while ago and came across other issues... There are still some things I must think carefully in sacktag processing since it does not validate start_seq and end_seq at all which can be abused currently at least in tcp-2.6. ...I would rather put end to the whole russian roulette in tcp-2.6 sacktag rather than fix/think individual cases and leave future modifications of it similarily hazardous. It's not very clear to me how to handle all possible cases of invalid SACK blocks though, perhaps TCP should just ignore such sack blocks without doing any processing based on them, i.e., ignore them whenever start_seq-end_seq does not fully fit to snd_una-snd_nxt (expect DSACK of course, which should be processed if it's between undo_marker-snd_nxt). Do you have any thoughts about this? It seems to me that net-2.6 is safe in this respect (probably just by accident) but the analysis wasn't that trivial, my main concern was tcp_fragment's pkt_len argument, if it ever becomes > skb->len, a boom would result (and perhaps zero has similar issues)! I couldn't find breakage analytically (but I could be wrong in it due to mind-boggling number of negations :-)) nor by bruteforcing seqno combinations near 0, skb->len and 2^31 wrap. > Patch applied. ...I think it should go to stable as well. -- i.
Re: [RFC: 2.6 patch] net/rxrpc/ar-output.c: remove dead code
From: Adrian Bunk <[EMAIL PROTECTED]> Date: Fri, 15 Jun 2007 23:45:20 +0200 > This patch removes dead code spotted by the Coverity checker. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> I'll let David H. decide how to handle these cleanups and simplifications. Thanks Adrian. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] net/rxrpc/ar-connection.c: fix NULL dereference
From: Adrian Bunk <[EMAIL PROTECTED]> Date: Fri, 15 Jun 2007 23:45:13 +0200 > This patch fixes a NULL dereference spotted by the Coverity checker. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Looks good to me, patch applied, thanks Adrian. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Fix logic breakage due to DSACK separation
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Fri, 15 Jun 2007 16:07:37 +0300 (EEST) > Commit 6f74651ae626ec672028587bc700538076dfbefb is found guilty > of breaking DSACK counting, which should be done only for the > SACK block reported by the DSACK instead of every SACK block > that is received along with DSACK information. > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> Thanks for performing such rigorous audits and finding regressions like this! Patch applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPV6] addrconf: Fix IPv6 on tuntap tunnels
From: Vlad Yasevich <[EMAIL PROTECTED]> Date: Fri, 15 Jun 2007 16:42:22 -0400 > No, the questions should really be: > > 1. Is IPV6 supported over this media type. > yes: got to 2 > no: stop > > 2. Is the device MTU >= IPV6_MIN_MTU > yes: continue > no: stop > > Autoconfiguration is a layer on top of IPv6. Whether it's enabled > or not should not dictate whether IPv6 addressed may be configured or not. Sounds good to me, patches? :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6] [TCP]: Congestion control API RTT sampling fix
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Fri, 15 Jun 2007 07:32:55 -0700 > On Fri, 15 Jun 2007 12:30:12 +0300 (EEST) > "Ilpo Järvinen" <[EMAIL PROTECTED]> wrote: > > > Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT > > sampling of congestion control modules. Inaccurate timestamps > > could be fed to them without providing any way for them to > > identify such cases. Previously RTT sampler was called only if > > FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate > > timestamps nicely. In addition, the new behavior could give an > > invalid timestamp (zero) to RTT sampler if only skbs with > > TCPCB_RETRANS were ACKed. This solves both problems. > > > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > > > Yes, let's get this in 2.6.22 Agreed, patch applied, thanks everyone. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.22-rc4 hibernate disables skge wol
On Friday, 15 June 2007 23:50, David Greaves wrote: > I've started a new thread here since the old one got somewhat hijacked. > > Rafael J. Wysocki wrote: > > On Friday, 1 June 2007 23:23, David Greaves wrote: > >> Not a regression though, it does it in 2.6.21 > >> > >> If I cause the system to save state to disk then whilst off it no longer > >> responds to g-wol. > > > > Can you please try with the hibernation and suspend patch series from > > > > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc3/patches/ > > > > applied? > > > > Greetings, > > Rafael > > > > > Now, IIRC, some time ago you asked me to try some patches :) > > So I applied them (and Tejun's fixes as per the old thread) to rc4 (which > seems > to include a couple already). > > Hibernate/resume works but although WOL works on an init 0, it doesn't work > on a > hibernated system :( Well, I have one idea, but I'd like the recent paches currently in -mm to settle down before trying it. ;-) In the meantime, you may want to open a bugzilla entry related to this issue (please add my address to the CC list if you do that). Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.22-rc4 hibernate disables skge wol
I've started a new thread here since the old one got somewhat hijacked. Rafael J. Wysocki wrote: > On Friday, 1 June 2007 23:23, David Greaves wrote: >> Not a regression though, it does it in 2.6.21 >> >> If I cause the system to save state to disk then whilst off it no longer >> responds to g-wol. > > Can you please try with the hibernation and suspend patch series from > > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc3/patches/ > > applied? > > Greetings, > Rafael > > Now, IIRC, some time ago you asked me to try some patches :) So I applied them (and Tejun's fixes as per the old thread) to rc4 (which seems to include a couple already). Hibernate/resume works but although WOL works on an init 0, it doesn't work on a hibernated system :( David - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC: 2.6 patch] net/rxrpc/ar-output.c: remove dead code
This patch removes dead code spotted by the Coverity checker. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- net/rxrpc/ar-output.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) --- linux-2.6.22-rc4-mm2/net/rxrpc/ar-output.c.old 2007-06-15 21:38:02.0 +0200 +++ linux-2.6.22-rc4-mm2/net/rxrpc/ar-output.c 2007-06-15 21:38:47.0 +0200 @@ -530,7 +530,7 @@ static int rxrpc_send_data(struct kiocb struct sock *sk = &rx->sk; long timeo; bool more; - int ret, ioc, segment, copied; + int ret, ioc, segment; _enter(",,,{%zu},%zu", msg->msg_iovlen, len); @@ -552,7 +552,6 @@ static int rxrpc_send_data(struct kiocb skb = call->tx_pending; call->tx_pending = NULL; - copied = 0; do { int copy; @@ -570,11 +569,11 @@ static int rxrpc_send_data(struct kiocb call->acks_winsz) <= 0) { ret = -EAGAIN; if (msg->msg_flags & MSG_DONTWAIT) - goto maybe_error; + goto out; ret = rxrpc_wait_for_tx_window(rx, call, &timeo); if (ret < 0) - goto maybe_error; + goto out; } max = call->conn->trans->peer->maxdata; @@ -596,7 +595,7 @@ static int rxrpc_send_data(struct kiocb skb = sock_alloc_send_skb( sk, size, msg->msg_flags & MSG_DONTWAIT, &ret); if (!skb) - goto maybe_error; + goto out; rxrpc_new_skb(skb); @@ -723,11 +722,6 @@ call_aborted: _leave(" = %d", ret); return ret; -maybe_error: - if (copied) - ret = copied; - goto out; - efault: ret = -EFAULT; goto out; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6 patch] net/rxrpc/ar-connection.c: fix NULL dereference
This patch fixes a NULL dereference spotted by the Coverity checker. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- net/rxrpc/ar-connection.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.22-rc4-mm2/net/rxrpc/ar-connection.c.old 2007-06-15 23:05:26.0 +0200 +++ linux-2.6.22-rc4-mm2/net/rxrpc/ar-connection.c 2007-06-15 23:06:42.0 +0200 @@ -200,29 +200,29 @@ static struct rxrpc_connection *rxrpc_al if (conn) { INIT_WORK(&conn->processor, &rxrpc_process_connection); INIT_LIST_HEAD(&conn->bundle_link); conn->calls = RB_ROOT; skb_queue_head_init(&conn->rx_queue); rwlock_init(&conn->lock); spin_lock_init(&conn->state_lock); atomic_set(&conn->usage, 1); conn->debug_id = atomic_inc_return(&rxrpc_debug_id); conn->avail_calls = RXRPC_MAXCALLS; conn->size_align = 4; conn->header_size = sizeof(struct rxrpc_header); } - _leave(" = %p{%d}", conn, conn->debug_id); + _leave(" = %p{%d}", conn, conn ? conn->debug_id : 0); return conn; } /* * assign a connection ID to a connection and add it to the transport's * connection lookup tree * - called with transport client lock held */ static void rxrpc_assign_connection_id(struct rxrpc_connection *conn) { struct rxrpc_connection *xconn; struct rb_node *parent, **p; __be32 epoch; u32 real_conn_id; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPV6] addrconf: Fix IPv6 on tuntap tunnels
Oliver Hartkopp wrote: > Hallo David, hello Herbert, > > indeed i have some concerns about reverting the patch as i do not see > that the MTU is the right thing to distinguish whether a netdevice is > capable to have IPv4/IPv6. E.g. is decnet able to run IPv6? > > IMHO the autoconf (in any case) should only handle netdevices that are > capable to be auto configured (e.g. with IPv6). > > So the question looks like: > "Is this device capable to be auto configured with IPv6?" > and not > "Is the devices MTU >= IPV6_MIN_MTU ?" No, the questions should really be: 1. Is IPV6 supported over this media type. yes: got to 2 no: stop 2. Is the device MTU >= IPV6_MIN_MTU yes: continue no: stop Autoconfiguration is a layer on top of IPv6. Whether it's enabled or not should not dictate whether IPv6 addressed may be configured or not. -vlad - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: ps3: gigabit ethernet driver for PS3
Hi Dan. Dan Williams wrote: > On Fri, 2007-06-15 at 10:16 -0700, Geoff Levand wrote: >> This all seems fine, but what is causing us trouble is that >> during the review of the wireless part Dan Williams made many >> non-trivial recommendations, and so we decided we need to re-work > > sorry :) Its not just because of you... We found some other things lacking and so we plan to do some other work on it. -Geoff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: ps3: gigabit ethernet driver for PS3
On Fri, 2007-06-15 at 10:16 -0700, Geoff Levand wrote: > Jeff Garzik wrote: > > MOKUNO Masakazu wrote: > >> +config GELIC_NET > >> + tristate "PS3 Gigabit Ethernet driver" > >> + depends on PPC_PS3 > >> + help > >> +This driver supports the Gigabit Ethernet device on the > >> +PS3 game console. > >> + > >> +To compile this driver as a module, choose M here: the > >> +module will be called ps3_gelic. > >> + > >> config GIANFAR > >>tristate "Gianfar Ethernet" > >>depends on 85xx || 83xx || PPC_86xx > >> --- a/drivers/net/Makefile > >> +++ b/drivers/net/Makefile > >> @@ -60,6 +60,8 @@ obj-$(CONFIG_TIGON3) += tg3.o > >> obj-$(CONFIG_BNX2) += bnx2.o > >> spidernet-y += spider_net.o spider_net_ethtool.o > >> obj-$(CONFIG_SPIDER_NET) += spidernet.o sungem_phy.o > >> +obj-$(CONFIG_GELIC_NET) += ps3_gelic.o > >> +ps3_gelic-objs += gelic_net.o > >> obj-$(CONFIG_TC35815) += tc35815.o > >> obj-$(CONFIG_SKGE) += skge.o > >> obj-$(CONFIG_SKY2) += sky2.o > > > > How about ps3_gige for the driver name. Ditto DaveM's comments about > > cleanups here. > > Hi Jeff. > > Based on your comments and those from Dave, I feel I need to explain a > little about our system and what we plan for the network support. > > First about the name. The name 'gelic' comes from the underlying > device's name. The name 'gelic' is already a well known in public > to be the PS3's network device (try google), so I think ps3_gelic > is a good name to use here. > > Regarding the device, Mokuno-san already mentioned it, but this > is a virtual device which supports both Ether and wireless, so > it is more that just an Ethernet device (I think we can change > the Kconfig text to be more accurate). > > Now, about the driver. We have arranged the driver so that > the wireless support can be enabled by a config option. So > when CONFIG_GELIC_WIRELESS=n you get ps3_gelic.ko with just > Ether support, and when CONFIG_GELIC_WIRELESS=y, you get > ps3_gelic.ko with both Ether and wireless support. So in > the makefile we would like to have this: > > obj-$(CONFIG_GELIC_NET) += ps3_gelic.o > gelic-$(CONFIG_GELIC_WIRELESS) += gelic_wireless.o > ps3_gelic-objs += gelic_net.o $(gelic-y) > > This all seems fine, but what is causing us trouble is that > during the review of the wireless part Dan Williams made many > non-trivial recommendations, and so we decided we need to re-work sorry :) > the wireless and it won't be ready for 2.6.23. So, to avoid having > to either change module names, or change file names when we submit > the wireless part for 2.6.24, we would like to have this for 2.6.23: > > obj-$(CONFIG_GELIC_NET) += ps3_gelic.o > ps3_gelic-objs += gelic_net.o > > Does it make sense? If you have a better idea, please let us know. > > -Geoff > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules
On 6/15/07, Kieran Mansley <[EMAIL PROTECTED]> wrote: The lock protects the use_count variable. The use_count variable prevents the plugin module unloading while it is being used. I couldn't just use the lock to prevent the module unloading as the hook function (i) might block (and holding a spin_lock would be rather antisocial) (ii) might call back into netfront and try to take the lock again, which would deadlock. If the hook routine blocks on the other code path instead of tx/rx path, why not use a simple atomic reference count. When the reference count reachs zero, free it. Considering you can synchronzie on tx/rx path, the free will not happen under the critical code path. So the uninitialize work could be done inside the free routine even if it blocks. I think that RCU would only work in this situation if the hook functions didn't block,. I agree. Kieran -- best regards, hanzhu - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] qdisc_restart - readability changes, one bug fix.
> - "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and > the lockless > check should be removed, since driver will return > NETDEV_TX_LOCKED only > if lockless is true and driver has to do the locking. In > the original > code as well as the latest code, this code can result in a bug where > if LLTX is not set for a driver (lockless == 0) but the > driver is written > wrongly to do a trylock (despite LLTX being set), the driver returns > LOCKED. But since lockless is zero, the packet is requeue'd > instead of > calling collision code, issuing a warning and freeing up > the skb. This > skb will be retried with this driver next time, and the > same result will > ensue. Removing this check will catch these driver bugs > instead of hiding > the problem. I am keeping this change to readability section since : > a. it is confusing to check two things as it is; and > b. it is difficult to keep this check in the changed > 'switch' code. I agree that the case shouldn't happen, and will only surface if the driver is indeed buggy. I've thought about this conditional being removed for awhile, since it will protect against a poorly written driver wrt locking, but then again a driver behaving like that shouldn't be making it into the kernel. That being said, out-of-tree drivers are heavily used (we have an out-of-tree e1000), and something like this could be missed. But since that is the corner case of a crappy driver, I'm ok with this being removed. > > - Changed some names, like try_get_tx_pkt to dequeue_skb (as > that is the work > being done and easier to understand) and do_dev_requeue to > requeue_skb, These, at a glance, look very similar to the qdisc ->enqueue() and ->dequeue() function pointers. I know I had to look a few times to realize they were separate calls through the qdisc and this new function name. Perhaps dequeue_skb() can become dev_dequeue_skb() and requeue_skb() can be dev_requeue_skb()? Something that helps differentiate these from the function pointers of the qdisc_ops might help distinguish what layer the operation is running on. > +static inline int handle_dev_cpu_collision(struct sk_buff *skb, > +struct net_device *dev, > +struct Qdisc *q) > { > - int ret = handle_dev_cpu_collision(dev); > + int ret; > > - if (ret == SCHED_TX_DROP) { > + if (unlikely(dev->xmit_lock_owner == smp_processor_id())) { > + /* > + * Same CPU holding the lock. It may be a transient > + * configuration error, when hard_start_xmit() > recurses. We > + * detect it by checking xmit owner and drop > the packet when > + * deadloop is detected. Return OK to try the next skb. > + */ > kfree_skb(skb); > - return qdisc_qlen(q); > + if (net_ratelimit()) > + printk(KERN_DEBUG "Dead loop on netdevice %s, " > +"fix it urgently!\n", dev->name); > + ret = qdisc_qlen(q); > + } else { > + /* > + * Another cpu is holding lock, requeue & delay > xmits for > + * some time. > + */ > + __get_cpu_var(netdev_rx_stat).cpu_collision++; > + ret = requeue_skb(skb, dev, q); > } > > - return do_dev_requeue(skb, dev, q); > + return ret; > } I like the single return here. > static inline int qdisc_restart(struct net_device *dev) { > struct Qdisc *q = dev->qdisc; > - unsigned lockless = (dev->features & NETIF_F_LLTX); > struct sk_buff *skb; > + unsigned lockless; > int ret; > > - skb = try_get_tx_pkt(dev, q); > - if (skb == NULL) > + /* Dequeue packet */ > + if (unlikely((skb = dequeue_skb(dev, q)) == NULL)) > return 0; I know there's been discussion on the driver side of the world to stop using unlikely() and likely() clauses. I personally think it's ok in situations like this where you have one corner-case conditional without an else, but it's something to keep in mind when using them in new/rewritten code like this. > if (!netif_queue_stopped(dev)) > ret = dev_hard_start_xmit(skb, dev); I thought you were removing this check of netif_queue_stopped(dev)? Nevermind, I see the followup patch takes care of this. > + switch (ret) { > + case NETDEV_TX_OK: > + /* Driver sent out skb successfully */ > + ret = qdisc_qlen(q); > + break; > + > + case NETDEV_TX_LOCKED: > + /* Driver try lock failed */ > + ret = handle_dev_cpu_collision(skb, dev, q); > + break; > + > + default: > + /* Driver returned NETDEV_TX_BUSY - requeue skb */ > + ret = requeue_skb(skb, dev, q); > + break; > + } Ooo. I like this; very clean and simpl
Re: [PATCH]: ps3: gigabit ethernet driver for PS3
Jeff Garzik wrote: > MOKUNO Masakazu wrote: >> +config GELIC_NET >> +tristate "PS3 Gigabit Ethernet driver" >> +depends on PPC_PS3 >> +help >> + This driver supports the Gigabit Ethernet device on the >> + PS3 game console. >> + >> + To compile this driver as a module, choose M here: the >> + module will be called ps3_gelic. >> + >> config GIANFAR >> tristate "Gianfar Ethernet" >> depends on 85xx || 83xx || PPC_86xx >> --- a/drivers/net/Makefile >> +++ b/drivers/net/Makefile >> @@ -60,6 +60,8 @@ obj-$(CONFIG_TIGON3) += tg3.o >> obj-$(CONFIG_BNX2) += bnx2.o >> spidernet-y += spider_net.o spider_net_ethtool.o >> obj-$(CONFIG_SPIDER_NET) += spidernet.o sungem_phy.o >> +obj-$(CONFIG_GELIC_NET) += ps3_gelic.o >> +ps3_gelic-objs += gelic_net.o >> obj-$(CONFIG_TC35815) += tc35815.o >> obj-$(CONFIG_SKGE) += skge.o >> obj-$(CONFIG_SKY2) += sky2.o > > How about ps3_gige for the driver name. Ditto DaveM's comments about > cleanups here. Hi Jeff. Based on your comments and those from Dave, I feel I need to explain a little about our system and what we plan for the network support. First about the name. The name 'gelic' comes from the underlying device's name. The name 'gelic' is already a well known in public to be the PS3's network device (try google), so I think ps3_gelic is a good name to use here. Regarding the device, Mokuno-san already mentioned it, but this is a virtual device which supports both Ether and wireless, so it is more that just an Ethernet device (I think we can change the Kconfig text to be more accurate). Now, about the driver. We have arranged the driver so that the wireless support can be enabled by a config option. So when CONFIG_GELIC_WIRELESS=n you get ps3_gelic.ko with just Ether support, and when CONFIG_GELIC_WIRELESS=y, you get ps3_gelic.ko with both Ether and wireless support. So in the makefile we would like to have this: obj-$(CONFIG_GELIC_NET) += ps3_gelic.o gelic-$(CONFIG_GELIC_WIRELESS) += gelic_wireless.o ps3_gelic-objs += gelic_net.o $(gelic-y) This all seems fine, but what is causing us trouble is that during the review of the wireless part Dan Williams made many non-trivial recommendations, and so we decided we need to re-work the wireless and it won't be ready for 2.6.23. So, to avoid having to either change module names, or change file names when we submit the wireless part for 2.6.24, we would like to have this for 2.6.23: obj-$(CONFIG_GELIC_NET) += ps3_gelic.o ps3_gelic-objs += gelic_net.o Does it make sense? If you have a better idea, please let us know. -Geoff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules
On 15/6/07 17:22, "Kieran Mansley" <[EMAIL PROTECTED]> wrote: > The lock protects the use_count variable. Yes, that's one thing I noticed -- can you use atomic_t for reference counts and hence reduce the number of times you need to lock/unlock? At least the open-coded lock-decrement-test-maybe-free-unlock sequences could be abstracted into a put_foo() function. -- Keir - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules
On Fri, 2007-06-15 at 11:59 -0400, Zhu Han wrote: > Hi, Kieran, > > I'm just wonder why you try to acquire the lock and increase the > hooks_usecount each time when you use the hook routine. Is there any > generic ways to synchronze the code path using hook routines and > netfront_accelerator_unloaded, considering you can synchronize the > tx/rx data path easily. The lock protects the use_count variable. The use_count variable prevents the plugin module unloading while it is being used. I couldn't just use the lock to prevent the module unloading as the hook function (i) might block (and holding a spin_lock would be rather antisocial) (ii) might call back into netfront and try to take the lock again, which would deadlock. The data path hooks do not block, and are already protected by locks, so these are also taken when trying to unload the plugin module. For this reason it's not necessary to use the hooks_usecount on the data path. I think that RCU would only work in this situation if the hook functions didn't block, and wouldn't affect the data path locking overhead as it wouldn't be necessary there. Kieran - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules
On Fri, 15 Jun 2007 11:59:43 -0400 "Zhu Han" <[EMAIL PROTECTED]> wrote: > Hi, Kieran, > > I'm just wonder why you try to acquire the lock and increase the > hooks_usecount each time when you use the hook routine. Is there any > generic ways to synchronze the code path using hook routines and > netfront_accelerator_unloaded, considering you can synchronize the > Learn to use RCU for this. It would reduce the lock overhead. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 4/4] [Net] Support accelerated network plugin modules
Hi, Kieran, I'm just wonder why you try to acquire the lock and increase the hooks_usecount each time when you use the hook routine. Is there any generic ways to synchronze the code path using hook routines and netfront_accelerator_unloaded, considering you can synchronize the tx/rx data path easily. On 6/15/07, Kieran Mansley <[EMAIL PROTECTED]> wrote: Frontend net driver acceleration diff -r cd3ade350f3f drivers/xen/netfront/netfront.c --- a/drivers/xen/netfront/netfront.c Thu Jun 14 15:04:32 2007 +0100 +++ b/drivers/xen/netfront/netfront.c Fri Jun 15 09:34:41 2007 +0100 @@ -3,6 +3,7 @@ * * Copyright (c) 2002-2005, K A Fraser * Copyright (c) 2005, XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -47,6 +48,7 @@ #include #include #include +#include #include #include #include @@ -73,6 +75,8 @@ struct netfront_cb { }; #define NETFRONT_SKB_CB(skb) ((struct netfront_cb *)((skb)->cb)) + +#include "netfront.h" /* * Mutually-exclusive module options to select receive data path: @@ -144,57 +148,6 @@ static inline int netif_needs_gso(struct #define GRANT_INVALID_REF 0 -#define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE) -#define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE) - -struct netfront_info { - struct list_head list; - struct net_device *netdev; - - struct net_device_stats stats; - - struct netif_tx_front_ring tx; - struct netif_rx_front_ring rx; - - spinlock_t tx_lock; - spinlock_t rx_lock; - - unsigned int irq; - unsigned int copying_receiver; - unsigned int carrier; - - /* Receive-ring batched refills. */ -#define RX_MIN_TARGET 8 -#define RX_DFL_MIN_TARGET 64 -#define RX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256) - unsigned rx_min_target, rx_max_target, rx_target; - struct sk_buff_head rx_batch; - - struct timer_list rx_refill_timer; - - /* -* {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs -* is an index into a chain of free entries. -*/ - struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; - struct sk_buff *rx_skbs[NET_RX_RING_SIZE]; - -#define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256) - grant_ref_t gref_tx_head; - grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; - grant_ref_t gref_rx_head; - grant_ref_t grant_rx_ref[NET_RX_RING_SIZE]; - - struct xenbus_device *xbdev; - int tx_ring_ref; - int rx_ring_ref; - u8 mac[ETH_ALEN]; - - unsigned long rx_pfn_array[NET_RX_RING_SIZE]; - struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1]; - struct mmu_update rx_mmu[NET_RX_RING_SIZE]; -}; - struct netfront_rx_info { struct netif_rx_response rx; struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1]; @@ -278,6 +231,369 @@ static void xennet_sysfs_delif(struct ne #define xennet_sysfs_delif(dev) do { } while(0) #endif +/* + * List of all netfront accelerator plugin modules available. Each + * list entry is of type struct netfront_accelerator. + */ +static struct list_head accelerators_list; +/* + * Lock to protect access to accelerators_list, and also used to + * protect the hooks_usecount field in struct netfront_accelerator + * against concurrent access + */ +static spinlock_t accelerators_lock; + +/* + * Safely remove the accelerator function hooks from a netfront state. + * Must only be called when there are no current users of the hooks. + */ +static void accelerator_remove_hooks(struct netfront_accelerator *accelerator) +{ +struct netfront_accel_vif_state *vif_state; + +list_for_each_entry( vif_state, + &accelerator->vif_states, + link ) { +/* Make sure there are no data path operations going on */ +netif_poll_disable(vif_state->np->netdev); +netif_tx_lock_bh(vif_state->np->netdev); + +/* + * Remove the hooks, but leave the vif_state on the + * accelerator's list as that signifies this vif is + * interested in using that accelerator if it becomes + * available again + */ +vif_state->hooks = NULL; + +netif_tx_unlock_bh(vif_state->np->netdev); +netif_poll_enable(vif_state->np->netdev); +} + +accelerator->hooks = NULL; + +/* Signal that all users of hooks are done */ +up(&accelerator->exit_semaphore); +} + + +/* + * Compare a frontend description string against an accelerator to see + * if they match. Would ultimately be nice to replace the string with + * a unique numeric identifier for each accelerator. + */ +static int match_acceler
Re: 2.4.35-pre1: new e1000 driver breaks old hardware
Wolfgang Nothdurft wrote: Hi, with the new e1000 driver version 7.3.20 the onboard gigabit nic 82547EI (8086:1019) doesn't work correctly. After transferring about 100 megabytes over a gigabit link the transfer stopped and I have to reinit the link either by doing a ifconfig down/up or unplugging the network cable. The RxIntDelay is set to 0 like described in the docu. Playing with this Parameter only increases the amount of traffic to be send and the error occurs later. have you tried (the default) 8000 ? Do you get the same problem when you run 2.6.21 with 7.5.5.1 or 7.3.20 ? This happens also with the 7.4.35 and 7.5.5 driver from the intel side. The driver 5.7.6 from kernel 2.4.33.3 and the 6.1.16 from intel works very well on this hardware. Also other gigabit nics we use didn't have this problem. Is this issue already known? this is the first time I have heard this issue. Is there any solution yet? I'm not that good :) Can you file a bugreport on e1000.sf.net and attach the usual (ethtool -e, dmesg, ifconfig -a, lspci -vv) debugging output and problem description for us? I'll try to see if I can have our labs setup a repro case, which might be hard given the adapter type, but we will do our best. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
Milton Miller wrote: On Jun 6, 2007, at 4:28 AM, Milton Miller wrote: On Jun 5, 2007, at 9:26 PM, Kok, Auke wrote: Kok, Auke wrote: Hmm git-revert seems to do the job right. I checked it with git-show | patch -p1 -R and the results look OK. The two patches on top of the one we want to revert are unrelated enough to apply (manually it shows some fuzz, but otherwise it's OK). Jeff, please `git-revert d52df4a35af569071fda3f4eb08e47cc7023f094` to revert the following patch for now: --- commit d52df4a35af569071fda3f4eb08e47cc7023f094 Author: Scott Feldman <[EMAIL PROTECTED]> Date: Wed Nov 9 02:18:52 2005 -0500 [netdrvr e100] experiment with doing RX in a similar manner to eepro100 I was going to say that eepro100's speedo_rx_link() does the same DMA abuse as e100, but then I noticed one little detail: eepro100 sets both EL (end of list) and S (suspend) bits in the RFD as it chains it to the RFD list. e100 was only setting the EL bit. Hmmm, that's interesting. That means that if HW reads a RFD with the S-bit set, it'll process that RFD and then suspend the receive unit. The receive unit will resume when SW clears the S-bit. There is no need for SW to restart the receive unit. Which means a lot of the receive unit state tracking code in the driver goes away. So here's a patch against 2.6.14. (Sorry for inlining it; the mailer I'm using now will mess with the word wrap). I can't test this on XScale (unless someone has an e100 module for Gumstix :) . It should be doing exactly what eepro100 does with RFDs. I don't believe this change will introduce a performance hit because the S-bit and EL-bit go hand-in-hand meaning if we're going to suspend because of the S- bit, we're on the last resource anyway, so we'll have to wait for SW to replenish. (cherry picked from 29e79da9495261119e3b2e4e7c72507348e75976 commit) --- A little bit more is needed to explain why we're reverting it for now. Jeff, please insert this into the revert commit. Auke -- This patch attempted to fix e100 for non-cache coherent memory architectures by using the cb style code that eepro100 had and using the EL and s bits from the RFD list. Unfortunately the hardware doesn't work exactly like this and therefore this patch actually breaks e100 on those systems. on all systems. (Both the &| typo and the removed restart logic). Reverting the change brings it back to the previously known good state for 2.6.22. The pending rewrite in progress to this code can then be safely merged later. Signed-off-by: Auke Kok <[EMAIL PROTECTED]> Jeff I don't see this in netdev upstream-fixes. Are you waiting on Auke to respond to this? It should now be upstream, as of a day or two ago. Sorry for the delaying, was travelling, and wound up accidentally losing Auke's commit description. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6] [TCP]: Congestion control API RTT sampling fix
On Fri, 15 Jun 2007 12:30:12 +0300 (EEST) "Ilpo Järvinen" <[EMAIL PROTECTED]> wrote: > Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT > sampling of congestion control modules. Inaccurate timestamps > could be fed to them without providing any way for them to > identify such cases. Previously RTT sampler was called only if > FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate > timestamps nicely. In addition, the new behavior could give an > invalid timestamp (zero) to RTT sampler if only skbs with > TCPCB_RETRANS were ACKed. This solves both problems. > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > Yes, let's get this in 2.6.22 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive v2
On Fri, Jun 15 2007, Evgeniy Polyakov wrote: > On Fri, Jun 15, 2007 at 11:34:30AM +0200, Jens Axboe ([EMAIL PROTECTED]) > wrote: > > > Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages > > > was 2 there was only 1 free slot in pipe_buffer. > > > > Ah duh, indeed, it is a dumb bug indeed. This should work. > > Yep, this one works too. Great, thanks for testing! -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [TCP]: Fix logic breakage due to DSACK separation
Commit 6f74651ae626ec672028587bc700538076dfbefb is found guilty of breaking DSACK counting, which should be done only for the SACK block reported by the DSACK instead of every SACK block that is received along with DSACK information. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_input.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d6d0f9b..c6a363b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -953,7 +953,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ int prior_fackets; u32 lost_retrans = 0; int flag = 0; - int dup_sack = 0; + int found_dup_sack = 0; int cached_fack_count; int i; int first_sack_index; @@ -964,20 +964,20 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ /* Check for D-SACK. */ if (before(ntohl(sp[0].start_seq), TCP_SKB_CB(ack_skb)->ack_seq)) { - dup_sack = 1; + found_dup_sack = 1; tp->rx_opt.sack_ok |= 4; NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV); } else if (num_sacks > 1 && !after(ntohl(sp[0].end_seq), ntohl(sp[1].end_seq)) && !before(ntohl(sp[0].start_seq), ntohl(sp[1].start_seq))) { - dup_sack = 1; + found_dup_sack = 1; tp->rx_opt.sack_ok |= 4; NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV); } /* D-SACK for already forgotten data... * Do dumb counting. */ - if (dup_sack && + if (found_dup_sack && !after(ntohl(sp[0].end_seq), prior_snd_una) && after(ntohl(sp[0].end_seq), tp->undo_marker)) tp->undo_retrans--; @@ -1058,6 +1058,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ __u32 start_seq = ntohl(sp->start_seq); __u32 end_seq = ntohl(sp->end_seq); int fack_count; + int dup_sack = (found_dup_sack && (i == first_sack_index)); skb = cached_skb; fack_count = cached_fack_count; -- 1.5.0.6
Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules
On 15/6/07 13:21, "Michael Buesch" <[EMAIL PROTECTED]> wrote: >> True in general, but not the cases I've seen in this patchset, where 'foo' >> is a predicate. > > Ok, if foo is a variable containing an error code, it's > better to return that error code. I assumed that foo is a > variable containing some value (counter or something). I mean specifically things like: if (be->accelerator == NULL) return 1; else return 0; K. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules
On Friday 15 June 2007 14:20:34 Keir Fraser wrote: > > On 15/6/07 13:11, "Michael Buesch" <[EMAIL PROTECTED]> wrote: > > >> No use of the following please: > >> If (foo) return 1; else return 0; > >> Is clearer as: > >> Return foo; > > > > But it's not the same. > > return !!foo; > > would be the same. And yes, it does matter: > > True in general, but not the cases I've seen in this patchset, where 'foo' > is a predicate. Ok, if foo is a variable containing an error code, it's better to return that error code. I assumed that foo is a variable containing some value (counter or something). -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules
On 15/6/07 13:11, "Michael Buesch" <[EMAIL PROTECTED]> wrote: >> No use of the following please: >> If (foo) return 1; else return 0; >> Is clearer as: >> Return foo; > > But it's not the same. > return !!foo; > would be the same. And yes, it does matter: True in general, but not the cases I've seen in this patchset, where 'foo' is a predicate. -- Keir - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules
On Friday 15 June 2007 13:26:03 Keir Fraser wrote: > On 15/6/07 11:46, "Kieran Mansley" <[EMAIL PROTECTED]> wrote: > > > This is a repost of some earlier patches to the xen-devel mailing list, > > with a number of changes thanks to some useful suggestions from others. > > The coding style needs fixing in various ways. > > Hard tabs need to be used, no spaces inside brackets, but should include > spaces between if/while/for and bracket, and bracket and brace: > if (foo) { > Not > if( foo ){ > if(foo ) { > Or various other possibilities. > > No use of the following please: > If (foo) return 1; else return 0; > Is clearer as: > Return foo; But it's not the same. return !!foo; would be the same. And yes, it does matter: int x(void) { unsigned long long v = 0xFF00ULL; /*foo*/ return v; } -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH 0/4] [Net] Support accelerated network plugin modules
On 15/6/07 11:46, "Kieran Mansley" <[EMAIL PROTECTED]> wrote: > This is a repost of some earlier patches to the xen-devel mailing list, > with a number of changes thanks to some useful suggestions from others. The coding style needs fixing in various ways. Hard tabs need to be used, no spaces inside brackets, but should include spaces between if/while/for and bracket, and bracket and brace: if (foo) { Not if( foo ){ if(foo ) { Or various other possibilities. No use of the following please: If (foo) return 1; else return 0; Is clearer as: Return foo; There's a (good) idiom for error handling in Linux: If (!error) { page of code; } else { oh no, error, print and return; } Should be changed to: If (error) { handle error, bail if necessary } Page of code, no longer indented by error check; Apart from the various coding style bits, accelerator_probe_vifs_on_load() walks the vif_states list with no lock held. Is this safe against the list changing? -- Keir - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH tcp-2.6 3/3] [TCP]: Move sack_ok access to obviously named funcs & cleanup
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <[EMAIL PROTECTED]> Previously code had IsReno/IsFack defined as macros that were local to tcp_input.c though sack_ok field has users elsewhere too for the same purpose. This changes them to static inlines as preferred according the current coding style and unifies the access to sack_ok across multiple files. Magic bitops of sack_ok for FACK and DSACK are also abstracted to functions with appropriate names. Note: - One sack_ok = 1 remains but that's self explanary, i.e., it enables sack - Couple of !IsReno cases are changed to tcp_is_sack - There were no users for IsDSack => I dropped it Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- include/net/tcp.h| 28 net/ipv4/tcp.c |2 +- net/ipv4/tcp_input.c | 103 +- net/ipv4/tcp_minisocks.c |2 +- net/ipv4/tcp_output.c|6 +- net/ipv4/tcp_timer.c |2 +- 6 files changed, 90 insertions(+), 53 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index d12154a..dd9749e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -715,6 +715,34 @@ static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event) icsk->icsk_ca_ops->cwnd_event(sk, event); } +/* These functions determine how the current flow behaves in respect of SACK + * handling. SACK is negotiated with the peer, and therefore it can vary + * between different flows. + * + * tcp_is_sack - SACK enabled + * tcp_is_reno - No SACK + * tcp_is_fack - FACK enabled, implies SACK enabled + */ +static inline int tcp_is_sack(const struct tcp_sock *tp) +{ + return tp->rx_opt.sack_ok; +} + +static inline int tcp_is_reno(const struct tcp_sock *tp) +{ + return !tcp_is_sack(tp); +} + +static inline int tcp_is_fack(const struct tcp_sock *tp) +{ + return tp->rx_opt.sack_ok & 2; +} + +static inline void tcp_enable_fack(struct tcp_sock *tp) +{ + tp->rx_opt.sack_ok |= 2; +} + /* This determines how many packets are "in the network" to the best * of our knowledge. In many cases it is conservative, but where * detailed information is available from the receiver (via SACK diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index bd4c295..e3de919 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2000,7 +2000,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info) if (tp->rx_opt.tstamp_ok) info->tcpi_options |= TCPI_OPT_TIMESTAMPS; - if (tp->rx_opt.sack_ok) + if (tcp_is_sack(tp)) info->tcpi_options |= TCPI_OPT_SACK; if (tp->rx_opt.wscale_ok) { info->tcpi_options |= TCPI_OPT_WSCALE; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f3942c3..3efd4f5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -108,11 +108,6 @@ int sysctl_tcp_abc __read_mostly; #define FLAG_CA_ALERT (FLAG_DATA_SACKED|FLAG_ECE) #define FLAG_FORWARD_PROGRESS (FLAG_ACKED|FLAG_DATA_SACKED) -#define IsReno(tp) ((tp)->rx_opt.sack_ok == 0) -#define IsFack(tp) ((tp)->rx_opt.sack_ok & 2) -#define IsDSack(tp) ((tp)->rx_opt.sack_ok & 4) -#define Is3517Sack(tp) (!IsReno(tp) && !IsFack(tp)) - #define IsSackFrto() (sysctl_tcp_frto == 0x2) #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_PSH) @@ -840,6 +835,21 @@ void tcp_enter_cwr(struct sock *sk, const int set_ssthresh) } } +/* + * Packet counting of FACK is based on in-order assumptions, therefore TCP + * disables it when reordering is detected + */ +static void tcp_disable_fack(struct tcp_sock *tp) +{ + tp->rx_opt.sack_ok &= ~2; +} + +/* Take a notice that peer is sending DSACKs */ +static void tcp_dsack_seen(struct tcp_sock *tp) +{ + tp->rx_opt.sack_ok |= 4; +} + /* Initialize metrics on socket. */ static void tcp_init_metrics(struct sock *sk) @@ -861,7 +871,7 @@ static void tcp_init_metrics(struct sock *sk) } if (dst_metric(dst, RTAX_REORDERING) && tp->reordering != dst_metric(dst, RTAX_REORDERING)) { - tp->rx_opt.sack_ok &= ~2; + tcp_disable_fack(tp); tp->reordering = dst_metric(dst, RTAX_REORDERING); } @@ -923,9 +933,9 @@ static void tcp_update_reordering(struct sock *sk, const int metric, /* This exciting event is worth to be remembered. 8) */ if (ts) NET_INC_STATS_BH(LINUX_MIB_TCPTSREORDER); - else if (IsReno(tp)) + else if (tcp_is_reno(tp)) NET_INC_STATS_BH(LINUX_MIB_TCPRENOREORDER); - else if (IsFack(tp)) + else if (tcp_is_fack(tp)) NET_INC_STATS_BH(LINUX_MIB_TCPFACKREORDER); else NET_INC_STATS_BH(LINUX_MIB_TCPSACKREORDER); @@ -937,8 +947,7 @@ static void tcp_update_reordering(struct sock *sk, const int metric, tp->s
[PATCH tcp-2.6 0/3]: cleanups
Couple of relatively simple cleanups to tcp-2.6. -- i. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH tcp-2.6 1/3] [TCP]: Add tcp_dec_pcount_approx int variant
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <[EMAIL PROTECTED]> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- include/net/tcp.h | 11 --- net/ipv4/tcp_output.c |8 ++-- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 6f88d13..d12154a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -593,16 +593,21 @@ static inline int tcp_skb_mss(const struct sk_buff *skb) return skb_shinfo(skb)->gso_size; } -static inline void tcp_dec_pcount_approx(__u32 *count, -const struct sk_buff *skb) +static inline void tcp_dec_pcount_approx_int(__u32 *count, const int decr) { if (*count) { - *count -= tcp_skb_pcount(skb); + *count -= decr; if ((int)*count < 0) *count = 0; } } +static inline void tcp_dec_pcount_approx(__u32 *count, +const struct sk_buff *skb) +{ + tcp_dec_pcount_approx_int(count, tcp_skb_pcount(skb)); +} + static inline void tcp_packets_out_inc(struct sock *sk, const struct sk_buff *skb) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 23ee283..c8be1c8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -738,15 +738,11 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss if (diff > 0) { /* Adjust Reno SACK estimate. */ if (!tp->rx_opt.sack_ok) { - tp->sacked_out -= diff; - if ((int)tp->sacked_out < 0) - tp->sacked_out = 0; + tcp_dec_pcount_approx_int(&tp->sacked_out, diff); tcp_sync_left_out(tp); } - tp->fackets_out -= diff; - if ((int)tp->fackets_out < 0) - tp->fackets_out = 0; + tcp_dec_pcount_approx_int(&tp->fackets_out, diff); } } -- 1.5.0.6 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH tcp-2.6 2/3] [TCP]: Remove num_acked>0 checks from cong.ctrl mods pkts_acked
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <[EMAIL PROTECTED]> There is no need for such check in pkts_acked because the callback is not invoked unless at least one segment got fully ACKed (i.e., the snd_una moved past skb's end_seq) by the cumulative ACK's snd_una advancement. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_bic.c |2 +- net/ipv4/tcp_cubic.c|2 +- net/ipv4/tcp_westwood.c |3 +-- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c index 281c9f9..9ebd8ec 100644 --- a/net/ipv4/tcp_bic.c +++ b/net/ipv4/tcp_bic.c @@ -210,7 +210,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, ktime_t last) { const struct inet_connection_sock *icsk = inet_csk(sk); - if (cnt > 0 && icsk->icsk_ca_state == TCP_CA_Open) { + if (icsk->icsk_ca_state == TCP_CA_Open) { struct bictcp *ca = inet_csk_ca(sk); cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT; ca->delayed_ack += cnt; diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 1422448..8be8ac4 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -338,7 +338,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, ktime_t last) { const struct inet_connection_sock *icsk = inet_csk(sk); - if (cnt > 0 && icsk->icsk_ca_state == TCP_CA_Open) { + if (icsk->icsk_ca_state == TCP_CA_Open) { struct bictcp *ca = inet_csk_ca(sk); cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT; ca->delayed_ack += cnt; diff --git a/net/ipv4/tcp_westwood.c b/net/ipv4/tcp_westwood.c index e61e09d..0d56a5a 100644 --- a/net/ipv4/tcp_westwood.c +++ b/net/ipv4/tcp_westwood.c @@ -103,8 +103,7 @@ static void westwood_filter(struct westwood *w, u32 delta) static void tcp_westwood_pkts_acked(struct sock *sk, u32 cnt, ktime_t last) { struct westwood *w = inet_csk_ca(sk); - if (cnt > 0) - w->rtt = tcp_sk(sk)->srtt >> 3; + w->rtt = tcp_sk(sk)->srtt >> 3; } /* -- 1.5.0.6 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
Hello Yi, On Fri, 2007-15-06 at 09:27 +0800, Zhu Yi wrote: > 1. driver becomes complicated (as it is too elaborate in the queue > wakeup strategies design) I am not sure i see the complexity in the wireless driver's wakeup strategy. I just gave some suggestions to use management frames - they dont have to be literally that way. > 2. duplicated code among drivers (otherwise you put all the queue > management logics in a new layer?) There will be some shared code on drivers of same media on the netif_stop/wake strategy perhaps, but not related to queue management. > 3. it's not 100% accurate. there has to be some overhead, more or less > depends on the queue wakeup strategy the driver selected. Why is it not accurate for wireless? I can see the corner case Patrick mentioned in wired ethernet but then wired ethernet doesnt have other events such as management frames (actually DCE does) to help. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] [Net] Support accelerated network plugin modules
Backend net driver acceleration Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]> diff -r fd61ea65cba3 drivers/xen/netback/common.h --- a/drivers/xen/netback/common.h Thu Jun 14 14:51:20 2007 +0100 +++ b/drivers/xen/netback/common.h Thu Jun 14 14:56:41 2007 +0100 @@ -122,6 +122,41 @@ enum { extern int netbk_copy_skb_mode; + +#include + +/* Function pointers into netback accelerator plugin modules */ +struct netback_accel_hooks { +int (*probe)(struct xenbus_device *dev); +int (*remove)(struct xenbus_device *dev); +}; + +/* Structure to track the state of a netback accelerator plugin */ +struct netback_accelerator { +struct list_head link; +int id; +char *frontend; +struct netback_accel_hooks *hooks; +}; + +/* Connect an accelerator plugin module to netback */ +extern void netback_connect_accelerator(int id, const char *frontend, +struct netback_accel_hooks *hooks); +/* Disconnect a previously connected accelerator pluging module */ +extern void netback_disconnect_accelerator(int id, const char *frontend); + +struct backend_info { + struct xenbus_device *dev; + netif_t *netif; + enum xenbus_state frontend_state; + +/* State relating to the netback accelerator */ +void *netback_accel_priv; +/* The accelerator that this backend is currently using */ +struct netback_accelerator *accelerator; +}; + + #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE) #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE) diff -r fd61ea65cba3 drivers/xen/netback/xenbus.c --- a/drivers/xen/netback/xenbus.c Thu Jun 14 14:51:20 2007 +0100 +++ b/drivers/xen/netback/xenbus.c Thu Jun 14 14:56:41 2007 +0100 @@ -1,6 +1,7 @@ /* Xenbus code for netif backend Copyright (C) 2005 Rusty Russell <[EMAIL PROTECTED]> Copyright (C) 2005 XenSource Ltd +Copyright (c) 2007 Solarflare Communications, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -28,11 +29,126 @@ printk("netback/xenbus (%s:%d) " fmt ".\n", __FUNCTION__, __LINE__, ##args) #endif -struct backend_info { - struct xenbus_device *dev; - netif_t *netif; - enum xenbus_state frontend_state; -}; +/* + * A list of available netback accelerator plugin modules (each list + * entry is of type struct netback_accelerator) + */ +static struct list_head accelerators_list; +/* Lock used to protect access to accelerators_list */ +static spinlock_t accelerators_lock; + +/* + * Compare a backend to an accelerator, and decide if they are + * compatible (i.e. if the accelerator should be used by the + * backend) + */ +static int match_accelerator(struct backend_info *be, + struct netback_accelerator *accelerator) +{ +/* + * This could do with being more sophisticated. For example, + * determine which hardware is being used by each backend from + * the bridge and network topology of the domain + */ +if ( be->accelerator == NULL ) +return 1; +else +return 0; +} + +/* + * Notify all suitable backends that a new accelerator is available + * and connected. This will also notify the accelerator plugin module + * that it is being used for a device through the probe hook. + */ +static int netback_accelerator_tell_backend(struct device *dev, void *arg) +{ +struct netback_accelerator *accelerator = +(struct netback_accelerator *)arg; +struct xenbus_device *xendev = to_xenbus_device(dev); + +if( !strcmp("vif", xendev->devicetype) ) { +struct backend_info *be = xendev->dev.driver_data; + +if ( match_accelerator(be, accelerator) ) { + be->accelerator = accelerator; + be->accelerator->hooks->probe(xendev); +} +} +return 0; +} + + +/* + * Entry point for an netback accelerator plugin module. Called to + * advertise its presence, and connect to any suitable backends. + */ +void netback_connect_accelerator(int id, const char *frontend, + struct netback_accel_hooks *hooks) +{ +struct netback_accelerator *new_accelerator = +kmalloc(sizeof(struct netback_accelerator), GFP_KERNEL); +unsigned frontend_len, flags; + +if ( new_accelerator ) { +new_accelerator->id = id; + +frontend_len = strlen(frontend)+1; +new_accelerator->frontend = kmalloc(frontend_len, GFP_KERNEL); +if ( !new_accelerator->frontend ) { +DPRINTK("%s: failed to allocate memory for frontend string\n", +__FUNCTION__); +
[PATCH 2/4] [Net] Support accelerated network plugin modules
Add accel option to vif xend config Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]> diff -r 405eb3e22887 tools/python/xen/xend/server/netif.py --- a/tools/python/xen/xend/server/netif.py Thu Jun 14 14:50:04 2007 +0100 +++ b/tools/python/xen/xend/server/netif.py Thu Jun 14 14:52:55 2007 +0100 @@ -107,6 +107,7 @@ class NetifController(DevController): uuid= config.get('uuid') ipaddr = config.get('ip') model = config.get('model') +accel = config.get('accel') if not typ: typ = xoptions.netback_type @@ -131,6 +132,8 @@ class NetifController(DevController): back['uuid'] = uuid if model: back['model'] = model +if accel: +back['accel'] = accel config_path = "device/%s/%d/" % (self.deviceClass, devid) for x in back: @@ -157,10 +160,10 @@ class NetifController(DevController): config_path = "device/%s/%d/" % (self.deviceClass, devid) devinfo = () for x in ( 'script', 'ip', 'bridge', 'mac', - 'type', 'vifname', 'rate', 'uuid', 'model' ): + 'type', 'vifname', 'rate', 'uuid', 'model', 'accel'): y = self.vm._readVm(config_path + x) devinfo += (y,) -(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo +(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = devinfo if script: result['script'] = script @@ -180,5 +183,7 @@ class NetifController(DevController): result['uuid'] = uuid if model: result['model'] = model +if accel: +result['accel'] = accel return result diff -r 405eb3e22887 tools/python/xen/xm/create.py --- a/tools/python/xen/xm/create.py Thu Jun 14 14:50:04 2007 +0100 +++ b/tools/python/xen/xm/create.py Thu Jun 14 14:52:55 2007 +0100 @@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals): def f(k): if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type', - 'vifname', 'rate', 'model']: + 'vifname', 'rate', 'model', 'accel']: err('Invalid vif option: ' + k) config_vif.append([k, d[k]]) Add accel option to vif xend config diff -r 405eb3e22887 tools/python/xen/xend/server/netif.py --- a/tools/python/xen/xend/server/netif.py Thu Jun 14 14:50:04 2007 +0100 +++ b/tools/python/xen/xend/server/netif.py Thu Jun 14 14:52:55 2007 +0100 @@ -107,6 +107,7 @@ class NetifController(DevController): uuid= config.get('uuid') ipaddr = config.get('ip') model = config.get('model') +accel = config.get('accel') if not typ: typ = xoptions.netback_type @@ -131,6 +132,8 @@ class NetifController(DevController): back['uuid'] = uuid if model: back['model'] = model +if accel: +back['accel'] = accel config_path = "device/%s/%d/" % (self.deviceClass, devid) for x in back: @@ -157,10 +160,10 @@ class NetifController(DevController): config_path = "device/%s/%d/" % (self.deviceClass, devid) devinfo = () for x in ( 'script', 'ip', 'bridge', 'mac', - 'type', 'vifname', 'rate', 'uuid', 'model' ): + 'type', 'vifname', 'rate', 'uuid', 'model', 'accel'): y = self.vm._readVm(config_path + x) devinfo += (y,) -(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo +(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = devinfo if script: result['script'] = script @@ -180,5 +183,7 @@ class NetifController(DevController): result['uuid'] = uuid if model: result['model'] = model +if accel: +result['accel'] = accel return result diff -r 405eb3e22887 tools/python/xen/xm/create.py --- a/tools/python/xen/xm/create.py Thu Jun 14 14:50:04 2007 +0100 +++ b/tools/python/xen/xm/create.py Thu Jun 14 14:52:55 2007 +0100 @@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals): def f(k): if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type', - 'vifname', 'rate', 'model']: + 'vifname', 'rate', 'model', 'accel']: err('Invalid vif option: ' + k) config_vif.append([k, d[k]])
[PATCH 0/4] [Net] Support accelerated network plugin modules
This is a repost of some earlier patches to the xen-devel mailing list, with a number of changes thanks to some useful suggestions from others. I've also CC'd netdev@vger.kernel.org at Herbert Xu's request as some of the files being patched may be merged into upstream linux soon, and so folks there may have opinions too. Changes from last time: - Improved the link between frontend and accelerated plugin so that netif_wake_queue is only called when both have available TX slots. Thanks to Herbert Xu for pointing this out. - Removed macro that safely called the accelerated hooks; this code is now inline in the relevant places. - Some small cleanup modifications, and resync patches to latest xen- unstable.hg, including source tree layout changes for linux-2.6.18- xen.hg The discussion following the previous posting of these patches was mainly concerned with the approach to locking in the frontend. Some felt it was too complex and suggested using RCU instead. However, I have been unable to convince myself that RCU would offer suitable protection against the accelerated plugin module unloading in the middle of a hook call, particularly where these hook calls might block. The existing locking has been well tested and is therefore known to work. In addition, the complexity is largely to ensure there is low locking overhead on the data path, and so a change to simplify it would not likely increase performance. As a result the locking has not changed substantially since the last patch. However if others feel strongly about this and can convince me that RCU would be adequate, I wouldn't object to making this change. What follows is the text from the original post providing some background on the patches: This set of patches provides the hooks and support necessary for accelerated network plugin modules to attach to Xen's netback and netfront. These modules provide a fast path for network traffic where there is hardware support available for the netfront driver to send and receive packets directly to a NIC (such as those available from Solarflare). As there are currently no available plugins, I've attached a couple of dummy ones to illustrate how the hooks could be used. These are incomplete (and clearly wouldn't even compile) in that they only include code to show the interface between the accelerated module and netfront/netback. A lot of the comments hint at what code should go where. They don't show any interface between the accelerated frontend and accelerated backend, or hardware access, for example, as those would both be specific to the implementation. I hope they help illustrate this, but if you have any questions I'm happy to provide more information. A brief overview of the operation of the plugins: When the accelerated modules are loaded, a VI is created by the accelerated backend to allow the accelerated frontend to safely access portions of the NIC. For RX, when packets are received by the accelerated backend, it will examine them and if appropriate insert filters into the NIC to deliver future packets on that address directly to the accelerated frontend's VI. For TX, netfront gives each accelerated frontend the option of sending each packet, which it can accept (if it wants to send it directly to the hardware) or decline (if it thinks this is more appropriate to send via the normal network path). We have found that using this approach to accelerating network traffic, domU to domU connections (across the network) can achieve close to the performance of dom0 to dom0 connections on a 10Gbps ethernet. This is roughly double the bandwidth seen with unmodified Xen. /***/ /*! \file dumm_accel_backend.c Dummy accelerated plugin module Copyright 2006 Solarflare Communications Inc, 9501 Jeronimo Road, Suite 250, Irvine, CA 92618, USA This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation, incorporated herein by reference. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /***/ static struct netback_accel_hooks accel_hooks = { &netback_accel_probe, &netback_accel_remove }; static const char *frontend_name = "dummynetaccel"; static int netback_accel_init(void) { /* Initialise the rest of the module... */ /* Tell the netback that we're here */ netback_connect_accelerator(
[PATCH 1/4] [Net] Support accelerated network plugin modules
Add xenbus_for_each_[back,front]end functions to iterate each bus Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]> diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe.c --- a/drivers/xen/xenbus/xenbus_probe.c Sun Jun 10 19:50:32 2007 +0100 +++ b/drivers/xen/xenbus/xenbus_probe.c Thu Jun 14 14:49:40 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -1085,3 +1086,10 @@ static int __init boot_wait_for_devices( late_initcall(boot_wait_for_devices); #endif + + +int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_frontend); diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe_backend.c --- a/drivers/xen/xenbus/xenbus_probe_backend.c Sun Jun 10 19:50:32 2007 +0100 +++ b/drivers/xen/xenbus/xenbus_probe_backend.c Thu Jun 14 14:49:40 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -285,3 +286,10 @@ void xenbus_backend_device_register(void xenbus_backend.error); } } + +int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_backend); + diff -r d5e0eb7dd069 include/xen/xenbus.h --- a/include/xen/xenbus.h Sun Jun 10 19:50:32 2007 +0100 +++ b/include/xen/xenbus.h Thu Jun 14 14:49:40 2007 +0100 @@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d int xenbus_dev_is_online(struct xenbus_device *dev); int xenbus_frontend_closed(struct xenbus_device *dev); +extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *)); +extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *)); + #endif /* _XEN_XENBUS_H */ Add xenbus_for_each_[back,front]end functions to iterate each bus diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe.c --- a/drivers/xen/xenbus/xenbus_probe.c Sun Jun 10 19:50:32 2007 +0100 +++ b/drivers/xen/xenbus/xenbus_probe.c Thu Jun 14 14:49:40 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -1085,3 +1086,10 @@ static int __init boot_wait_for_devices( late_initcall(boot_wait_for_devices); #endif + + +int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_frontend); diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe_backend.c --- a/drivers/xen/xenbus/xenbus_probe_backend.c Sun Jun 10 19:50:32 2007 +0100 +++ b/drivers/xen/xenbus/xenbus_probe_backend.c Thu Jun 14 14:49:40 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -285,3 +286,10 @@ void xenbus_backend_device_register(void xenbus_backend.error); } } + +int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_backend); + diff -r d5e0eb7dd069 include/xen/xenbus.h --- a/include/xen/xenbus.h Sun Jun 10 19:50:32 2007 +0100 +++ b/include/xen/xenbus.h Thu Jun 14 14:49:40 2007 +0100 @@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d int xenbus_dev_is_online(struct xenbus_device *dev); int xenbus_frontend_closed(struct xenbus_device *dev); +extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *)); +extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *)); + #endif /* _XEN_XENBUS_H */
Re: Fwd: [PATCH] [-mm] ACPI: export ACPI events via netlink
On Fri, 2007-15-06 at 09:01 +0800, Zhang Rui wrote: > > rtnl_open_byproto(&rth, nl_mgrp(mydiscoveredacpiid), NETLINK_GENERIC) > > > Yes. It doesn't work if I use my group id here. > In fact, I'm using rtnl_open_byproto(&rth, 1, NETLINK_GENERIC) now. > That's why I said that this demo receives all the broadcasted genetlink > messages. Ok, tell me how to generate these ACPI events and i will patch my laptop and test it. What kernel compile options do i need? 2.6.24-rc4 will do? cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive v2
On Fri, Jun 15, 2007 at 11:34:30AM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: > > Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages > > was 2 there was only 1 free slot in pipe_buffer. > > Ah duh, indeed, it is a dumb bug indeed. This should work. Yep, this one works too. > diff --git a/fs/splice.c b/fs/splice.c > index 89871c6..5327cbf 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -172,6 +172,7 @@ static const struct pipe_buf_operations > user_page_pipe_buf_ops = { > ssize_t splice_to_pipe(struct pipe_inode_info *pipe, > struct splice_pipe_desc *spd) > { > + unsigned int spd_pages = spd->nr_pages; > int ret, do_wakeup, page_nr; > > ret = 0; > @@ -252,7 +253,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, > } > } > > - while (page_nr < spd->nr_pages) > + while (page_nr < spd_pages) > spd->spd_release(spd, page_nr++); > > return ret; > > -- > Jens Axboe -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive v2
On Fri, Jun 15 2007, Evgeniy Polyakov wrote: > On Fri, Jun 15, 2007 at 10:43:18AM +0200, Jens Axboe ([EMAIL PROTECTED]) > wrote: > > > So, things turned down to be not in the __splice_from_pipe(), but > > > splice_to_pipe(). Attached patch fixes a leak for me. > > > It was tested with different data files and all were received correctly. > > > > > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > > > > > diff --git a/fs/splice.c b/fs/splice.c > > > index bc481f1..365bfd9 100644 > > > --- a/fs/splice.c > > > +++ b/fs/splice.c > > > @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, > > > break; > > > if (pipe->nrbufs < PIPE_BUFFERS) > > > continue; > > > - > > > - break; > > > } > > > > > > if (spd->flags & SPLICE_F_NONBLOCK) { > > > > > > > Hmm, curious. If we hit that location, then two conditions are true: > > > > - Pipe is full > > - We transferred some data > > Yep. > > > if you remove the break, then you'll end up blocking in pipe_wait() > > (unless you have SPLICE_F_NONBLOCK also set). And we don't want to block > > waiting for more room, if we already transferred some data. In that case > > we just want to return a short splice. Looking at pipe_write(), it'll > > block as well though. Just doesn't seem optimal to me, but... > > > > So the question is why would doing the break there cause a leak? I just > > don't yet see how it can happen, I'd love to fix that condition instead. > > For the case you describe, we should have page_nr == 1 and spd->nr_pages > > == 2. Is the: > > > > while (page_nr < spd->nr_pages) > > spd->spd_release(spd, page_nr++); > > > > not dropping the right reference? > > Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages > was 2 there was only 1 free slot in pipe_buffer. Ah duh, indeed, it is a dumb bug indeed. This should work. diff --git a/fs/splice.c b/fs/splice.c index 89871c6..5327cbf 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -172,6 +172,7 @@ static const struct pipe_buf_operations user_page_pipe_buf_ops = { ssize_t splice_to_pipe(struct pipe_inode_info *pipe, struct splice_pipe_desc *spd) { + unsigned int spd_pages = spd->nr_pages; int ret, do_wakeup, page_nr; ret = 0; @@ -252,7 +253,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, } } - while (page_nr < spd->nr_pages) + while (page_nr < spd_pages) spd->spd_release(spd, page_nr++); return ret; -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6] [TCP]: Congestion control API RTT sampling fix
Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT sampling of congestion control modules. Inaccurate timestamps could be fed to them without providing any way for them to identify such cases. Previously RTT sampler was called only if FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate timestamps nicely. In addition, the new behavior could give an invalid timestamp (zero) to RTT sampler if only skbs with TCPCB_RETRANS were ACKed. This solves both problems. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- include/linux/ktime.h | 12 include/linux/skbuff.h |4 net/ipv4/tcp_illinois.c |3 +++ net/ipv4/tcp_input.c|6 +- net/ipv4/tcp_lp.c |3 ++- net/ipv4/tcp_vegas.c|3 +++ net/ipv4/tcp_veno.c |3 +++ 7 files changed, 32 insertions(+), 2 deletions(-) diff --git a/include/linux/ktime.h b/include/linux/ktime.h index c762954..2b139f6 100644 --- a/include/linux/ktime.h +++ b/include/linux/ktime.h @@ -261,6 +261,18 @@ static inline s64 ktime_to_ns(const ktime_t kt) #endif +/** + * ktime_equal - Compares two ktime_t variables to see if they are equal + * @cmp1: comparable1 + * @cmp2: comparable2 + * + * Compare two ktime_t variables, returns 1 if equal + */ +static inline int ktime_equal(const ktime_t cmp1, const ktime_t cmp2) +{ + return cmp1.tv64 == cmp2.tv64; +} + static inline s64 ktime_to_us(const ktime_t kt) { struct timeval tv = ktime_to_timeval(kt); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index e7367c7..6f0b2f7 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1579,6 +1579,10 @@ static inline ktime_t net_timedelta(ktime_t t) return ktime_sub(ktime_get_real(), t); } +static inline ktime_t net_invalid_timestamp(void) +{ + return ktime_set(0, 0); +} extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len); extern __sum16 __skb_checksum_complete(struct sk_buff *skb); diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c index 4adc47c..b2b2256 100644 --- a/net/ipv4/tcp_illinois.c +++ b/net/ipv4/tcp_illinois.c @@ -90,6 +90,9 @@ static void tcp_illinois_acked(struct sock *sk, u32 pkts_acked, ktime_t last) ca->acked = pkts_acked; + if (ktime_equal(last, net_invalid_timestamp())) + return; + rtt = ktime_to_us(net_timedelta(last)); /* ignore bogus values, this prevents wraparound in alpha math */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d6d0f9b..aaf6f66 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2409,7 +2409,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) int acked = 0; int prior_packets = tp->packets_out; __s32 seq_rtt = -1; - ktime_t last_ackt = ktime_set(0,0); + ktime_t last_ackt = net_invalid_timestamp(); while ((skb = tcp_write_queue_head(sk)) && skb != tcp_send_head(sk)) { @@ -2487,6 +2487,10 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) tcp_ack_update_rtt(sk, acked, seq_rtt); tcp_ack_packets_out(sk); + /* Is the ACK triggering packet unambiguous? */ + if (acked & FLAG_RETRANS_DATA_ACKED) + last_ackt = net_invalid_timestamp(); + if (ca_ops->pkts_acked) ca_ops->pkts_acked(sk, pkts_acked, last_ackt); } diff --git a/net/ipv4/tcp_lp.c b/net/ipv4/tcp_lp.c index 43294ad..e49836c 100644 --- a/net/ipv4/tcp_lp.c +++ b/net/ipv4/tcp_lp.c @@ -266,7 +266,8 @@ static void tcp_lp_pkts_acked(struct sock *sk, u32 num_acked, ktime_t last) struct tcp_sock *tp = tcp_sk(sk); struct lp *lp = inet_csk_ca(sk); - tcp_lp_rtt_sample(sk, ktime_to_us(net_timedelta(last))); + if (!ktime_equal(last, net_invalid_timestamp())) + tcp_lp_rtt_sample(sk, ktime_to_us(net_timedelta(last))); /* calc inference */ if (tcp_time_stamp > tp->rx_opt.rcv_tsecr) diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c index 73e19cf..e218a51 100644 --- a/net/ipv4/tcp_vegas.c +++ b/net/ipv4/tcp_vegas.c @@ -117,6 +117,9 @@ void tcp_vegas_pkts_acked(struct sock *sk, u32 cnt, ktime_t last) struct vegas *vegas = inet_csk_ca(sk); u32 vrtt; + if (ktime_equal(last, net_invalid_timestamp())) + return; + /* Never allow zero rtt or baseRTT */ vrtt = ktime_to_us(net_timedelta(last)) + 1; diff --git a/net/ipv4/tcp_veno.c b/net/ipv4/tcp_veno.c index 9edb340..ec854cc 100644 --- a/net/ipv4/tcp_veno.c +++ b/net/ipv4/tcp_veno.c @@ -74,6 +74,9 @@ static void tcp_veno_pkts_acked(struct sock *sk, u32 cnt, ktime_t last) struct veno *veno = inet_csk_ca(sk); u32 vrtt; + if (ktime_equal(last, net_invalid_timestamp())) + return; + /* Never allow zero rtt or baseRTT */ vrtt = ktime_
Re: [IPV6] addrconf: Fix IPv6 on tuntap tunnels
Hallo David, hello Herbert, indeed i have some concerns about reverting the patch as i do not see that the MTU is the right thing to distinguish whether a netdevice is capable to have IPv4/IPv6. E.g. is decnet able to run IPv6? IMHO the autoconf (in any case) should only handle netdevices that are capable to be auto configured (e.g. with IPv6). So the question looks like: "Is this device capable to be auto configured with IPv6?" and not "Is the devices MTU >= IPV6_MIN_MTU ?" My original patch showed the way to ask the (IMO) correct question. And maybe this has to be improved somehow (like Florian Zumbiel suggested). And also maybe the Autoconfiguration has to be a part of code that is independent from net/ipv6. But to ask about the MTU size does not look the right way to me. Best regards, Oliver David Miller wrote: > From: Herbert Xu <[EMAIL PROTECTED]> > Date: Thu, 14 Jun 2007 18:16:07 +1000 > > >> [IPV6] addrconf: Fix IPv6 on tuntap tunnels >> >> The recent patch that added ipv6_hwtype is broken on tuntap tunnels. >> Indeed, it's broken on any device that does not pass the ipv6_hwtype >> test. >> >> The reason is that the original test only applies to autoconfiguration, >> not IPv6 support. IPv6 support is allowed on any device. In fact, >> even with the ipv6_hwtype patch applied you can still add IPv6 addresses >> to any interface that doesn't pass thw ipv6_hwtype test provided that >> they have a sufficiently large MTU. This is a serious problem because >> come deregistration time these devices won't be cleaned up properly. >> >> I've gone back and looked at the rationale for the patch. It appears >> that the real problem is that we were creating IPv6 devices even if the >> MTU was too small. So here's a patch which fixes that and reverts the >> ipv6_hwtype stuff. >> >> Thanks to Kanru Chen for reporting this issue. >> >> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> >> > > Thanks for fixing this up Herbert. > > Patch applied. > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive v2
On Fri, Jun 15, 2007 at 10:43:18AM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: > > So, things turned down to be not in the __splice_from_pipe(), but > > splice_to_pipe(). Attached patch fixes a leak for me. > > It was tested with different data files and all were received correctly. > > > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > > > diff --git a/fs/splice.c b/fs/splice.c > > index bc481f1..365bfd9 100644 > > --- a/fs/splice.c > > +++ b/fs/splice.c > > @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, > > break; > > if (pipe->nrbufs < PIPE_BUFFERS) > > continue; > > - > > - break; > > } > > > > if (spd->flags & SPLICE_F_NONBLOCK) { > > > > Hmm, curious. If we hit that location, then two conditions are true: > > - Pipe is full > - We transferred some data Yep. > if you remove the break, then you'll end up blocking in pipe_wait() > (unless you have SPLICE_F_NONBLOCK also set). And we don't want to block > waiting for more room, if we already transferred some data. In that case > we just want to return a short splice. Looking at pipe_write(), it'll > block as well though. Just doesn't seem optimal to me, but... > > So the question is why would doing the break there cause a leak? I just > don't yet see how it can happen, I'd love to fix that condition instead. > For the case you describe, we should have page_nr == 1 and spd->nr_pages > == 2. Is the: > > while (page_nr < spd->nr_pages) > spd->spd_release(spd, page_nr++); > > not dropping the right reference? Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages was 2 there was only 1 free slot in pipe_buffer. spd_fill_page: allocated: 89, freed: 73, data: 81003d606d28 spd_fill_page: allocated: 90, freed: 73, data: 81003d606d28 splice_to_pipe: priv: 81003d606d28, spd_nrpages: 1, pipe_nrbufs: 16, page_nr: 1. spd_fill_page: allocated: 91, freed: 73, data: fff81003d6549c8 // next data ... __splice_from_pipe: process: sd_len: 0, buf_len: 0, buf_priv: 81003d606d28. __splice_from_pipe: release 81003d606d28. sock_pipe_buf_release: allocated: 91, freed: 89, ptr: 81003d606d28 splice_to_pipe: priv: 81003d6549c8, spd_nrpages: 0, pipe_nrbufs: 1, page_nr: 1. // next data > -- > Jens Axboe -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive v2
On Fri, Jun 15 2007, Evgeniy Polyakov wrote: > On Thu, Jun 14, 2007 at 01:59:02PM +0200, Jens Axboe ([EMAIL PROTECTED]) > wrote: > > On Thu, Jun 14 2007, Evgeniy Polyakov wrote: > > > On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov ([EMAIL > > > PROTECTED]) wrote: > > > > I will rebase my tree, likely something was not merged correctly. > > > > > > Ok, I've just rebased a tree from recent git and pulled from brick - > > > things seems to be settled. I've ran several tests with different > > > filesizes and all files were received correctly without kernel crashes. > > > There is skb leak indeed, and it looks like it is in the > > > __splice_from_pipe() for the last page. > > > > Uh, the leak, right - I had forgotten about that, was working on getting > > vmsplice into shape the last two days. Interesting that you mention the > > last page, I'll dig in now! Any more info on this (how did you notice > > the leak originates from there)? > > I first observed leak via slabinfo data (not a leak, but number of skbs > did not dropped after quite huge files were transferred), then added > number of allocated and freed objects in skbuff.c, they did not match > for big files, so I started to check splice source and found that > sometimes ->release callback is not called, but code breaks out of the > loop. I've put some printks in __splice_from_pipe() and found following > case, when skb is leaked: > when the same cloned skb was shared multiple times (no more than 2 though), > only one copy was freed. > > Further analysis description requires some splice background (obvious > for you, but that clears it for me): > pipe_buffer only contains 16 pages. > There is a code, which copies pages (pointers) from spd to pipe_buffer > (splice_to_pipe()). > skb allocations happens in chunks of different size (i.e. with different > number of skbs/pages per call), so it is possible that number of > allocated skbs will be less than pipe_buffer size (16), and then the > rest of the pages will be put into different (or the same) pipe_buffer later. > Sometimes two skbs from above example happens to be on the boundary of > the pipe buffer, so only one of them is being copied into pipe_buffer, > which is then transferred over the pipe. > So, we have a case, when spd has (it had more, but this two are special) > 2 pages (actually the same page, but two references to it), but pipe_buffer > has a place only for one. In that case second page from spd will be missed. > > So, things turned down to be not in the __splice_from_pipe(), but > splice_to_pipe(). Attached patch fixes a leak for me. > It was tested with different data files and all were received correctly. > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > diff --git a/fs/splice.c b/fs/splice.c > index bc481f1..365bfd9 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, > break; > if (pipe->nrbufs < PIPE_BUFFERS) > continue; > - > - break; > } > > if (spd->flags & SPLICE_F_NONBLOCK) { > Hmm, curious. If we hit that location, then two conditions are true: - Pipe is full - We transferred some data if you remove the break, then you'll end up blocking in pipe_wait() (unless you have SPLICE_F_NONBLOCK also set). And we don't want to block waiting for more room, if we already transferred some data. In that case we just want to return a short splice. Looking at pipe_write(), it'll block as well though. Just doesn't seem optimal to me, but... So the question is why would doing the break there cause a leak? I just don't yet see how it can happen, I'd love to fix that condition instead. For the case you describe, we should have page_nr == 1 and spd->nr_pages == 2. Is the: while (page_nr < spd->nr_pages) spd->spd_release(spd, page_nr++); not dropping the right reference? -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive v2
On Thu, Jun 14, 2007 at 01:59:02PM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: > On Thu, Jun 14 2007, Evgeniy Polyakov wrote: > > On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov ([EMAIL > > PROTECTED]) wrote: > > > I will rebase my tree, likely something was not merged correctly. > > > > Ok, I've just rebased a tree from recent git and pulled from brick - > > things seems to be settled. I've ran several tests with different > > filesizes and all files were received correctly without kernel crashes. > > There is skb leak indeed, and it looks like it is in the > > __splice_from_pipe() for the last page. > > Uh, the leak, right - I had forgotten about that, was working on getting > vmsplice into shape the last two days. Interesting that you mention the > last page, I'll dig in now! Any more info on this (how did you notice > the leak originates from there)? I first observed leak via slabinfo data (not a leak, but number of skbs did not dropped after quite huge files were transferred), then added number of allocated and freed objects in skbuff.c, they did not match for big files, so I started to check splice source and found that sometimes ->release callback is not called, but code breaks out of the loop. I've put some printks in __splice_from_pipe() and found following case, when skb is leaked: when the same cloned skb was shared multiple times (no more than 2 though), only one copy was freed. Further analysis description requires some splice background (obvious for you, but that clears it for me): pipe_buffer only contains 16 pages. There is a code, which copies pages (pointers) from spd to pipe_buffer (splice_to_pipe()). skb allocations happens in chunks of different size (i.e. with different number of skbs/pages per call), so it is possible that number of allocated skbs will be less than pipe_buffer size (16), and then the rest of the pages will be put into different (or the same) pipe_buffer later. Sometimes two skbs from above example happens to be on the boundary of the pipe buffer, so only one of them is being copied into pipe_buffer, which is then transferred over the pipe. So, we have a case, when spd has (it had more, but this two are special) 2 pages (actually the same page, but two references to it), but pipe_buffer has a place only for one. In that case second page from spd will be missed. So, things turned down to be not in the __splice_from_pipe(), but splice_to_pipe(). Attached patch fixes a leak for me. It was tested with different data files and all were received correctly. Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/fs/splice.c b/fs/splice.c index bc481f1..365bfd9 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, break; if (pipe->nrbufs < PIPE_BUFFERS) continue; - - break; } if (spd->flags & SPLICE_F_NONBLOCK) { -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.4.35-pre1: new e1000 driver breaks old hardware
Hi, with the new e1000 driver version 7.3.20 the onboard gigabit nic 82547EI (8086:1019) doesn't work correctly. After transferring about 100 megabytes over a gigabit link the transfer stopped and I have to reinit the link either by doing a ifconfig down/up or unplugging the network cable. The RxIntDelay is set to 0 like described in the docu. Playing with this Parameter only increases the amount of traffic to be send and the error occurs later. This happens also with the 7.4.35 and 7.5.5 driver from the intel side. The driver 5.7.6 from kernel 2.4.33.3 and the 6.1.16 from intel works very well on this hardware. Also other gigabit nics we use didn't have this problem. Is this issue already known? Is there any solution yet? Kind regards Wolfgang - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html