Re: [2.6 patch] net/llc/llc_conn.c: fix possible NULL dereference
From: Eugene Teo <[EMAIL PROTECTED]> Date: Sat, 19 May 2007 13:49:11 +0800 > Spotted by the Coverity checker. Why am I not surprised :-( There is no bug here, if Coverity warns every single time skb_peek() is used and not tested against NULL, that's a very serious shortcoming of Coverity or what it has been taught about SKB queues. It needs to learn that skb_queue_len() != 0 implies skb_peek() will not return NULL sans locking errors. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] net/llc/llc_conn.c: fix possible NULL dereference
Randy Dunlap wrote: > On Sat, 19 May 2007 13:13:07 +0800 Eugene Teo wrote: > >> skb_peek() might return an empty list. skb should be checked before calling >> llc_pdu_sn_hdr() with it. >> >> Spotted by the Coverity checker. >> >> Signed-off-by: Eugene Teo <[EMAIL PROTECTED]> [...] > > Oh, and your patch has spaces instead of tabs. It's a hassle to > get thunderbird to send a patch that preserves tabs. See if this: >http://mbligh.org/linuxdocs/Email/Clients/Thunderbird > helps you any. Here's a resend: skb_peek() might return an empty list. skb should be checked before calling llc_pdu_sn_hdr() with it. Spotted by the Coverity checker. Signed-off-by: Eugene Teo <[EMAIL PROTECTED]> diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c index 3b8cfbe..6d3a07e 100644 --- a/net/llc/llc_conn.c +++ b/net/llc/llc_conn.c @@ -324,6 +324,8 @@ int llc_conn_remove_acked_pdus(struct sock *sk, u8 nr, u16 *how_many_unacked) if (!q_len) goto out; skb = skb_peek(&llc->pdu_unack_q); + if (!skb) + goto out; pdu = llc_pdu_sn_hdr(skb); /* finding position of last acked pdu in queue */
Re: [2.6 patch] net/llc/llc_conn.c: fix possible NULL dereference
Eugene Teo <[EMAIL PROTECTED]> wrote: > > diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c > index 3b8cfbe..28a3994 100644 > --- a/net/llc/llc_conn.c > +++ b/net/llc/llc_conn.c > @@ -323,7 +323,8 @@ int llc_conn_remove_acked_pdus(struct sock *sk, u8 nr, u16 > *how_many_unacked) > >if (!q_len) >goto out; > - skb = skb_peek(&llc->pdu_unack_q); > + if (! (skb = skb_peek(&llc->pdu_unack_q))) > + goto out; Actually we just checked that the queue length is non-zero so there must be a packet there unless someone's just removed it. If it were possible for someone else to remove it in parallel, then we've got bigger problems to worry about :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libertas: skb dereferenced after netif_rx
Stephen Hemminger wrote: netif_rx() used to return a value in older kernels. netif_rx() returns a value in current kernels. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libertas: skb dereferenced after netif_rx
On Fri, 18 May 2007 14:09:03 -0400 "John W. Linville" <[EMAIL PROTECTED]> wrote: > On Wed, May 16, 2007 at 05:01:27PM -0400, Florin Malita wrote: > > In libertas_process_rxed_packet() and process_rxed_802_11_packet() the > > skb is dereferenced after being passed to netif_rx (called from > > libertas_upload_rx_packet). Spotted by Coverity (1658, 1659). > > Relocating the libertas_upload_rx_packet call is fine, but... > > > Also, libertas_upload_rx_packet() unconditionally returns 0 so the error > > check is dead code - might as well take it out. > > Is this merely an implementation detail? Or an absolute fact? > If the former is true, then we should preserve the error > checking. If the latter, then we should change the signature of > libertas_upload_rx_packet to return void. netif_rx() used to return a value in older kernels. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Purpose of bit shifts in tcp_get_info
Daniel Schaffrath <[EMAIL PROTECTED]> wrote: > > I was wondering what the purpose of the bit shifts in tcp_get_info > right after the jiffies conversion might be. What's the time unit > after that shift? > > info->tcpi_rtt = jiffies_to_usecs(tp->srtt)>>3; > info->tcpi_rttvar = jiffies_to_usecs(tp->mdev)>>2; > [...] > info->tcpi_rcv_rtt = jiffies_to_usecs(tp->rcv_rtt_est.rtt)>>3; > > Maybe somebody has a hint for me? Checkout the comment in tcp_rtt_estimator() in net/ipv4/tcp_input.c. It's commented in the actual struct too for srtt at least. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: assertion hit in e1000_clean(), kernel 2.6.21.1
4:03pm Kok, Auke said: Chuck Ebbert wrote: We have several reports now of hitting this assertion in netif_rx_complete(), inlined in e1000_clean(): BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state)); [] __queue_work+0x51/0x5e [] et_rx_action+0x94/0x185 [] __do_softirq+0x5d/0xba [] do_softirq+0x59/0xb1 [] local_bh_enable_ip+0x35/0x40 [] dev_open+0x44/0x62 [] dev_change_flags+0x46/0xe3 [] devinet_ioctl+0x250/0x56a The second function is "net_rx_action", corrupted by the serial connection. The source file has four extra lines at the top because of a trivial wireless patch, so 898 in that code is really 894 in the stock kernel. please shared that code then. http://cvs.fedora.redhat.com/viewcvs/devel/kernel/linux-2.6-wireless.patch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: assertion hit in e1000_clean(), kernel 2.6.21.1
Chuck Ebbert wrote: We have several reports now of hitting this assertion in netif_rx_complete(), inlined in e1000_clean(): BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state)); [] __queue_work+0x51/0x5e [] et_rx_action+0x94/0x185 [] __do_softirq+0x5d/0xba [] do_softirq+0x59/0xb1 [] local_bh_enable_ip+0x35/0x40 [] dev_open+0x44/0x62 [] dev_change_flags+0x46/0xe3 [] devinet_ioctl+0x250/0x56a The second function is "net_rx_action", corrupted by the serial connection. The source file has four extra lines at the top because of a trivial wireless patch, so 898 in that code is really 894 in the stock kernel. please shared that code then. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240339 this is lacking a lot of debugging info. Please post *all* the dmesg output, lspci -vvv, ethtool -e ethX, etc. in the bugzilla. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
e1000: assertion hit in e1000_clean(), kernel 2.6.21.1
We have several reports now of hitting this assertion in netif_rx_complete(), inlined in e1000_clean(): BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state)); [] __queue_work+0x51/0x5e [] et_rx_action+0x94/0x185 [] __do_softirq+0x5d/0xba [] do_softirq+0x59/0xb1 [] local_bh_enable_ip+0x35/0x40 [] dev_open+0x44/0x62 [] dev_change_flags+0x46/0xe3 [] devinet_ioctl+0x250/0x56a The second function is "net_rx_action", corrupted by the serial connection. The source file has four extra lines at the top because of a trivial wireless patch, so 898 in that code is really 894 in the stock kernel. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240339 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver updates
On Friday 18 May 2007, Andrew Morton wrote: >On Fri, 18 May 2007 23:46:21 +0200 > >Mariusz Koz__owski <[EMAIL PROTECTED]> wrote: >> Hello, >> >> > diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h >> > index 7053026..111f23d 100644 >> > --- a/drivers/net/smc91x.h >> > +++ b/drivers/net/smc91x.h >> > @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg) >> >> ... >> >> > +#define SMC_outb(v, a, r) __ __ >> > __outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | ((v)<<(8*(r&2, (a) + >> > ((r)&~1)) >> >> This one has unbalanced parenthesis. > >I dunno how you can tell - I can't count that high. > Try removing the one in front of the comma, it makes the count -1. And I wrote, probably a good 20 years ago, a gizmo I called cntx that found that stuff for you. I occasionally use it to run over a section of the kernel tree, but it rarely finds anything to fuss about. And I really do need to make it handle shell expansions so you don't have write a quick script to have it scan everything in a directory. However I suspect that most of you already have such a tool in your bag of tricks. >Can this be programmed in C, rather than in cpp? >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to [EMAIL PROTECTED] >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) I'm definitely not in Omaha! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver updates
> > > diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h > > > index 7053026..111f23d 100644 > > > --- a/drivers/net/smc91x.h > > > +++ b/drivers/net/smc91x.h > > > @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg) > > > > ... > > > > > +#define SMC_outb(v, a, r) __ __ > > > __outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | ((v)<<(8*(r&2, (a) + > > > ((r)&~1)) > > > > This one has unbalanced parenthesis. > > > > I dunno how you can tell - I can't count that high. Me neither. Simple script does that for me ;) Not sure where exactly the parenthesis is missing so I leave it up to someone (the author?) who knows better. Regards, Mariusz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver updates
On Fri, 18 May 2007 23:46:21 +0200 Mariusz Koz__owski <[EMAIL PROTECTED]> wrote: > Hello, > > > diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h > > index 7053026..111f23d 100644 > > --- a/drivers/net/smc91x.h > > +++ b/drivers/net/smc91x.h > > @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg) > > ... > > > +#define SMC_outb(v, a, r) __ __ > > __outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | ((v)<<(8*(r&2, (a) + > > ((r)&~1)) > > This one has unbalanced parenthesis. > I dunno how you can tell - I can't count that high. Can this be programmed in C, rather than in cpp? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver updates
Hello, > diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h > index 7053026..111f23d 100644 > --- a/drivers/net/smc91x.h > +++ b/drivers/net/smc91x.h > @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg) ... > +#define SMC_outb(v, a, r) outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | > ((v)<<(8*(r&2, (a) + ((r)&~1)) This one has unbalanced parenthesis. Regards, Mariusz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][af_key]pfkey_add: Optimize SA adds and algorithm probes
On Fri, May 18, 2007 at 02:34:12PM +1000, Herbert Xu wrote: > > Actually, I think we should just probe for the specific algorithm > requested rather than everything. See patch below. Doh, forgot to actually remove the probe call :) [IPSEC] pfkey: Load specific algorithm in pfkey_add rather than all This is a natural extension of the changeset [XFRM]: Probe selected algorithm only. which only removed the probe call for xfrm_user. This patch does exactly the same thing for af_key. In other words, we load the algorithm requested by the user rather than everything when adding xfrm states in af_key. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/net/key/af_key.c b/net/key/af_key.c index a994441..d302dda 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1448,8 +1448,6 @@ static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, int err; struct km_event c; - xfrm_probe_algs(); - x = pfkey_msg2xfrm_state(hdr, ext_hdrs); if (IS_ERR(x)) return PTR_ERR(x); diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index 6249a94..8a72def 100644 --- a/net/xfrm/xfrm_algo.c +++ b/net/xfrm/xfrm_algo.c @@ -347,67 +347,44 @@ static inline int calg_entries(void) return ARRAY_SIZE(calg_list); } -/* Todo: generic iterators */ -struct xfrm_algo_desc *xfrm_aalg_get_byid(int alg_id) -{ - int i; - - for (i = 0; i < aalg_entries(); i++) { - if (aalg_list[i].desc.sadb_alg_id == alg_id) { - if (aalg_list[i].available) - return &aalg_list[i]; - else - break; - } - } - return NULL; -} -EXPORT_SYMBOL_GPL(xfrm_aalg_get_byid); - -struct xfrm_algo_desc *xfrm_ealg_get_byid(int alg_id) -{ - int i; +struct xfrm_algo_list { + struct xfrm_algo_desc *algs; + int entries; + u32 type; + u32 mask; +}; - for (i = 0; i < ealg_entries(); i++) { - if (ealg_list[i].desc.sadb_alg_id == alg_id) { - if (ealg_list[i].available) - return &ealg_list[i]; - else - break; - } - } - return NULL; -} -EXPORT_SYMBOL_GPL(xfrm_ealg_get_byid); +static const struct xfrm_algo_list xfrm_aalg_list = { + .algs = aalg_list, + .entries = ARRAY_SIZE(aalg_list), + .type = CRYPTO_ALG_TYPE_HASH, + .mask = CRYPTO_ALG_TYPE_HASH_MASK | CRYPTO_ALG_ASYNC, +}; -struct xfrm_algo_desc *xfrm_calg_get_byid(int alg_id) -{ - int i; +static const struct xfrm_algo_list xfrm_ealg_list = { + .algs = ealg_list, + .entries = ARRAY_SIZE(ealg_list), + .type = CRYPTO_ALG_TYPE_BLKCIPHER, + .mask = CRYPTO_ALG_TYPE_MASK | CRYPTO_ALG_ASYNC, +}; - for (i = 0; i < calg_entries(); i++) { - if (calg_list[i].desc.sadb_alg_id == alg_id) { - if (calg_list[i].available) - return &calg_list[i]; - else - break; - } - } - return NULL; -} -EXPORT_SYMBOL_GPL(xfrm_calg_get_byid); +static const struct xfrm_algo_list xfrm_calg_list = { + .algs = calg_list, + .entries = ARRAY_SIZE(calg_list), + .type = CRYPTO_ALG_TYPE_COMPRESS, + .mask = CRYPTO_ALG_TYPE_MASK | CRYPTO_ALG_ASYNC, +}; -static struct xfrm_algo_desc *xfrm_get_byname(struct xfrm_algo_desc *list, - int entries, u32 type, u32 mask, - char *name, int probe) +static struct xfrm_algo_desc *xfrm_find_algo( + const struct xfrm_algo_list *algo_list, + int match(const struct xfrm_algo_desc *entry, const void *data), + const void *data, int probe) { + struct xfrm_algo_desc *list = algo_list->algs; int i, status; - if (!name) - return NULL; - - for (i = 0; i < entries; i++) { - if (strcmp(name, list[i].name) && - (!list[i].compat || strcmp(name, list[i].compat))) + for (i = 0; i < algo_list->entries; i++) { + if (!match(list + i, data)) continue; if (list[i].available) @@ -416,8 +393,8 @@ static struct xfrm_algo_desc *xfrm_get_byname(struct xfrm_algo_desc *list, if (!probe) break; - status = crypto_has_alg(list[i].name, type, - mask | CRYPTO_ALG_ASYNC); + status = crypto_has_alg(list[i].name, algo_list->type, +
Re: [patch 2/7] CAN: Add PF_CAN core module
Hello Paul, as you may see in the attached SVN-log i changed some code according your suggestions. I additionally made some clarifications of function names. If you would like to see the af_can.c completely please visit: http://svn.berlios.de/svnroot/repos/socketcan/trunk/kernel/2.6/net/can/af_can.c But this code in the projekt SVN is working for several kernel versions (which is removed by some scripts when creating a LKML netdev patch) - just for your information. Best regards, Oliver Original-Nachricht Betreff:r311 - trunk/kernel/2.6/net/can Datum: Fri, 18 May 2007 21:47:15 +0200 Von:[EMAIL PROTECTED] An: [EMAIL PROTECTED] Author: hartkopp Date: 2007-05-18 21:47:14 +0200 (Fri, 18 May 2007) New Revision: 311 Modified: trunk/kernel/2.6/net/can/af_can.c Log: Updated RCU removal of dev_rcv_lists structures in the case of CAN-interfaces going down (in can_notifier). Thanks to Paul E. McKenny (IBM) who gave this hint on netdev kernel mailinglist. Modified: trunk/kernel/2.6/net/can/af_can.c === --- trunk/kernel/2.6/net/can/af_can.c 2007-05-16 09:03:21 UTC (rev 310) +++ trunk/kernel/2.6/net/can/af_can.c 2007-05-18 19:47:14 UTC (rev 311) @@ -454,31 +454,48 @@ } EXPORT_SYMBOL(can_rx_register); -static void can_rcv_lists_delete(struct rcu_head *rp) +static void can_rx_delete_list(struct hlist_head *rl) { + struct receiver *r; + struct hlist_node *n; + + hlist_for_each_entry_rcu(r, n, rl, list) { + hlist_del_rcu(&r->list); + kmem_cache_free(rcv_cache, r); + } +} + +/* + * can_rx_delete_device - rcu callback for dev_rcv_lists structure removal + */ +static void can_rx_delete_device(struct rcu_head *rp) +{ struct dev_rcv_lists *d = container_of(rp, struct dev_rcv_lists, rcu); + int i; + /* remove all receivers hooked at this netdevice */ + can_rx_delete_list(&d->rx_err); + can_rx_delete_list(&d->rx_all); + can_rx_delete_list(&d->rx_fil); + can_rx_delete_list(&d->rx_inv); + can_rx_delete_list(&d->rx_eff); + + for (i = 0; i < 2048; i++) + can_rx_delete_list(&d->rx_sff[i]); + kfree(d); } -static void can_rx_delete(struct rcu_head *rp) +/* + * can_rx_delete_receiver - rcu callback for single receiver entry removal + */ +static void can_rx_delete_receiver(struct rcu_head *rp) { struct receiver *r = container_of(rp, struct receiver, rcu); kmem_cache_free(rcv_cache, r); } -static void can_rx_delete_all(struct hlist_head *rl) -{ - struct receiver *r; - struct hlist_node *n; - - hlist_for_each_entry_rcu(r, n, rl, list) { - hlist_del_rcu(&r->list); - call_rcu(&r->rcu, can_rx_delete); - } -} - /** * can_rx_unregister - unsubscribe CAN frames from a specific interface * @dev: pointer to netdevice (NULL => unsubcribe from 'all' CAN devices list) @@ -556,7 +573,7 @@ /* schedule the receiver item for deletion */ if (r) - call_rcu(&r->rcu, can_rx_delete); + call_rcu(&r->rcu, can_rx_delete_receiver); return ret; } @@ -945,7 +962,6 @@ struct net_device *dev = (struct net_device *)data; struct notifier *n; struct dev_rcv_lists *d; - int i; DBG("called for %s, msg = %lu\n", dev->name, msg); @@ -986,25 +1002,16 @@ spin_lock_bh(&rcv_lists_lock); d = find_dev_rcv_lists(dev); - if (d) { + if (d) hlist_del_rcu(&d->list); - - /* remove all receivers hooked at this netdevice */ - can_rx_delete_all(&d->rx_err); - can_rx_delete_all(&d->rx_all); - can_rx_delete_all(&d->rx_fil); - can_rx_delete_all(&d->rx_inv); - can_rx_delete_all(&d->rx_eff); - for (i = 0; i < 2048; i++) - can_rx_delete_all(&d->rx_sff[i]); - } else + else printk(KERN_ERR "can: notifier: receive list not " "found for dev %s\n", dev->name); spin_unlock_bh(&rcv_lists_lock); if (d) - call_rcu(&d->rcu, can_rcv_lists_delete); + call_rcu(&d->rcu, can_rx_delete_device); break; } ___ Socketcan-commit mailing list [EMAIL PROTECTED] https://lists.berlios.de/mailman/listinfo/socketcan-commit - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)
Rick Jones wrote: as an asside, "tcp_max_ssthresh" sounds like the maximum value ssthresh can take-on. is that correct, or is this more of a "once ssthresh is above this, behave in this new way?" If that is the case, while the I don't like it either, but you'll have to talk to Sally Floyd about that one.. ;) In general, I would like the documentation to emphasize more how to set the parameter than describe the algorithm. The max_ssthresh parameter should ideally be set to the bottleneck queue size, or more realistically a conservative value that's likely to be smaller than the bottleneck queue size. When max_ssthresh is smaller than the bottleneck queue, (limited) slow start will not overflow it until cwnd has fully ramped up to the appropriate size. -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libertas: skb dereferenced after netif_rx
John W. Linville wrote: Also, libertas_upload_rx_packet() unconditionally returns 0 so the error check is dead code - might as well take it out. Is this merely an implementation detail? Or an absolute fact? I believe it's an absolute fact that got lost among implementation details ;) All libertas_upload_rx_packet does is set a few fields in the skb, then pass it up to the stack via netif_rx: 139 int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb) 140 { 141 lbs_pr_debug(1, "skb->data=%p\n", skb->data); 142 143 if(IS_MESH_FRAME(skb)) 144 skb->dev = priv->mesh_dev; 145 else 146 skb->dev = priv->wlan_dev.netdev; 147 skb->protocol = eth_type_trans(skb, priv->wlan_dev.netdev); 148 skb->ip_summed = CHECKSUM_UNNECESSARY; 149 150 netif_rx(skb); 151 152 return 0; 153 } Since netif_rx always succeeds, so should libertas_upload_rx_packet - there's no reason for passing back a success code (especially one that's hardcoded to 0). If the latter, then we should change the signature of libertas_upload_rx_packet to return void. Makes sense, updated patch below. Another potential patch is to remove the "ret = 0" line before the "done" label, since ret is initialized at the head of the function. Come to think of it, you can probably remove the "= 0" part of ret's declaration as well (in both functions). Right, even more: looks like both process_rxed_802_11_packet & libertas_process_rxed_packet can only return 0 so we could drop the return code altogether and change their signature to void too (nobody seems to care about their return code anyway). I will send a separate cleanup patch but this might be leaning more on the implementation detail side (planning to extend the functions and make the return code meaningful in the future?) so somebody familiar with the driver should make the call. Thanks, Florin Signed-off-by: Florin Malita <[EMAIL PROTECTED]> --- decl.h |2 +- rx.c | 22 +- 2 files changed, 6 insertions(+), 18 deletions(-) diff --git a/drivers/net/wireless/libertas/decl.h b/drivers/net/wireless/libertas/decl.h index 606bdd0..dfe2764 100644 --- a/drivers/net/wireless/libertas/decl.h +++ b/drivers/net/wireless/libertas/decl.h @@ -46,7 +46,7 @@ u32 libertas_index_to_data_rate(u8 index); u8 libertas_data_rate_to_index(u32 rate); void libertas_get_fwversion(wlan_adapter * adapter, char *fwversion, int maxlen); -int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb); +void libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb); /** The proc fs interface */ int libertas_process_rx_command(wlan_private * priv); diff --git a/drivers/net/wireless/libertas/rx.c b/drivers/net/wireless/libertas/rx.c index d17924f..b19b5aa 100644 --- a/drivers/net/wireless/libertas/rx.c +++ b/drivers/net/wireless/libertas/rx.c @@ -136,7 +136,7 @@ static void wlan_compute_rssi(wlan_private * priv, struct rxpd *p_rx_pd) LEAVE(); } -int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb) +void libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb) { lbs_pr_debug(1, "skb->data=%p\n", skb->data); @@ -148,8 +148,6 @@ int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb) skb->ip_summed = CHECKSUM_UNNECESSARY; netif_rx(skb); - - return 0; } /** @@ -269,15 +267,11 @@ int libertas_process_rxed_packet(wlan_private * priv, struct sk_buff *skb) wlan_compute_rssi(priv, p_rx_pd); lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len); - if (libertas_upload_rx_packet(priv, skb)) { - lbs_pr_debug(1, "RX error: libertas_upload_rx_packet" - " returns failure\n"); - ret = -1; - goto done; - } priv->stats.rx_bytes += skb->len; priv->stats.rx_packets++; + libertas_upload_rx_packet(priv, skb); + ret = 0; done: LEAVE(); @@ -438,17 +432,11 @@ static int process_rxed_802_11_packet(wlan_private * priv, struct sk_buff *skb) wlan_compute_rssi(priv, prxpd); lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len); - - if (libertas_upload_rx_packet(priv, skb)) { - lbs_pr_debug(1, "RX error: libertas_upload_rx_packet " - "returns failure\n"); - ret = -1; - goto done; - } - priv->stats.rx_bytes += skb->len; priv->stats.rx_packets++; + libertas_upload_rx_packet(priv, skb); + ret = 0; done: LEAVE(); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)
Baruch Even wrote: Ilpo Järvinen wrote: Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- Documentation/networking/ip-sysctl.txt | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ce16e6a..44ba8d4 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -239,6 +239,19 @@ tcp_max_orphans - INTEGER more aggressively. Let me to remind again: each orphan eats up to ~64K of unswappable memory. +tcp_max_ssthresh - INTEGER + Limited Slow-Start for TCP with Large Congestion Windows defined in + RFC3742. Limited slow-start is a mechanism to limit grow of the s/grow/growth/ + congestion window on the region where congestion window is larger than + tcp_max_ssthresh. A TCP connection with a large congestion window could + have its congestion window increased by thousand (or even more) + segments per RTT by the traditional slow-start procedure which might be + counter-productive to TCP performance when packet losses start to + occur. With limited slow-start TCP increments congestion window at + most tcp_max_ssthresh/2 segments per RTT when the congestion window is I'm not a native English speaker but "at most" sounds a bit awkward to me, maybe change it to "by no more than". But I'm sure someone can find a better phrasing. It could be "by at most" or "by no more than" but indeed it should not just be "at most." Also, it should be "With limited slow-start TCP increments the congestion window..." Combining with other nits yeilds: Limited Slow-Start for TCP with Large Congestion Windows is defined in RFC3742. Limited slow-start is a mechanism to limit growth of the congestion window when the congestion window is larger than tcp_max_ssthresh. A TCP connection with a large congestion window could have its congestion window increased by thousands of segments (or more) per RTT via the traditional slow-start procedures. This might degrade TCP performance when packet losses start to occur. With limited slow-start, TCP increments the congestion window by at most tcp_max_ssthresh/2 segments per RTT when the congestion window is is above tcp_max_ssthresh. rick jones as an asside, "tcp_max_ssthresh" sounds like the maximum value ssthresh can take-on. is that correct, or is this more of a "once ssthresh is above this, behave in this new way?" If that is the case, while the horse has probably left the barn, perhaps another name would be better? Baruch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libertas: skb dereferenced after netif_rx
On Wed, May 16, 2007 at 05:01:27PM -0400, Florin Malita wrote: > In libertas_process_rxed_packet() and process_rxed_802_11_packet() the > skb is dereferenced after being passed to netif_rx (called from > libertas_upload_rx_packet). Spotted by Coverity (1658, 1659). Relocating the libertas_upload_rx_packet call is fine, but... > Also, libertas_upload_rx_packet() unconditionally returns 0 so the error > check is dead code - might as well take it out. Is this merely an implementation detail? Or an absolute fact? If the former is true, then we should preserve the error checking. If the latter, then we should change the signature of libertas_upload_rx_packet to return void. > Signed-off-by: Florin Malita <[EMAIL PROTECTED]> > lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len); > - if (libertas_upload_rx_packet(priv, skb)) { > - lbs_pr_debug(1, "RX error: libertas_upload_rx_packet" > -" returns failure\n"); > - ret = -1; > - goto done; > - } > priv->stats.rx_bytes += skb->len; > priv->stats.rx_packets++; > > + libertas_upload_rx_packet(priv, skb); > + > ret = 0; > done: > LEAVE(); Another potential patch is to remove the "ret = 0" line before the "done" label, since ret is initialized at the head of the function. Come to think of it, you can probably remove the "= 0" part of ret's declaration as well (in both functions). Hth! John P.S. Also, please make sure to send wireless patches to [EMAIL PROTECTED] and CC me. -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)
Ilpo Järvinen wrote: > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > --- > Documentation/networking/ip-sysctl.txt | 13 + > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.txt > b/Documentation/networking/ip-sysctl.txt > index ce16e6a..44ba8d4 100644 > --- a/Documentation/networking/ip-sysctl.txt > +++ b/Documentation/networking/ip-sysctl.txt > @@ -239,6 +239,19 @@ tcp_max_orphans - INTEGER > more aggressively. Let me to remind again: each orphan eats > up to ~64K of unswappable memory. > > +tcp_max_ssthresh - INTEGER > + Limited Slow-Start for TCP with Large Congestion Windows defined in > + RFC3742. Limited slow-start is a mechanism to limit grow of the s/grow/growth/ > + congestion window on the region where congestion window is larger than > + tcp_max_ssthresh. A TCP connection with a large congestion window could > + have its congestion window increased by thousand (or even more) > + segments per RTT by the traditional slow-start procedure which might be > + counter-productive to TCP performance when packet losses start to > + occur. With limited slow-start TCP increments congestion window at > + most tcp_max_ssthresh/2 segments per RTT when the congestion window is I'm not a native English speaker but "at most" sounds a bit awkward to me, maybe change it to "by no more than". But I'm sure someone can find a better phrasing. Baruch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
David Acker wrote: Kok, Auke wrote: First impression just came in: It seems RX performance is dropped to 10mbit. TX is unaffected and runs at 94mbit/tcp, but RX the new code seems to misbehave and fluctuate, dropping below 10mbit after a few netperf runs and staying there... ideas? I found the problem. Another casualty of working with two different kernels at once...arg. The blank rfd needs to have its el-bit clear now. Here is the new and improved patch. - nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s); + nic->blank_rfd.command = 0; OK, that fixed it and it's getting 94mbit/tcp rx now without issues. Milton, can you look at this patch? I'd like some more feedback. Meanwhile I will try to get this tested on a variety of e100 NICs, which will take a few days. Thanks, Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][af_key]pfkey_add: Optimize SA adds and algorithm probes
Herbert Xu wrote: Mark Huth <[EMAIL PROTECTED]> wrote: This patch provides a performance optimization in the pfkey_add path. Prior versions have a serious performance problem when adding a large number of SAs to a node. For example, if a backup node needs to be loaded with the SAs previously held by a failed active node, thousands of SAs may need to be added as rapidly as possible. Tests show that without this patch, such additions may take several minutes. The cause is that the available algorithm modules are probed each time instead of only when needed. This patch changes the unconditional call to xfrm_probe_algs() to only be done when it may be needed. Thanks for the patch! static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { struct xfrm_state *x; - int err; + int err, probe_done = 0; struct km_event c; - xfrm_probe_algs(); - x = pfkey_msg2xfrm_state(hdr, ext_hdrs); if (IS_ERR(x)) return PTR_ERR(x); I don't think it works when then algorithm isn't loaded though :) If the algorithm isn't present pfkey_msg2xfrm_state will return -ENOSYS so we need to do the probe here. Okay. I tested with the algorithms not loaded, and they were all loaded after the test, but I'm sure you understand this much better than I do. Actually, I think we should just probe for the specific algorithm requested rather than everything. See patch below. [IPSEC] pfkey: Load specific algorithm in pfkey_add rather than all This is a natural extension of the changeset [XFRM]: Probe selected algorithm only. which only removed the probe call for xfrm_user. This patch does exactly the same thing for af_key. In other words, we load the algorithm requested by the user rather than everything when adding xfrm states in af_key. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Cheers, Thanks for the patch. I'll try it later today and confirm that it fixes our problem. Mark Huth - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
Kok, Auke wrote: First impression just came in: It seems RX performance is dropped to 10mbit. TX is unaffected and runs at 94mbit/tcp, but RX the new code seems to misbehave and fluctuate, dropping below 10mbit after a few netperf runs and staying there... ideas? I found the problem. Another casualty of working with two different kernels at once...arg. The blank rfd needs to have its el-bit clear now. Here is the new and improved patch. On the ARM, their is a race condition between software allocating a new receive buffer and hardware writing into a buffer. The two race on touching the last Receive Frame Descriptor (RFD). It has its el-bit set and its next link equal to 0. When hardware encounters this buffer it attempts to write data to it and then update Status Word bits and Actual Count in the RFD. At the same time software may try to clear the el-bit and set the link address to a new buffer. Since the entire RFD is once cache-line, the two write operations can collide. This can lead to the receive unit stalling or freed receive buffers getting written to. The fix is to set the el-bit on and the size to 0 on the next to last buffer in the chain. When the hardware encounters this buffer it stops and does not write to it at all. The hardware issues an RNR interrupt with the receive unit in the No Resources state. When software allocates buffers, it can update the tail of the list because it knows the hardware will stop at the buffer before it. Once it has a new next to last buffer prepared, it can clear the el-bit and set the size on the previous one. The race on this buffer is safe since the link already points to a valid next buffer. If the hardware sees the el-bit cleared without the size set, it will move on to the next buffer and complete that one in error. If it sees the size set but the el-bit still set, it will complete that buffer and then RNR interrupt and wait. Signed-off-by: David Acker <[EMAIL PROTECTED]> --- --- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 13:08:01.0 -0400 +++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 13:08:24.0 -0400 @@ -285,6 +285,12 @@ enum scb_status { rus_mask = 0x3C, }; +enum ru_state { + RU_SUSPENDED = 0, + RU_RUNNING = 1, + RU_UNINITIALIZED = -1, +}; + enum scb_stat_ack { stat_ack_not_ours= 0x00, stat_ack_sw_gen = 0x04, @@ -526,6 +532,7 @@ struct nic { struct rx *rx_to_use; struct rx *rx_to_clean; struct rfd blank_rfd; + enum ru_state ru_running; spinlock_t cb_lock cacheline_aligned; spinlock_t cmd_lock; @@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic ((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i)); /* Template for a freshly allocated RFD */ - nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s); + nic->blank_rfd.command = 0; nic->blank_rfd.rbd = 0x; nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN); @@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni return 0; } -static inline void e100_start_receiver(struct nic *nic) +static inline void e100_start_receiver(struct nic *nic, struct rx *rx) { - /* Start if RFA is non-NULL */ - if(nic->rx_to_clean->skb) - e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr); + if(!nic->rxs) return; + if(RU_SUSPENDED != nic->ru_running) return; + + /* handle init time starts */ + if(!rx) rx = nic->rxs; + + /* (Re)start RU if suspended or idle and RFA is non-NULL */ + if(rx->skb) { + e100_exec_cmd(nic, ruc_start, rx->dma_addr); + nic->ru_running = RU_RUNNING; + } } #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN) @@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic } /* Link the RFD to end of RFA by linking previous RFD to -* this one, and clearing EL bit of previous. */ +* this one. We are safe to touch the previous RFD because +* it is protected the before last buffer's el bit being set */ if(rx->prev->skb) { struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data; put_unaligned(cpu_to_le32(rx->dma_addr), (u32 *)&prev_rfd->link); - wmb(); - prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s); pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr, sizeof(struct rfd), PCI_DMA_TODEVICE); } @@ -1801,8 +1815,12 @@ static int e100_rx_indicate(struct nic * DPRINTK(RX_STATUS, DEBUG, "status=0x%04X\n", rfd_status); /* If data isn't ready, nothing to indicate */ - if(unlikely(!(rfd_status & cb_complete))) + if(unlikely(!(rfd_status & cb_complete))) { + /* this allows for a fast restart withou
Re: [Cbe-oss-dev] [PATCH 2/10] spidernet: beautify error messages
On Thu, May 17, 2007 at 09:28:53AM +1000, Michael Ellerman wrote: > On Wed, 2007-05-16 at 17:00 -0500, Linas Vepstas wrote: > > - pr_err("Not enough memory to allocate rx buffer\n"); > > + pr_err("%s: Not enough memory to allocate rx buffer\n", > > + card->netdev->name); > > Isn't that what dev_err() is for? Ahh, did not know that. These patches just got pushed upstream, I'll send a separate round to clean this up. --linas - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/10] spidernet: node-aware skbuff allocation
On Thu, May 17, 2007 at 08:46:00PM -0400, Jeff Garzik wrote: > Linas Vepstas wrote: > > applied Thanks --linas - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)
Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- Documentation/networking/ip-sysctl.txt | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ce16e6a..44ba8d4 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -239,6 +239,19 @@ tcp_max_orphans - INTEGER more aggressively. Let me to remind again: each orphan eats up to ~64K of unswappable memory. +tcp_max_ssthresh - INTEGER + Limited Slow-Start for TCP with Large Congestion Windows defined in + RFC3742. Limited slow-start is a mechanism to limit grow of the + congestion window on the region where congestion window is larger than + tcp_max_ssthresh. A TCP connection with a large congestion window could + have its congestion window increased by thousand (or even more) + segments per RTT by the traditional slow-start procedure which might be + counter-productive to TCP performance when packet losses start to + occur. With limited slow-start TCP increments congestion window at + most tcp_max_ssthresh/2 segments per RTT when the congestion window is + above tcp_max_ssthresh. + Default: 0 (off) + tcp_max_syn_backlog - INTEGER Maximal number of remembered connection requests, which are still did not receive an acknowledgment from connecting client. -- 1.5.0.6
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
David Acker wrote: Kok, Auke wrote: David Acker wrote: David Acker wrote: Done. Below is a patch against 2.6.22-rc1. It combines removing the s-bit patch and applying the patch I previously sent. Oops. I missed one state in that patch. Since the el-bit buffer will normally not complete due to a zero size, we need to check if the buffer with no data has the el-bit set. Without this, you have to wait for the interrupt. Sorry about that...this was in the code I tested on my embedded system but got lost in the regular kernel patch. OK. Thanks. If you don't mind I'm going to have some testing on this patch done for a bit now (mostly x86 hardware of course) to see if there's no pitfalls in it. It'll be a few days because of the weekend before I get back on it. Cool. I will see if I can get some more tests running over the weekend on our PXA255 platform. First impression just came in: It seems RX performance is dropped to 10mbit. TX is unaffected and runs at 94mbit/tcp, but RX the new code seems to misbehave and fluctuate, dropping below 10mbit after a few netperf runs and staying there... ideas? Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
Kok, Auke wrote: David Acker wrote: David Acker wrote: Done. Below is a patch against 2.6.22-rc1. It combines removing the s-bit patch and applying the patch I previously sent. Oops. I missed one state in that patch. Since the el-bit buffer will normally not complete due to a zero size, we need to check if the buffer with no data has the el-bit set. Without this, you have to wait for the interrupt. Sorry about that...this was in the code I tested on my embedded system but got lost in the regular kernel patch. OK. Thanks. If you don't mind I'm going to have some testing on this patch done for a bit now (mostly x86 hardware of course) to see if there's no pitfalls in it. It'll be a few days because of the weekend before I get back on it. Cool. I will see if I can get some more tests running over the weekend on our PXA255 platform. -Ack - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] ucc_geth: eliminate max-speed, change interface-type to phy-connection-type
On Fri, 18 May 2007 09:07:42 -0500 Kumar Gala <[EMAIL PROTECTED]> wrote: > > On May 17, 2007, at 8:53 PM, Jeff Garzik wrote: > > > Kim Phillips wrote: > >> On Tue, 15 May 2007 17:45:19 -0400 > >> Jeff Garzik <[EMAIL PROTECTED]> wrote: > >>> Kim Phillips wrote: > It was agreed that phy-connection-type was a better name for > the interface-type property, so this patch renames it. > > Also, the max-speed property name was determined too generic, > and is therefore eliminated in favour of phy-connection-type > derivation logic. > > includes corrections to copyright text. > > Signed-off-by: Kim Phillips <[EMAIL PROTECTED]> > --- > drivers/net/ucc_geth.c | 40 +++ > + > drivers/net/ucc_geth_mii.c |9 + > drivers/net/ucc_geth_mii.h | 10 +- > 3 files changed, 26 insertions(+), 33 deletions(-) > >>> applied to #upstream > >>> > >> just to be clear; only the mpc8323e-mds board works on your > >> current #upstream-{fixes,linus}. > >> The mpc8323e-rdb and mpc8360e-mds require > >> phylib: add the ICPlus IP175C PHY driver > >> and phylib: enable RGMII-ID on the Marvell m88e PHY > >> respectively in order to work. > >> I'm ok with the SGMII patches: > >> phylib: m88e: enable SGMII mode > >> gianfar: add support for SGMII > >> staying on #upstream, but 2 out of 3 boards are currently broken > >> for #upstream-{fixes,linus}. > > > > New hardware support is always "broken" until merged. Nonetheless > > it is still new and outside the merge window. > > I think new is a little relative here. The MPC8232-RDB & MPC8360E- > MDS boards worked with 2.6.21 and the UCC. They had support for the > PHYs that Kim's listed built directly into the ucc driver and not as > part of the phy lib. Since we've moved to using the phy lib in > 2.6.22 we've broken these boards since their PHY drivers aren't > currently in linus's tree. > > Kim, correct me if I'm wrong. > thanks for clarifying that - it is indeed the case. Kim - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
David Acker wrote: David Acker wrote: Kok, Auke wrote: Jeff Garzik wrote: Can you resend against the latest kernel (2.6.22-rc1)? And what does Intel think? I'm expecting at least a reply from Milton as the patch was sent to him. I haven't yet tested it but will certainly do so. At first glance it looks OK, and I'll try to put it under my colleague's noses who know e100 best. A resend against 2.6.22-rc1 would be nice. Done. Below is a patch against 2.6.22-rc1. It combines removing the s-bit patch and applying the patch I previously sent. Oops. I missed one state in that patch. Since the el-bit buffer will normally not complete due to a zero size, we need to check if the buffer with no data has the el-bit set. Without this, you have to wait for the interrupt. Sorry about that...this was in the code I tested on my embedded system but got lost in the regular kernel patch. OK. Thanks. If you don't mind I'm going to have some testing on this patch done for a bit now (mostly x86 hardware of course) to see if there's no pitfalls in it. It'll be a few days because of the weekend before I get back on it. Auke On the ARM, their is a race condition between software allocating a new receive buffer and hardware writing into a buffer. The two race on touching the last Receive Frame Descriptor (RFD). It has its el-bit set and its next link equal to 0. When hardware encounters this buffer it attempts to write data to it and then update Status Word bits and Actual Count in the RFD. At the same time software may try to clear the el-bit and set the link address to a new buffer. Since the entire RFD is once cache-line, the two write operations can collide. This can lead to the receive unit stalling or freed receive buffers getting written to. The fix is to set the el-bit on and the size to 0 on the next to last buffer in the chain. When the hardware encounters this buffer it stops and does not write to it at all. The hardware issues an RNR interrupt with the receive unit in the No Resources state. When software allocates buffers, it can update the tail of the list because it knows the hardware will stop at the buffer before it. Once it has a new next to last buffer prepared, it can clear the el-bit and set the size on the previous one. The race on this buffer is safe since the link already points to a valid next buffer. If the hardware sees the el-bit cleared without the size set, it will move on to the next buffer and complete that one in error. If it sees the size set but the el-bit still set, it will complete that buffer and then RNR interrupt and wait. Signed-off-by: David Acker <[EMAIL PROTECTED]> --- --- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 10:16:03.0 -0400 +++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 10:15:53.0 -0400 @@ -285,6 +285,12 @@ enum scb_status { rus_mask = 0x3C, }; +enum ru_state { + RU_SUSPENDED = 0, + RU_RUNNING = 1, + RU_UNINITIALIZED = -1, +}; + enum scb_stat_ack { stat_ack_not_ours= 0x00, stat_ack_sw_gen = 0x04, @@ -526,6 +532,7 @@ struct nic { struct rx *rx_to_use; struct rx *rx_to_clean; struct rfd blank_rfd; + enum ru_state ru_running; spinlock_t cb_lock cacheline_aligned; spinlock_t cmd_lock; @@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic ((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i)); /* Template for a freshly allocated RFD */ - nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s); + nic->blank_rfd.command = cpu_to_le16(cb_el); nic->blank_rfd.rbd = 0x; nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN); @@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni return 0; } -static inline void e100_start_receiver(struct nic *nic) +static inline void e100_start_receiver(struct nic *nic, struct rx *rx) { - /* Start if RFA is non-NULL */ - if(nic->rx_to_clean->skb) - e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr); + if(!nic->rxs) return; + if(RU_SUSPENDED != nic->ru_running) return; + + /* handle init time starts */ + if(!rx) rx = nic->rxs; + + /* (Re)start RU if suspended or idle and RFA is non-NULL */ + if(rx->skb) { + e100_exec_cmd(nic, ruc_start, rx->dma_addr); + nic->ru_running = RU_RUNNING; + } } #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN) @@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic } /* Link the RFD to end of RFA by linking previous RFD to -* this one, and clearing EL bit of previous. */ +* this one. We are safe to touch the previous RFD because +* it is protected the before last buffer's el bit being set */ if(rx->prev->skb) { struct rfd *prev_rfd = (str
Re: [patch 2/7] CAN: Add PF_CAN core module
Hm, this is indeed one step further, than i thought :-) Thanks for this nifty solution! I will doublecheck your suggestion with Urs and then we'll change it in our next patch update (after some more feedback on this mailing list). Additional feedback is welcome. Tnx & best regards, Oliver Paul E. McKenney wrote: On Fri, May 18, 2007 at 11:19:01AM +0200, Oliver Hartkopp wrote: Hi Urs, Hello Paul, i assume Paul refers to the can_rx_delete_all() function that adds each receive list entry for rcu removal using the can_rx_delete RCU callback, right? So the idea would be to create a second RCU callback - e.g. can_rx_delete_list() - that removes the complete list inside the RCU callback?!? The list removal would therefore be processed inside this new can_rx_delete_list() in RCU context and not inside can_rx_delete_all(). @Paul: Was this your intention? My intention was that the list-removing be placed into can_rcv_lists_delete(), perhaps as follows: static void can_rx_delete_all(struct hlist_head *rl) { struct receiver *r; struct hlist_node *n; hlist_for_each_entry(r, n, rl, list) { hlist_del(&r->list); kmem_cache_free(rcv_cache, r); } } static void can_rcv_lists_delete(struct rcu_head *rp) { struct dev_rcv_lists *d = container_of(rp, struct dev_rcv_lists, rcu); /* remove all receivers hooked at this netdevice */ can_rx_delete_all(&d->rx_err); can_rx_delete_all(&d->rx_all); can_rx_delete_all(&d->rx_fil); can_rx_delete_all(&d->rx_inv); can_rx_delete_all(&d->rx_eff); for (i = 0; i < 2048; i++) can_rx_delete_all(&d->rx_sff[i]); kfree(d); } Then the code in can_notifier() can reduce to the following: if (d) { hlist_del_rcu(&d->list); /* used to be a string of can_rx_delete_all(). */ } else printk(KERN_ERR "can: notifier: receive list not " "found for dev %s\n", dev->name); spin_lock_bh(&rcv_lists_lock); if (d) { call_rcu(&d->rcu, can_rcv_lists_delete); } This moves the traversal work into the callback function. This is not a problem for CONFIG_PREEMPT_RT and non-CONFIG_PREEMPT, but not sure about CONFIG_PREEMPT. But it sure has the potential to cut down on a bunch of call_rcu() work... Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/7] CAN: Add PF_CAN core module
On Fri, May 18, 2007 at 11:19:01AM +0200, Oliver Hartkopp wrote: > Hi Urs, Hello Paul, > > i assume Paul refers to the can_rx_delete_all() function that adds each > receive list entry for rcu removal using the can_rx_delete RCU callback, > right? > > So the idea would be to create a second RCU callback - e.g. > can_rx_delete_list() - that removes the complete list inside the RCU > callback?!? > The list removal would therefore be processed inside this new > can_rx_delete_list() in RCU context and not inside can_rx_delete_all(). > > @Paul: Was this your intention? My intention was that the list-removing be placed into can_rcv_lists_delete(), perhaps as follows: static void can_rx_delete_all(struct hlist_head *rl) { struct receiver *r; struct hlist_node *n; hlist_for_each_entry(r, n, rl, list) { hlist_del(&r->list); kmem_cache_free(rcv_cache, r); } } static void can_rcv_lists_delete(struct rcu_head *rp) { struct dev_rcv_lists *d = container_of(rp, struct dev_rcv_lists, rcu); /* remove all receivers hooked at this netdevice */ can_rx_delete_all(&d->rx_err); can_rx_delete_all(&d->rx_all); can_rx_delete_all(&d->rx_fil); can_rx_delete_all(&d->rx_inv); can_rx_delete_all(&d->rx_eff); for (i = 0; i < 2048; i++) can_rx_delete_all(&d->rx_sff[i]); kfree(d); } Then the code in can_notifier() can reduce to the following: if (d) { hlist_del_rcu(&d->list); /* used to be a string of can_rx_delete_all(). */ } else printk(KERN_ERR "can: notifier: receive list not " "found for dev %s\n", dev->name); spin_lock_bh(&rcv_lists_lock); if (d) { call_rcu(&d->rcu, can_rcv_lists_delete); } This moves the traversal work into the callback function. This is not a problem for CONFIG_PREEMPT_RT and non-CONFIG_PREEMPT, but not sure about CONFIG_PREEMPT. But it sure has the potential to cut down on a bunch of call_rcu() work... Thanx, Paul > Best regards, > Oliver > > Paul E. McKenney wrote: > >On Wed, May 16, 2007 at 04:51:02PM +0200, Urs Thuermann wrote: > > > >>This patch adds the CAN core functionality but no protocols or drivers. > >>No protocol implementations are included here. They come as separate > >>patches. Protocol numbers are already in include/linux/can.h. > >> > > > >Interesting! One question called out below -- why do call_rcu() on each > >piece of the struct dev_rcv_lists, instead of doing call_rcu() on the > >whole thing and having the RCU callback free up the pieces? Given that > >all the pieces are call_rcu()ed separately, there had better not be > >persistent pointers to the pieces, right? > > > >Doing it in one chunk would make the code a bit simpler and also reduce > >the RCU overhead a bit. > > > >Or am I missing something subtle here? > > > > Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
David Acker wrote: Kok, Auke wrote: Jeff Garzik wrote: Can you resend against the latest kernel (2.6.22-rc1)? And what does Intel think? I'm expecting at least a reply from Milton as the patch was sent to him. I haven't yet tested it but will certainly do so. At first glance it looks OK, and I'll try to put it under my colleague's noses who know e100 best. A resend against 2.6.22-rc1 would be nice. Done. Below is a patch against 2.6.22-rc1. It combines removing the s-bit patch and applying the patch I previously sent. Oops. I missed one state in that patch. Since the el-bit buffer will normally not complete due to a zero size, we need to check if the buffer with no data has the el-bit set. Without this, you have to wait for the interrupt. Sorry about that...this was in the code I tested on my embedded system but got lost in the regular kernel patch. On the ARM, their is a race condition between software allocating a new receive buffer and hardware writing into a buffer. The two race on touching the last Receive Frame Descriptor (RFD). It has its el-bit set and its next link equal to 0. When hardware encounters this buffer it attempts to write data to it and then update Status Word bits and Actual Count in the RFD. At the same time software may try to clear the el-bit and set the link address to a new buffer. Since the entire RFD is once cache-line, the two write operations can collide. This can lead to the receive unit stalling or freed receive buffers getting written to. The fix is to set the el-bit on and the size to 0 on the next to last buffer in the chain. When the hardware encounters this buffer it stops and does not write to it at all. The hardware issues an RNR interrupt with the receive unit in the No Resources state. When software allocates buffers, it can update the tail of the list because it knows the hardware will stop at the buffer before it. Once it has a new next to last buffer prepared, it can clear the el-bit and set the size on the previous one. The race on this buffer is safe since the link already points to a valid next buffer. If the hardware sees the el-bit cleared without the size set, it will move on to the next buffer and complete that one in error. If it sees the size set but the el-bit still set, it will complete that buffer and then RNR interrupt and wait. Signed-off-by: David Acker <[EMAIL PROTECTED]> --- --- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 10:16:03.0 -0400 +++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 10:15:53.0 -0400 @@ -285,6 +285,12 @@ enum scb_status { rus_mask = 0x3C, }; +enum ru_state { + RU_SUSPENDED = 0, + RU_RUNNING = 1, + RU_UNINITIALIZED = -1, +}; + enum scb_stat_ack { stat_ack_not_ours= 0x00, stat_ack_sw_gen = 0x04, @@ -526,6 +532,7 @@ struct nic { struct rx *rx_to_use; struct rx *rx_to_clean; struct rfd blank_rfd; + enum ru_state ru_running; spinlock_t cb_lock cacheline_aligned; spinlock_t cmd_lock; @@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic ((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i)); /* Template for a freshly allocated RFD */ - nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s); + nic->blank_rfd.command = cpu_to_le16(cb_el); nic->blank_rfd.rbd = 0x; nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN); @@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni return 0; } -static inline void e100_start_receiver(struct nic *nic) +static inline void e100_start_receiver(struct nic *nic, struct rx *rx) { - /* Start if RFA is non-NULL */ - if(nic->rx_to_clean->skb) - e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr); + if(!nic->rxs) return; + if(RU_SUSPENDED != nic->ru_running) return; + + /* handle init time starts */ + if(!rx) rx = nic->rxs; + + /* (Re)start RU if suspended or idle and RFA is non-NULL */ + if(rx->skb) { + e100_exec_cmd(nic, ruc_start, rx->dma_addr); + nic->ru_running = RU_RUNNING; + } } #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN) @@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic } /* Link the RFD to end of RFA by linking previous RFD to -* this one, and clearing EL bit of previous. */ +* this one. We are safe to touch the previous RFD because +* it is protected the before last buffer's el bit being set */ if(rx->prev->skb) { struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data; put_unaligned(cpu_to_le32(rx->dma_addr), (u32 *)&prev_rfd->link); - wmb(); - prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s); pci_dma_sync_single_for_d
Re: [PATCH 3/5] ucc_geth: eliminate max-speed, change interface-type to phy-connection-type
On May 17, 2007, at 8:53 PM, Jeff Garzik wrote: Kim Phillips wrote: On Tue, 15 May 2007 17:45:19 -0400 Jeff Garzik <[EMAIL PROTECTED]> wrote: Kim Phillips wrote: It was agreed that phy-connection-type was a better name for the interface-type property, so this patch renames it. Also, the max-speed property name was determined too generic, and is therefore eliminated in favour of phy-connection-type derivation logic. includes corrections to copyright text. Signed-off-by: Kim Phillips <[EMAIL PROTECTED]> --- drivers/net/ucc_geth.c | 40 +++ + drivers/net/ucc_geth_mii.c |9 + drivers/net/ucc_geth_mii.h | 10 +- 3 files changed, 26 insertions(+), 33 deletions(-) applied to #upstream just to be clear; only the mpc8323e-mds board works on your current #upstream-{fixes,linus}. The mpc8323e-rdb and mpc8360e-mds require phylib: add the ICPlus IP175C PHY driver and phylib: enable RGMII-ID on the Marvell m88e PHY respectively in order to work. I'm ok with the SGMII patches: phylib: m88e: enable SGMII mode gianfar: add support for SGMII staying on #upstream, but 2 out of 3 boards are currently broken for #upstream-{fixes,linus}. New hardware support is always "broken" until merged. Nonetheless it is still new and outside the merge window. I think new is a little relative here. The MPC8232-RDB & MPC8360E- MDS boards worked with 2.6.21 and the UCC. They had support for the PHYs that Kim's listed built directly into the ucc driver and not as part of the phy lib. Since we've moved to using the phy lib in 2.6.22 we've broken these boards since their PHY drivers aren't currently in linus's tree. Kim, correct me if I'm wrong. - k - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
Kok, Auke wrote: Jeff Garzik wrote: Can you resend against the latest kernel (2.6.22-rc1)? And what does Intel think? I'm expecting at least a reply from Milton as the patch was sent to him. I haven't yet tested it but will certainly do so. At first glance it looks OK, and I'll try to put it under my colleague's noses who know e100 best. A resend against 2.6.22-rc1 would be nice. Done. Below is a patch against 2.6.22-rc1. It combines removing the s-bit patch and applying the patch I previously sent. On the ARM, their is a race condition between software allocating a new receive buffer and hardware writing into a buffer. The two race on touching the last Receive Frame Descriptor (RFD). It has its el-bit set and its next link equal to 0. When hardware encounters this buffer it attempts to write data to it and then update Status Word bits and Actual Count in the RFD. At the same time software may try to clear the el-bit and set the link address to a new buffer. Since the entire RFD is once cache-line, the two write operations can collide. This can lead to the receive unit stalling or freed receive buffers getting written to. The fix is to set the el-bit on and the size to 0 on the next to last buffer in the chain. When the hardware encounters this buffer it stops and does not write to it at all. The hardware issues an RNR interrupt with the receive unit in the No Resources state. When software allocates buffers, it can update the tail of the list because it knows the hardware will stop at the buffer before it. Once it has a new next to last buffer prepared, it can clear the el-bit and set the size on the previous one. The race on this buffer is safe since the link already points to a valid next buffer. If the hardware sees the el-bit cleared without the size set, it will move on to the next buffer and complete that one in error. If it sees the size set but the el-bit still set, it will complete that buffer and then RNR interrupt and wait. Signed-off-by: David Acker <[EMAIL PROTECTED]> --- --- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 10:01:52.0 -0400 +++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 09:59:33.0 -0400 @@ -285,6 +285,12 @@ enum scb_status { rus_mask = 0x3C, }; +enum ru_state { + RU_SUSPENDED = 0, + RU_RUNNING = 1, + RU_UNINITIALIZED = -1, +}; + enum scb_stat_ack { stat_ack_not_ours= 0x00, stat_ack_sw_gen = 0x04, @@ -526,6 +532,7 @@ struct nic { struct rx *rx_to_use; struct rx *rx_to_clean; struct rfd blank_rfd; + enum ru_state ru_running; spinlock_t cb_lock cacheline_aligned; spinlock_t cmd_lock; @@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic ((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i)); /* Template for a freshly allocated RFD */ - nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s); + nic->blank_rfd.command = cpu_to_le16(cb_el); nic->blank_rfd.rbd = 0x; nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN); @@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni return 0; } -static inline void e100_start_receiver(struct nic *nic) +static inline void e100_start_receiver(struct nic *nic, struct rx *rx) { - /* Start if RFA is non-NULL */ - if(nic->rx_to_clean->skb) - e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr); + if(!nic->rxs) return; + if(RU_SUSPENDED != nic->ru_running) return; + + /* handle init time starts */ + if(!rx) rx = nic->rxs; + + /* (Re)start RU if suspended or idle and RFA is non-NULL */ + if(rx->skb) { + e100_exec_cmd(nic, ruc_start, rx->dma_addr); + nic->ru_running = RU_RUNNING; + } } #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN) @@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic } /* Link the RFD to end of RFA by linking previous RFD to -* this one, and clearing EL bit of previous. */ +* this one. We are safe to touch the previous RFD because +* it is protected the before last buffer's el bit being set */ if(rx->prev->skb) { struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data; put_unaligned(cpu_to_le32(rx->dma_addr), (u32 *)&prev_rfd->link); - wmb(); - prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s); pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr, sizeof(struct rfd), PCI_DMA_TODEVICE); } @@ -1813,6 +1827,10 @@ static int e100_rx_indicate(struct nic * pci_unmap_single(nic->pdev, rx->dma_addr, RFD_BUF_LEN, PCI_DMA_FROMDEVICE); + /* this allows for a fast restart without re-enabling interrupts */ + if(le16
[PATCH 0/4] [Net] Support Xen accelerated network plugin modules
This is a repost of some earlier patches to the xen-devel mailing list, with a number of changes thanks to some useful suggestions from others. Apologies for the short delay in getting this next version ready. I've also CC'd netdev@vger.kernel.org as some of the files being patched may be merged into upstream linux soon, and so folks there may have opinions too. This set of patches provides the hooks and support necessary for accelerated network plugin modules to attach to Xen's netback and netfront. These modules provide a fast path for network traffic where there is hardware support available for the netfront driver to send and receive packets directly to a NIC (such as those available from Solarflare). As there are currently no available plugins, I've attached a couple of dummy ones to illustrate how the hooks could be used. These are incomplete (and clearly wouldn't even compile) in that they only include code to show the interface between the accelerated module and netfront/netback. A lot of the comments hint at what code should go where. They don't show any interface between the accelerated frontend and accelerated backend, or hardware access, for example, as those would both be specific to the implementation. I hope they help illustrate this, but if you have any questions I'm happy to provide more information. A brief overview of the operation of the plugins: When the accelerated modules are loaded, a VI is created by the accelerated backend to allow the accelerated frontend to safely access portions of the NIC. For RX, when packets are received by the accelerated backend, it will examine them and if appropriate insert filters into the NIC to deliver future packets on that address directly to the accelerated frontend's VI. For TX, netfront gives each accelerated frontend the option of sending each packet, which it can accept (if it wants to send it directly to the hardware) or decline (if it thinks this is more appropriate to send via the normal network path). We have tried to ensure that the hooks are hardware-agnostic, i.e. would be relevant to hardware other than our own, without providing all possible ways of doing each task (but if others need to extend it, that would be welcomed). We have found that using this approach to accelerating network traffic, domU to domU connections (across the network) can achieve close to the performance of dom0 to dom0 connections on a 10Gbps ethernet. This is roughly double the bandwidth seen with unmodified Xen. Kieran /***/ /*! \file dumm_accel_backend.c Dummy accelerated plugin module Copyright 2006 Solarflare Communications Inc, 9501 Jeronimo Road, Suite 250, Irvine, CA 92618, USA This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation, incorporated herein by reference. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /***/ static struct netback_accel_hooks accel_hooks = { &netback_accel_probe, &netback_accel_remove }; static const char *frontend_name = "dummynetaccel"; static int netback_accel_init(void) { /* Initialise the rest of the module... */ /* Tell the netback that we're here */ netback_connect_accelerator(0, frontend_name, &accel_hooks); } module_init(netback_accel_init); static void __exit netback_accel_exit(void) { netback_disconnect_accelerator(0, frontend_name); /* ...and take down the rest of the module */ } module_exit(netback_accel_exit); int netback_accel_probe(struct xenbus_device *dev) { struct backend_info *binfo; /* Setup per-device internal state */ /* Store internal state for future access */ binfo = (struct backend_info *) dev->dev.driver_data; binfo->netback_accel_priv = my_internal_state; /* Setup watch on accel-state, so we know when frontend acceleration plugin is loaded */ setup_domu_accel_watch(dev, my_internal_state); } int netback_accel_remove(struct xenbus_device *dev) { /* Cleanup as the device is going away */ } void netback_accel_frontend_changed(struct xenbus_device *dev, XenbusState frontend_state) { switch(frontend_state) { case XenbusStateConnected: /* Frontend has loaded and is ready to go */ /* I
[PATCH 2/4] [Net] Support Xen accelerated network plugin modules
Add accel option to vif xend config to allow users to specify which interfaces should be accelerated using which plugin modules. diff -r 194f5b88d257 tools/python/xen/xend/server/netif.py --- a/tools/python/xen/xend/server/netif.py Tue Apr 17 09:04:51 2007 +0100 +++ b/tools/python/xen/xend/server/netif.py Tue Apr 17 09:06:58 2007 +0100 @@ -107,6 +107,7 @@ class NetifController(DevController): uuid= config.get('uuid') ipaddr = config.get('ip') model = config.get('model') +accel = config.get('accel') if not typ: typ = xoptions.netback_type @@ -131,6 +132,8 @@ class NetifController(DevController): back['uuid'] = uuid if model: back['model'] = model +if accel: +back['accel'] = accel config_path = "device/%s/%d/" % (self.deviceClass, devid) for x in back: @@ -157,10 +160,10 @@ class NetifController(DevController): config_path = "device/%s/%d/" % (self.deviceClass, devid) devinfo = () for x in ( 'script', 'ip', 'bridge', 'mac', - 'type', 'vifname', 'rate', 'uuid', 'model' ): + 'type', 'vifname', 'rate', 'uuid', 'model', 'accel'): y = self.vm._readVm(config_path + x) devinfo += (y,) -(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo +(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = devinfo if script: result['script'] = script @@ -180,5 +183,7 @@ class NetifController(DevController): result['uuid'] = uuid if model: result['model'] = model +if accel: +result['accel'] = accel return result diff -r 194f5b88d257 tools/python/xen/xm/create.py --- a/tools/python/xen/xm/create.py Tue Apr 17 09:04:51 2007 +0100 +++ b/tools/python/xen/xm/create.py Tue Apr 17 09:04:54 2007 +0100 @@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals): def f(k): if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type', - 'vifname', 'rate', 'model']: + 'vifname', 'rate', 'model', 'accel']: err('Invalid vif option: ' + k) config_vif.append([k, d[k]]) Add accel option to vif xend config diff -r 194f5b88d257 tools/python/xen/xend/server/netif.py --- a/tools/python/xen/xend/server/netif.py Tue Apr 17 09:04:51 2007 +0100 +++ b/tools/python/xen/xend/server/netif.py Tue Apr 17 09:06:58 2007 +0100 @@ -107,6 +107,7 @@ class NetifController(DevController): uuid= config.get('uuid') ipaddr = config.get('ip') model = config.get('model') +accel = config.get('accel') if not typ: typ = xoptions.netback_type @@ -131,6 +132,8 @@ class NetifController(DevController): back['uuid'] = uuid if model: back['model'] = model +if accel: +back['accel'] = accel config_path = "device/%s/%d/" % (self.deviceClass, devid) for x in back: @@ -157,10 +160,10 @@ class NetifController(DevController): config_path = "device/%s/%d/" % (self.deviceClass, devid) devinfo = () for x in ( 'script', 'ip', 'bridge', 'mac', - 'type', 'vifname', 'rate', 'uuid', 'model' ): + 'type', 'vifname', 'rate', 'uuid', 'model', 'accel'): y = self.vm._readVm(config_path + x) devinfo += (y,) -(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo +(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = devinfo if script: result['script'] = script @@ -180,5 +183,7 @@ class NetifController(DevController): result['uuid'] = uuid if model: result['model'] = model +if accel: +result['accel'] = accel return result diff -r 194f5b88d257 tools/python/xen/xm/create.py --- a/tools/python/xen/xm/create.py Tue Apr 17 09:04:51 2007 +0100 +++ b/tools/python/xen/xm/create.py Tue Apr 17 09:04:54 2007 +0100 @@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals): def f(k): if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type', - 'vifname', 'rate', 'model']: + 'vifname', 'rate', 'model', 'accel']: err('Invalid vif option: ' + k) config_vif.append([k, d[k]])
[PATCH 4/4] [Net] Support Xen accelerated network plugin modules
Add support to Xen netback to support accelerated plugin module diff -r da9639486bf2 linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri May 18 10:36:47 2007 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri May 18 11:02:49 2007 +0100 @@ -114,6 +114,41 @@ typedef struct netif_st { #define netback_carrier_off(netif) ((netif)->carrier = 0) #define netback_carrier_ok(netif) ((netif)->carrier) + +#include + +/* Function pointers into netback accelerator plugin modules */ +struct netback_accel_hooks { +int (*probe)(struct xenbus_device *dev); +int (*remove)(struct xenbus_device *dev); +}; + +/* Structure to track the state of a netback accelerator plugin */ +struct netback_accelerator { +struct list_head link; +int id; +char *frontend; +struct netback_accel_hooks *hooks; +}; + +/* Connect an accelerator plugin module to netback */ +extern void netback_connect_accelerator(int id, const char *frontend, +struct netback_accel_hooks *hooks); +/* Disconnect a previously connected accelerator pluging module */ +extern void netback_disconnect_accelerator(int id, const char *frontend); + +struct backend_info { + struct xenbus_device *dev; + netif_t *netif; + enum xenbus_state frontend_state; + +/* State relating to the netback accelerator */ +void *netback_accel_priv; +/* The accelerator that this backend is currently using */ +struct netback_accelerator *accelerator; +}; + + #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE) #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE) diff -r da9639486bf2 linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri May 18 10:36:47 2007 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri May 18 11:02:54 2007 +0100 @@ -1,6 +1,7 @@ /* Xenbus code for netif backend Copyright (C) 2005 Rusty Russell <[EMAIL PROTECTED]> Copyright (C) 2005 XenSource Ltd +Copyright (c) 2007 Solarflare Communications, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -28,11 +29,126 @@ printk("netback/xenbus (%s:%d) " fmt ".\n", __FUNCTION__, __LINE__, ##args) #endif -struct backend_info { - struct xenbus_device *dev; - netif_t *netif; - enum xenbus_state frontend_state; -}; +/* + * A list of available netback accelerator plugin modules (each list + * entry is of type struct netback_accelerator) + */ +static struct list_head accelerators_list; +/* Lock used to protect access to accelerators_list */ +static spinlock_t accelerators_lock; + +/* + * Compare a backend to an accelerator, and decide if they are + * compatible (i.e. if the accelerator should be used by the + * backend) + */ +static int match_accelerator(struct backend_info *be, + struct netback_accelerator *accelerator) +{ +/* + * This could do with being more sophisticated. For example, + * determine which hardware is being used by each backend from + * the bridge and network topology of the domain + */ +if ( be->accelerator == NULL ) +return 1; +else +return 0; +} + +/* + * Notify all suitable backends that a new accelerator is available + * and connected. This will also notify the accelerator plugin module + * that it is being used for a device through the probe hook. + */ +static int netback_accelerator_tell_backend(struct device *dev, void *arg) +{ +struct netback_accelerator *accelerator = +(struct netback_accelerator *)arg; +struct xenbus_device *xendev = to_xenbus_device(dev); + +if( !strcmp("vif", xendev->devicetype) ) { +struct backend_info *be = xendev->dev.driver_data; + +if ( match_accelerator(be, accelerator) ) { + be->accelerator = accelerator; + be->accelerator->hooks->probe(xendev); +} +} +return 0; +} + + +/* + * Entry point for an netback accelerator plugin module. Called to + * advertise its presence, and connect to any suitable backends. + */ +void netback_connect_accelerator(int id, const char *frontend, + struct netback_accel_hooks *hooks) +{ +struct netback_accelerator *new_accelerator = +kmalloc(sizeof(struct netback_accelerator), GFP_KERNEL); +unsigned frontend_len, flags; + +if ( new_accelerator ) { +new_accelerator->id = id; + +frontend_len = strlen(frontend)+1; +new_accelerator->frontend = kmalloc(frontend_len, GFP_KERNEL); +i
[PATCH 1/4] [Net] Support Xen accelerated network plugin modules
Provide two helper functions (xenbus_for_each_frontend, xenbus_for_each_backend) to allow drivers to iterate the xenbus frontend and backend buses. diff -r 4f67d849e788 linux-2.6-xen- sparse/drivers/xen/xenbus/xenbus_probe.c --- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Tue Apr 03 09:03:51 2007 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Tue Apr 03 13:32:33 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -1079,3 +1080,10 @@ static int __init boot_wait_for_devices( late_initcall(boot_wait_for_devices); #endif + + +int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_frontend); diff -r 4f67d849e788 linux-2.6-xen- sparse/drivers/xen/xenbus/xenbus_probe_backend.c --- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.c Tue Apr 03 09:03:51 2007 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.c Tue Apr 03 13:32:33 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -284,3 +285,10 @@ void xenbus_backend_device_register(void xenbus_backend.error); } } + +int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_backend); + diff -r 4f67d849e788 linux-2.6-xen-sparse/include/xen/xenbus.h --- a/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 09:03:51 2007 +0100 +++ b/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 13:32:33 2007 +0100 @@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d int xenbus_dev_is_online(struct xenbus_device *dev); int xenbus_frontend_closed(struct xenbus_device *dev); +extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *)); +extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *)); + #endif /* _XEN_XENBUS_H */ Add xenbus_for_each_[back,front]end functions to iterate each bus diff -r 4f67d849e788 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c --- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.cTue Apr 03 09:03:51 2007 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.cTue Apr 03 13:32:33 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -1079,3 +1080,10 @@ static int __init boot_wait_for_devices( late_initcall(boot_wait_for_devices); #endif + + +int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_frontend); diff -r 4f67d849e788 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.c --- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.cTue Apr 03 09:03:51 2007 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.cTue Apr 03 13:32:33 2007 +0100 @@ -4,6 +4,7 @@ * Copyright (C) 2005 Rusty Russell, IBM Corporation * Copyright (C) 2005 Mike Wray, Hewlett-Packard * Copyright (C) 2005, 2006 XenSource Ltd + * Copyright (C) 2007 Solarflare Communications, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -284,3 +285,10 @@ void xenbus_backend_device_register(void xenbus_backend.error); } } + +int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *)) +{ +return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn); +} +EXPORT_SYMBOL_GPL(xenbus_for_each_backend); + diff -r 4f67d849e788 linux-2.6-xen-sparse/include/xen/xenbus.h --- a/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 09:03:51 2007 +0100 +++ b/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 13:32:33 2007 +0100 @@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d int xenbus_dev_is_online(struct xenbus_device *dev); int xenbus_frontend_closed(struct xenbus_device
Re: select(0, ..) is valid ?
On Wednesday 16 May 2007 17:37, Anton Blanchard wrote: > Hi Hugh, > > > It's interesting that compat_core_sys_select() shows this kmalloc(0) > > failure but core_sys_select() does not. That's because core_sys_select() > > avoids kmalloc by using a buffer on the stack for small allocations (and > > 0 sure is small). Shouldn't compat_core_sys_select() do just the same? > > Or is SLUB going to be so efficient that doing so is a waste of time? > > Nice catch, the original optimisation from Andi is: > > http://git.kernel.org/git-new/?p=linux/kernel/git/torvalds/linux-2.6.git;a= >commit;h=70674f95c0a2ea694d5c39f4e514f538a09be36f > > And I think it makes sense for the compat code to do it too. Yes agreed. I just forgot the copy'n'pasted code when doing the original change. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small packets xmits
On Wed, 2007-16-05 at 23:25 -0400, jamal wrote: > > This patch now includes two changed drivers (tun and e1000). I have > tested tun with this patch. I tested e1000 earlier and i couldnt find > any issues - although as the tittle says its a WIP. > > As before you need net-2.6. You also need the qdisc restart cleanup > patch. > I have found a bug on e1000 which manifests under a lot of pounding with traffic. I am not gonna be posting the patch anymore - but will continue to fix and add things. If you want the patches email me privately. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] NetXen: Updates for register access routines
On Friday 18 May 2007 06:17, Jeff Garzik wrote: > Mithlesh Thukral wrote: > > NetXen: Changes related to registers and their access routines > > This patch updates the various access routines to access different > > control and status settings present in different register locations. > > This will fix problems related to working of different ports in > > multi Port card. > > > > Signed-off by: Milan Bag <[EMAIL PROTECTED]> > > Signed-off by: Adhiraj Joshi <[EMAIL PROTECTED]> > > Acked-by: Mithlesh Thukral <[EMAIL PROTECTED]> > > --- > > drivers/net/netxen/netxen_nic.h | 96 ++--- > > drivers/net/netxen/netxen_nic_hw.h | 65 +++- > > drivers/net/netxen/netxen_nic_init.c | 17 ++-- > > 3 files changed, 65 insertions(+), 113 deletions(-) > > not applied. This patch removes a bunch of endianness changes. Ok. I will take care of these changes in the next patchset that i submit. Thanks for applying the first patch. -- Mithlesh Thukral - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/7] CAN: Add PF_CAN core module
Hi Urs, Hello Paul, i assume Paul refers to the can_rx_delete_all() function that adds each receive list entry for rcu removal using the can_rx_delete RCU callback, right? So the idea would be to create a second RCU callback - e.g. can_rx_delete_list() - that removes the complete list inside the RCU callback?!? The list removal would therefore be processed inside this new can_rx_delete_list() in RCU context and not inside can_rx_delete_all(). @Paul: Was this your intention? Best regards, Oliver Paul E. McKenney wrote: On Wed, May 16, 2007 at 04:51:02PM +0200, Urs Thuermann wrote: This patch adds the CAN core functionality but no protocols or drivers. No protocol implementations are included here. They come as separate patches. Protocol numbers are already in include/linux/can.h. Interesting! One question called out below -- why do call_rcu() on each piece of the struct dev_rcv_lists, instead of doing call_rcu() on the whole thing and having the RCU callback free up the pieces? Given that all the pieces are call_rcu()ed separately, there had better not be persistent pointers to the pieces, right? Doing it in one chunk would make the code a bit simpler and also reduce the RCU overhead a bit. Or am I missing something subtle here? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6 2/2] [TCP] FRTO: Prevent state inconsistency in corner cases
State could become inconsistent in two cases: 1) Userspace disabled FRTO by tuning sysctl when one of the TCP flows was in the middle of FRTO algorithm (and then RTO is again triggered) 2) SACK reneging occurs during FRTO algorithm A simple solution is just to abort the previous FRTO when such obscure condition occurs... Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_input.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 7ecdc89..38cb25b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1501,6 +1501,8 @@ void tcp_enter_loss(struct sock *sk, int how) tcp_set_ca_state(sk, TCP_CA_Loss); tp->high_seq = tp->snd_nxt; TCP_ECN_queue_cwr(tp); + /* Abort FRTO algorithm if one is in progress */ + tp->frto_counter = 0; clear_all_retrans_hints(tp); } -- 1.5.0.6
[PATCH RESEND net-2.6] [TCP] FRTO: Add missing ECN CWR sending to one of the responses
The conservative spurious RTO response did not queue CWR even though the sending rate was lowered. Whenever reduction happens regardless of reason, CWR should be sent (forgetting to send it is not very fatal though). A better approach would be to queue CWR when one of the sending rate reducing responses (rate-halving one or this conservative response) is used already at RTO. Doing that would allow CWR to be sent along with the two new data segments that are sent during FRTO. However, it's a bit "racy" because userland could tune the response sysctl to a more aggressive one in between. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_input.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 7641b27..7ecdc89 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2608,6 +2608,7 @@ static void tcp_conservative_spur_to_response(struct tcp_sock *tp) { tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh); tp->snd_cwnd_cnt = 0; + TCP_ECN_queue_cwr(tp); tcp_moderate_cwnd(tp); } -- 1.5.0.6
Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed
From: Julian Anastasov <[EMAIL PROTECTED]> Date: Fri, 18 May 2007 11:40:54 +0300 (EEST) > On Thu, 17 May 2007, Patrick McHardy wrote: > > > In any case some better solution than the current one needs to be > > found, allowing users to send spoofed packets is far worse than > > using a non-desired source address for ICMP packets. > > yes, I would prefer the sysctl_ip_nonlocal_bind change to be > removed until such solution is found. Ok, I'll revert it. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed
Hello, On Thu, 17 May 2007, Patrick McHardy wrote: > > But what is preferred is to use VIP in ICMP. > > > > ip route add local VIP dev lo table user_defined > > > > returns RTCF_LOCAL but inet_addr_type() does not return RTN_LOCAL, > > we fix one thing but break another :) > > > Actually thats exactly the case that my patch handles. Why does it > matter which source address the ICMP packet uses, as long as its > routed properly? It should work for most of the cases but it can cause problems in closely connected hosts where using the right subnet matters. If inet_addr_type is not considered slow for routers and this local route justifies it then i have no more objections. May be Janusz should test it first without sysctl_ip_nonlocal_bind change. > In any case some better solution than the current one needs to be > found, allowing users to send spoofed packets is far worse than > using a non-desired source address for ICMP packets. yes, I would prefer the sysctl_ip_nonlocal_bind change to be removed until such solution is found. Regards -- Julian Anastasov <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] s2io: add PCI error recovery support
Hi, Fix looks good. I have couple of comments, 1) Return from s2io_updt_stats function if the PCI bus is offline (pci_channel_offline). if (pci_channel_offline(pdev)) return; 2) No Need to call netif_wake_queue() in s2io_io_resume as netif_device_attach() will take care of calling it. 3) We need to return from s2io_msix_ring_handle(), s2io_msix_fifo_handle(), s2io_msi_handle() if the PCI channel is offline. Some thing similary to the below condition that is added in the s2io_isr() -4117,6 +4129,10 @@ static irqreturn_t s2io_isr(int irq, voi struct mac_info *mac_control; struct config_param *config; + /* Pretend we handled any irq's from a disconnected card */ + if (pci_channel_offline(sp->pdev)) + return IRQ_NONE; Thanks, ~Siva -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Linas Vepstas Sent: Tuesday, May 15, 2007 5:08 AM To: Jeff Garzik Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: [PATCH 1/2] s2io: add PCI error recovery support This patch adds PCI error recovery support to the s2io 10-Gigabit ethernet device driver. Third revision, blocks interrupts and the watchdog. Tested, seems to work well. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Acked-by: Ramkrishna Vepa <[EMAIL PROTECTED]> Cc: Raghavendra Koushik <[EMAIL PROTECTED]> Cc: Wen Xiong <[EMAIL PROTECTED]> Jeff, please apply for 2.6.22 --linas drivers/net/s2io.c | 116 ++--- drivers/net/s2io.h |5 ++ 2 files changed, 116 insertions(+), 5 deletions(-) Index: linux-2.6.22-rc1/drivers/net/s2io.c === --- linux-2.6.22-rc1.orig/drivers/net/s2io.c2007-05-14 17:05:03.0 -0500 +++ linux-2.6.22-rc1/drivers/net/s2io.c 2007-05-14 17:23:45.0 -0500 @@ -480,11 +480,18 @@ static struct pci_device_id s2io_tbl[] _ MODULE_DEVICE_TABLE(pci, s2io_tbl); +static struct pci_error_handlers s2io_err_handler = { + .error_detected = s2io_io_error_detected, + .slot_reset = s2io_io_slot_reset, + .resume = s2io_io_resume, +}; + static struct pci_driver s2io_driver = { .name = "S2IO", .id_table = s2io_tbl, .probe = s2io_init_nic, .remove = __devexit_p(s2io_rem_nic), + .err_handler = &s2io_err_handler, }; /* A simplifier macro used both by init and free shared_mem Fns(). */ @@ -2700,6 +2707,9 @@ static void s2io_netpoll(struct net_devi u64 val64 = 0xULL; int i; + if (pci_channel_offline(nic->pdev)) + return; + disable_irq(dev->irq); atomic_inc(&nic->isr_cnt); @@ -3225,6 +3235,8 @@ static void alarm_intr_handler(struct s2 int i; if (atomic_read(&nic->card_state) == CARD_DOWN) return; + if (pci_channel_offline(nic->pdev)) + return; nic->mac_control.stats_info->sw_stat.ring_full_cnt = 0; /* Handling the XPAK counters update */ if(nic->mac_control.stats_info->xpak_stat.xpak_timer_count < 72000) { @@ -4324,6 +4336,10 @@ static irqreturn_t s2io_isr(int irq, voi struct mac_info *mac_control; struct config_param *config; + /* Pretend we handled any irq's from a disconnected card */ + if (pci_channel_offline(sp->pdev)) + return IRQ_NONE; + atomic_inc(&sp->isr_cnt); mac_control = &sp->mac_control; config = &sp->config; @@ -6579,7 +6595,7 @@ static void s2io_rem_isr(struct s2io_nic } while(cnt < 5); } -static void s2io_card_down(struct s2io_nic * sp) +static void do_s2io_card_down(struct s2io_nic * sp, int do_io) { int cnt = 0; struct XENA_dev_config __iomem *bar0 = sp->bar0; @@ -6594,7 +6610,8 @@ static void s2io_card_down(struct s2io_n atomic_set(&sp->card_state, CARD_DOWN); /* disable Tx and Rx traffic on the NIC */ - stop_nic(sp); + if (do_io) + stop_nic(sp); s2io_rem_isr(sp); @@ -6602,7 +6619,7 @@ static void s2io_card_down(struct s2io_n tasklet_kill(&sp->task); /* Check if the device is Quiescent and then Reset the NIC */ - do { + while(do_io) { /* As per the HW requirement we need to replenish the * receive buffer to avoid the ring bump. Since there is * no intention of processing the Rx frame at this pointwe are @@ -6627,8 +6644,9 @@ static void s2io_card_down(struct s2io_n (unsigned long long) val64); break; } - } while (1); - s2io_reset(sp); + } + if (do_io) + s2io_reset(sp); spin_lock_irqsave(&sp->tx_lock, flags); /* Free all Tx buffers */ @@ -6643,6 +6661,11 @@ static void s2io_card_down(struct s2io_n
Re: [RFC] New driver API to speed up small packets xmits
David Miller wrote: > From: "Michael Chan" <[EMAIL PROTECTED]> > Date: Tue, 15 May 2007 15:05:28 -0700 > > > For each gso_seg, there will be a header and the payload may span 2 > > pages for 1500-byte packets. We always assume 1500-byte packets > > because the buggy chips do not support jumbo frames. > > Correct. > > I think there may be a case where you could see up to 4 segments. > If the user corks the TCP socket, does a sendmsg() (which puts > the data in the per-socket page) then does a sendfile() you'll > see something like: > > skb->data IP, TCP, ethernet headers, etc. > page0 sendmsg() data > page1 sendfile > page2 sendfile > > Ie. this can happen if the sendfile() part starts near the > end of a page, so it would get split even for a 1500 MTU > frame. > > Even more complex variants are possible if the user does > tiny sendfile() requests to different pages within the file. > > So in fact it can span up to N pages. > > But there is an upper limit defined by the original GSO > frame, and that is controlled by MAX_SKB_FRAGS, so at most > you would see MAX_SKB_FRAGS plus some small constant. > > I see. Is the following a better estimate? (gso_segs * 2) + skb_shinfo(skb)->nr_frags For each gso_seg, you get at least a header segment and a segment from a page for 1500-byte packets. You can get more segments if packets cross page boundaries or if there are tiny sendfile pages or some sendmsg pages mixed in. But these additional segments will never be more than nr_frags. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html