Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
David S. Miller a écrit : From: Eric Dumazet <[EMAIL PROTECTED]> Date: Sat, 07 Jan 2006 08:34:35 +0100 I agree, I do use a hashed spinlock array on my local tree for TCP, mainly to reduce the hash table size by a 2 factor. So what do you think about going to a single spinlock for the routing cache? I have no problem with this, since the biggest server I have is 4 way, but are you sure big machines wont suffer from this single spinlock ? Also I dont understand what you want to do after this single spinlock patch. How is it supposed to help the 'ip route flush cache' problem ? In my case, I have about 600.000 dst-entries : # grep ip_dst /proc/slabinfo ip_dst_cache 616250 622440320 121 : tunables 54 278 : slabdata 51870 51870 0 Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Sat, 07 Jan 2006 08:34:35 +0100 > I agree, I do use a hashed spinlock array on my local tree for TCP, > mainly to reduce the hash table size by a 2 factor. So what do you think about going to a single spinlock for the routing cache? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
Andi Kleen a écrit : I always disliked the per chain spinlocks even for other hash tables like TCP/UDP multiplex - it would be much nicer to use a much smaller separately hashed lock table and save cache. In this case the special case of using a one entry only lock hash table makes sense. I agree, I do use a hashed spinlock array on my local tree for TCP, mainly to reduce the hash table size by a 2 factor. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] fix ipvs compilation
From: Joe Kappus <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 22:30:56 -0500 > Why not then, we'll do this one as well since it needs it. > > Signed-off-by: Joe Kappus <[EMAIL PROTECTED]> Your email client corrupted the patch, I fixed it up manually this time, but next time I won't be so nice so please get this working. Thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
From: Andi Kleen <[EMAIL PROTECTED]> Date: Sat, 7 Jan 2006 02:09:01 +0100 > I always disliked the per chain spinlocks even for other hash tables like > TCP/UDP multiplex - it would be much nicer to use a much smaller separately > hashed lock table and save cache. In this case the special case of using > a one entry only lock hash table makes sense. I used to think they were a great technique. But in each case I thought they could be applied, better schemes have come along. In the case of the page cache we went to a per-address-space tree, and here in the routing cache we went to RCU. There are RCU patches around for the TCP hashes and I'd like to put those in at some point as well. In fact, they'd be even more far reaching since Arnaldo abstracted away the socket hashing stuff into an inet_hashtables subsystem. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ax25/mkiss: unbalanced spinlock_bh in ax_encaps()
From: Francois Romieu <[EMAIL PROTECTED]> Date: Sat, 7 Jan 2006 03:22:43 +0100 > The unlocking disappeared during commit > 5793f4be23f0171b4999ca68a39a9157b44139f3. > > Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Applied, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] fix ipvs compilation
On 1/6/06, David S. Miller <[EMAIL PROTECTED]> wrote: > From: Joe <[EMAIL PROTECTED]> > Date: Thu, 5 Jan 2006 23:43:52 -0500 > > > Thats not all either, ./net/ipv4/netfilter/ipt_helper.c has the same > > error and the same fix. > > > > Here's the patch for this one. Sorry for the dupe.. i sent the last > > as html by accident. > > Applied, please provide a "Signed-off-by:" line with your patch > next time. > > Thanks. > Why not then, we'll do this one as well since it needs it. Signed-off-by: Joe Kappus <[EMAIL PROTECTED]> --- ./net/ipv4/netfilter/ip_conntrack_proto_sctp.c.old 2006-01-06 22:27:08.885583023 -0500 +++ ./net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2006-01-06 22:27:44.606582972 -0500 @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ax25/mkiss: unbalanced spinlock_bh in ax_encaps()
The unlocking disappeared during commit 5793f4be23f0171b4999ca68a39a9157b44139f3. Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c index 3e9accf..41b3d83 100644 --- a/drivers/net/hamradio/mkiss.c +++ b/drivers/net/hamradio/mkiss.c @@ -524,6 +524,7 @@ static void ax_encaps(struct net_device ax->dev->trans_start = jiffies; ax->xleft = count - actual; ax->xhead = ax->xbuff + actual; + spin_unlock_bh(&ax->buflock); } /* Encapsulate an AX.25 packet and kick it into a TTY queue. */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETFILTER 00/10]: Netfilter IPsec support
In article <[EMAIL PROTECTED]> (at Sat, 7 Jan 2006 02:09:30 +0100 (MET)), Patrick McHardy <[EMAIL PROTECTED]> says: > following are the remaining patches for netfilter IPsec support. > They are missing the common-case optimization for inner transport mode > SAs on the input path, but since its just an optimization, I think > it can also be done later. One note: unfortunately I had to increase I definitely want to do it before 2.6.16. Anyway, we'll test this series of patches. Thank you. --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Saturday 07 January 2006 01:17, David S. Miller wrote: > > I mean something like this patch: Looks like a good idea to me. I always disliked the per chain spinlocks even for other hash tables like TCP/UDP multiplex - it would be much nicer to use a much smaller separately hashed lock table and save cache. In this case the special case of using a one entry only lock hash table makes sense. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
On Fri, Jan 06, 2006 at 01:25:03PM -0800, David S. Miller wrote: > From: Alexey Dobriyan <[EMAIL PROTECTED]> > Date: Fri, 6 Jan 2006 23:18:37 +0300 > > > And fix trivial warnings that emerged. > > > > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> > > Applied. OK, will merge... I've actually got way past that point (morning snapshot is on ftp.linux.org.uk/pub/people/viro/net-endian-mbox) and I hope to finish the bulk of net/* tonight. It still needs reordering and merging some of the chunks, though. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Localizing a variable in net/core/filter.c
From: David S. Miller Sent: 1/6/2006 5:29:20 PM > From: "Kris Katterjohn" <[EMAIL PROTECTED]> > Date: Fri, 6 Jan 2006 17:25:32 -0800 > > > So the whole thing is wrong? If so, I guess I understand why it was > > done the way it was before. > > It's using the local variable in the parent function as a temporary > scratch area if the SKB isn't linear and we need to copy the packet > data out from the scatter-gather list of the SKB. > > Read the implementation of skb_header_pointer() and be confused > no further. Okay.. the last horse finally crossed the finish line. Thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Localizing a variable in net/core/filter.c
From: "Kris Katterjohn" <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 17:25:32 -0800 > So the whole thing is wrong? If so, I guess I understand why it was > done the way it was before. It's using the local variable in the parent function as a temporary scratch area if the SKB isn't linear and we need to copy the packet data out from the scatter-gather list of the SKB. Read the implementation of skb_header_pointer() and be confused no further. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Localizing a variable in net/core/filter.c
From: Patrick McHardy Sent: 1/6/2006 5:20:44 PM > > -static inline void *load_pointer(struct sk_buff *skb, int k, > > - unsigned int size, void *buffer) > > +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int > > size) > > { > > - if (k >= 0) > > - return skb_header_pointer(skb, k, size, buffer); > > - else { > > + if (k >= 0) { > > + u32 buffer; > > + return skb_header_pointer(skb, k, size, &buffer); > > This is also wrong, now you returning an address from load_pointer's > stackframe. So the whole thing is wrong? If so, I guess I understand why it was done the way it was before. Shouldn't gcc warn about this kind of thing? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Localizing a variable in net/core/filter.c
Kris Katterjohn wrote: From: Patrick McHardy Sent: 1/6/2006 5:12:33 PM -static inline void *load_pointer(struct sk_buff *skb, int k, - unsigned int size, void *buffer) +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size) { - if (k >= 0) + if (k >= 0) { + u32 *buffer = NULL; return skb_header_pointer(skb, k, size, buffer); This is wrong, skb_header_pointer needs a pointer to a buffer to which it can copy the packet contents if they are located in the non-linear area. Ah, gotcha. --- x/net/core/filter.c 2006-01-06 19:14:34.0 -0600 +++ y/net/core/filter.c 2006-01-06 19:14:26.0 -0600 @@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu return NULL; } -static inline void *load_pointer(struct sk_buff *skb, int k, - unsigned int size, void *buffer) +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size) { - if (k >= 0) - return skb_header_pointer(skb, k, size, buffer); - else { + if (k >= 0) { + u32 buffer; + return skb_header_pointer(skb, k, size, &buffer); This is also wrong, now you returning an address from load_pointer's stackframe. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Localizing a variable in net/core/filter.c
From: Patrick McHardy Sent: 1/6/2006 5:12:33 PM > > -static inline void *load_pointer(struct sk_buff *skb, int k, > > - unsigned int size, void *buffer) > > +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int > > size) > > { > > - if (k >= 0) > > + if (k >= 0) { > > + u32 *buffer = NULL; > > return skb_header_pointer(skb, k, size, buffer); > > This is wrong, skb_header_pointer needs a pointer to a buffer > to which it can copy the packet contents if they are located > in the non-linear area. Ah, gotcha. --- x/net/core/filter.c 2006-01-06 19:14:34.0 -0600 +++ y/net/core/filter.c 2006-01-06 19:14:26.0 -0600 @@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu return NULL; } -static inline void *load_pointer(struct sk_buff *skb, int k, - unsigned int size, void *buffer) +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size) { - if (k >= 0) - return skb_header_pointer(skb, k, size, buffer); - else { + if (k >= 0) { + u32 buffer; + return skb_header_pointer(skb, k, size, &buffer); + } else { if (k >= SKF_AD_OFF) return NULL; return __load_pointer(skb, k); @@ -82,7 +82,6 @@ unsigned int sk_run_filter(struct sk_buf u32 A = 0; /* Accumulator */ u32 X = 0; /* Index Register */ u32 mem[BPF_MEMWORDS]; /* Scratch Memory Store */ - u32 tmp; int k; int pc; @@ -176,7 +175,7 @@ unsigned int sk_run_filter(struct sk_buf case BPF_LD|BPF_W|BPF_ABS: k = fentry->k; load_w: - ptr = load_pointer(skb, k, 4, &tmp); + ptr = load_pointer(skb, k, 4); if (ptr != NULL) { A = ntohl(*(u32 *)ptr); continue; @@ -185,7 +184,7 @@ unsigned int sk_run_filter(struct sk_buf case BPF_LD|BPF_H|BPF_ABS: k = fentry->k; load_h: - ptr = load_pointer(skb, k, 2, &tmp); + ptr = load_pointer(skb, k, 2); if (ptr != NULL) { A = ntohs(*(u16 *)ptr); continue; @@ -194,7 +193,7 @@ unsigned int sk_run_filter(struct sk_buf case BPF_LD|BPF_B|BPF_ABS: k = fentry->k; load_b: - ptr = load_pointer(skb, k, 1, &tmp); + ptr = load_pointer(skb, k, 1); if (ptr != NULL) { A = *(u8 *)ptr; continue; @@ -216,7 +215,7 @@ load_b: k = X + fentry->k; goto load_b; case BPF_LDX|BPF_B|BPF_MSH: - ptr = load_pointer(skb, fentry->k, 1, &tmp); + ptr = load_pointer(skb, fentry->k, 1); if (ptr != NULL) { X = (*(u8 *)ptr & 0xf) << 2; continue; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Localizing a variable in net/core/filter.c
Kris Katterjohn wrote: This localizes a variable to the function it's used in. Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]> I assume tmp was used for a reason instead of using a variable local to the if() in load_pointer(), but I can't figure out why. So I wrote this patch changing it in case it was just a mistake or something left over from something else. So in other words, can you explain to me why it was done the way it was done? If not, I think my patch takes care of it. Also, I tested it my way and everything seems to be working quite well. Thanks! --- x/net/core/filter.c 2006-01-06 16:51:51.0 -0600 +++ y/net/core/filter.c 2006-01-06 18:17:43.0 -0600 @@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu return NULL; } -static inline void *load_pointer(struct sk_buff *skb, int k, - unsigned int size, void *buffer) +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size) { - if (k >= 0) + if (k >= 0) { + u32 *buffer = NULL; return skb_header_pointer(skb, k, size, buffer); This is wrong, skb_header_pointer needs a pointer to a buffer to which it can copy the packet contents if they are located in the non-linear area. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] sk98lin: error handling on dual port board
Sk98lin driver error recovery on two port boards is bad. If it fails the second allocation, it will not release resources properly. Also it registers the second port in the pci driver data If second port fails, might as well go with one port. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- sk98lin.orig/drivers/net/sk98lin/skge.c +++ sk98lin/drivers/net/sk98lin/skge.c @@ -4899,15 +4899,17 @@ static int __devinit skge_probe_one(stru boards_found++; + pci_set_drvdata(pdev, dev); + /* More then one port found */ if ((pAC->GIni.GIMacsFound == 2 ) && (pAC->RlmtNets == 2)) { - if ((dev = alloc_etherdev(sizeof(DEV_NET))) == 0) { - printk(KERN_ERR "Unable to allocate etherdev " + dev = alloc_etherdev(sizeof(DEV_NET)); + if (!dev) { + printk(KERN_ERR "sk98lin: unable to allocate etherdev " "structure!\n"); - goto out; + goto single_port; } - pAC->dev[1] = dev; pNet = netdev_priv(dev); pNet->PortNr = 1; pNet->NetNr = 1; @@ -4939,20 +4941,25 @@ static int __devinit skge_probe_one(stru if (using_dac) dev->features |= NETIF_F_HIGHDMA; - if (register_netdev(dev)) { - printk(KERN_ERR "sk98lin: Could not register device for seconf port.\n"); + error = register_netdev(dev); + if (error) { + printk(KERN_ERR "sk98lin: Could not register device" + " for second port. (%d)\n", error); free_netdev(dev); - pAC->dev[1] = pAC->dev[0]; - } else { - memcpy(&dev->dev_addr, - &pAC->Addr.Net[1].CurrentMacAddress, 6); - memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len); - - printk("%s: %s\n", dev->name, DeviceStr); - printk(" PrefPort:B RlmtMode:Dual Check Link State\n"); + goto single_port; } + + pAC->dev[1] = dev; + memcpy(&dev->dev_addr, + &pAC->Addr.Net[1].CurrentMacAddress, 6); + memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len); + + printk("%s: %s\n", dev->name, DeviceStr); + printk(" PrefPort:B RlmtMode:Dual Check Link State\n"); } +single_port: + /* Save the hardware revision */ pAC->HWRevision = (((pAC->GIni.GIPciHwRev >> 4) & 0x0F)*10) + (pAC->GIni.GIPciHwRev & 0x0F); @@ -4964,7 +4971,6 @@ static int __devinit skge_probe_one(stru memset(&pAC->PnmiBackup, 0, sizeof(SK_PNMI_STRUCT_DATA)); memcpy(&pAC->PnmiBackup, &pAC->PnmiStruct, sizeof(SK_PNMI_STRUCT_DATA)); - pci_set_drvdata(pdev, dev); return 0; out_free_resources: -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] sk98lin: use kzalloc
Trivial use of kzalloc. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- sk98lin.orig/drivers/net/sk98lin/skge.c +++ sk98lin/drivers/net/sk98lin/skge.c @@ -4807,14 +4807,13 @@ static int __devinit skge_probe_one(stru } pNet = netdev_priv(dev); - pNet->pAC = kmalloc(sizeof(SK_AC), GFP_KERNEL); + pNet->pAC = kzalloc(sizeof(SK_AC), GFP_KERNEL); if (!pNet->pAC) { printk(KERN_ERR "Unable to allocate adapter " "structure!\n"); goto out_free_netdev; } - memset(pNet->pAC, 0, sizeof(SK_AC)); pAC = pNet->pAC; pAC->PciDev = pdev; -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] sk98lin: error handling of pci setup
Don't enable the pci device twice (already done in the probe routine). Propogate the error codes from pci_request_region back to initial probing. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- sk98lin.orig/drivers/net/sk98lin/skge.c +++ sk98lin/drivers/net/sk98lin/skge.c @@ -292,17 +292,12 @@ static __devinit int SkGeInitPCI(SK_AC * struct pci_dev *pdev = pAC->PciDev; int retval; - if (pci_enable_device(pdev) != 0) { - return 1; - } - dev->mem_start = pci_resource_start (pdev, 0); pci_set_master(pdev); - if (pci_request_regions(pdev, "sk98lin") != 0) { - retval = 2; - goto out_disable; - } + retval = pci_request_regions(pdev, "sk98lin"); + if (retval) + goto out; #ifdef SK_BIG_ENDIAN /* @@ -321,9 +316,8 @@ static __devinit int SkGeInitPCI(SK_AC * * Remap the regs into kernel space. */ pAC->IoBase = ioremap_nocache(dev->mem_start, 0x4000); - - if (!pAC->IoBase){ - retval = 3; + if (!pAC->IoBase) { + retval = -EIO; goto out_release; } @@ -331,8 +325,7 @@ static __devinit int SkGeInitPCI(SK_AC * out_release: pci_release_regions(pdev); - out_disable: - pci_disable_device(pdev); + out: return retval; } -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] sk98lin: routine called from probe marked __init
Sk98lin driver has a routine marked __init that is called from the probe code. If using pci hotplug, this could be called after the initialization so it needs to be marked __devinit. So if you hot added a sk98lin board, the kernel would crash. I don't have hot plug hardware to actually try this feat. Also, there are two routines, only called from SkGeBoardInit that can be marked __devinit. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- sk98lin.orig/drivers/net/sk98lin/skge.c +++ sk98lin/drivers/net/sk98lin/skge.c @@ -282,10 +282,11 @@ SK_U32 Val) /* pointer to store the rea * Description: * This function initialize the PCI resources and IO * - * Returns: N/A - * + * Returns: + * 0 - indicate everything worked ok. + * != 0 - error indication */ -int SkGeInitPCI(SK_AC *pAC) +static __devinit int SkGeInitPCI(SK_AC *pAC) { struct SK_NET_DEVICE *dev = pAC->dev[0]; struct pci_dev *pdev = pAC->PciDev; @@ -492,7 +493,7 @@ module_param_array(AutoSizing, charp, NU * 0, if everything is ok * !=0, on error */ -static int __init SkGeBoardInit(struct SK_NET_DEVICE *dev, SK_AC *pAC) +static int __devinit SkGeBoardInit(struct SK_NET_DEVICE *dev, SK_AC *pAC) { short i; unsigned long Flags; @@ -633,8 +634,7 @@ SK_BOOL DualNet; * SK_TRUE, if all memory could be allocated * SK_FALSE, if not */ -static SK_BOOL BoardAllocMem( -SK_AC *pAC) +static __devinit SK_BOOL BoardAllocMem(SK_AC *pAC) { caddr_tpDescrMem; /* pointer to descriptor memory area */ size_t AllocLength;/* length of complete descriptor area */ @@ -727,8 +727,7 @@ size_t AllocLength;/* length of comple * * Returns:N/A */ -static void BoardInitMem( -SK_AC *pAC) /* pointer to adapter context */ +static __devinit void BoardInitMem(SK_AC *pAC) { inti; /* loop counter */ intRxDescrSize;/* the size of a rx descriptor rounded up to alignment*/ -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] sk98lin:
After fixing skge/sky2 for 64 bit DMA, examination of sk98lin showed similar bugs. Once again, I don't want to get into a massive cleanup fest of the sk98lin driver, but there are some real issues here that users might see. -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] sk98lin: error handling on probe
The sk98lin driver doesn't do proper error number handling during initialization. Note: -EAGAIN is a bogus return value for hardware errors. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- sk98lin.orig/drivers/net/sk98lin/skge.c +++ sk98lin/drivers/net/sk98lin/skge.c @@ -530,7 +530,7 @@ SK_BOOL DualNet; if (SkGeInit(pAC, pAC->IoBase, SK_INIT_DATA) != 0) { printk("HWInit (0) failed.\n"); spin_unlock_irqrestore(&pAC->SlowPathLock, Flags); - return(-EAGAIN); + return -EIO; } SkI2cInit( pAC, pAC->IoBase, SK_INIT_DATA); SkEventInit(pAC, pAC->IoBase, SK_INIT_DATA); @@ -552,7 +552,7 @@ SK_BOOL DualNet; if (SkGeInit(pAC, pAC->IoBase, SK_INIT_IO) != 0) { printk("sk98lin: HWInit (1) failed.\n"); spin_unlock_irqrestore(&pAC->SlowPathLock, Flags); - return(-EAGAIN); + return -EIO; } SkI2cInit( pAC, pAC->IoBase, SK_INIT_IO); SkEventInit(pAC, pAC->IoBase, SK_INIT_IO); @@ -584,20 +584,20 @@ SK_BOOL DualNet; } else { printk(KERN_WARNING "sk98lin: Illegal number of ports: %d\n", pAC->GIni.GIMacsFound); - return -EAGAIN; + return -EIO; } if (Ret) { printk(KERN_WARNING "sk98lin: Requested IRQ %d is busy.\n", dev->irq); - return -EAGAIN; + return Ret; } pAC->AllocFlag |= SK_ALLOC_IRQ; /* Alloc memory for this board (Mem for RxD/TxD) : */ if(!BoardAllocMem(pAC)) { printk("No memory for descriptor rings.\n"); - return(-EAGAIN); + return -ENOMEM; } BoardInitMem(pAC); @@ -613,7 +613,7 @@ SK_BOOL DualNet; DualNet)) { BoardFreeMem(pAC); printk("sk98lin: SkGeInitAssignRamToQueues failed.\n"); - return(-EAGAIN); + return -EIO; } return (0); @@ -4800,8 +4800,10 @@ static int __devinit skge_probe_one(stru } } - if ((dev = alloc_etherdev(sizeof(DEV_NET))) == NULL) { - printk(KERN_ERR "Unable to allocate etherdev " + error = -ENOMEM; + dev = alloc_etherdev(sizeof(DEV_NET)); + if (!dev) { + printk(KERN_ERR "sk98lin: unable to allocate etherdev " "structure!\n"); goto out_disable_device; } @@ -4809,7 +4811,7 @@ static int __devinit skge_probe_one(stru pNet = netdev_priv(dev); pNet->pAC = kzalloc(sizeof(SK_AC), GFP_KERNEL); if (!pNet->pAC) { - printk(KERN_ERR "Unable to allocate adapter " + printk(KERN_ERR "sk98lin: unable to allocate adapter " "structure!\n"); goto out_free_netdev; } @@ -4822,6 +4824,7 @@ static int __devinit skge_probe_one(stru pAC->CheckQueue = SK_FALSE; dev->irq = pdev->irq; + error = SkGeInitPCI(pAC); if (error) { printk(KERN_ERR "sk98lin: PCI setup failed: %i\n", error); @@ -4861,17 +4864,20 @@ static int __devinit skge_probe_one(stru pAC->Index = boards_found++; - if (SkGeBoardInit(dev, pAC)) + error = SkGeBoardInit(dev, pAC); + if (error) goto out_free_netdev; /* Read Adapter name from VPD */ if (ProductStr(pAC, DeviceStr, sizeof(DeviceStr)) != 0) { + error = -EIO; printk(KERN_ERR "sk98lin: Could not read VPD data.\n"); goto out_free_resources; } /* Register net device */ - if (register_netdev(dev)) { + error = register_netdev(dev); + if (error) { printk(KERN_ERR "sk98lin: Could not register device.\n"); goto out_free_resources; } -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] sk98lin: not doing high dma properly
Sk98lin 64bit memory handling is wrong. It doesn't set the highdma flag; i.e. the kernel always does bounce buffers. It doesn't fallback to 32 bit mask if it can't get 64 bit mask. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- sk98lin.orig/drivers/net/sk98lin/skge.c +++ sk98lin/drivers/net/sk98lin/skge.c @@ -4775,16 +4775,30 @@ static int __devinit skge_probe_one(stru struct net_device *dev = NULL; static int boards_found = 0; int error = -ENODEV; + int using_dac = 0; char DeviceStr[80]; if (pci_enable_device(pdev)) goto out; /* Configure DMA attributes. */ - if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) && - pci_set_dma_mask(pdev, DMA_32BIT_MASK)) - goto out_disable_device; - + if (sizeof(dma_addr_t) > sizeof(u32) && + !(error = pci_set_dma_mask(pdev, DMA_64BIT_MASK))) { + using_dac = 1; + error = pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); + if (error < 0) { + printk(KERN_ERR "sk98lin %s unable to obtain 64 bit DMA " + "for consistent allocations\n", pci_name(pdev)); + goto out_disable_device; + } + } else { + error = pci_set_dma_mask(pdev, DMA_32BIT_MASK); + if (error) { + printk(KERN_ERR "sk98lin %s no usable DMA configuration\n", + pci_name(pdev)); + goto out_disable_device; + } + } if ((dev = alloc_etherdev(sizeof(DEV_NET))) == NULL) { printk(KERN_ERR "Unable to allocate etherdev " @@ -4843,6 +4857,9 @@ static int __devinit skge_probe_one(stru #endif } + if (using_dac) + dev->features |= NETIF_F_HIGHDMA; + pAC->Index = boards_found++; if (SkGeBoardInit(dev, pAC)) @@ -4919,6 +4936,9 @@ static int __devinit skge_probe_one(stru #endif } + if (using_dac) + dev->features |= NETIF_F_HIGHDMA; + if (register_netdev(dev)) { printk(KERN_ERR "sk98lin: Could not register device for seconf port.\n"); free_netdev(dev); -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Localizing a variable in net/core/filter.c
This localizes a variable to the function it's used in. Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]> I assume tmp was used for a reason instead of using a variable local to the if() in load_pointer(), but I can't figure out why. So I wrote this patch changing it in case it was just a mistake or something left over from something else. So in other words, can you explain to me why it was done the way it was done? If not, I think my patch takes care of it. Also, I tested it my way and everything seems to be working quite well. Thanks! --- x/net/core/filter.c 2006-01-06 16:51:51.0 -0600 +++ y/net/core/filter.c 2006-01-06 18:17:43.0 -0600 @@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu return NULL; } -static inline void *load_pointer(struct sk_buff *skb, int k, - unsigned int size, void *buffer) +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size) { - if (k >= 0) + if (k >= 0) { + u32 *buffer = NULL; return skb_header_pointer(skb, k, size, buffer); - else { + } else { if (k >= SKF_AD_OFF) return NULL; return __load_pointer(skb, k); @@ -82,7 +82,6 @@ unsigned int sk_run_filter(struct sk_buf u32 A = 0; /* Accumulator */ u32 X = 0; /* Index Register */ u32 mem[BPF_MEMWORDS]; /* Scratch Memory Store */ - u32 tmp; int k; int pc; @@ -176,7 +175,7 @@ unsigned int sk_run_filter(struct sk_buf case BPF_LD|BPF_W|BPF_ABS: k = fentry->k; load_w: - ptr = load_pointer(skb, k, 4, &tmp); + ptr = load_pointer(skb, k, 4); if (ptr != NULL) { A = ntohl(*(u32 *)ptr); continue; @@ -185,7 +184,7 @@ unsigned int sk_run_filter(struct sk_buf case BPF_LD|BPF_H|BPF_ABS: k = fentry->k; load_h: - ptr = load_pointer(skb, k, 2, &tmp); + ptr = load_pointer(skb, k, 2); if (ptr != NULL) { A = ntohs(*(u16 *)ptr); continue; @@ -194,7 +193,7 @@ unsigned int sk_run_filter(struct sk_buf case BPF_LD|BPF_B|BPF_ABS: k = fentry->k; load_b: - ptr = load_pointer(skb, k, 1, &tmp); + ptr = load_pointer(skb, k, 1); if (ptr != NULL) { A = *(u8 *)ptr; continue; @@ -216,7 +215,7 @@ load_b: k = X + fentry->k; goto load_b; case BPF_LDX|BPF_B|BPF_MSH: - ptr = load_pointer(skb, fentry->k, 1, &tmp); + ptr = load_pointer(skb, fentry->k, 1); if (ptr != NULL) { X = (*(u8 *)ptr & 0xf) << 2; continue; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
From: Andi Kleen <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 21:57:41 +0100 > Perhaps a better way would be to just exclude dst entries in RCU state > from the normal accounting and assume that if the system > really runs short of memory because of this the results would > trigger quiescent states more quickly, freeing the memory again. That's one idea... Eric, how important do you honestly think the per-hashchain spinlocks are? That's the big barrier from making rt_secret_rebuild() a simple rehash instead of flushing the whole table as it does now. The lock is only grabbed for updates, and the access to these locks is random and as such probably non-local when taken anyways. Back before we used RCU for reads, this array-of-spinlock thing made a lot more sense. I mean something like this patch: diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f701a13..f9436c7 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -204,36 +204,8 @@ __u8 ip_tos2prio[16] = { struct rt_hash_bucket { struct rtable *chain; }; -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) -/* - * Instead of using one spinlock for each rt_hash_bucket, we use a table of spinlocks - * The size of this table is a power of two and depends on the number of CPUS. - */ -#if NR_CPUS >= 32 -#define RT_HASH_LOCK_SZ4096 -#elif NR_CPUS >= 16 -#define RT_HASH_LOCK_SZ2048 -#elif NR_CPUS >= 8 -#define RT_HASH_LOCK_SZ1024 -#elif NR_CPUS >= 4 -#define RT_HASH_LOCK_SZ512 -#else -#define RT_HASH_LOCK_SZ256 -#endif -static spinlock_t *rt_hash_locks; -# define rt_hash_lock_addr(slot) &rt_hash_locks[(slot) & (RT_HASH_LOCK_SZ - 1)] -# define rt_hash_lock_init() { \ - int i; \ - rt_hash_locks = kmalloc(sizeof(spinlock_t) * RT_HASH_LOCK_SZ, GFP_KERNEL); \ - if (!rt_hash_locks) panic("IP: failed to allocate rt_hash_locks\n"); \ - for (i = 0; i < RT_HASH_LOCK_SZ; i++) \ - spin_lock_init(&rt_hash_locks[i]); \ - } -#else -# define rt_hash_lock_addr(slot) NULL -# define rt_hash_lock_init() -#endif +static DEFINE_SPINLOCK(rt_hash_lock); static struct rt_hash_bucket *rt_hash_table; static unsignedrt_hash_mask; @@ -627,7 +599,7 @@ static void rt_check_expire(unsigned lon if (*rthp == 0) continue; - spin_lock(rt_hash_lock_addr(i)); + spin_lock(&rt_hash_lock); while ((rth = *rthp) != NULL) { if (rth->u.dst.expires) { /* Entry is expired even if it is in use */ @@ -660,7 +632,7 @@ static void rt_check_expire(unsigned lon rt_free(rth); #endif /* CONFIG_IP_ROUTE_MULTIPATH_CACHED */ } - spin_unlock(rt_hash_lock_addr(i)); + spin_unlock(&rt_hash_lock); /* Fallback loop breaker. */ if (time_after(jiffies, now)) @@ -683,11 +655,11 @@ static void rt_run_flush(unsigned long d get_random_bytes(&rt_hash_rnd, 4); for (i = rt_hash_mask; i >= 0; i--) { - spin_lock_bh(rt_hash_lock_addr(i)); + spin_lock_bh(&rt_hash_lock); rth = rt_hash_table[i].chain; if (rth) rt_hash_table[i].chain = NULL; - spin_unlock_bh(rt_hash_lock_addr(i)); + spin_unlock_bh(&rt_hash_lock); for (; rth; rth = next) { next = rth->u.rt_next; @@ -820,7 +792,7 @@ static int rt_garbage_collect(void) k = (k + 1) & rt_hash_mask; rthp = &rt_hash_table[k].chain; - spin_lock_bh(rt_hash_lock_addr(k)); + spin_lock_bh(&rt_hash_lock); while ((rth = *rthp) != NULL) { if (!rt_may_expire(rth, tmo, expire)) { tmo >>= 1; @@ -852,7 +824,7 @@ static int rt_garbage_collect(void) goal--; #endif /* CONFIG_IP_ROUTE_MULTIPATH_CACHED */ } - spin_unlock_bh(rt_hash_lock_addr(k)); + spin_unlock_bh(&rt_hash_lock); if (goal <= 0) break; } @@ -922,7 +894,7 @@ restart: rthp = &rt_hash_table[hash].chain; - spin_lock_bh(rt_hash_lock_addr(hash)); + spin_lock_bh(&rt_hash_lock); while ((rth = *rthp) != NULL) { #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED if (!(rth->u.dst.flags & DST_BALANCED) && @@ -948,7 +920,7 @@ restart: rth->u.dst.__use++; dst_hold(&rth->u.dst); rth->u.dst.lastuse = now; - spin_unlock_bh(rt_hash_lock_addr(hash)); +
Re: [PATCH] Remove old comments and code in net/ethernet/eth.c
From: David S. Miller Sent: 1/6/2006 4:08:33 PM > From: "Kris Katterjohn" <[EMAIL PROTECTED]> > Date: Fri, 6 Jan 2006 16:05:36 -0800 > > > This removes an old comment and old commented-out code that's been there > > since > > at least as far back as 2.4.0. > > > > Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]> > > It's instructive to keep it there so that nobody in the > future tries to add the "optimization" without understanding > why it's wrong. Okay then. That makes sense. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Remove old comments and code in net/ethernet/eth.c
This removes an old comment and old commented-out code that's been there since at least as far back as 2.4.0. Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]> Thanks! --- x/net/ethernet/eth.c2006-01-06 12:49:27.0 -0600 +++ y/net/ethernet/eth.c2006-01-06 18:01:43.0 -0600 @@ -168,20 +168,8 @@ __be16 eth_type_trans(struct sk_buff *sk skb->pkt_type = PACKET_BROADCAST; else skb->pkt_type = PACKET_MULTICAST; - } - - /* -* This ALLMULTI check should be redundant by 1.4 -* so don't forget to remove it. -* -* Seems, you forgot to remove it. All silly devices -* seems to set IFF_PROMISC. -*/ - - else if(1 /*dev->flags&IFF_PROMISC*/) { - if (unlikely(compare_ether_addr(eth->h_dest, dev->dev_addr))) - skb->pkt_type = PACKET_OTHERHOST; - } + } else if (unlikely(compare_ether_addr(eth->h_dest, dev->dev_addr))) + skb->pkt_type = PACKET_OTHERHOST; if (ntohs(eth->h_proto) >= 1536) return eth->h_proto; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove old comments and code in net/ethernet/eth.c
From: "Kris Katterjohn" <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 16:05:36 -0800 > This removes an old comment and old commented-out code that's been there since > at least as far back as 2.4.0. > > Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]> It's instructive to keep it there so that nobody in the future tries to add the "optimization" without understanding why it's wrong. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EMAIL PROTECTED]: [PATCH] PCI Error Recovery: ixgb network device driver]
Here's the corresponding patch for the ixgb. --linas > Hi, > > The following patch to the e100 device driver is in the current > 2.6.15-mm1 tree, and is being pushed to the mainline 2.6.15 tree. > > I wrote this patch, and I believe I've cc'ed you on previous > versions, but certainly not recently. Please review, comment, > ACK or NAK as appropriate. > > Background: Newer PCI controllers can detect and respond to > serious PCI bus errors, typically by isolating the PCI slot > (cutting off i/o to the failing card). An arch-specific > framework can report these errors back to the device driver, > and coordinate the recovery of the card. Detailed documentation > for this is in the kernel tree, at Documentation/pci-error-recovery.txt > > This patch adds the detection and recovery callbacks to the > e100 driver. A version of this patch has been shipping as > a part of SUSE SLES9 for about a year, and so has been > tested in the field. > > Similar patches to follow for the e1000 and the ixgb. > > --linas > - Forwarded message from Greg KH <[EMAIL PROTECTED]> - Subject: [PATCH] PCI Error Recovery: ixgb network device driver To: [EMAIL PROTECTED] From: Greg KH <[EMAIL PROTECTED]> [PATCH] PCI Error Recovery: ixgb network device driver Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ten-gigabit ethernet ixgb device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> --- commit 3c0006afdd8ade574257c88df81c93b0bb71b544 tree 4cc697ccc74b8d67a9f08e68f71584f9d538e90e parent d78cde68ab78766c3a175466aa8adcbdc5520963 author linas <[EMAIL PROTECTED]> Fri, 18 Nov 2005 16:24:20 -0600 committer Greg Kroah-Hartman <[EMAIL PROTECTED]> Thu, 05 Jan 2006 21:54:55 -0800 drivers/net/ixgb/ixgb_main.c | 86 ++ 1 files changed, 86 insertions(+), 0 deletions(-) diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c index f9f77e4..166832c 100644 --- a/drivers/net/ixgb/ixgb_main.c +++ b/drivers/net/ixgb/ixgb_main.c @@ -132,6 +132,16 @@ static void ixgb_restore_vlan(struct ixg static void ixgb_netpoll(struct net_device *dev); #endif +static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state); +static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev); +static void ixgb_io_resume (struct pci_dev *pdev); + +static struct pci_error_handlers ixgb_err_handler = { + .error_detected = ixgb_io_error_detected, + .slot_reset = ixgb_io_slot_reset, + .resume = ixgb_io_resume, +}; + /* Exported from other modules */ extern void ixgb_check_options(struct ixgb_adapter *adapter); @@ -141,6 +151,8 @@ static struct pci_driver ixgb_driver = { .id_table = ixgb_pci_tbl, .probe= ixgb_probe, .remove = __devexit_p(ixgb_remove), + .err_handler = &ixgb_err_handler, + }; MODULE_AUTHOR("Intel Corporation, <[EMAIL PROTECTED]>"); @@ -1654,8 +1666,16 @@ ixgb_intr(int irq, void *data, struct pt unsigned int i; #endif +#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY + if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) { + if (eeh_slot_is_isolated (adapter->pdev)) + // disable_irq_nosync (adapter->pdev->irq); + return IRQ_NONE; /* Not our interrupt */ + } +#else if(unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ +#endif /* CONFIG_IXGB_EEH_RECOVERY */ if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) { mod_timer(&adapter->watchdog_timer, jiffies); @@ -2125,4 +2145,70 @@ static void ixgb_netpoll(struct net_devi } #endif +/* -- PCI Error Recovery infrastructure */ +/** ixgb_io_error_detected() is called when PCI error is detected */ +static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) + ixgb_down(adapter, TRUE); + + /* Request a slot reset. */ + return PCI_ERS_RESULT_NEED_RESET; +} + +/** ixgb_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * ixgb_resume routine. + */ +static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Perform card reset only on one instance of the card */ + i
[EMAIL PROTECTED]: [PATCH] PCI Error Recovery: e1000 network device driver]
Here's the correspondig patch fo the e1000 --linas > Hi, > > The following patch to the e100 device driver is in the current > 2.6.15-mm1 tree, and is being pushed to the mainline 2.6.15 tree. > > I wrote this patch, and I believe I've cc'ed you on previous > versions, but certainly not recently. Please review, comment, > ACK or NAK as appropriate. > > Background: Newer PCI controllers can detect and respond to > serious PCI bus errors, typically by isolating the PCI slot > (cutting off i/o to the failing card). An arch-specific > framework can report these errors back to the device driver, > and coordinate the recovery of the card. Detailed documentation > for this is in the kernel tree, at Documentation/pci-error-recovery.txt > > This patch adds the detection and recovery callbacks to the > e100 driver. A version of this patch has been shipping as > a part of SUSE SLES9 for about a year, and so has been > tested in the field. > > Similar patches to follow for the e1000 and the ixgb. > > --linas - Forwarded message from Greg KH <[EMAIL PROTECTED]> - Subject: [PATCH] PCI Error Recovery: e1000 network device driver To: [EMAIL PROTECTED] From: Greg KH <[EMAIL PROTECTED]> [PATCH] PCI Error Recovery: e1000 network device driver Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel gigabit ethernet e1000 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> --- commit 113cc803a20d72ee5e3c92302ac5a06e0c651d01 tree aae6aa3b20f14a36eba84867c2406cbe385affad parent 5a02e3abf1e74c159deca91d6af01297379eede7 author linas <[EMAIL PROTECTED]> Fri, 18 Nov 2005 16:23:54 -0600 committer Greg Kroah-Hartman <[EMAIL PROTECTED]> Thu, 05 Jan 2006 21:54:55 -0800 drivers/net/e1000/e1000_main.c | 101 1 files changed, 100 insertions(+), 1 deletions(-) diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 438a931..76352fe 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -206,6 +206,16 @@ static void e1000_netpoll (struct net_de void e1000_rx_schedule(void *data); #endif +static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state); +static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev); +static void e1000_io_resume(struct pci_dev *pdev); + +static struct pci_error_handlers e1000_err_handler = { + .error_detected = e1000_io_error_detected, + .slot_reset = e1000_io_slot_reset, + .resume = e1000_io_resume, +}; + /* Exported from other modules */ extern void e1000_check_options(struct e1000_adapter *adapter); @@ -218,8 +228,9 @@ static struct pci_driver e1000_driver = /* Power Managment Hooks */ #ifdef CONFIG_PM .suspend = e1000_suspend, - .resume = e1000_resume + .resume = e1000_resume, #endif + .err_handler = &e1000_err_handler, }; MODULE_AUTHOR("Intel Corporation, <[EMAIL PROTECTED]>"); @@ -2941,6 +2952,10 @@ e1000_update_stats(struct e1000_adapter #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF + /* Prevent stats update while adapter is being reset */ + if (adapter->link_speed == 0) + return; + spin_lock_irqsave(&adapter->stats_lock, flags); /* these counters are modified from e1000_adjust_tbi_stats, @@ -4331,4 +4346,88 @@ e1000_netpoll(struct net_device *netdev) } #endif +/* --- PCI Error Recovery infrastructure */ +/** e1000_io_error_detected() is called when PCI error is detected */ +static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (netif_running(netdev)) + e1000_down(adapter); + + /* Request a slot slot reset. */ + return PCI_ERS_RESULT_NEED_RESET; +} + +/** e1000_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * e1000_resume routine. + */ +static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (pci_enable_device(pdev)) { + printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + pci_enable_wake(pdev, 3, 0); + pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */ + + /* Perform card reset only on one instance of the card */ + if(0 != PCI_FUNC (pdev->devfn)) + return PCI_ERS_RESULT_RECOVERED; + + e1000_reset(adapter); + E1000
Re: dccp_ipv6 fails to link on some archs.
From: Dave Jones <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 17:23:07 -0500 > Missing exports/inlines ? Missing include, I'll fix it up. Thanks for the report. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EMAIL PROTECTED]: [PATCH] PCI Error Recovery: e100 network device driver]
Hi, The following patch to the e100 device driver is in the current 2.6.15-mm1 tree, and is being pushed to the mainline 2.6.15 tree. I wrote this patch, and I believe I've cc'ed you on previous versions, but certainly not recently. Please review, comment, ACK or NAK as appropriate. Background: Newer PCI controllers can detect and respond to serious PCI bus errors, typically by isolating the PCI slot (cutting off i/o to the failing card). An arch-specific framework can report these errors back to the device driver, and coordinate the recovery of the card. Detailed documentation for this is in the kernel tree, at Documentation/pci-error-recovery.txt This patch adds the detection and recovery callbacks to the e100 driver. A version of this patch has been shipping as a part of SUSE SLES9 for about a year, and so has been tested in the field. Similar patches to follow for the e1000 and the ixgb. --linas - Forwarded message from Greg KH <[EMAIL PROTECTED]> - Subject: [PATCH] PCI Error Recovery: e100 network device driver Reply-To: Greg K-H <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] From: Greg KH <[EMAIL PROTECTED]> [PATCH] PCI Error Recovery: e100 network device driver Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ethernet e100 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> --- commit 414eee4fa72175d3c0be116d6cb8b0634e4ae916 tree e1cc342377037142e0fd46f89b4cabaa3bb12adb parent 113cc803a20d72ee5e3c92302ac5a06e0c651d01 author linas <[EMAIL PROTECTED]> Fri, 18 Nov 2005 16:23:26 -0600 committer Greg Kroah-Hartman <[EMAIL PROTECTED]> Thu, 05 Jan 2006 21:54:55 -0800 drivers/net/e100.c | 70 1 files changed, 70 insertions(+), 0 deletions(-) diff --git a/drivers/net/e100.c b/drivers/net/e100.c index 22cd045..095d953 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -2704,6 +2704,75 @@ static void e100_shutdown(struct pci_dev } +/* -- PCI Error Recovery infrastructure -- */ +/** e100_io_error_detected() is called when PCI error is detected */ +static pci_ers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + + /* Same as calling e100_down(netdev_priv(netdev)), but generic */ + netdev->stop(netdev); + + /* Is a detach needed ?? */ + // netif_device_detach(netdev); + + /* Request a slot reset. */ + return PCI_ERS_RESULT_NEED_RESET; +} + +/** e100_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. */ +static pci_ers_result_t e100_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Only one device per card can do a reset */ + if (0 != PCI_FUNC (pdev->devfn)) + return PCI_ERS_RESULT_RECOVERED; + + e100_hw_reset(nic); + e100_phy_init(nic); + + if(e100_hw_init(nic)) { + DPRINTK(HW, ERR, "e100_hw_init failed\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + + return PCI_ERS_RESULT_RECOVERED; +} + +/** e100_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void e100_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + /* ack any pending wake events, disable PME */ + pci_enable_wake(pdev, 0, 0); + + netif_device_attach(netdev); + if(netif_running(netdev)) { + e100_open (netdev); + mod_timer(&nic->watchdog, jiffies); + } +} + +static struct pci_error_handlers e100_err_handler = { + .error_detected = e100_io_error_detected, + .slot_reset = e100_io_slot_reset, + .resume = e100_io_resume, +}; + + static struct pci_driver e100_driver = { .name = DRV_NAME, .id_table = e100_id_table, @@ -2714,6 +2783,7 @@ static struct pci_driver e100_driver = { .resume = e100_resume, #endif .shutdown = e100_shutdown, + .err_handler = &e100_err_handler, }; static int __init e100_init_module(void) - End forwarded message - - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of the Union: Wireless
[ Sorry, this went to linux-kernel, meant to send it to netdev. Apologies to those who see it twice. ] > So, now we asked: How would a sane UI look like. We had a few points: > * The interface needs to support some kind of "master" interface to > configure the hardware, 80211 parameters and > to actually configure and setup the > * Virtual interfaces. > Data is transferred only though the virtual interfaces, which could > be an AP interface, a STA interface in INFRA or Ad-Hoc mode, etc... . > Configuration is done though the master interface. Two things to inject, from my own little corner of userspace: 1. Monitor mode formatting. I ported over the BSD radiotap packet header system, it's in the Intel and I beleive some versions of the Devicescape stacks. Using these would be a very good thing for userspace. If for some reason it isn't used, then we (userspace tool people) need something equivalent. I like radiotap primarily because: * Dynamic per-packet stats. Drivers provide what their firmware is capable of providing per frame. The more info provided the better. * Expandable headers. New per-frame stats can be added into the RT headers without changing linktype, breaking existing apps, etc. * Format indicators. Is the 4 byte FCS tacked onto the end of the frame in rfmon? If we don't know this in userspace, we can't do 802.11 validation, wep decoding, and other important stuff. Userspace shouldn't have to know which driver is being used, this ought to be in the frame headers. Radiotap provides all of those and is already supported by tcpdump, ethereal, kismet, etc. 2. RFMon is weird/breaks interfaces The other gotcha with rfmon is it often breaks a cards ability to associate (though less often with new cards). Even if it doesn't, whatever tool put it into rfmon is likely to want to take control of the channel hopping, which will interfere with the associations of other virtual interfaces. Currently single-interface cards (ethX, whatever) thrown into rfmon just plain break, in a pretty obvious way. The linktype changes, traffic stops, and users more or less understand this is going to be the behavior. Once virtual interfaces come into play, it may cause some confusion if you can make virtual interfaces that do sta, adhoc, ap all at once without conflicting, and suddenly bringing up an rfmon interfaces causes them all to break. I don't know if the solution to this is a warning, marking non-rfmon virtual interfaces down, or just saying "they'll figure it out", but I figured it's worth considering at an early stage. -m -- Mike Kershaw/Dragorn <[EMAIL PROTECTED]> GPG Fingerprint: 3546 89DF 3C9D ED80 3381 A661 D7B2 8822 738B BDB1 "Yes, yes, LORD OF HUMANS! I will rule you ALL with an iron fist! YOU! OBEY THE FIST!" -- Invader Zim pgpJF2IWX0ckO.pgp Description: PGP signature
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
> It can be in promiscious mode (wardriving). Just to nitpick: Promisc implies delivering all data frames from the medium. rfmon is actually a different link type and delivers management frames (for which there isn't a clear equivalent in 802.3). Promisc does not imply disabling normal operation. Rfmon generally does, either due to firmware restrictions or because the app using rfmon wants to control the channel. I'd expect promisc on a wireless device to report 802.3 formatted data frames for all data on the network the card is associated to. Many cards can't do this, so cleanly reporting that inability may be a good idea. Rfmon reports link layer frames, both data and non-data, with 802.11 headers, independent of network association. Not to hassle needlessly, I just think being clear early in the planning can help eliminate problems later. Promisc and rfmon are fairly different things. -m -- Mike Kershaw/Dragorn <[EMAIL PROTECTED]> GPG Fingerprint: 3546 89DF 3C9D ED80 3381 A661 D7B2 8822 738B BDB1 "We're sorry, Susy won't be attending classes for the rest of this academic year. She caught the measles, and we had her shot." pgp6zOxSn9Mf1.pgp Description: PGP signature
Fw: [Bugme-new] [Bug 5843] New: kissattach locks up system
Begin forwarded message: Date: Fri, 6 Jan 2006 03:12:39 -0800 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bugme-new] [Bug 5843] New: kissattach locks up system http://bugzilla.kernel.org/show_bug.cgi?id=5843 Summary: kissattach locks up system Kernel Version: 2.6.15 Status: NEW Severity: high Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: 2.6.14-5 Distribution: kernel.org kernel source on openSuSE 10.0 Hardware Environment: i386, 1GHz PIII Software Environment: Ham Radio Problem Description: Issuing a kissattach command locks up system Steps to reproduce: 1. Build and install 2.6.15 kernel with ax25 and mkiss hamradio compiled in or modules. 2. Build and install libax25, ax25-tools and ax25-apps from sourceforge or distro ftp site. 3. Configure /etc/ax25/axports for simple tnc ># /etc/ax25/axports ># ># The format of this file is: ># ># name callsign speed paclen window description # >2m W1NR-9 19200 255 2 145.650 MHz (1200 bps) 4. Configure /etc/ax25/ax25d.conf ># /etc/ax25/ax25d.conf ># ># ax25d Configuration File. ># ># AX.25 Ports begin with a '['. ># >[W1NR VIA 2m] >parameters 2 1 6 900 * 15 0 >NOCALL * * * * * * L >default * * * * * * - root /spider/src/client client %s ax25 5. Bind to serial port with kissattach >worf:~ # kissattach /dev/ttyS0 2m 44.56.10.3 >AX.25 port 2m bound to device ax0 System is locked up hard at this point. This is new to 2.6.15 and is not reproducible in 2.6.14-5. --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
Michael Buesch <[EMAIL PROTECTED]> wrote: > How would the virtual interfaces look like? That is quite easy to answer. > They are net_devices, as they transfer data. > They should probaly _not_ be on top of the ethernet, as 80211 does not > have very much in common with ethernet. Basically they share the same > MAC address format. Does someone have another thing, which he thinks > is shared? It has a connection status. It has a connection speed, which is less static than on a LAN. (Maybe it can be asynchronous in the next version.) It can't yet be full duplex, but who knows ... It can be in promiscious mode (wardriving). > The virtual interface is then configured though /dev/wlan0 using write() > (no ugly ioctl anymore, you see...). Config data like TX rate, > current essid, basically everything + xyz which is done by WE today, > is written to /dev/wlan0. In ASCII parsed by an in-kernel library? Did you consider sysfs? What would a connection manager look for if it's supposed to act on * plugging in the WLAN card * finding/losing a (better) network -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Friday 06 January 2006 20:26, Lee Revell wrote: > On Fri, 2006-01-06 at 13:58 +0100, Andi Kleen wrote: > > Another CPU might be stuck in a long > > running interrupt > > Shouldn't a long running interrupt be considered a bug? In normal operation yes, but there can be always exceptional circumstances where it's unavoidable (e.g. during error handling) and in the name of defensive programming the rest of the system ought to tolerate it. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Friday 06 January 2006 21:26, Paul E. McKenney wrote: > If not, it may be worthwhile to limit the number of times that > rt_run_flush() runs per RCU grace period. Problem is that without rt_run_flush new routes and route attribute changes don't get used by the stack. If RCU takes long and routes keep changing this might be a big issue. As a admin I would be certainly annoyed if the network stack ignored my new route for some unbounded time. Perhaps a better way would be to just exclude dst entries in RCU state from the normal accounting and assume that if the system really runs short of memory because of this the results would trigger quiescent states more quickly, freeing the memory again. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
David Lang wrote: On Fri, 6 Jan 2006, Patrick McHardy wrote: I think the main advantages of netlink over a character device is its flexible format, which is easily extendable, and multicast capability, which can be used to broadcast events and configuration changes. Its also good to have all the net stuff accessible in a uniform way. character devices are far easier to script. this really sounds like the type of configuration stuff that sysfs was designed for. can we avoid yet another configuration tool that's required? I think its not just configuration but also event handling for associating, link layer authentication, ..., which is not something handled by scripts but by some daemon. It might also want to set up routes or ip addresses which is done using netlink anyway. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
dccp_ipv6 fails to link on some archs.
Our daily build-system spat this out about 2.6.15-git2 WARNING: /usr/src/build/676459-ia64/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko needs unknown symbol csum_ipv6_magic WARNING: /usr/src/build/676462-ppc64/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko needs unknown symbol csum_ipv6_magic WARNING: /usr/src/build/676467-ppc/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko needs unknown symbol csum_ipv6_magic WARNING: /usr/src/build/676467-ppc/install/lib/modules/2.6.15-1.1830_FC5smp/kernel/net/dccp/dccp_ipv6.ko needs unknown symbol csum_ipv6_magic WARNING: /usr/src/build/676465-s390/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko needs unknown symbol csum_ipv6_magic WARNING: /usr/src/build/676460-s390x/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko needs unknown symbol csum_ipv6_magic Missing exports/inlines ? Dave - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
From: David Lang <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 14:16:17 -0800 (PST) > character devices are far easier to script. this really sounds like the > type of configuration stuff that sysfs was designed for. can we avoid yet > another configuration tool that's required? netlink is being recommended exactly because it can result in only needing one tool for everything - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
On Fri, 6 Jan 2006, Patrick McHardy wrote: Marcel Holtmann wrote: I just personally liked the idea of having a device node in /dev for every existing hardware wlan card. Like we have device nodes for other real hardware, too. It felt like a bit of a "unix way" to do this to me. I don't say this is the way to go. If a netlink socket is used (which is possible, for sure), we stay with the old way of having no device node in /dev for networking devices. That is ok. But that is really only an implementation detail (and for sure a matter of taste). At the OLS last year, I think the consensus was to use netlink for all configuration task. However this was mainly driven by Harald Welte and he might be able to talk about the pros and cons of netlink versus a character device. I think the main advantages of netlink over a character device is its flexible format, which is easily extendable, and multicast capability, which can be used to broadcast events and configuration changes. Its also good to have all the net stuff accessible in a uniform way. character devices are far easier to script. this really sounds like the type of configuration stuff that sysfs was designed for. can we avoid yet another configuration tool that's required? David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
From: Alexey Dobriyan <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 23:18:37 +0300 > And fix trivial warnings that emerged. > > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate in_aton()
From: Alexey Dobriyan <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 23:19:25 +0300 > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> Also applied. Thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Corrections to LSM-IPSec Nethooks
From: Trent Jaeger <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 11:09:43 -0500 > Signed-off-by: Trent Jaeger <[EMAIL PROTECTED]> Applied, thanks Trent. I think it's a small bit of lesser known trivia that I spent one semester at Penn State, on the Erie campus :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] fix ipvs compilation
From: Joe <[EMAIL PROTECTED]> Date: Thu, 5 Jan 2006 23:43:52 -0500 > Thats not all either, ./net/ipv4/netfilter/ipt_helper.c has the same > error and the same fix. > > Here's the patch for this one. Sorry for the dupe.. i sent the last > as html by accident. Applied, please provide a "Signed-off-by:" line with your patch next time. Thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] bridge + netfilter + vlan + hw checksum = bug?
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 4 Jan 2006 16:00:41 -0800 > It looks like the bridge netfilter code does not correctly update > the hardware checksum after popping off the VLAN header. > > This is by inspection, I have *not* tested this. > To test you would need to set up a filtering bridge with vlans > and a device the does hardware receive checksum (skge, or sungem) > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Even though untested, it very much looks correct to me and therefore I'll apply this now for 2.6.16 We have a lot of time to find any problems with this change :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use newer is_multicast_ether_addr() in some files
From: "Kris Katterjohn" <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 13:01:54 -0800 > From: Patrick McHardy > Sent: 1/6/2006 12:52:34 PM > > > Randy.Dunlap wrote: > > > On Fri, 6 Jan 2006, Patrick McHardy wrote: > > > > > >>>--- x/net/atm/br2684.c 2006-01-02 21:21:10.0 -0600 > > >>>+++ y/net/atm/br2684.c 2006-01-06 12:34:47.0 -0600 > > >>>@@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc > > >>> unsigned char *rawp; > > >>> eth = eth_hdr(skb); > > >>> > > >>>-if (*eth->h_dest & 1) { > > >>>+if (is_multicast_ether_addr(eth->h_dest)) { > > >> > > >>This is not equivalent, is_multicast_ether_addr() ignores > > >>addresses starting with 0xff. > > > > > > It used to. Not today afaict. > > > > You're right, Stephen changed it two days ago. > > That's why I said the newer is_multicast_ether_addr(). Sorry for the > confusion. Applied, thanks Kris. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Change sk_run_filter()'s return type in net/core/filter.c
From: "Kris Katterjohn" <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 05:53:32 -0800 > From: Patrick McHardy > Sent: 1/6/2006 1:36:24 AM > > Please use unsigned int not just unsigned. > > Ta-da! Applied, thanks Kris. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BCM 5705 firmware not starting...
On Fri, 2006-01-06 at 15:13 -0500, Ben Collins wrote: > http://bugzilla.ubuntu.com/show_bug.cgi?id=16435 > > The above is a bug report for a user that is getting a firmware restart > timeout (waiting for mbox1 magic to invert). > > Any ideas on if this is a software or hardware issue? Anything I can ask > the user to do to help debug it? > > This is 2.6.15, btw. > It is most likely bad firmware or corrupted firmware on the card. The mismatch of the chip revision reported by tg3 and lspci output further confirms this. I will see what can be done to get the firmware upgraded. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use newer is_multicast_ether_addr() in some files
From: Patrick McHardy Sent: 1/6/2006 12:52:34 PM > Randy.Dunlap wrote: > > On Fri, 6 Jan 2006, Patrick McHardy wrote: > > > >>>--- x/net/atm/br2684.c 2006-01-02 21:21:10.0 -0600 > >>>+++ y/net/atm/br2684.c 2006-01-06 12:34:47.0 -0600 > >>>@@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc > >>> unsigned char *rawp; > >>> eth = eth_hdr(skb); > >>> > >>>- if (*eth->h_dest & 1) { > >>>+ if (is_multicast_ether_addr(eth->h_dest)) { > >> > >>This is not equivalent, is_multicast_ether_addr() ignores > >>addresses starting with 0xff. > > > > It used to. Not today afaict. > > You're right, Stephen changed it two days ago. That's why I said the newer is_multicast_ether_addr(). Sorry for the confusion. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use newer is_multicast_ether_addr() in some files
Randy.Dunlap wrote: On Fri, 6 Jan 2006, Patrick McHardy wrote: --- x/net/atm/br2684.c 2006-01-02 21:21:10.0 -0600 +++ y/net/atm/br2684.c 2006-01-06 12:34:47.0 -0600 @@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc unsigned char *rawp; eth = eth_hdr(skb); - if (*eth->h_dest & 1) { + if (is_multicast_ether_addr(eth->h_dest)) { This is not equivalent, is_multicast_ether_addr() ignores addresses starting with 0xff. It used to. Not today afaict. You're right, Stephen changed it two days ago. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use newer is_multicast_ether_addr() in some files
On Fri, 6 Jan 2006, Patrick McHardy wrote: > Kris Katterjohn wrote: > > This uses is_multicast_ether_addr() because it has recently been changed to > > do > > the same thing these seperate tests are doing. > > > --- x/net/atm/br2684.c 2006-01-02 21:21:10.0 -0600 > > +++ y/net/atm/br2684.c 2006-01-06 12:34:47.0 -0600 > > @@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc > > unsigned char *rawp; > > eth = eth_hdr(skb); > > > > - if (*eth->h_dest & 1) { > > + if (is_multicast_ether_addr(eth->h_dest)) { > > This is not equivalent, is_multicast_ether_addr() ignores > addresses starting with 0xff. It used to. Not today afaict. -- ~Randy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use newer is_multicast_ether_addr() in some files
Kris Katterjohn wrote: This uses is_multicast_ether_addr() because it has recently been changed to do the same thing these seperate tests are doing. --- x/net/atm/br2684.c 2006-01-02 21:21:10.0 -0600 +++ y/net/atm/br2684.c 2006-01-06 12:34:47.0 -0600 @@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc unsigned char *rawp; eth = eth_hdr(skb); - if (*eth->h_dest & 1) { + if (is_multicast_ether_addr(eth->h_dest)) { This is not equivalent, is_multicast_ether_addr() ignores addresses starting with 0xff. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
From: "Paul E. McKenney" <[EMAIL PROTECTED]> Date: Fri, 6 Jan 2006 12:26:26 -0800 > If not, it may be worthwhile to limit the number of times that > rt_run_flush() runs per RCU grace period. This is mixing two sets of requirements. rt_run_flush() runs periodically in order to regenerate the hash function secret key. Now, for that specific case it might actually be possible to rehash instead of flush, but the locking is a little bit tricky :-) And also, I think we're regenerating the secret key just a little bit too often, I think we'd get enough security with a less frequent regeneration. I'll look into this and your other ideas later today hopefully. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Fri, Jan 06, 2006 at 06:19:15PM +0100, Eric Dumazet wrote: > Paul E. McKenney a écrit : > >On Fri, Jan 06, 2006 at 01:37:12PM +, Alan Cox wrote: > >>On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: > >>>I assume that if a CPU queued 10.000 items in its RCU queue, then the > >>>oldest entry cannot still be in use by another CPU. This might sounds as > >>>a violation of RCU rules, (I'm not an RCU expert) but seems quite > >>>reasonable. > >>Fixing the real problem in the routing code would be the real fix. > >> > >>The underlying problem of RCU and memory usage could be solved more > >>safely by making sure that the sleeping memory allocator path always > >>waits until at least one RCU cleanup has occurred after it fails an > >>allocation before it starts trying harder. That ought to also naturally > >>throttle memory consumers more in the situation which is the right > >>behaviour. > > > >A quick look at rt_garbage_collect() leads me to believe that although > >the IP route cache does try to limit its use of memory, it does not > >fully account for memory that it has released to RCU, but that RCU has > >not yet freed due to a grace period not having elapsed. > > > >The following appears to be possible: > > > >1. rt_garbage_collect() sees that there are too many entries, > > and sets "goal" to the number to free up, based on a > > computed "equilibrium" value. > > > >2. The number of entries is (correctly) decremented only when > > the corresponding RCU callback is invoked, which actually > > frees the entry. > > > >3. Between the time that rt_garbage_collect() is invoked the > > first time and when the RCU grace period ends, rt_garbage_collect() > > is invoked again. It still sees too many entries (since > > RCU has not yet freed the ones released by the earlier > > invocation in step (1) above), so frees a bunch more. > > > >4. Packets routed now miss the route cache, because the corresponding > > entries are waiting for a grace period, slowing the system down. > > Therefore, even more entries are freed to make room for new > > entries corresponding to the new packets. > > > >If my (likely quite naive) reading of the IP route cache code is correct, > >it would be possible to end up in a steady state with most of the entries > >always being in RCU rather than in the route cache. > > > >Eric, could this be what is happening to your system? > > > >If it is, one straightforward fix would be to keep a count of the number > >of route-cache entries waiting on RCU, and for rt_garbage_collect() > >to subtract this number of entries from its goal. Does this make sense? > > > > Hi Paul > > Thanks for reviewing route code :) > > As I said, the problem comes from 'route flush cache', that is periodically > done by rt_run_flush(), triggered by rt_flush_timer. > > The 10% of LOWMEM ram that was used by route-cache entries are pushed into > rcu queues (with call_rcu_bh()) and network continue to receive > packets from *many* sources that want their route-cache entry. Hello, Eric, The rt_run_flush() function could indeed be suffering from the same problem. Dipankar's recent patch should help RCU grace periods proceed more quickly, does that help? If not, it may be worthwhile to limit the number of times that rt_run_flush() runs per RCU grace period. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
BCM 5705 firmware not starting...
http://bugzilla.ubuntu.com/show_bug.cgi?id=16435 The above is a bug report for a user that is getting a firmware restart timeout (waiting for mbox1 magic to invert). Any ideas on if this is a software or hardware issue? Anything I can ask the user to do to help debug it? This is 2.6.15, btw. -- Ben Collins <[EMAIL PROTECTED]> Developer Ubuntu Linux - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Endian-annotate in_aton()
Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- include/linux/inet.h |2 +- net/core/utils.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) --- a/include/linux/inet.h +++ b/include/linux/inet.h @@ -45,6 +45,6 @@ #ifdef __KERNEL__ #include -extern __u32 in_aton(const char *str); +extern __be32 in_aton(const char *str); #endif #endif /* _LINUX_INET_H */ --- a/net/core/utils.c +++ b/net/core/utils.c @@ -162,7 +162,7 @@ EXPORT_SYMBOL(net_srandom); * is otherwise not dependent on the TCP/IP stack. */ -__u32 in_aton(const char *str) +__be32 in_aton(const char *str) { unsigned long l; unsigned int val; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Endian-annotate struct iphdr
And fix trivial warnings that emerged. Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- include/linux/ip.h | 10 +- net/ipv4/ip_fragment.c |2 +- net/ipv4/ip_output.c |4 ++-- net/ipv4/ipvs/ip_vs_xmit.c |2 +- 4 files changed, 9 insertions(+), 9 deletions(-) --- a/include/linux/ip.h +++ b/include/linux/ip.h @@ -90,14 +90,14 @@ struct iphdr { #error "Please fix " #endif __u8tos; - __u16 tot_len; - __u16 id; - __u16 frag_off; + __be16 tot_len; + __be16 id; + __be16 frag_off; __u8ttl; __u8protocol; __u16 check; - __u32 saddr; - __u32 daddr; + __be32 saddr; + __be32 daddr; /*The options start here. */ }; --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -383,7 +383,7 @@ out_nomem: */ static inline struct ipq *ip_find(struct iphdr *iph, u32 user) { - __u16 id = iph->id; + __be16 id = iph->id; __u32 saddr = iph->saddr; __u32 daddr = iph->daddr; __u8 protocol = iph->protocol; --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -418,7 +418,7 @@ int ip_fragment(struct sk_buff *skb, int struct sk_buff *skb2; unsigned int mtu, hlen, left, len, ll_rs; int offset; - int not_last_frag; + __be16 not_last_frag; struct rtable *rt = (struct rtable*)skb->dst; int err = 0; @@ -1180,7 +1180,7 @@ int ip_push_pending_frames(struct sock * struct ip_options *opt = NULL; struct rtable *rt = inet->cork.rt; struct iphdr *iph; - int df = 0; + __be16 df = 0; __u8 ttl; int err = 0; diff --git a/net/ipv4/ipvs/ip_vs_xmit.c b/net/ipv4/ipvs/ip_vs_xmit.c index 3b87482..52c12e9 100644 --- a/net/ipv4/ipvs/ip_vs_xmit.c +++ b/net/ipv4/ipvs/ip_vs_xmit.c @@ -322,7 +322,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, s struct net_device *tdev;/* Device to other host */ struct iphdr *old_iph = skb->nh.iph; u8 tos = old_iph->tos; - u16df = old_iph->frag_off; + __be16 df = old_iph->frag_off; struct iphdr *iph; /* Our new IP header */ intmax_headroom;/* The extra header space needed */ intmtu; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] tulip: enable multiport NIC BIOS fixups for x86_64
From: Christoph Dworzak <[EMAIL PROTECTED]> A BIOS bug affecting some multiport tulip NICs requires an irq fixup in tulip_core.c. This has only been enabled for i686, but it is needed for x86_64 as well. Signed-off-by: John W. Linville <[EMAIL PROTECTED]> --- drivers/net/tulip/tulip_core.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c index 125ed00..c67c912 100644 --- a/drivers/net/tulip/tulip_core.c +++ b/drivers/net/tulip/tulip_core.c @@ -1564,7 +1564,7 @@ static int __devinit tulip_init_one (str dev->dev_addr, 6); } #endif -#if defined(__i386__) /* Patch up x86 BIOS bug. */ +#if defined(__i386__) || defined(__x86_64__) /* Patch up x86 BIOS bug. */ if (last_irq) irq = last_irq; #endif -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Fri, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: > I have some servers that once in a while crashes when the ip route > cache is flushed. After > raising /proc/sys/net/ipv4/route/secret_interval (so that *no* > flush is done), I got better uptime for these servers. Argh, where is that documented? I have been banging my head against this for weeks - how do I keep the kernel from flushing 4096 routes at once in softirq context causing huge (~8-20ms) latency problems? I tried all the route related sysctls I could find and nothing worked... Lee - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Fri, 2006-01-06 at 13:58 +0100, Andi Kleen wrote: > Another CPU might be stuck in a long > running interrupt Shouldn't a long running interrupt be considered a bug? Lee - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6 patch] remove drivers/net/tulip/xircom_tulip_cb.c
This patch removes the obsolete drivers/net/tulip/xircom_tulip_cb.c driver. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- This patch was already sent on: - 12 Dec 2005 - 18 Nov 2005 drivers/net/tulip/Kconfig | 16 drivers/net/tulip/Makefile |1 drivers/net/tulip/xircom_tulip_cb.c | 1748 3 files changed, 1 insertion(+), 1764 deletions(-) --- linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Kconfig.old 2005-11-18 03:45:53.0 +0100 +++ linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Kconfig 2005-11-18 03:46:20.0 +0100 @@ -148,7 +148,7 @@ be called uli526x. config PCMCIA_XIRCOM - tristate "Xircom CardBus support (new driver)" + tristate "Xircom CardBus support" depends on NET_TULIP && CARDBUS ---help--- This driver is for the Digital "Tulip" Ethernet CardBus adapters. @@ -160,19 +160,5 @@ . The module will be called xircom_cb. If unsure, say N. -config PCMCIA_XIRTULIP - tristate "Xircom Tulip-like CardBus support (old driver)" - depends on NET_TULIP && CARDBUS && BROKEN_ON_SMP - select CRC32 - ---help--- - This driver is for the Digital "Tulip" Ethernet CardBus adapters. - It should work with most DEC 21*4*-based chips/ethercards, as well - as with work-alike chips from Lite-On (PNIC) and Macronix (MXIC) and - ASIX. - - To compile this driver as a module, choose M here and read - . The module will - be called xircom_tulip_cb. If unsure, say N. - endmenu --- linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Makefile.old2005-11-18 03:46:32.0 +0100 +++ linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Makefile2005-11-18 03:46:41.0 +0100 @@ -2,7 +2,6 @@ # Makefile for the Linux "Tulip" family network device drivers. # -obj-$(CONFIG_PCMCIA_XIRTULIP) += xircom_tulip_cb.o obj-$(CONFIG_PCMCIA_XIRCOM)+= xircom_cb.o obj-$(CONFIG_DM9102) += dmfe.o obj-$(CONFIG_WINBOND_840) += winbond-840.o --- linux-2.6.15-rc1-mm1-full/drivers/net/tulip/xircom_tulip_cb.c 2005-10-28 02:02:08.0 +0200 +++ /dev/null 2005-11-08 19:07:57.0 +0100 @@ -1,1748 +0,0 @@ -/* xircom_tulip_cb.c: A Xircom CBE-100 ethernet driver for Linux. */ -/* - Written/copyright 1994-1999 by Donald Becker. - - This software may be used and distributed according to the terms - of the GNU General Public License, incorporated herein by reference. - - The author may be reached as [EMAIL PROTECTED], or C/O - Scyld Computing Corporation - 410 Severn Ave., Suite 210 - Annapolis MD 21403 - - --- - - Linux kernel-specific changes: - - LK1.0 (Ion Badulescu) - - Major cleanup - - Use 2.4 PCI API - - Support ethtool - - Rewrite perfect filter/hash code - - Use interrupts for media changes - - LK1.1 (Ion Badulescu) - - Disallow negotiation of unsupported full-duplex modes -*/ - -#define DRV_NAME "xircom_tulip_cb" -#define DRV_VERSION"0.91+LK1.1" -#define DRV_RELDATE"October 11, 2001" - -#define CARDBUS 1 - -/* A few user-configurable values. */ - -#define xircom_debug debug -#ifdef XIRCOM_DEBUG -static int xircom_debug = XIRCOM_DEBUG; -#else -static int xircom_debug = 1; -#endif - -/* Maximum events (Rx packets, etc.) to handle at each interrupt. */ -static int max_interrupt_work = 25; - -#define MAX_UNITS 4 -/* Used to pass the full-duplex flag, etc. */ -static int full_duplex[MAX_UNITS]; -static int options[MAX_UNITS]; -static int mtu[MAX_UNITS]; /* Jumbo MTU for interfaces. */ - -/* Keep the ring sizes a power of two for efficiency. - Making the Tx ring too large decreases the effectiveness of channel - bonding and packet priority. - There are no ill effects from too-large receive rings. */ -#define TX_RING_SIZE 16 -#define RX_RING_SIZE 32 - -/* Set the copy breakpoint for the copy-only-tiny-buffer Rx structure. */ -#ifdef __alpha__ -static int rx_copybreak = 1518; -#else -static int rx_copybreak = 100; -#endif - -/* - Set the bus performance register. - Typical: Set 16 longword cache alignment, no burst limit. - Cache alignment bits 15:14 Burst length 13:8 - No alignment 0x unlimited 0800 8 longwords - 40008 longwords0100 1 longword 1000 16 longwords - 800016 longwords0200 2 longwords2000 32 longwords - C00032 longwords 0400 4 longwords - Warning: many older 486 systems are broken and require setting 0x00A04800 - 8 longword cache alignment, 8 longword burst. - ToDo: Non-Intel setting could be better. -*/ - -#if defined(__alpha__) || defined(__ia64__) || defined(__x86_
[PATCH] Use newer is_multicast_ether_addr() in some files
This uses is_multicast_ether_addr() because it has recently been changed to do the same thing these seperate tests are doing. Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]> Thanks! --- x/net/atm/br2684.c 2006-01-02 21:21:10.0 -0600 +++ y/net/atm/br2684.c 2006-01-06 12:34:47.0 -0600 @@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc unsigned char *rawp; eth = eth_hdr(skb); - if (*eth->h_dest & 1) { + if (is_multicast_ether_addr(eth->h_dest)) { if (memcmp(eth->h_dest, dev->broadcast, ETH_ALEN) == 0) skb->pkt_type = PACKET_BROADCAST; else --- x/net/bridge/br_input.c 2006-01-02 21:21:10.0 -0600 +++ y/net/bridge/br_input.c 2006-01-06 12:31:59.0 -0600 @@ -63,7 +63,7 @@ int br_handle_frame_finish(struct sk_buf } } - if (dest[0] & 1) { + if (is_multicast_ether_addr(dest)) { br_flood_forward(br, skb, !passedup); if (!passedup) br_pass_frame_up(br, skb); --- x/net/ethernet/eth.c2006-01-05 21:28:02.0 -0600 +++ y/net/ethernet/eth.c2006-01-06 12:21:04.0 -0600 @@ -163,7 +163,7 @@ __be16 eth_type_trans(struct sk_buff *sk skb_pull(skb,ETH_HLEN); eth = eth_hdr(skb); - if (*eth->h_dest&1) { + if (is_multicast_ether_addr(eth->h_dest)) { if (!compare_ether_addr(eth->h_dest, dev->broadcast)) skb->pkt_type = PACKET_BROADCAST; else - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
On Fri, 06 Jan 2006 13:46:15 +0100 Patrick McHardy <[EMAIL PROTECTED]> wrote: > Marcel Holtmann wrote: > > >>I just personally liked the idea of having a device node in /dev for > >>every existing hardware wlan card. Like we have device nodes for > >>other real hardware, too. It felt like a bit of a "unix way" to do > >>this to me. I don't say this is the way to go. > >>If a netlink socket is used (which is possible, for sure), we stay with > >>the old way of having no device node in /dev for networking devices. > >>That is ok. But that is really only an implementation detail (and for sure > >>a matter of taste). > > > > > > At the OLS last year, I think the consensus was to use netlink for all > > configuration task. However this was mainly driven by Harald Welte and > > he might be able to talk about the pros and cons of netlink versus a > > character device. > > I think the main advantages of netlink over a character device is its > flexible format, which is easily extendable, and multicast capability, > which can be used to broadcast events and configuration changes. Its > also good to have all the net stuff accessible in a uniform way. Also netlink doesn't have the naming issues that /dev node would. -- Stephen Hemminger <[EMAIL PROTECTED]> OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
Paul E. McKenney a écrit : On Fri, Jan 06, 2006 at 01:37:12PM +, Alan Cox wrote: On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable. Fixing the real problem in the routing code would be the real fix. The underlying problem of RCU and memory usage could be solved more safely by making sure that the sleeping memory allocator path always waits until at least one RCU cleanup has occurred after it fails an allocation before it starts trying harder. That ought to also naturally throttle memory consumers more in the situation which is the right behaviour. A quick look at rt_garbage_collect() leads me to believe that although the IP route cache does try to limit its use of memory, it does not fully account for memory that it has released to RCU, but that RCU has not yet freed due to a grace period not having elapsed. The following appears to be possible: 1. rt_garbage_collect() sees that there are too many entries, and sets "goal" to the number to free up, based on a computed "equilibrium" value. 2. The number of entries is (correctly) decremented only when the corresponding RCU callback is invoked, which actually frees the entry. 3. Between the time that rt_garbage_collect() is invoked the first time and when the RCU grace period ends, rt_garbage_collect() is invoked again. It still sees too many entries (since RCU has not yet freed the ones released by the earlier invocation in step (1) above), so frees a bunch more. 4. Packets routed now miss the route cache, because the corresponding entries are waiting for a grace period, slowing the system down. Therefore, even more entries are freed to make room for new entries corresponding to the new packets. If my (likely quite naive) reading of the IP route cache code is correct, it would be possible to end up in a steady state with most of the entries always being in RCU rather than in the route cache. Eric, could this be what is happening to your system? If it is, one straightforward fix would be to keep a count of the number of route-cache entries waiting on RCU, and for rt_garbage_collect() to subtract this number of entries from its goal. Does this make sense? Hi Paul Thanks for reviewing route code :) As I said, the problem comes from 'route flush cache', that is periodically done by rt_run_flush(), triggered by rt_flush_timer. The 10% of LOWMEM ram that was used by route-cache entries are pushed into rcu queues (with call_rcu_bh()) and network continue to receive packets from *many* sources that want their route-cache entry. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
Michael Buesch wrote: How would the virtual interfaces look like? That is quite easy to answer. They are net_devices, as they transfer data. They should probaly _not_ be on top of the ethernet, as 80211 does not have very much in common with ethernet. Basically they share the same MAC address format. Does someone have another thing, which he thinks is shared? If you can make the virtual devices look like ethernet, I believe a lot of other things will just work w/out hacking, including user-space apps that think they know exactly what an ethernet frame/device looks like. The only things I think of that won't work like ethernet is the ability to change the local MAC address or go into promisc mode. And, it's always possible that future wifi hardware will support that as well. Either way, the current API handles this fine: the requests to change will just fail with a convenient error. Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] update bonding.txt to not show ip address on slaves
ifenslave, as of abi version 2, does not set the ip address on the slave interfaces. The documentation example however still shows that the ensalved interfaces should have the same IP as the master. The patch simply removes the lines from the example which should no longer appear. Signed-off-by: Eric Paris <[EMAIL PROTECTED]> bonding.txt |2 -- 1 files changed, 2 deletions(-) --- linux-2.6.14.2/Documentation/networking/bonding.txt.old 2006-01-06 11:47:31.0 -0500 +++ linux-2.6.14.2/Documentation/networking/bonding.txt 2006-01-06 11:49:18.0 -0500 @@ -944,7 +944,6 @@ bond0 Link encap:Ethernet HWaddr 00 collisions:0 txqueuelen:0 eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 - inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 @@ -952,7 +951,6 @@ eth0 Link encap:Ethernet HWaddr 00 Interrupt:10 Base address:0x1080 eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 - inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
On Fri, 2006-01-06 at 17:12 +0100, Feyd wrote: > Michael Buesch wrote: > > The _real_ main point I wanted to make was to _not_ use a net_device for > > the master device. What else should be used for master device, let it > > be a device node or a netlink socket, is rather unimportant at > > this stage. > > If the only purpose of the master device was configuration, then it > would be beter to use something other then a net_device, but you may > want to send/receive raw 802.11 packets from userspace, most logicaly > over a master interface. We thought about that for a while, but it may not be feasible. Certain hardware that manages more stuff than others in firmware/hardware may not allow sending raw frames without going into some special mode, which is better handled by adding some kind of raw virtual device. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Fri, Jan 06, 2006 at 01:37:12PM +, Alan Cox wrote: > On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: > > I assume that if a CPU queued 10.000 items in its RCU queue, then the > > oldest > > entry cannot still be in use by another CPU. This might sounds as a > > violation > > of RCU rules, (I'm not an RCU expert) but seems quite reasonable. > > Fixing the real problem in the routing code would be the real fix. > > The underlying problem of RCU and memory usage could be solved more > safely by making sure that the sleeping memory allocator path always > waits until at least one RCU cleanup has occurred after it fails an > allocation before it starts trying harder. That ought to also naturally > throttle memory consumers more in the situation which is the right > behaviour. A quick look at rt_garbage_collect() leads me to believe that although the IP route cache does try to limit its use of memory, it does not fully account for memory that it has released to RCU, but that RCU has not yet freed due to a grace period not having elapsed. The following appears to be possible: 1. rt_garbage_collect() sees that there are too many entries, and sets "goal" to the number to free up, based on a computed "equilibrium" value. 2. The number of entries is (correctly) decremented only when the corresponding RCU callback is invoked, which actually frees the entry. 3. Between the time that rt_garbage_collect() is invoked the first time and when the RCU grace period ends, rt_garbage_collect() is invoked again. It still sees too many entries (since RCU has not yet freed the ones released by the earlier invocation in step (1) above), so frees a bunch more. 4. Packets routed now miss the route cache, because the corresponding entries are waiting for a grace period, slowing the system down. Therefore, even more entries are freed to make room for new entries corresponding to the new packets. If my (likely quite naive) reading of the IP route cache code is correct, it would be possible to end up in a steady state with most of the entries always being in RCU rather than in the route cache. Eric, could this be what is happening to your system? If it is, one straightforward fix would be to keep a count of the number of route-cache entries waiting on RCU, and for rt_garbage_collect() to subtract this number of entries from its goal. Does this make sense? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie question
On 1/6/06, Alan Menegotto <[EMAIL PROTECTED]> wrote: > Hi. > > I couldn't understand the logic in the function 'static int __init > ipv4_proc_init(void)' located at net/ipv4/af_inet.c. Look at the code: > > static int __init ipv4_proc_init(void) > { > int rc = 0; > > if (raw_proc_init()) > goto out_raw; > if (tcp4_proc_init()) > goto out_tcp; > if (udp4_proc_init()) > goto out_udp; > if (fib_proc_init()) > goto out_fib; > if (ip_misc_proc_init()) > goto out_misc; > out: > return rc; > out_misc: > fib_proc_exit(); > out_fib: > udp4_proc_exit(); > out_udp: > tcp4_proc_exit(); > out_tcp: > raw_proc_exit(); > out_raw: > rc = -ENOMEM; > goto out; > } > > Calling tcp4_proc_init should go to label out_tcp, which call > raw_proc_exit(). Is this correct? If yes, why? No, calling tcp4_proc_init() will only lead to calling raw_proc_exit() if tcp4_proc_init() returns !0, i.e. if it fails. - Arnaldo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Corrections to LSM-IPSec Nethooks
On Fri, 2006-01-06 at 11:09 -0500, Trent Jaeger wrote: > Forgot signoff -- see below. > > On Jan 6, 2006, at 10:48 AM, Trent Jaeger wrote: > > > Hi, > > > > This patch contains two corrections to the LSM-IPsec Nethooks patches > > previously applied. > > > > (1) free a security context on a failed insert via xfrm_user > > interface in xfrm_add_policy. Memory leak. > > > > (2) change the authorization of the allocation of a security context > > in a xfrm_policy or xfrm_state from both relabelfrom and relabelto > > to setcontext. > > > > This is intended to be a correction to the 2.6.16 tree. > > Signed-off-by: Trent Jaeger <[EMAIL PROTECTED]> Acked-by: Stephen Smalley <[EMAIL PROTECTED]> -- Stephen Smalley National Security Agency - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
Michael Buesch wrote: The _real_ main point I wanted to make was to _not_ use a net_device for the master device. What else should be used for master device, let it be a device node or a netlink socket, is rather unimportant at this stage. If the only purpose of the master device was configuration, then it would be beter to use something other then a net_device, but you may want to send/receive raw 802.11 packets from userspace, most logicaly over a master interface. Feyd - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Corrections to LSM-IPSec Nethooks
Forgot signoff -- see below. On Jan 6, 2006, at 10:48 AM, Trent Jaeger wrote: Hi, This patch contains two corrections to the LSM-IPsec Nethooks patches previously applied. (1) free a security context on a failed insert via xfrm_user interface in xfrm_add_policy. Memory leak. (2) change the authorization of the allocation of a security context in a xfrm_policy or xfrm_state from both relabelfrom and relabelto to setcontext. This is intended to be a correction to the 2.6.16 tree. Signed-off-by: Trent Jaeger <[EMAIL PROTECTED]> Regards, Trent. - --- net/xfrm/xfrm_user.c |1 + security/selinux/include/av_perm_to_string.h |3 +-- security/selinux/include/av_permissions.h|3 +-- security/selinux/xfrm.c |8 +--- 4 files changed, 4 insertions(+), 11 deletions(-) diff -puN include/linux/security.h~lsm-relabel-nethooks include/ linux/security.h diff -puN net/key/af_key.c~lsm-relabel-nethooks net/key/af_key.c diff -puN net/xfrm/xfrm_user.c~lsm-relabel-nethooks net/xfrm/ xfrm_user.c --- linux-2.6.15-rc5/net/xfrm/xfrm_user.c~lsm-relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/net/xfrm/xfrm_user.c 2006-01-05 10:36:04.0 -0500 @@ -802,6 +802,7 @@ static int xfrm_add_policy(struct sk_buf excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { + security_xfrm_policy_free(xp); kfree(xp); return err; } diff -puN security/dummy.c~lsm-relabel-nethooks security/dummy.c diff -puN security/selinux/hooks.c~lsm-relabel-nethooks security/ selinux/hooks.c diff -puN security/selinux/include/av_perm_to_string.h~lsm-relabel- nethooks security/selinux/include/av_perm_to_string.h --- linux-2.6.15-rc5/security/selinux/include/ av_perm_to_string.h~lsm-relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/security/selinux/include/ av_perm_to_string.h 2006-01-04 22:38:14.0 -0500 @@ -238,5 +238,4 @@ S_(SECCLASS_NSCD, NSCD__SHMEMHOST, "shmemhost") S_(SECCLASS_ASSOCIATION, ASSOCIATION__SENDTO, "sendto") S_(SECCLASS_ASSOCIATION, ASSOCIATION__RECVFROM, "recvfrom") - S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELFROM, "relabelfrom") - S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELTO, "relabelto") + S_(SECCLASS_ASSOCIATION, ASSOCIATION__SETCONTEXT, "setcontext") diff -puN security/selinux/include/av_permissions.h~lsm-relabel- nethooks security/selinux/include/av_permissions.h --- linux-2.6.15-rc5/security/selinux/include/av_permissions.h~lsm- relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/security/selinux/include/av_permissions.h 2006-01-04 22:38:13.0 -0500 @@ -908,8 +908,7 @@ #define ASSOCIATION__SENDTO 0x0001UL #define ASSOCIATION__RECVFROM 0x0002UL -#define ASSOCIATION__RELABELFROM 0x0004UL -#define ASSOCIATION__RELABELTO0x0008UL +#define ASSOCIATION__SETCONTEXT 0x0004UL #define NETLINK_KOBJECT_UEVENT_SOCKET__IOCTL 0x0001UL #define NETLINK_KOBJECT_UEVENT_SOCKET__READ 0x0002UL diff -puN security/selinux/include/av_inherit.h~lsm-relabel- nethooks security/selinux/include/av_inherit.h diff -puN security/selinux/include/class_to_string.h~lsm-relabel- nethooks security/selinux/include/class_to_string.h diff -puN security/selinux/include/common_perm_to_string.h~lsm- relabel-nethooks security/selinux/include/common_perm_to_string.h diff -puN security/selinux/include/flask.h~lsm-relabel-nethooks security/selinux/include/flask.h diff -puN security/selinux/include/initial_sid_to_string.h~lsm- relabel-nethooks security/selinux/include/initial_sid_to_string.h diff -puN security/selinux/include/xfrm.h~lsm-relabel-nethooks security/selinux/include/xfrm.h diff -puN security/selinux/xfrm.c~lsm-relabel-nethooks security/ selinux/xfrm.c --- linux-2.6.15-rc5/security/selinux/xfrm.c~lsm-relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/security/selinux/xfrm.c 2006-01-04 22:35:41.0 -0500 @@ -137,15 +137,9 @@ static int selinux_xfrm_sec_ctx_alloc(st * Must be permitted to relabel from default socket type (process type) * to specified context */ - rc = avc_has_perm(tsec->sid, tsec->sid, - SECCLASS_ASSOCIATION, - ASSOCIATION__RELABELFROM, NULL); - if (rc) - goto out; - rc = avc_has_perm(tsec->sid, ctx->ctx_sid, SECCLASS_ASSOCIATION, - ASSOCIATION__RELABELTO, NULL); + ASSOCIATION__SETCONTEXT, NULL); if (rc) goto out; _ Regards, Trent. -- Trent Jaeger, A
[PATCH 1/1] Corrections to LSM-IPSec Nethooks
Hi, This patch contains two corrections to the LSM-IPsec Nethooks patches previously applied. (1) free a security context on a failed insert via xfrm_user interface in xfrm_add_policy. Memory leak. (2) change the authorization of the allocation of a security context in a xfrm_policy or xfrm_state from both relabelfrom and relabelto to setcontext. This is intended to be a correction to the 2.6.16 tree. Regards, Trent. - --- net/xfrm/xfrm_user.c |1 + security/selinux/include/av_perm_to_string.h |3 +-- security/selinux/include/av_permissions.h|3 +-- security/selinux/xfrm.c |8 +--- 4 files changed, 4 insertions(+), 11 deletions(-) diff -puN include/linux/security.h~lsm-relabel-nethooks include/linux/security.h diff -puN net/key/af_key.c~lsm-relabel-nethooks net/key/af_key.c diff -puN net/xfrm/xfrm_user.c~lsm-relabel-nethooks net/xfrm/xfrm_user.c --- linux-2.6.15-rc5/net/xfrm/xfrm_user.c~lsm-relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/net/xfrm/xfrm_user.c 2006-01-05 10:36:04.0 -0500 @@ -802,6 +802,7 @@ static int xfrm_add_policy(struct sk_buf excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { + security_xfrm_policy_free(xp); kfree(xp); return err; } diff -puN security/dummy.c~lsm-relabel-nethooks security/dummy.c diff -puN security/selinux/hooks.c~lsm-relabel-nethooks security/selinux/hooks.c diff -puN security/selinux/include/av_perm_to_string.h~lsm-relabel-nethooks security/selinux/include/av_perm_to_string.h --- linux-2.6.15-rc5/security/selinux/include/av_perm_to_string.h~lsm-relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/security/selinux/include/av_perm_to_string.h 2006-01-04 22:38:14.0 -0500 @@ -238,5 +238,4 @@ S_(SECCLASS_NSCD, NSCD__SHMEMHOST, "shmemhost") S_(SECCLASS_ASSOCIATION, ASSOCIATION__SENDTO, "sendto") S_(SECCLASS_ASSOCIATION, ASSOCIATION__RECVFROM, "recvfrom") - S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELFROM, "relabelfrom") - S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELTO, "relabelto") + S_(SECCLASS_ASSOCIATION, ASSOCIATION__SETCONTEXT, "setcontext") diff -puN security/selinux/include/av_permissions.h~lsm-relabel-nethooks security/selinux/include/av_permissions.h --- linux-2.6.15-rc5/security/selinux/include/av_permissions.h~lsm-relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/security/selinux/include/av_permissions.h 2006-01-04 22:38:13.0 -0500 @@ -908,8 +908,7 @@ #define ASSOCIATION__SENDTO 0x0001UL #define ASSOCIATION__RECVFROM 0x0002UL -#define ASSOCIATION__RELABELFROM 0x0004UL -#define ASSOCIATION__RELABELTO0x0008UL +#define ASSOCIATION__SETCONTEXT 0x0004UL #define NETLINK_KOBJECT_UEVENT_SOCKET__IOCTL 0x0001UL #define NETLINK_KOBJECT_UEVENT_SOCKET__READ 0x0002UL diff -puN security/selinux/include/av_inherit.h~lsm-relabel-nethooks security/selinux/include/av_inherit.h diff -puN security/selinux/include/class_to_string.h~lsm-relabel-nethooks security/selinux/include/class_to_string.h diff -puN security/selinux/include/common_perm_to_string.h~lsm-relabel-nethooks security/selinux/include/common_perm_to_string.h diff -puN security/selinux/include/flask.h~lsm-relabel-nethooks security/selinux/include/flask.h diff -puN security/selinux/include/initial_sid_to_string.h~lsm-relabel-nethooks security/selinux/include/initial_sid_to_string.h diff -puN security/selinux/include/xfrm.h~lsm-relabel-nethooks security/selinux/include/xfrm.h diff -puN security/selinux/xfrm.c~lsm-relabel-nethooks security/selinux/xfrm.c --- linux-2.6.15-rc5/security/selinux/xfrm.c~lsm-relabel-nethooks 2006-01-04 22:35:41.0 -0500 +++ linux-2.6.15-rc5-root/security/selinux/xfrm.c 2006-01-04 22:35:41.0 -0500 @@ -137,15 +137,9 @@ static int selinux_xfrm_sec_ctx_alloc(st * Must be permitted to relabel from default socket type (process type) * to specified context */ - rc = avc_has_perm(tsec->sid, tsec->sid, - SECCLASS_ASSOCIATION, - ASSOCIATION__RELABELFROM, NULL); - if (rc) - goto out; - rc = avc_has_perm(tsec->sid, ctx->ctx_sid, SECCLASS_ASSOCIATION, - ASSOCIATION__RELABELTO, NULL); + ASSOCIATION__SETCONTEXT, NULL); if (rc) goto out; _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Gwe, 2006-01-06 at 15:00 +0100, Eric Dumazet wrote: > In the case of call_rcu_bh(), you can be sure that the caller cannot afford > 'sleeping memory allocations'. Better drop a frame than block the stack, no ? atomic allocations can't sleep and will fail which is fine. If memory allocation pressure exists for sleeping allocations because of a large rcu backlog we want to be sure that the rcu backlog from the networking stack or other sources does not cause us to OOM kill or take incorrect action. So if for example we want to grow a process stack and the memory is there just stuck in the RCU lists pending recovery we want to let the RCU recovery happen before making drastic decisions. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie question
On Fri, 2006-01-06 at 12:38 -0200, Alan Menegotto wrote: > Look at the code: > > static int __init ipv4_proc_init(void) > { > int rc = 0; > > if (raw_proc_init()) > goto out_raw; > if (tcp4_proc_init()) > goto out_tcp; > if (udp4_proc_init()) > goto out_udp; > if (fib_proc_init()) > goto out_fib; > if (ip_misc_proc_init()) > goto out_misc; > out: > return rc; > out_misc: > fib_proc_exit(); > out_fib: > udp4_proc_exit(); > out_udp: > tcp4_proc_exit(); > out_tcp: > raw_proc_exit(); > out_raw: > rc = -ENOMEM; > goto out; > } > > Calling tcp4_proc_init should go to label out_tcp, which call > raw_proc_exit(). Is this correct? If yes, why? It's symmetric. If raw_proc_init() fails, no cleanup needs to be done so you go to out_raw. If tcp4_proc_init fails, then raw_proc_init() did *not* fail and needs to be cleaned up after by calling raw_proc_exit(). etc. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
Alan Cox a écrit : On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable. Fixing the real problem in the routing code would be the real fix. So far nobody succeeded in 'fixing the routing code', few people can even read the code from the first line to the last one... I think this code is not buggy, it only makes general RCU assumptions about delayed freeing of dst entries. In some cases, the general assumptions are just wrong. We can fix it at RCU level, and future users of call_rcu_bh() wont have to think *hard* about 'general assumptions'. Of course, we can ignore the RCU problem and mark somewhere on a sticker: ***DONT USE OR RISK CRASHES*** ***USE IT ONLY FOR FUN*** The underlying problem of RCU and memory usage could be solved more safely by making sure that the sleeping memory allocator path always waits until at least one RCU cleanup has occurred after it fails an allocation before it starts trying harder. That ought to also naturally throttle memory consumers more in the situation which is the right behaviour. In the case of call_rcu_bh(), you can be sure that the caller cannot afford 'sleeping memory allocations'. Better drop a frame than block the stack, no ? Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Change sk_run_filter()'s return type in net/core/filter.c
From: Patrick McHardy Sent: 1/6/2006 1:36:24 AM > Please use unsigned int not just unsigned. Ta-da! --- x/net/core/filter.c 2006-01-05 12:27:17.0 -0600 +++ y/net/core/filter.c 2006-01-05 17:02:32.0 -0600 @@ -75,7 +75,7 @@ static inline void *load_pointer(struct * len is the number of filter blocks in the array. */ -int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) +unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) { struct sock_filter *fentry; /* We walk down these */ void *ptr; @@ -241,9 +241,9 @@ load_b: A = X; continue; case BPF_RET|BPF_K: - return ((unsigned int)fentry->k); + return fentry->k; case BPF_RET|BPF_A: - return ((unsigned int)A); + return A; case BPF_ST: mem[fentry->k] = A; continue; --- x/include/linux/filter.h2006-01-02 21:21:10.0 -0600 +++ y/include/linux/filter.h2006-01-05 17:02:58.0 -0600 @@ -143,7 +143,7 @@ static inline unsigned int sk_filter_len struct sk_buff; struct sock; -extern int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen); +extern unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen); extern int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk); extern int sk_chk_filter(struct sock_filter *filter, int flen); #endif /* __KERNEL__ */ --- x/include/net/sock.h2006-01-05 23:06:00.0 -0600 +++ y/include/net/sock.h2006-01-05 23:06:06.0 -0600 @@ -856,8 +856,8 @@ static inline int sk_filter(struct sock filter = sk->sk_filter; if (filter) { - int pkt_len = sk_run_filter(skb, filter->insns, - filter->len); + unsigned int pkt_len = sk_run_filter(skb, filter->insns, +filter->len); if (!pkt_len) err = -EPERM; else - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: > I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest > entry cannot still be in use by another CPU. This might sounds as a violation > of RCU rules, (I'm not an RCU expert) but seems quite reasonable. Fixing the real problem in the routing code would be the real fix. The underlying problem of RCU and memory usage could be solved more safely by making sure that the sleeping memory allocator path always waits until at least one RCU cleanup has occurred after it fails an allocation before it starts trying harder. That ought to also naturally throttle memory consumers more in the situation which is the right behaviour. Alan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
Andi Kleen a écrit : On Friday 06 January 2006 11:17, Eric Dumazet wrote: I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable. I don't think it's a good assumption. Another CPU might be stuck in a long running interrupt, and still have a reference in the code running below the interrupt handler. And in general letting correctness depend on magic numbers like this is very nasty. I agree Andi, I posted a 2nd version of the patch with no more assumptions. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Friday 06 January 2006 11:17, Eric Dumazet wrote: > > I assume that if a CPU queued 10.000 items in its RCU queue, then the > oldest entry cannot still be in use by another CPU. This might sounds as a > violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable. I don't think it's a good assumption. Another CPU might be stuck in a long running interrupt, and still have a reference in the code running below the interrupt handler. And in general letting correctness depend on magic numbers like this is very nasty. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency (Version 2), HOTPLUG_CPU fix
First patch was buggy, sorry :( This 2nd version makes no more RCU assumptions, because only the 'donelist' queue is fetched for an item to be deleted. Items from the donelist are ready to be freed. This V2 also corrects a problem in case of a CPU hotplug, we forgot to update the ->count variable when transfering a queue to another one. - In order to avoid some OOM triggered by a flood of call_rcu() calls, we increased in linux 2.6.14 maxbatch from 10 to 1, and conditionally call set_need_resched() in call_rcu(). This solution doesnt solve all the problems and has drawbacks. 1) Using a big maxbatch has a bad impact on latency. 2) A flood of call_rcu_bh() still can OOM I have some servers that once in a while crashes when the ip route cache is flushed. After raising /proc/sys/net/ipv4/route/secret_interval (so that *no* flush is done), I got better uptime for these servers. But in some cases I think the network stack can floods call_rcu_bh(), and a fatal OOM occurs. I suggest in this patch : 1) To lower maxbatch to a more reasonable value (as far as the latency is concerned) 2) To be able to guard a RCU cpu queue against a maximal count (10.000 for example). If this limit is reached, free the oldest entry (if available from the donelist queue). 3) Bug correction in __rcu_offline_cpu() where we forgot to adjust ->count field when transfering a queue to another one. In my stress tests, I could not reproduce OOM anymore after applying this patch. Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> --- linux-2.6.15/kernel/rcupdate.c 2006-01-03 04:21:10.0 +0100 +++ linux-2.6.15-edum/kernel/rcupdate.c 2006-01-06 13:32:02.0 +0100 @@ -71,14 +71,14 @@ /* Fake initialization required by compiler */ static DEFINE_PER_CPU(struct tasklet_struct, rcu_tasklet) = {NULL}; -static int maxbatch = 1; +static int maxbatch = 100; #ifndef __HAVE_ARCH_CMPXCHG /* * We use an array of spinlocks for the rcurefs -- similar to ones in sparc * 32 bit atomic_t implementations, and a hash function similar to that * for our refcounting needs. - * Can't help multiprocessors which donot have cmpxchg :( + * Can't help multiprocessors which dont have cmpxchg :( */ spinlock_t __rcuref_hash[RCUREF_HASH_SIZE] = { @@ -110,9 +110,19 @@ *rdp->nxttail = head; rdp->nxttail = &head->next; - if (unlikely(++rdp->count > 1)) - set_need_resched(); - +/* + * OOM avoidance : If we queued too many items in this queue, + * free the oldest entry (from the donelist only to respect + * RCU constraints) + */ + if (unlikely(++rdp->count > 1 && (head = rdp->donelist))) { + rdp->count--; + rdp->donelist = head->next; + if (!rdp->donelist) + rdp->donetail = &rdp->donelist; + local_irq_restore(flags); + return head->func(head); + } local_irq_restore(flags); } @@ -148,12 +158,19 @@ rdp = &__get_cpu_var(rcu_bh_data); *rdp->nxttail = head; rdp->nxttail = &head->next; - rdp->count++; /* - * Should we directly call rcu_do_batch() here ? - * if (unlikely(rdp->count > 1)) - * rcu_do_batch(rdp); + * OOM avoidance : If we queued too many items in this queue, + * free the oldest entry (from the donelist only to respect + * RCU constraints) */ + if (unlikely(++rdp->count > 1 && (head = rdp->donelist))) { + rdp->count--; + rdp->donelist = head->next; + if (!rdp->donelist) + rdp->donetail = &rdp->donelist; + local_irq_restore(flags); + return head->func(head); + } local_irq_restore(flags); } @@ -208,19 +225,20 @@ */ static void rcu_do_batch(struct rcu_data *rdp) { - struct rcu_head *next, *list; - int count = 0; + struct rcu_head *next = NULL, *list; + int count = maxbatch; list = rdp->donelist; while (list) { - next = rdp->donelist = list->next; + next = list->next; list->func(list); list = next; rdp->count--; - if (++count >= maxbatch) + if (--count <= 0) break; } - if (!rdp->donelist) + rdp->donelist = next; + if (!next) rdp->donetail = &rdp->donelist; else tasklet_schedule(&per_cpu(rcu_tasklet, rdp->cpu)); @@ -344,11 +362,9 @@ static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list, struct rcu_head **tail) { - local_irq_disable(); *this_rdp->nxttail = list; if (list) this_rdp->nxttail = tail; - local_irq_enable(); } static void __rcu_offline_cpu(struct rcu_data *this_rdp, @@
Re: State of the Union: Wireless
On Fri, 2006-01-06 at 13:48 +0100, Stefan Rompf wrote: > With hardware like prism2 usb that gets "don't touch me now mode" for a while > after a join command is issued, current API requires a driver to delay > starting an association in order to wait if other config requests are issued > - an ugly hack. So that settles the 'need to change multiple settings at once' issue, saying that yes, it is indeed required. johannes signature.asc Description: This is a digitally signed message part
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
Marcel Holtmann wrote: I just personally liked the idea of having a device node in /dev for every existing hardware wlan card. Like we have device nodes for other real hardware, too. It felt like a bit of a "unix way" to do this to me. I don't say this is the way to go. If a netlink socket is used (which is possible, for sure), we stay with the old way of having no device node in /dev for networking devices. That is ok. But that is really only an implementation detail (and for sure a matter of taste). At the OLS last year, I think the consensus was to use netlink for all configuration task. However this was mainly driven by Harald Welte and he might be able to talk about the pros and cons of netlink versus a character device. I think the main advantages of netlink over a character device is its flexible format, which is easily extendable, and multicast capability, which can be used to broadcast events and configuration changes. Its also good to have all the net stuff accessible in a uniform way. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of the Union: Wireless
Am Freitag 06 Januar 2006 12:46 schrieb Dominik Brodowski: > From someone who has no idea at all (yet) about 802.11: why character > device, and not sysfs or configfs files? Like sysfs shares the main problem with wireless extensions: It configures one value per file / per ioctl. Setting up a wireless card to associate or form an IBSS network consists of multiple parameters, many requiring the card to disasscociate. With hardware like prism2 usb that gets "don't touch me now mode" for a while after a join command is issued, current API requires a driver to delay starting an association in order to wait if other config requests are issued - an ugly hack. I vote for netlink. It's a defined and tested interface and has all features needed to set multiple values in one transaction. Stefan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of the Union: Wireless
> From someone who has no idea at all (yet) about 802.11: why character > device, and not sysfs or configfs files? Like As Michael already said -- there's no real reason for that. We were just brainstorming. The /dev idea seemed like a good plan at first, but then it isn't fixed. What you suggested below does look useful too. Coming back to the point Michael already raised: the overarching idea is to get rid of the net_dev for the 'master' device, even if the underlying hardware supports only a single virtual device (which might then be created by the driver automatically) I'll move the wiki pages a bit to accomodate different models, please check in a few minutes. johannes signature.asc Description: This is a digitally signed message part
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
Hi Michael, > > > How would the virtual interfaces look like? That is quite easy to answer. > > > They are net_devices, as they transfer data. > > > They should probaly _not_ be on top of the ethernet, as 80211 does not > > > have very much in common with ethernet. Basically they share the same > > > MAC address format. Does someone have another thing, which he thinks > > > is shared? > > > How would the master interface look like? A somewhat unusual idea came > > > up. Using a device node in /dev. So every wireless card in the system > > > would have a node in /dev associated (/dev/wlan0 for example). > > > A node for the master device would be ok, because no data is transferred > > > through it. It is only a configuration interface. > > > So you would tell the, yet-to-be-written userspace tool wconfig (or > > > something > > > like that) "I need a STA in INFRA mode and want to drive it on the > > > wlan0 card". So wconfig goes and write()s some data to /dev/wlan0 > > > telling the 80211 code to setup a virtual net_device for the driver > > > associated to /dev/wlan0. > > > The virtual interface is then configured though /dev/wlan0 using write() > > > (no ugly ioctl anymore, you see...). Config data like TX rate, > > > current essid, basically everything + xyz which is done by WE today, > > > is written to /dev/wlan0. > > > This config data is entirely cached in the 80211 code for the /dev/wlan0 > > > instance. This is important, to have the data persistent throughout > > > suspend/resume cycles, if up/down cycles. > > > After configuring, a virtual net_device (let's call it wlan0) exists, > > > which can be brought up by ifconfig and data can be transferred though > > > it as usual. > > > > what is wrong with using netlink and/or sysfs for it? I don't see the > > advantage of defining another /dev something interface. > > Nothing is wrong with that. > "brainstorming" was the most dominant word in the whole text. ;) so I might got the wrong impression, because it seemed you put a lot of thinking into the /dev/wlanX stuff without even considering netlink or something else. > I just personally liked the idea of having a device node in /dev for > every existing hardware wlan card. Like we have device nodes for > other real hardware, too. It felt like a bit of a "unix way" to do > this to me. I don't say this is the way to go. > If a netlink socket is used (which is possible, for sure), we stay with > the old way of having no device node in /dev for networking devices. > That is ok. But that is really only an implementation detail (and for sure > a matter of taste). At the OLS last year, I think the consensus was to use netlink for all configuration task. However this was mainly driven by Harald Welte and he might be able to talk about the pros and cons of netlink versus a character device. > The _real_ main point I wanted to make was to _not_ use a net_device for > the master device. What else should be used for master device, let it > be a device node or a netlink socket, is rather unimportant at > this stage. I am all for it, because I don't like dummy Ethernet devices that are only used for configuration. I am still not happy that IrDA uses irda0 to get somekind of packet management etc. instead of implementing a real suitable hardware abstraction. Regards Marcel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
On Friday 06 January 2006 12:38, you wrote: > Hi Michael, > > > How would the virtual interfaces look like? That is quite easy to answer. > > They are net_devices, as they transfer data. > > They should probaly _not_ be on top of the ethernet, as 80211 does not > > have very much in common with ethernet. Basically they share the same > > MAC address format. Does someone have another thing, which he thinks > > is shared? > > How would the master interface look like? A somewhat unusual idea came > > up. Using a device node in /dev. So every wireless card in the system > > would have a node in /dev associated (/dev/wlan0 for example). > > A node for the master device would be ok, because no data is transferred > > through it. It is only a configuration interface. > > So you would tell the, yet-to-be-written userspace tool wconfig (or > > something > > like that) "I need a STA in INFRA mode and want to drive it on the > > wlan0 card". So wconfig goes and write()s some data to /dev/wlan0 > > telling the 80211 code to setup a virtual net_device for the driver > > associated to /dev/wlan0. > > The virtual interface is then configured though /dev/wlan0 using write() > > (no ugly ioctl anymore, you see...). Config data like TX rate, > > current essid, basically everything + xyz which is done by WE today, > > is written to /dev/wlan0. > > This config data is entirely cached in the 80211 code for the /dev/wlan0 > > instance. This is important, to have the data persistent throughout > > suspend/resume cycles, if up/down cycles. > > After configuring, a virtual net_device (let's call it wlan0) exists, > > which can be brought up by ifconfig and data can be transferred though > > it as usual. > > what is wrong with using netlink and/or sysfs for it? I don't see the > advantage of defining another /dev something interface. Nothing is wrong with that. "brainstorming" was the most dominant word in the whole text. ;) I just personally liked the idea of having a device node in /dev for every existing hardware wlan card. Like we have device nodes for other real hardware, too. It felt like a bit of a "unix way" to do this to me. I don't say this is the way to go. If a netlink socket is used (which is possible, for sure), we stay with the old way of having no device node in /dev for networking devices. That is ok. But that is really only an implementation detail (and for sure a matter of taste). The _real_ main point I wanted to make was to _not_ use a net_device for the master device. What else should be used for master device, let it be a device node or a netlink socket, is rather unimportant at this stage. -- Greetings Michael. pgppakQZ5rqcz.pgp Description: PGP signature
Re: State of the Union: Wireless
On Fri, Jan 06, 2006 at 12:31:24PM +0100, Johannes Berg wrote: > On Fri, 2006-01-06 at 12:00 +0100, Michael Buesch wrote: > > > * "master" interface as real device node > > * Virtual interfaces (net_devices) > > I didn't want to spam the netdev wiki with this (yet) so I collected > some more structured things outside. Anyone feel free to edit: > http://softmac.sipsolutions.net/802.11 >From someone who has no idea at all (yet) about 802.11: why character device, and not sysfs or configfs files? Like TASK: get list of MAC addresses available to hardware device (usually only one for current hw) cat /sys/devices/path/to/device/wireless/address TASK: get list of virtual devices including (some of) their properties ls -l /sys/devices/path/to/device/wireless/ ... wlan0 -> /sys/class/net/wlan0 wlan1 -> /sys/class/net/wlan1 TASK: create virtual device (with arbitrary type, netdev name and mac address) ^^ isn't nameif / udev for that? echo "$type" > /sys/devices/path/to/device/wireless/new_if ... we get uevents for this new interface; in this we can set the mac adress doing: echo "$mac" > /sys/class/net/wlan0/wireless/address TASK: configure virtual device (key is the device name since that needs to be unique anyway) echo "$some_config_option_for_virtual_device" > /sys/class/net/wlan0/wireless/some_option echo "$some_config_option_for_physical_device"> /sys/devices/path/to/dev/wireless/some_other_option Of course the configuration userspace tool would use libsysfs for that, not "echo" scripts... but they'd work too. Dominik - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
Hi Michael, > How would the virtual interfaces look like? That is quite easy to answer. > They are net_devices, as they transfer data. > They should probaly _not_ be on top of the ethernet, as 80211 does not > have very much in common with ethernet. Basically they share the same > MAC address format. Does someone have another thing, which he thinks > is shared? > How would the master interface look like? A somewhat unusual idea came > up. Using a device node in /dev. So every wireless card in the system > would have a node in /dev associated (/dev/wlan0 for example). > A node for the master device would be ok, because no data is transferred > through it. It is only a configuration interface. > So you would tell the, yet-to-be-written userspace tool wconfig (or something > like that) "I need a STA in INFRA mode and want to drive it on the > wlan0 card". So wconfig goes and write()s some data to /dev/wlan0 > telling the 80211 code to setup a virtual net_device for the driver > associated to /dev/wlan0. > The virtual interface is then configured though /dev/wlan0 using write() > (no ugly ioctl anymore, you see...). Config data like TX rate, > current essid, basically everything + xyz which is done by WE today, > is written to /dev/wlan0. > This config data is entirely cached in the 80211 code for the /dev/wlan0 > instance. This is important, to have the data persistent throughout > suspend/resume cycles, if up/down cycles. > After configuring, a virtual net_device (let's call it wlan0) exists, > which can be brought up by ifconfig and data can be transferred though > it as usual. what is wrong with using netlink and/or sysfs for it? I don't see the advantage of defining another /dev something interface. Regards Marcel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of the Union: Wireless
On Fri, 2006-01-06 at 12:00 +0100, Michael Buesch wrote: > * "master" interface as real device node > * Virtual interfaces (net_devices) I didn't want to spam the netdev wiki with this (yet) so I collected some more structured things outside. Anyone feel free to edit: http://softmac.sipsolutions.net/802.11 I'll move that content to the netdev wiki if anyone else thinks it would be a good way forward to start with requirements, API issues and similar. Until we get there, we'll fix up softmac to make it usable for most people in basic station mode without any kind of virtual devices, which will need some slight changes to the current ieee80211. johannes signature.asc Description: This is a digitally signed message part
Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]
> > * We really have no wireless maintainer. I'm just the defacto guy, > > with no interest in the job. The ideal maintainer knows 802.11 well, > > uses git, and isn't an asshole with no taste. I'm just the guy who > > wants to make sure the net driver portion doesn't turn out to be a > > stinker (read: review and pass up the chain). That problem is easiest to solve. ;) > > * Wireless management, in particular the wireless kernel<->user > > interface, needs some thinking. Wireless Extensions (WE) isn't > > cutting it, but I haven't seen any netlink work yet (or some > > other interface). Whatever the userspace interface is, it will be > > basically carved in stone for years (unlike kernel APIs), so this > > needs a lot more thought than people have been giving it. We did some brainstorming about this yesterday evening on the bcm irc channel. I think we all agreed on dropping WE. So, now we asked: How would a sane UI look like. We had a few points: * The interface needs to support some kind of "master" interface to configure the hardware, 80211 parameters and to actually configure and setup the * Virtual interfaces. Data is transferred only though the virtual interfaces, which could be an AP interface, a STA interface in INFRA or Ad-Hoc mode, etc... . Configuration is done though the master interface. How would the virtual interfaces look like? That is quite easy to answer. They are net_devices, as they transfer data. They should probaly _not_ be on top of the ethernet, as 80211 does not have very much in common with ethernet. Basically they share the same MAC address format. Does someone have another thing, which he thinks is shared? How would the master interface look like? A somewhat unusual idea came up. Using a device node in /dev. So every wireless card in the system would have a node in /dev associated (/dev/wlan0 for example). A node for the master device would be ok, because no data is transferred through it. It is only a configuration interface. So you would tell the, yet-to-be-written userspace tool wconfig (or something like that) "I need a STA in INFRA mode and want to drive it on the wlan0 card". So wconfig goes and write()s some data to /dev/wlan0 telling the 80211 code to setup a virtual net_device for the driver associated to /dev/wlan0. The virtual interface is then configured though /dev/wlan0 using write() (no ugly ioctl anymore, you see...). Config data like TX rate, current essid, basically everything + xyz which is done by WE today, is written to /dev/wlan0. This config data is entirely cached in the 80211 code for the /dev/wlan0 instance. This is important, to have the data persistent throughout suspend/resume cycles, if up/down cycles. After configuring, a virtual net_device (let's call it wlan0) exists, which can be brought up by ifconfig and data can be transferred though it as usual. This whole concept is derived from how dscape does the stuff. With a major exception, that a device node instead of a net_device is used for the master device. With the effect of getting rid of the ugly WE ioctl stuff. > > * Long term, wireless should go from being a library of common code to a > > "real" wireless stack, as shown in the template developed by David Miller: > > > > http://kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.6/davem-p80211.tar.bz2 > > Zhu Yi @ Intel and Vladmir @ somewhere both independently did some > > work in this area. This looks very interresting and in fact is part of our thoughts I explained above. > > * I prefer GPL-only code. Dual licensing has proven in practice to > > be a logistical nightmare that concentrates power in the hands of > > a few. Dual licensing, BSD licensing works for some, but GPL-only > > code is quite simply the least amount of flamewars, headaches > > and worry. IOW, the P.I.T.A. level of GPL-only code is lowest. I personally prefer EXPORT_SYMBOL_GPL(). But that's only my opinion and that does not really matter. ;) > > Dual licensed code gives kernel hackers yet more legal crapola to > > worry about, which is never a good thing. I don't see a point in dual licensing it. The only benefit would be to allow BSD people to take the code. Honestly, I really don't see this happening, anyway. ;) They have net80211. > > Patches welcome from all motivated, clueful parties. Jiri Benc has a > > long series of patches that looks nice. Johannes Berg has done some > > work on the ieee80211 softmac stuff and hw WEP. But maybe DeviceScape > > is what people like now. Well, "like" is a strong word. I personally would say "It is better than all currently existing solutions, if some final polishing is done to dscape." -- Greetings Michael. pgpUD0unGABZ1.pgp Description: PGP signature
[PATCH, RFC] RCU : OOM avoidance and lower latency
In order to avoid some OOM triggered by a flood of call_rcu() calls, we increased in linux 2.6.14 maxbatch from 10 to 1, and conditionally call set_need_resched() in call_rcu(). This solution doesnt solve all the problems and has drawbacks. 1) Using a big maxbatch has a bad impact on latency. 2) A flood of call_rcu_bh() still can OOM I have some servers that once in a while crashes when the ip route cache is flushed. After raising /proc/sys/net/ipv4/route/secret_interval (so that *no* flush is done), I got better uptime for these servers. But in some cases I think the network stack can floods call_rcu_bh(), and a fatal OOM occurs. I suggest in this patch : 1) To lower maxbatch to a more reasonable value (as far as the latency is concerned) 2) To be able to guard a RCU cpu queue against a maximal count (10.000 for example). If this limit is reached, free the oldest entry of this queue. I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable. Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> --- linux-2.6.15/kernel/rcupdate.c 2006-01-03 04:21:10.0 +0100 +++ linux-2.6.15-edum/kernel/rcupdate.c 2006-01-06 11:10:45.0 +0100 @@ -71,14 +71,14 @@ /* Fake initialization required by compiler */ static DEFINE_PER_CPU(struct tasklet_struct, rcu_tasklet) = {NULL}; -static int maxbatch = 1; +static int maxbatch = 100; #ifndef __HAVE_ARCH_CMPXCHG /* * We use an array of spinlocks for the rcurefs -- similar to ones in sparc * 32 bit atomic_t implementations, and a hash function similar to that * for our refcounting needs. - * Can't help multiprocessors which donot have cmpxchg :( + * Can't help multiprocessors which dont have cmpxchg :( */ spinlock_t __rcuref_hash[RCUREF_HASH_SIZE] = { @@ -110,9 +110,17 @@ *rdp->nxttail = head; rdp->nxttail = &head->next; - if (unlikely(++rdp->count > 1)) - set_need_resched(); - +/* + * OOM avoidance : If we queued too many items in this queue, + * free the oldest entry + */ + if (unlikely(++rdp->count > 1)) { + rdp->count--; + head = rdp->donelist; + rdp->donelist = head->next; + local_irq_restore(flags); + return head->func(head); + } local_irq_restore(flags); } @@ -148,12 +156,17 @@ rdp = &__get_cpu_var(rcu_bh_data); *rdp->nxttail = head; rdp->nxttail = &head->next; - rdp->count++; /* - * Should we directly call rcu_do_batch() here ? - * if (unlikely(rdp->count > 1)) - * rcu_do_batch(rdp); + * OOM avoidance : If we queued too many items in this queue, + * free the oldest entry */ + if (unlikely(++rdp->count > 1)) { + rdp->count--; + head = rdp->donelist; + rdp->donelist = head->next; + local_irq_restore(flags); + return head->func(head); + } local_irq_restore(flags); } @@ -209,7 +222,7 @@ static void rcu_do_batch(struct rcu_data *rdp) { struct rcu_head *next, *list; - int count = 0; + int count = maxbatch; list = rdp->donelist; while (list) { @@ -217,7 +230,7 @@ list->func(list); list = next; rdp->count--; - if (++count >= maxbatch) + if (--count <= 0) break; } if (!rdp->donelist)
Re: [PATCH] Change sk_run_filter()'s return type in net/core/filter.c
Kris Katterjohn wrote: Whoops! Here you go: Whoops again. Screwed that last patch up. I gotta stop doing this stuff when I'm tired and I need to check myself :) Sorry. Again. --- x/net/core/filter.c 2006-01-05 12:27:17.0 -0600 +++ y/net/core/filter.c 2006-01-05 17:02:32.0 -0600 @@ -75,7 +75,7 @@ static inline void *load_pointer(struct * len is the number of filter blocks in the array. */ -int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) +unsigned sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) Please use unsigned int not just unsigned. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html