Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet

David S. Miller a écrit :

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Sat, 07 Jan 2006 08:34:35 +0100


I agree, I do use a hashed spinlock array on my local tree for TCP,
mainly to reduce the hash table size by a 2 factor.


So what do you think about going to a single spinlock for the
routing cache?


I have no problem with this, since the biggest server I have is 4 way, but are 
you sure big machines wont suffer from this single spinlock ?


Also I dont understand what you want to do after this single spinlock patch.
How is it supposed to help the 'ip route flush cache' problem ?

In my case, I have about 600.000 dst-entries :

# grep ip_dst /proc/slabinfo
ip_dst_cache  616250 622440320   121 : tunables   54   278 : 
slabdata  51870  51870  0



Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread David S. Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Sat, 07 Jan 2006 08:34:35 +0100

> I agree, I do use a hashed spinlock array on my local tree for TCP,
> mainly to reduce the hash table size by a 2 factor.

So what do you think about going to a single spinlock for the
routing cache?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet

Andi Kleen a écrit :


I always disliked the per chain spinlocks even for other hash tables like
TCP/UDP multiplex - it would be much nicer to use a much smaller separately 
hashed lock table and save cache. In this case the special case of using

a one entry only lock hash table makes sense.



I agree, I do use a hashed spinlock array on my local tree for TCP, mainly to 
reduce the hash table size by a 2 factor.


Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] fix ipvs compilation

2006-01-06 Thread David S. Miller
From: Joe Kappus <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 22:30:56 -0500

> Why not then, we'll do this one as well since it needs it.
> 
> Signed-off-by: Joe Kappus <[EMAIL PROTECTED]>

Your email client corrupted the patch, I fixed it up manually
this time, but next time I won't be so nice so please get this
working.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread David S. Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Sat, 7 Jan 2006 02:09:01 +0100

> I always disliked the per chain spinlocks even for other hash tables like
> TCP/UDP multiplex - it would be much nicer to use a much smaller separately 
> hashed lock table and save cache. In this case the special case of using
> a one entry only lock hash table makes sense.

I used to think they were a great technique.  But in each case I
thought they could be applied, better schemes have come along.
In the case of the page cache we went to a per-address-space tree,
and here in the routing cache we went to RCU.

There are RCU patches around for the TCP hashes and I'd like to
put those in at some point as well.  In fact, they'd be even
more far reaching since Arnaldo abstracted away the socket
hashing stuff into an inet_hashtables subsystem.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ax25/mkiss: unbalanced spinlock_bh in ax_encaps()

2006-01-06 Thread David S. Miller
From: Francois Romieu <[EMAIL PROTECTED]>
Date: Sat, 7 Jan 2006 03:22:43 +0100

> The unlocking disappeared during commit
> 5793f4be23f0171b4999ca68a39a9157b44139f3.
> 
> Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>

Applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] fix ipvs compilation

2006-01-06 Thread Joe Kappus
On 1/6/06, David S. Miller <[EMAIL PROTECTED]> wrote:
> From: Joe <[EMAIL PROTECTED]>
> Date: Thu, 5 Jan 2006 23:43:52 -0500
>
> > Thats not all either,  ./net/ipv4/netfilter/ipt_helper.c has the same
> > error and the same fix.
> >
> > Here's the patch for this one.  Sorry for the dupe.. i sent the last
> > as html by accident.
>
> Applied, please provide a "Signed-off-by:" line with your patch
> next time.
>
> Thanks.
>

Why not then, we'll do this one as well since it needs it.

Signed-off-by: Joe Kappus <[EMAIL PROTECTED]>

--- ./net/ipv4/netfilter/ip_conntrack_proto_sctp.c.old  2006-01-06
22:27:08.885583023 -0500
+++ ./net/ipv4/netfilter/ip_conntrack_proto_sctp.c  2006-01-06
22:27:44.606582972 -0500
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ax25/mkiss: unbalanced spinlock_bh in ax_encaps()

2006-01-06 Thread Francois Romieu
The unlocking disappeared during commit
5793f4be23f0171b4999ca68a39a9157b44139f3.

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>

diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index 3e9accf..41b3d83 100644
--- a/drivers/net/hamradio/mkiss.c
+++ b/drivers/net/hamradio/mkiss.c
@@ -524,6 +524,7 @@ static void ax_encaps(struct net_device 
ax->dev->trans_start = jiffies;
ax->xleft = count - actual;
ax->xhead = ax->xbuff + actual;
+   spin_unlock_bh(&ax->buflock);
 }
 
 /* Encapsulate an AX.25 packet and kick it into a TTY queue. */
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETFILTER 00/10]: Netfilter IPsec support

2006-01-06 Thread YOSHIFUJI Hideaki / 吉藤英明
In article <[EMAIL PROTECTED]> (at Sat,  7 Jan 2006 02:09:30 +0100 (MET)), 
Patrick McHardy <[EMAIL PROTECTED]> says:

> following are the remaining patches for netfilter IPsec support.
> They are missing the common-case optimization for inner transport mode
> SAs on the input path, but since its just an optimization, I think
> it can also be done later. One note: unfortunately I had to increase

I definitely want to do it before 2.6.16.
Anyway, we'll test this series of patches.  Thank you.

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Andi Kleen
On Saturday 07 January 2006 01:17, David S. Miller wrote:

> 
> I mean something like this patch:

Looks like a good idea to me.

I always disliked the per chain spinlocks even for other hash tables like
TCP/UDP multiplex - it would be much nicer to use a much smaller separately 
hashed lock table and save cache. In this case the special case of using
a one entry only lock hash table makes sense.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Endian-annotate struct iphdr

2006-01-06 Thread Al Viro
On Fri, Jan 06, 2006 at 01:25:03PM -0800, David S. Miller wrote:
> From: Alexey Dobriyan <[EMAIL PROTECTED]>
> Date: Fri, 6 Jan 2006 23:18:37 +0300
> 
> > And fix trivial warnings that emerged.
> > 
> > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> 
> Applied.

OK, will merge...  I've actually got way past that point (morning
snapshot is on ftp.linux.org.uk/pub/people/viro/net-endian-mbox)
and I hope to finish the bulk of net/* tonight.  It still needs
reordering and merging some of the chunks, though.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Localizing a variable in net/core/filter.c

2006-01-06 Thread Kris Katterjohn
From: David S. Miller
Sent: 1/6/2006 5:29:20 PM
> From: "Kris Katterjohn" <[EMAIL PROTECTED]>
> Date: Fri, 6 Jan 2006 17:25:32 -0800
> 
> > So the whole thing is wrong? If so, I guess I understand why it was
> > done the way it was before.
> 
> It's using the local variable in the parent function as a temporary
> scratch area if the SKB isn't linear and we need to copy the packet
> data out from the scatter-gather list of the SKB.
> 
> Read the implementation of skb_header_pointer() and be confused
> no further.

Okay.. the last horse finally crossed the finish line. Thanks.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Localizing a variable in net/core/filter.c

2006-01-06 Thread David S. Miller
From: "Kris Katterjohn" <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 17:25:32 -0800

> So the whole thing is wrong? If so, I guess I understand why it was
> done the way it was before.

It's using the local variable in the parent function as a temporary
scratch area if the SKB isn't linear and we need to copy the packet
data out from the scatter-gather list of the SKB.

Read the implementation of skb_header_pointer() and be confused
no further.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Localizing a variable in net/core/filter.c

2006-01-06 Thread Kris Katterjohn
From: Patrick McHardy
Sent: 1/6/2006 5:20:44 PM
> > -static inline void *load_pointer(struct sk_buff *skb, int k,
> > - unsigned int size, void *buffer)
> > +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int 
> > size)
> >  {
> > -   if (k >= 0)
> > -   return skb_header_pointer(skb, k, size, buffer);
> > -   else {
> > +   if (k >= 0) {
> > +   u32 buffer;
> > +   return skb_header_pointer(skb, k, size, &buffer);
> 
> This is also wrong, now you returning an address from load_pointer's
> stackframe.

So the whole thing is wrong? If so, I guess I understand why it was done the 
way it
was before.

Shouldn't gcc warn about this kind of thing?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Localizing a variable in net/core/filter.c

2006-01-06 Thread Patrick McHardy

Kris Katterjohn wrote:

From: Patrick McHardy
Sent: 1/6/2006 5:12:33 PM


-static inline void *load_pointer(struct sk_buff *skb, int k,
- unsigned int size, void *buffer)
+static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size)
{
-   if (k >= 0)
+   if (k >= 0) {
+   u32 *buffer = NULL;
return skb_header_pointer(skb, k, size, buffer);


This is wrong, skb_header_pointer needs a pointer to a buffer
to which it can copy the packet contents if they are located
in the non-linear area.



Ah, gotcha.

--- x/net/core/filter.c 2006-01-06 19:14:34.0 -0600
+++ y/net/core/filter.c 2006-01-06 19:14:26.0 -0600
@@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu
return NULL;
 }
 
-static inline void *load_pointer(struct sk_buff *skb, int k,

- unsigned int size, void *buffer)
+static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size)
 {
-   if (k >= 0)
-   return skb_header_pointer(skb, k, size, buffer);
-   else {
+   if (k >= 0) {
+   u32 buffer;
+   return skb_header_pointer(skb, k, size, &buffer);


This is also wrong, now you returning an address from load_pointer's
stackframe.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Localizing a variable in net/core/filter.c

2006-01-06 Thread Kris Katterjohn
From: Patrick McHardy
Sent: 1/6/2006 5:12:33 PM
> > -static inline void *load_pointer(struct sk_buff *skb, int k,
> > - unsigned int size, void *buffer)
> > +static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int 
> > size)
> >  {
> > -   if (k >= 0)
> > +   if (k >= 0) {
> > +   u32 *buffer = NULL;
> > return skb_header_pointer(skb, k, size, buffer);
> 
> This is wrong, skb_header_pointer needs a pointer to a buffer
> to which it can copy the packet contents if they are located
> in the non-linear area.

Ah, gotcha.

--- x/net/core/filter.c 2006-01-06 19:14:34.0 -0600
+++ y/net/core/filter.c 2006-01-06 19:14:26.0 -0600
@@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu
return NULL;
 }
 
-static inline void *load_pointer(struct sk_buff *skb, int k,
- unsigned int size, void *buffer)
+static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size)
 {
-   if (k >= 0)
-   return skb_header_pointer(skb, k, size, buffer);
-   else {
+   if (k >= 0) {
+   u32 buffer;
+   return skb_header_pointer(skb, k, size, &buffer);
+   } else {
if (k >= SKF_AD_OFF)
return NULL;
return __load_pointer(skb, k);
@@ -82,7 +82,6 @@ unsigned int sk_run_filter(struct sk_buf
u32 A = 0;  /* Accumulator */
u32 X = 0;  /* Index Register */
u32 mem[BPF_MEMWORDS];  /* Scratch Memory Store */
-   u32 tmp;
int k;
int pc;
 
@@ -176,7 +175,7 @@ unsigned int sk_run_filter(struct sk_buf
case BPF_LD|BPF_W|BPF_ABS:
k = fentry->k;
  load_w:
-   ptr = load_pointer(skb, k, 4, &tmp);
+   ptr = load_pointer(skb, k, 4);
if (ptr != NULL) {
A = ntohl(*(u32 *)ptr);
continue;
@@ -185,7 +184,7 @@ unsigned int sk_run_filter(struct sk_buf
case BPF_LD|BPF_H|BPF_ABS:
k = fentry->k;
  load_h:
-   ptr = load_pointer(skb, k, 2, &tmp);
+   ptr = load_pointer(skb, k, 2);
if (ptr != NULL) {
A = ntohs(*(u16 *)ptr);
continue;
@@ -194,7 +193,7 @@ unsigned int sk_run_filter(struct sk_buf
case BPF_LD|BPF_B|BPF_ABS:
k = fentry->k;
 load_b:
-   ptr = load_pointer(skb, k, 1, &tmp);
+   ptr = load_pointer(skb, k, 1);
if (ptr != NULL) {
A = *(u8 *)ptr;
continue;
@@ -216,7 +215,7 @@ load_b:
k = X + fentry->k;
goto load_b;
case BPF_LDX|BPF_B|BPF_MSH:
-   ptr = load_pointer(skb, fentry->k, 1, &tmp);
+   ptr = load_pointer(skb, fentry->k, 1);
if (ptr != NULL) {
X = (*(u8 *)ptr & 0xf) << 2;
continue;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Localizing a variable in net/core/filter.c

2006-01-06 Thread Patrick McHardy

Kris Katterjohn wrote:

This localizes a variable to the function it's used in.

Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]>

I assume tmp was used for a reason instead of using a variable local to the if()
in load_pointer(), but I can't figure out why. So I wrote this patch changing it
in case it was just a mistake or something left over from something else.

So in other words, can you explain to me why it was done the way it was done? If
not, I think my patch takes care of it.

Also, I tested it my way and everything seems to be working quite well.

Thanks!

--- x/net/core/filter.c 2006-01-06 16:51:51.0 -0600
+++ y/net/core/filter.c 2006-01-06 18:17:43.0 -0600
@@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu
return NULL;
 }
 
-static inline void *load_pointer(struct sk_buff *skb, int k,

- unsigned int size, void *buffer)
+static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size)
 {
-   if (k >= 0)
+   if (k >= 0) {
+   u32 *buffer = NULL;
return skb_header_pointer(skb, k, size, buffer);


This is wrong, skb_header_pointer needs a pointer to a buffer
to which it can copy the packet contents if they are located
in the non-linear area.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] sk98lin: error handling on dual port board

2006-01-06 Thread Stephen Hemminger
Sk98lin driver error recovery on two port boards is bad.
If it fails the second allocation, it will not release resources
properly. Also it registers the second port in the pci driver data

If second port fails, might as well go with one port.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- sk98lin.orig/drivers/net/sk98lin/skge.c
+++ sk98lin/drivers/net/sk98lin/skge.c
@@ -4899,15 +4899,17 @@ static int __devinit skge_probe_one(stru
 
boards_found++;
 
+   pci_set_drvdata(pdev, dev);
+
/* More then one port found */
if ((pAC->GIni.GIMacsFound == 2 ) && (pAC->RlmtNets == 2)) {
-   if ((dev = alloc_etherdev(sizeof(DEV_NET))) == 0) {
-   printk(KERN_ERR "Unable to allocate etherdev "
+   dev = alloc_etherdev(sizeof(DEV_NET));
+   if (!dev) {
+   printk(KERN_ERR "sk98lin: unable to allocate etherdev "
"structure!\n");
-   goto out;
+   goto single_port;
}
 
-   pAC->dev[1]   = dev;
pNet  = netdev_priv(dev);
pNet->PortNr  = 1;
pNet->NetNr   = 1;
@@ -4939,20 +4941,25 @@ static int __devinit skge_probe_one(stru
if (using_dac)
dev->features |= NETIF_F_HIGHDMA;
 
-   if (register_netdev(dev)) {
-   printk(KERN_ERR "sk98lin: Could not register device for 
seconf port.\n");
+   error = register_netdev(dev);
+   if (error) {
+   printk(KERN_ERR "sk98lin: Could not register device"
+  " for second port. (%d)\n", error);
free_netdev(dev);
-   pAC->dev[1] = pAC->dev[0];
-   } else {
-   memcpy(&dev->dev_addr,
-   &pAC->Addr.Net[1].CurrentMacAddress, 6);
-   memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
-   
-   printk("%s: %s\n", dev->name, DeviceStr);
-   printk("  PrefPort:B  RlmtMode:Dual Check Link 
State\n");
+   goto single_port;
}
+
+   pAC->dev[1]   = dev;
+   memcpy(&dev->dev_addr,
+  &pAC->Addr.Net[1].CurrentMacAddress, 6);
+   memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
+
+   printk("%s: %s\n", dev->name, DeviceStr);
+   printk("  PrefPort:B  RlmtMode:Dual Check Link State\n");
}
 
+single_port:
+
/* Save the hardware revision */
pAC->HWRevision = (((pAC->GIni.GIPciHwRev >> 4) & 0x0F)*10) +
(pAC->GIni.GIPciHwRev & 0x0F);
@@ -4964,7 +4971,6 @@ static int __devinit skge_probe_one(stru
memset(&pAC->PnmiBackup, 0, sizeof(SK_PNMI_STRUCT_DATA));
memcpy(&pAC->PnmiBackup, &pAC->PnmiStruct, sizeof(SK_PNMI_STRUCT_DATA));
 
-   pci_set_drvdata(pdev, dev);
return 0;
 
  out_free_resources:

--
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] sk98lin: use kzalloc

2006-01-06 Thread Stephen Hemminger
Trivial use of kzalloc.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- sk98lin.orig/drivers/net/sk98lin/skge.c
+++ sk98lin/drivers/net/sk98lin/skge.c
@@ -4807,14 +4807,13 @@ static int __devinit skge_probe_one(stru
}
 
pNet = netdev_priv(dev);
-   pNet->pAC = kmalloc(sizeof(SK_AC), GFP_KERNEL);
+   pNet->pAC = kzalloc(sizeof(SK_AC), GFP_KERNEL);
if (!pNet->pAC) {
printk(KERN_ERR "Unable to allocate adapter "
   "structure!\n");
goto out_free_netdev;
}
 
-   memset(pNet->pAC, 0, sizeof(SK_AC));
pAC = pNet->pAC;
pAC->PciDev = pdev;
 

--
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] sk98lin: error handling of pci setup

2006-01-06 Thread Stephen Hemminger
Don't enable the pci device twice (already done in the probe
routine).  Propogate the error codes from pci_request_region
back to initial probing.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- sk98lin.orig/drivers/net/sk98lin/skge.c
+++ sk98lin/drivers/net/sk98lin/skge.c
@@ -292,17 +292,12 @@ static __devinit int SkGeInitPCI(SK_AC *
struct pci_dev *pdev = pAC->PciDev;
int retval;
 
-   if (pci_enable_device(pdev) != 0) {
-   return 1;
-   }
-
dev->mem_start = pci_resource_start (pdev, 0);
pci_set_master(pdev);
 
-   if (pci_request_regions(pdev, "sk98lin") != 0) {
-   retval = 2;
-   goto out_disable;
-   }
+   retval = pci_request_regions(pdev, "sk98lin");
+   if (retval)
+   goto out;
 
 #ifdef SK_BIG_ENDIAN
/*
@@ -321,9 +316,8 @@ static __devinit int SkGeInitPCI(SK_AC *
 * Remap the regs into kernel space.
 */
pAC->IoBase = ioremap_nocache(dev->mem_start, 0x4000);
-
-   if (!pAC->IoBase){
-   retval = 3;
+   if (!pAC->IoBase) {
+   retval = -EIO;
goto out_release;
}
 
@@ -331,8 +325,7 @@ static __devinit int SkGeInitPCI(SK_AC *
 
  out_release:
pci_release_regions(pdev);
- out_disable:
-   pci_disable_device(pdev);
+ out:
return retval;
 }
 

--
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] sk98lin: routine called from probe marked __init

2006-01-06 Thread Stephen Hemminger
Sk98lin driver has a routine marked __init that is called from
the probe code. If using pci hotplug, this could be called after
the initialization so it needs to be marked __devinit. 
So if you hot added a sk98lin board, the kernel would crash.
I don't have hot plug hardware to actually try this feat.

Also, there are two routines, only called from SkGeBoardInit that can
be marked __devinit.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- sk98lin.orig/drivers/net/sk98lin/skge.c
+++ sk98lin/drivers/net/sk98lin/skge.c
@@ -282,10 +282,11 @@ SK_U32 Val)   /* pointer to store the rea
  * Description:
  * This function initialize the PCI resources and IO
  *
- * Returns: N/A
- * 
+ * Returns:
+ * 0 - indicate everything worked ok.
+ * != 0 - error indication
  */
-int SkGeInitPCI(SK_AC *pAC)
+static __devinit int SkGeInitPCI(SK_AC *pAC)
 {
struct SK_NET_DEVICE *dev = pAC->dev[0];
struct pci_dev *pdev = pAC->PciDev;
@@ -492,7 +493,7 @@ module_param_array(AutoSizing, charp, NU
  * 0, if everything is ok
  * !=0, on error
  */
-static int __init SkGeBoardInit(struct SK_NET_DEVICE *dev, SK_AC *pAC)
+static int __devinit SkGeBoardInit(struct SK_NET_DEVICE *dev, SK_AC *pAC)
 {
 short  i;
 unsigned long Flags;
@@ -633,8 +634,7 @@ SK_BOOL DualNet;
  * SK_TRUE, if all memory could be allocated
  * SK_FALSE, if not
  */
-static SK_BOOL BoardAllocMem(
-SK_AC  *pAC)
+static __devinit SK_BOOL BoardAllocMem(SK_AC   *pAC)
 {
 caddr_tpDescrMem;  /* pointer to descriptor memory area */
 size_t AllocLength;/* length of complete descriptor area */
@@ -727,8 +727,7 @@ size_t  AllocLength;/* length of comple
  *
  * Returns:N/A
  */
-static void BoardInitMem(
-SK_AC  *pAC)   /* pointer to adapter context */
+static __devinit void BoardInitMem(SK_AC *pAC)
 {
 inti;  /* loop counter */
 intRxDescrSize;/* the size of a rx descriptor rounded up to alignment*/

--
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] sk98lin:

2006-01-06 Thread Stephen Hemminger
After fixing skge/sky2 for 64 bit DMA, examination of sk98lin
showed similar bugs. Once again, I don't want to get into a massive
cleanup fest of the sk98lin driver, but there are some real issues
here that users might see.

--
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] sk98lin: error handling on probe

2006-01-06 Thread Stephen Hemminger
The sk98lin driver doesn't do proper error number handling
during initialization. Note: -EAGAIN is a bogus return value for
hardware errors.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- sk98lin.orig/drivers/net/sk98lin/skge.c
+++ sk98lin/drivers/net/sk98lin/skge.c
@@ -530,7 +530,7 @@ SK_BOOL DualNet;
if (SkGeInit(pAC, pAC->IoBase, SK_INIT_DATA) != 0) {
printk("HWInit (0) failed.\n");
spin_unlock_irqrestore(&pAC->SlowPathLock, Flags);
-   return(-EAGAIN);
+   return -EIO;
}
SkI2cInit(  pAC, pAC->IoBase, SK_INIT_DATA);
SkEventInit(pAC, pAC->IoBase, SK_INIT_DATA);
@@ -552,7 +552,7 @@ SK_BOOL DualNet;
if (SkGeInit(pAC, pAC->IoBase, SK_INIT_IO) != 0) {
printk("sk98lin: HWInit (1) failed.\n");
spin_unlock_irqrestore(&pAC->SlowPathLock, Flags);
-   return(-EAGAIN);
+   return -EIO;
}
SkI2cInit(  pAC, pAC->IoBase, SK_INIT_IO);
SkEventInit(pAC, pAC->IoBase, SK_INIT_IO);
@@ -584,20 +584,20 @@ SK_BOOL   DualNet;
} else {
printk(KERN_WARNING "sk98lin: Illegal number of ports: %d\n",
   pAC->GIni.GIMacsFound);
-   return -EAGAIN;
+   return -EIO;
}
 
if (Ret) {
printk(KERN_WARNING "sk98lin: Requested IRQ %d is busy.\n",
   dev->irq);
-   return -EAGAIN;
+   return Ret;
}
pAC->AllocFlag |= SK_ALLOC_IRQ;
 
/* Alloc memory for this board (Mem for RxD/TxD) : */
if(!BoardAllocMem(pAC)) {
printk("No memory for descriptor rings.\n");
-   return(-EAGAIN);
+   return -ENOMEM;
}
 
BoardInitMem(pAC);
@@ -613,7 +613,7 @@ SK_BOOL DualNet;
DualNet)) {
BoardFreeMem(pAC);
printk("sk98lin: SkGeInitAssignRamToQueues failed.\n");
-   return(-EAGAIN);
+   return -EIO;
}
 
return (0);
@@ -4800,8 +4800,10 @@ static int __devinit skge_probe_one(stru
}
}
 
-   if ((dev = alloc_etherdev(sizeof(DEV_NET))) == NULL) {
-   printk(KERN_ERR "Unable to allocate etherdev "
+   error = -ENOMEM;
+   dev = alloc_etherdev(sizeof(DEV_NET));
+   if (!dev) {
+   printk(KERN_ERR "sk98lin: unable to allocate etherdev "
   "structure!\n");
goto out_disable_device;
}
@@ -4809,7 +4811,7 @@ static int __devinit skge_probe_one(stru
pNet = netdev_priv(dev);
pNet->pAC = kzalloc(sizeof(SK_AC), GFP_KERNEL);
if (!pNet->pAC) {
-   printk(KERN_ERR "Unable to allocate adapter "
+   printk(KERN_ERR "sk98lin: unable to allocate adapter "
   "structure!\n");
goto out_free_netdev;
}
@@ -4822,6 +4824,7 @@ static int __devinit skge_probe_one(stru
pAC->CheckQueue = SK_FALSE;
 
dev->irq = pdev->irq;
+
error = SkGeInitPCI(pAC);
if (error) {
printk(KERN_ERR "sk98lin: PCI setup failed: %i\n", error);
@@ -4861,17 +4864,20 @@ static int __devinit skge_probe_one(stru
 
pAC->Index = boards_found++;
 
-   if (SkGeBoardInit(dev, pAC))
+   error = SkGeBoardInit(dev, pAC);
+   if (error)
goto out_free_netdev;
 
/* Read Adapter name from VPD */
if (ProductStr(pAC, DeviceStr, sizeof(DeviceStr)) != 0) {
+   error = -EIO;
printk(KERN_ERR "sk98lin: Could not read VPD data.\n");
goto out_free_resources;
}
 
/* Register net device */
-   if (register_netdev(dev)) {
+   error = register_netdev(dev);
+   if (error) {
printk(KERN_ERR "sk98lin: Could not register device.\n");
goto out_free_resources;
}

--
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] sk98lin: not doing high dma properly

2006-01-06 Thread Stephen Hemminger
Sk98lin 64bit memory handling is wrong. It doesn't set the
highdma flag; i.e. the kernel always does bounce buffers. 
It doesn't fallback to 32 bit mask if it can't get 64 bit mask.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- sk98lin.orig/drivers/net/sk98lin/skge.c
+++ sk98lin/drivers/net/sk98lin/skge.c
@@ -4775,16 +4775,30 @@ static int __devinit skge_probe_one(stru
struct net_device   *dev = NULL;
static int boards_found = 0;
int error = -ENODEV;
+   int using_dac = 0;
char DeviceStr[80];
 
if (pci_enable_device(pdev))
goto out;
  
/* Configure DMA attributes. */
-   if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) &&
-   pci_set_dma_mask(pdev, DMA_32BIT_MASK))
-   goto out_disable_device;
-
+   if (sizeof(dma_addr_t) > sizeof(u32) &&
+   !(error = pci_set_dma_mask(pdev, DMA_64BIT_MASK))) {
+   using_dac = 1;
+   error = pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
+   if (error < 0) {
+   printk(KERN_ERR "sk98lin %s unable to obtain 64 bit DMA 
"
+  "for consistent allocations\n", pci_name(pdev));
+   goto out_disable_device;
+   }
+   } else {
+   error = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
+   if (error) {
+   printk(KERN_ERR "sk98lin %s no usable DMA 
configuration\n",
+  pci_name(pdev));
+   goto out_disable_device;
+   }
+   }
 
if ((dev = alloc_etherdev(sizeof(DEV_NET))) == NULL) {
printk(KERN_ERR "Unable to allocate etherdev "
@@ -4843,6 +4857,9 @@ static int __devinit skge_probe_one(stru
 #endif
}
 
+   if (using_dac)
+   dev->features |= NETIF_F_HIGHDMA;
+
pAC->Index = boards_found++;
 
if (SkGeBoardInit(dev, pAC))
@@ -4919,6 +4936,9 @@ static int __devinit skge_probe_one(stru
 #endif
}
 
+   if (using_dac)
+   dev->features |= NETIF_F_HIGHDMA;
+
if (register_netdev(dev)) {
printk(KERN_ERR "sk98lin: Could not register device for 
seconf port.\n");
free_netdev(dev);

--
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Localizing a variable in net/core/filter.c

2006-01-06 Thread Kris Katterjohn
This localizes a variable to the function it's used in.

Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]>

I assume tmp was used for a reason instead of using a variable local to the if()
in load_pointer(), but I can't figure out why. So I wrote this patch changing it
in case it was just a mistake or something left over from something else.

So in other words, can you explain to me why it was done the way it was done? If
not, I think my patch takes care of it.

Also, I tested it my way and everything seems to be working quite well.

Thanks!

--- x/net/core/filter.c 2006-01-06 16:51:51.0 -0600
+++ y/net/core/filter.c 2006-01-06 18:17:43.0 -0600
@@ -51,12 +51,12 @@ static void *__load_pointer(struct sk_bu
return NULL;
 }
 
-static inline void *load_pointer(struct sk_buff *skb, int k,
- unsigned int size, void *buffer)
+static inline void *load_pointer(struct sk_buff *skb, int k, unsigned int size)
 {
-   if (k >= 0)
+   if (k >= 0) {
+   u32 *buffer = NULL;
return skb_header_pointer(skb, k, size, buffer);
-   else {
+   } else {
if (k >= SKF_AD_OFF)
return NULL;
return __load_pointer(skb, k);
@@ -82,7 +82,6 @@ unsigned int sk_run_filter(struct sk_buf
u32 A = 0;  /* Accumulator */
u32 X = 0;  /* Index Register */
u32 mem[BPF_MEMWORDS];  /* Scratch Memory Store */
-   u32 tmp;
int k;
int pc;
 
@@ -176,7 +175,7 @@ unsigned int sk_run_filter(struct sk_buf
case BPF_LD|BPF_W|BPF_ABS:
k = fentry->k;
  load_w:
-   ptr = load_pointer(skb, k, 4, &tmp);
+   ptr = load_pointer(skb, k, 4);
if (ptr != NULL) {
A = ntohl(*(u32 *)ptr);
continue;
@@ -185,7 +184,7 @@ unsigned int sk_run_filter(struct sk_buf
case BPF_LD|BPF_H|BPF_ABS:
k = fentry->k;
  load_h:
-   ptr = load_pointer(skb, k, 2, &tmp);
+   ptr = load_pointer(skb, k, 2);
if (ptr != NULL) {
A = ntohs(*(u16 *)ptr);
continue;
@@ -194,7 +193,7 @@ unsigned int sk_run_filter(struct sk_buf
case BPF_LD|BPF_B|BPF_ABS:
k = fentry->k;
 load_b:
-   ptr = load_pointer(skb, k, 1, &tmp);
+   ptr = load_pointer(skb, k, 1);
if (ptr != NULL) {
A = *(u8 *)ptr;
continue;
@@ -216,7 +215,7 @@ load_b:
k = X + fentry->k;
goto load_b;
case BPF_LDX|BPF_B|BPF_MSH:
-   ptr = load_pointer(skb, fentry->k, 1, &tmp);
+   ptr = load_pointer(skb, fentry->k, 1);
if (ptr != NULL) {
X = (*(u8 *)ptr & 0xf) << 2;
continue;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread David S. Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 21:57:41 +0100

> Perhaps a better way would be to just exclude dst entries in RCU state
> from the normal accounting and assume that if the system
> really runs short of memory because of this the results would
> trigger quiescent states more quickly, freeing the memory again.

That's one idea...

Eric, how important do you honestly think the per-hashchain spinlocks
are?  That's the big barrier from making rt_secret_rebuild() a simple
rehash instead of flushing the whole table as it does now.

The lock is only grabbed for updates, and the access to these locks is
random and as such probably non-local when taken anyways.  Back before
we used RCU for reads, this array-of-spinlock thing made a lot more
sense.

I mean something like this patch:

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f701a13..f9436c7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -204,36 +204,8 @@ __u8 ip_tos2prio[16] = {
 struct rt_hash_bucket {
struct rtable   *chain;
 };
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-/*
- * Instead of using one spinlock for each rt_hash_bucket, we use a table of 
spinlocks
- * The size of this table is a power of two and depends on the number of CPUS.
- */
-#if NR_CPUS >= 32
-#define RT_HASH_LOCK_SZ4096
-#elif NR_CPUS >= 16
-#define RT_HASH_LOCK_SZ2048
-#elif NR_CPUS >= 8
-#define RT_HASH_LOCK_SZ1024
-#elif NR_CPUS >= 4
-#define RT_HASH_LOCK_SZ512
-#else
-#define RT_HASH_LOCK_SZ256
-#endif
 
-static spinlock_t  *rt_hash_locks;
-# define rt_hash_lock_addr(slot) &rt_hash_locks[(slot) & (RT_HASH_LOCK_SZ - 1)]
-# define rt_hash_lock_init()   { \
-   int i; \
-   rt_hash_locks = kmalloc(sizeof(spinlock_t) * RT_HASH_LOCK_SZ, 
GFP_KERNEL); \
-   if (!rt_hash_locks) panic("IP: failed to allocate 
rt_hash_locks\n"); \
-   for (i = 0; i < RT_HASH_LOCK_SZ; i++) \
-   spin_lock_init(&rt_hash_locks[i]); \
-   }
-#else
-# define rt_hash_lock_addr(slot) NULL
-# define rt_hash_lock_init()
-#endif
+static DEFINE_SPINLOCK(rt_hash_lock);
 
 static struct rt_hash_bucket   *rt_hash_table;
 static unsignedrt_hash_mask;
@@ -627,7 +599,7 @@ static void rt_check_expire(unsigned lon
 
if (*rthp == 0)
continue;
-   spin_lock(rt_hash_lock_addr(i));
+   spin_lock(&rt_hash_lock);
while ((rth = *rthp) != NULL) {
if (rth->u.dst.expires) {
/* Entry is expired even if it is in use */
@@ -660,7 +632,7 @@ static void rt_check_expire(unsigned lon
rt_free(rth);
 #endif /* CONFIG_IP_ROUTE_MULTIPATH_CACHED */
}
-   spin_unlock(rt_hash_lock_addr(i));
+   spin_unlock(&rt_hash_lock);
 
/* Fallback loop breaker. */
if (time_after(jiffies, now))
@@ -683,11 +655,11 @@ static void rt_run_flush(unsigned long d
get_random_bytes(&rt_hash_rnd, 4);
 
for (i = rt_hash_mask; i >= 0; i--) {
-   spin_lock_bh(rt_hash_lock_addr(i));
+   spin_lock_bh(&rt_hash_lock);
rth = rt_hash_table[i].chain;
if (rth)
rt_hash_table[i].chain = NULL;
-   spin_unlock_bh(rt_hash_lock_addr(i));
+   spin_unlock_bh(&rt_hash_lock);
 
for (; rth; rth = next) {
next = rth->u.rt_next;
@@ -820,7 +792,7 @@ static int rt_garbage_collect(void)
 
k = (k + 1) & rt_hash_mask;
rthp = &rt_hash_table[k].chain;
-   spin_lock_bh(rt_hash_lock_addr(k));
+   spin_lock_bh(&rt_hash_lock);
while ((rth = *rthp) != NULL) {
if (!rt_may_expire(rth, tmo, expire)) {
tmo >>= 1;
@@ -852,7 +824,7 @@ static int rt_garbage_collect(void)
goal--;
 #endif /* CONFIG_IP_ROUTE_MULTIPATH_CACHED */
}
-   spin_unlock_bh(rt_hash_lock_addr(k));
+   spin_unlock_bh(&rt_hash_lock);
if (goal <= 0)
break;
}
@@ -922,7 +894,7 @@ restart:
 
rthp = &rt_hash_table[hash].chain;
 
-   spin_lock_bh(rt_hash_lock_addr(hash));
+   spin_lock_bh(&rt_hash_lock);
while ((rth = *rthp) != NULL) {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
if (!(rth->u.dst.flags & DST_BALANCED) &&
@@ -948,7 +920,7 @@ restart:
rth->u.dst.__use++;
dst_hold(&rth->u.dst);
rth->u.dst.lastuse = now;
-   spin_unlock_bh(rt_hash_lock_addr(hash));
+   

Re: [PATCH] Remove old comments and code in net/ethernet/eth.c

2006-01-06 Thread Kris Katterjohn
From: David S. Miller
Sent: 1/6/2006 4:08:33 PM
> From: "Kris Katterjohn" <[EMAIL PROTECTED]>
> Date: Fri, 6 Jan 2006 16:05:36 -0800
> 
> > This removes an old comment and old commented-out code that's been there 
> > since
> > at least as far back as 2.4.0.
> > 
> > Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]>
> 
> It's instructive to keep it there so that nobody in the
> future tries to add the "optimization" without understanding
> why it's wrong.

Okay then. That makes sense.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Remove old comments and code in net/ethernet/eth.c

2006-01-06 Thread Kris Katterjohn
This removes an old comment and old commented-out code that's been there since
at least as far back as 2.4.0.

Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]>

Thanks!

--- x/net/ethernet/eth.c2006-01-06 12:49:27.0 -0600
+++ y/net/ethernet/eth.c2006-01-06 18:01:43.0 -0600
@@ -168,20 +168,8 @@ __be16 eth_type_trans(struct sk_buff *sk
skb->pkt_type = PACKET_BROADCAST;
else
skb->pkt_type = PACKET_MULTICAST;
-   }
-   
-   /*
-*  This ALLMULTI check should be redundant by 1.4
-*  so don't forget to remove it.
-*
-*  Seems, you forgot to remove it. All silly devices
-*  seems to set IFF_PROMISC.
-*/
-
-   else if(1 /*dev->flags&IFF_PROMISC*/) {
-   if (unlikely(compare_ether_addr(eth->h_dest, dev->dev_addr)))
-   skb->pkt_type = PACKET_OTHERHOST;
-   }
+   } else if (unlikely(compare_ether_addr(eth->h_dest, dev->dev_addr)))
+   skb->pkt_type = PACKET_OTHERHOST;

if (ntohs(eth->h_proto) >= 1536)
return eth->h_proto;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove old comments and code in net/ethernet/eth.c

2006-01-06 Thread David S. Miller
From: "Kris Katterjohn" <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 16:05:36 -0800

> This removes an old comment and old commented-out code that's been there since
> at least as far back as 2.4.0.
> 
> Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]>

It's instructive to keep it there so that nobody in the
future tries to add the "optimization" without understanding
why it's wrong.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EMAIL PROTECTED]: [PATCH] PCI Error Recovery: ixgb network device driver]

2006-01-06 Thread linas

Here's the corresponding patch for the ixgb.

--linas

> Hi,
> 
> The following patch to the e100 device driver is in the current
> 2.6.15-mm1 tree, and is being pushed to the mainline 2.6.15 tree.
> 
> I wrote this patch, and I believe I've cc'ed you on previous
> versions, but certainly not recently. Please review, comment,
> ACK or NAK as appropriate.
> 
> Background: Newer PCI controllers can detect and respond to
> serious PCI bus errors, typically by isolating the PCI slot
> (cutting off i/o to the failing card). An arch-specific
> framework can report these errors back to the device driver,
> and coordinate the recovery of the card. Detailed documentation
> for this is in the kernel tree, at Documentation/pci-error-recovery.txt
> 
>  This patch adds the detection and recovery callbacks to the
> e100 driver. A version of this patch has been shipping as
> a part of SUSE SLES9 for about a year, and so has been
> tested in the field.
> 
> Similar patches to follow for the e1000 and the ixgb.
> 
> --linas
> 

- Forwarded message from Greg KH <[EMAIL PROTECTED]> -

Subject: [PATCH] PCI Error Recovery: ixgb network device driver
To: [EMAIL PROTECTED]
From: Greg KH <[EMAIL PROTECTED]>

[PATCH] PCI Error Recovery: ixgb network device driver

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ten-gigabit
ethernet ixgb device driver. The patch has been tested, and appears
to work well.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
commit 3c0006afdd8ade574257c88df81c93b0bb71b544
tree 4cc697ccc74b8d67a9f08e68f71584f9d538e90e
parent d78cde68ab78766c3a175466aa8adcbdc5520963
author linas <[EMAIL PROTECTED]> Fri, 18 Nov 2005 16:24:20 -0600
committer Greg Kroah-Hartman <[EMAIL PROTECTED]> Thu, 05 Jan 2006 21:54:55 -0800

 drivers/net/ixgb/ixgb_main.c |   86 ++
 1 files changed, 86 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
index f9f77e4..166832c 100644
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -132,6 +132,16 @@ static void ixgb_restore_vlan(struct ixg
 static void ixgb_netpoll(struct net_device *dev);
 #endif
 
+static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, 
pci_channel_state_t state);
+static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev);
+static void ixgb_io_resume (struct pci_dev *pdev);
+
+static struct pci_error_handlers ixgb_err_handler = {
+   .error_detected = ixgb_io_error_detected,
+   .slot_reset = ixgb_io_slot_reset,
+   .resume = ixgb_io_resume,
+};
+
 /* Exported from other modules */
 
 extern void ixgb_check_options(struct ixgb_adapter *adapter);
@@ -141,6 +151,8 @@ static struct pci_driver ixgb_driver = {
.id_table = ixgb_pci_tbl,
.probe= ixgb_probe,
.remove   = __devexit_p(ixgb_remove),
+   .err_handler = &ixgb_err_handler,
+
 };
 
 MODULE_AUTHOR("Intel Corporation, <[EMAIL PROTECTED]>");
@@ -1654,8 +1666,16 @@ ixgb_intr(int irq, void *data, struct pt
unsigned int i;
 #endif
 
+#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY
+   if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) {
+   if (eeh_slot_is_isolated (adapter->pdev))
+   // disable_irq_nosync (adapter->pdev->irq);
+   return IRQ_NONE;  /* Not our interrupt */
+   }
+#else
if(unlikely(!icr))
return IRQ_NONE;  /* Not our interrupt */
+#endif /* CONFIG_IXGB_EEH_RECOVERY */
 
if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) {
mod_timer(&adapter->watchdog_timer, jiffies);
@@ -2125,4 +2145,70 @@ static void ixgb_netpoll(struct net_devi
 }
 #endif
 
+/* -- PCI Error Recovery infrastructure  */
+/** ixgb_io_error_detected() is called when PCI error is detected */
+static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, 
pci_channel_state_t state)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct ixgb_adapter *adapter = netdev->priv;
+
+   if(netif_running(netdev))
+   ixgb_down(adapter, TRUE);
+
+   /* Request a slot reset. */
+   return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/** ixgb_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch.
+ *  Implementation resembles the first-half of the
+ *  ixgb_resume routine.
+ */
+static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct ixgb_adapter *adapter = netdev->priv;
+
+   if(pci_enable_device(pdev)) {
+   printk(KERN_ERR "ixgb: Cannot re-enable PCI device after 
reset.\n");
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+   pci_set_master(pdev);
+
+   /* Perform card reset only on one instance of the card */
+   i

[EMAIL PROTECTED]: [PATCH] PCI Error Recovery: e1000 network device driver]

2006-01-06 Thread linas

Here's the correspondig patch fo the e1000
--linas

> Hi,
> 
> The following patch to the e100 device driver is in the current
> 2.6.15-mm1 tree, and is being pushed to the mainline 2.6.15 tree.
> 
> I wrote this patch, and I believe I've cc'ed you on previous
> versions, but certainly not recently. Please review, comment,
> ACK or NAK as appropriate.
> 
> Background: Newer PCI controllers can detect and respond to
> serious PCI bus errors, typically by isolating the PCI slot
> (cutting off i/o to the failing card). An arch-specific
> framework can report these errors back to the device driver,
> and coordinate the recovery of the card. Detailed documentation
> for this is in the kernel tree, at Documentation/pci-error-recovery.txt
> 
> This patch adds the detection and recovery callbacks to the
> e100 driver. A version of this patch has been shipping as
> a part of SUSE SLES9 for about a year, and so has been
> tested in the field.
> 
> Similar patches to follow for the e1000 and the ixgb.
> 
> --linas

- Forwarded message from Greg KH <[EMAIL PROTECTED]> -

Subject: [PATCH] PCI Error Recovery: e1000 network device driver
To: [EMAIL PROTECTED]
From: Greg KH <[EMAIL PROTECTED]>

[PATCH] PCI Error Recovery: e1000 network device driver

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel gigabit
ethernet e1000 device driver. The patch has been tested, and appears
to work well.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
commit 113cc803a20d72ee5e3c92302ac5a06e0c651d01
tree aae6aa3b20f14a36eba84867c2406cbe385affad
parent 5a02e3abf1e74c159deca91d6af01297379eede7
author linas <[EMAIL PROTECTED]> Fri, 18 Nov 2005 16:23:54 -0600
committer Greg Kroah-Hartman <[EMAIL PROTECTED]> Thu, 05 Jan 2006 21:54:55 -0800

 drivers/net/e1000/e1000_main.c |  101 
 1 files changed, 100 insertions(+), 1 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 438a931..76352fe 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -206,6 +206,16 @@ static void e1000_netpoll (struct net_de
 void e1000_rx_schedule(void *data);
 #endif
 
+static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, 
pci_channel_state_t state);
+static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev);
+static void e1000_io_resume(struct pci_dev *pdev);
+
+static struct pci_error_handlers e1000_err_handler = {
+   .error_detected = e1000_io_error_detected,
+   .slot_reset = e1000_io_slot_reset,
+   .resume = e1000_io_resume,
+};
+
 /* Exported from other modules */
 
 extern void e1000_check_options(struct e1000_adapter *adapter);
@@ -218,8 +228,9 @@ static struct pci_driver e1000_driver = 
/* Power Managment Hooks */
 #ifdef CONFIG_PM
.suspend  = e1000_suspend,
-   .resume   = e1000_resume
+   .resume   = e1000_resume,
 #endif
+   .err_handler = &e1000_err_handler,
 };
 
 MODULE_AUTHOR("Intel Corporation, <[EMAIL PROTECTED]>");
@@ -2941,6 +2952,10 @@ e1000_update_stats(struct e1000_adapter 
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
+   /* Prevent stats update while adapter is being reset */
+   if (adapter->link_speed == 0)
+   return;
+
spin_lock_irqsave(&adapter->stats_lock, flags);
 
/* these counters are modified from e1000_adjust_tbi_stats,
@@ -4331,4 +4346,88 @@ e1000_netpoll(struct net_device *netdev)
 }
 #endif
 
+/* --- PCI Error Recovery infrastructure  */
+/** e1000_io_error_detected() is called when PCI error is detected */
+static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, 
pci_channel_state_t state)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct e1000_adapter *adapter = netdev->priv;
+
+   if (netif_running(netdev))
+   e1000_down(adapter);
+
+   /* Request a slot slot reset. */
+   return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/** e1000_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch.
+ *  Implementation resembles the first-half of the
+ *  e1000_resume routine.
+ */
+static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct e1000_adapter *adapter = netdev->priv;
+
+   if (pci_enable_device(pdev)) {
+   printk(KERN_ERR "e1000: Cannot re-enable PCI device after 
reset.\n");
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+   pci_set_master(pdev);
+
+   pci_enable_wake(pdev, 3, 0);
+   pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */
+
+   /* Perform card reset only on one instance of the card */
+   if(0 != PCI_FUNC (pdev->devfn))
+   return PCI_ERS_RESULT_RECOVERED;
+
+   e1000_reset(adapter);
+   E1000

Re: dccp_ipv6 fails to link on some archs.

2006-01-06 Thread David S. Miller
From: Dave Jones <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 17:23:07 -0500

> Missing exports/inlines ?

Missing include, I'll fix it up.

Thanks for the report.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EMAIL PROTECTED]: [PATCH] PCI Error Recovery: e100 network device driver]

2006-01-06 Thread linas
Hi,

The following patch to the e100 device driver is in the current
2.6.15-mm1 tree, and is being pushed to the mainline 2.6.15 tree.

I wrote this patch, and I believe I've cc'ed you on previous 
versions, but certainly not recently. Please review, comment,
ACK or NAK as appropriate.

Background: Newer PCI controllers can detect and respond to 
serious PCI bus errors, typically by isolating the PCI slot 
(cutting off i/o to the failing card). An arch-specific 
framework can report these errors back to the device driver,
and coordinate the recovery of the card. Detailed documentation
for this is in the kernel tree, at Documentation/pci-error-recovery.txt

This patch adds the detection and recovery callbacks to the
e100 driver. A version of this patch has been shipping as 
a part of SUSE SLES9 for about a year, and so has been 
tested in the field.

Similar patches to follow for the e1000 and the ixgb. 

--linas

- Forwarded message from Greg KH <[EMAIL PROTECTED]> -

Subject: [PATCH] PCI Error Recovery: e100 network device driver
Reply-To: Greg K-H <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
From: Greg KH <[EMAIL PROTECTED]>

[PATCH] PCI Error Recovery: e100 network device driver

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ethernet e100
device driver. The patch has been tested, and appears to work well.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
commit 414eee4fa72175d3c0be116d6cb8b0634e4ae916
tree e1cc342377037142e0fd46f89b4cabaa3bb12adb
parent 113cc803a20d72ee5e3c92302ac5a06e0c651d01
author linas <[EMAIL PROTECTED]> Fri, 18 Nov 2005 16:23:26 -0600
committer Greg Kroah-Hartman <[EMAIL PROTECTED]> Thu, 05 Jan 2006 21:54:55 -0800

 drivers/net/e100.c |   70 
 1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 22cd045..095d953 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2704,6 +2704,75 @@ static void e100_shutdown(struct pci_dev
 }
 
 
+/* -- PCI Error Recovery infrastructure  -- */
+/** e100_io_error_detected() is called when PCI error is detected */
+static pci_ers_result_t e100_io_error_detected(struct pci_dev *pdev, 
pci_channel_state_t state)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+
+   /* Same as calling e100_down(netdev_priv(netdev)), but generic */
+   netdev->stop(netdev);
+
+   /* Is a detach needed ?? */
+   // netif_device_detach(netdev);
+
+   /* Request a slot reset. */
+   return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/** e100_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch. */
+static pci_ers_result_t e100_io_slot_reset(struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct nic *nic = netdev_priv(netdev);
+
+   if(pci_enable_device(pdev)) {
+   printk(KERN_ERR "e100: Cannot re-enable PCI device after 
reset.\n");
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+   pci_set_master(pdev);
+
+   /* Only one device per card can do a reset */
+   if (0 != PCI_FUNC (pdev->devfn))
+   return PCI_ERS_RESULT_RECOVERED;
+
+   e100_hw_reset(nic);
+   e100_phy_init(nic);
+
+   if(e100_hw_init(nic)) {
+   DPRINTK(HW, ERR, "e100_hw_init failed\n");
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+
+   return PCI_ERS_RESULT_RECOVERED;
+}
+
+/** e100_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ */
+static void e100_io_resume(struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct nic *nic = netdev_priv(netdev);
+
+   /* ack any pending wake events, disable PME */
+   pci_enable_wake(pdev, 0, 0);
+
+   netif_device_attach(netdev);
+   if(netif_running(netdev)) {
+   e100_open (netdev);
+   mod_timer(&nic->watchdog, jiffies);
+   }
+}
+
+static struct pci_error_handlers e100_err_handler = {
+   .error_detected = e100_io_error_detected,
+   .slot_reset = e100_io_slot_reset,
+   .resume = e100_io_resume,
+};
+
+
 static struct pci_driver e100_driver = {
.name = DRV_NAME,
.id_table = e100_id_table,
@@ -2714,6 +2783,7 @@ static struct pci_driver e100_driver = {
.resume =   e100_resume,
 #endif
.shutdown = e100_shutdown,
+   .err_handler = &e100_err_handler,
 };
 
 static int __init e100_init_module(void)



- End forwarded message -
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of the Union: Wireless

2006-01-06 Thread Mike Kershaw
[ Sorry, this went to linux-kernel, meant to send it to netdev.
Apologies to those who see it twice. ]

> So, now we asked: How would a sane UI look like. We had a few points:
> * The interface needs to support some kind of "master" interface to
> configure the hardware, 80211 parameters and
> to actually configure and setup the
> * Virtual interfaces.
> Data is transferred only though the virtual interfaces, which could
> be an AP interface, a STA interface in INFRA or Ad-Hoc mode, etc... .
> Configuration is done though the master interface.

Two things to inject, from my own little corner of userspace:  

1.  Monitor mode formatting.  

I ported over the BSD radiotap packet header system, it's in the Intel
and I beleive some versions of the Devicescape stacks.  Using these
would be a very good thing for userspace.  If for some reason it isn't
used, then we (userspace tool people) need something equivalent.  I like
radiotap primarily because:
 * Dynamic per-packet stats.  Drivers provide what their firmware is
   capable of providing per frame.  The more info provided the better.
 * Expandable headers.  New per-frame stats can be added into the RT
   headers without changing linktype, breaking existing apps, etc.
 * Format indicators.  Is the 4 byte FCS tacked onto the end of the
   frame in rfmon?  If we don't know this in userspace, we can't do
   802.11 validation, wep decoding, and other important stuff.
   Userspace shouldn't have to know which driver is being used, this
   ought to be in the frame headers.

Radiotap provides all of those and is already supported by tcpdump,
ethereal, kismet, etc.

2. RFMon is weird/breaks interfaces
The other gotcha with rfmon is it often breaks a cards ability to
associate (though less often with new cards).  Even if it doesn't,
whatever tool put it into rfmon is likely to want to take control of the
channel hopping, which will interfere with the associations of other
virtual interfaces.

Currently single-interface cards (ethX, whatever) thrown into rfmon just
plain break, in a pretty obvious way.  The linktype changes, traffic
stops, and users more or less understand this is going to be the
behavior.  Once virtual interfaces come into play, it may cause some
confusion if you can make virtual interfaces that do sta, adhoc, ap all
at once without conflicting, and suddenly bringing up an rfmon
interfaces causes them all to break.

I don't know if the solution to this is a warning, marking non-rfmon
virtual interfaces down, or just saying "they'll figure it out", but I
figured it's worth considering at an early stage.

-m

-- 
Mike Kershaw/Dragorn <[EMAIL PROTECTED]>
GPG Fingerprint: 3546 89DF 3C9D ED80 3381  A661 D7B2 8822 738B BDB1

"Yes, yes, LORD OF HUMANS!  I will rule you ALL with an iron fist!  YOU!
OBEY THE FIST!"
  -- Invader Zim  


pgpJF2IWX0ckO.pgp
Description: PGP signature


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Mike Kershaw
> It can be in promiscious mode (wardriving).

Just to nitpick:

Promisc implies delivering all data frames from the medium.  rfmon is
actually a different link type and delivers management frames (for which
there isn't a clear equivalent in 802.3).

Promisc does not imply disabling normal operation.  Rfmon generally
does, either due to firmware restrictions or because the app using rfmon
wants to control the channel.

I'd expect promisc on a wireless device to report 802.3 formatted data
frames for all data on the network the card is associated to.  Many
cards can't do this, so cleanly reporting that inability may be a good
idea.  Rfmon reports link layer frames, both data and non-data, with
802.11 headers, independent of network association.

Not to hassle needlessly, I just think being clear early in the planning
can help eliminate problems later.  Promisc and rfmon are fairly
different things.

-m

-- 
Mike Kershaw/Dragorn <[EMAIL PROTECTED]>
GPG Fingerprint: 3546 89DF 3C9D ED80 3381  A661 D7B2 8822 738B BDB1

"We're sorry, Susy won't be attending classes for the rest of this academic 
year.  She caught the measles, and we had her shot."


pgp6zOxSn9Mf1.pgp
Description: PGP signature


Fw: [Bugme-new] [Bug 5843] New: kissattach locks up system

2006-01-06 Thread Andrew Morton


Begin forwarded message:

Date: Fri, 6 Jan 2006 03:12:39 -0800
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bugme-new] [Bug 5843] New: kissattach locks up system


http://bugzilla.kernel.org/show_bug.cgi?id=5843

   Summary: kissattach locks up system
Kernel Version: 2.6.15
Status: NEW
  Severity: high
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]


Most recent kernel where this bug did not occur: 2.6.14-5
Distribution: kernel.org kernel source on openSuSE 10.0
Hardware Environment: i386, 1GHz PIII
Software Environment: Ham Radio
Problem Description: Issuing a kissattach command locks up system

Steps to reproduce:

1. Build and install 2.6.15 kernel with ax25 and mkiss hamradio compiled in or
modules.

2. Build and install libax25, ax25-tools and ax25-apps from sourceforge or
distro ftp site.

3. Configure /etc/ax25/axports for simple tnc
># /etc/ax25/axports
>#
># The format of this file is:
>#
># name callsign speed paclen window description #
>2m  W1NR-9  19200   255 2   145.650 MHz (1200 bps)

4. Configure /etc/ax25/ax25d.conf
># /etc/ax25/ax25d.conf
>#
># ax25d Configuration File.
>#
># AX.25 Ports begin with a '['.
>#
>[W1NR VIA 2m]
>parameters  2 1 6 900 * 15 0
>NOCALL   * * * * * * L
>default  * * * * * * - root /spider/src/client client %s ax25

5. Bind to serial port with kissattach
>worf:~ # kissattach /dev/ttyS0 2m 44.56.10.3
>AX.25 port 2m bound to device ax0

System is locked up hard at this point.  This is new to 2.6.15 and is not
reproducible in 2.6.14-5.

--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Bodo Eggert
Michael Buesch <[EMAIL PROTECTED]> wrote:

> How would the virtual interfaces look like? That is quite easy to answer.
> They are net_devices, as they transfer data.
> They should probaly _not_ be on top of the ethernet, as 80211 does not
> have very much in common with ethernet. Basically they share the same
> MAC address format. Does someone have another thing, which he thinks
> is shared?



It has a connection status.
It has a connection speed, which is less static than on a LAN.
(Maybe it can be asynchronous in the next version.)
It can't yet be full duplex, but who knows ...
It can be in promiscious mode (wardriving).

> The virtual interface is then configured though /dev/wlan0 using write()
> (no ugly ioctl anymore, you see...). Config data like TX rate,
> current essid, basically everything + xyz which is done by WE today,
> is written to /dev/wlan0.

In ASCII parsed by an in-kernel library? Did you consider sysfs?

What would a connection manager look for if it's supposed to act on
* plugging in the WLAN card
* finding/losing a (better) network

-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Andi Kleen
On Friday 06 January 2006 20:26, Lee Revell wrote:
> On Fri, 2006-01-06 at 13:58 +0100, Andi Kleen wrote:
> > Another CPU might be stuck in a long 
> > running interrupt
> 
> Shouldn't a long running interrupt be considered a bug?

In normal operation yes, but there can be always exceptional
circumstances where it's unavoidable (e.g. during error handling) 
and in the name of defensive programming the rest of the system ought 
to tolerate it.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Andi Kleen
On Friday 06 January 2006 21:26, Paul E. McKenney wrote:

> If not, it may be worthwhile to limit the number of times that
> rt_run_flush() runs per RCU grace period.

Problem is that without rt_run_flush new routes and route attribute
changes don't get used by the stack. If RCU takes long and routes
keep changing this might be a big issue.

As a admin I would be certainly annoyed if the network stack
ignored my new route for some unbounded time.

Perhaps a better way would be to just exclude dst entries in RCU state
from the normal accounting and assume that if the system
really runs short of memory because of this the results would
trigger quiescent states more quickly, freeing the memory again.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Patrick McHardy

David Lang wrote:

On Fri, 6 Jan 2006, Patrick McHardy wrote:


I think the main advantages of netlink over a character device is its
flexible format, which is easily extendable, and multicast capability,
which can be used to broadcast events and configuration changes. Its
also good to have all the net stuff accessible in a uniform way.



character devices are far easier to script. this really sounds like the 
type of configuration stuff that sysfs was designed for. can we avoid 
yet another configuration tool that's required?


I think its not just configuration but also event handling
for associating, link layer authentication, ..., which is
not something handled by scripts but by some daemon. It might
also want to set up routes or ip addresses which is done using
netlink anyway.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


dccp_ipv6 fails to link on some archs.

2006-01-06 Thread Dave Jones
Our daily build-system spat this out about 2.6.15-git2

WARNING: 
/usr/src/build/676459-ia64/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko
 needs unknown symbol csum_ipv6_magic
WARNING: 
/usr/src/build/676462-ppc64/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko
 needs unknown symbol csum_ipv6_magic
WARNING: 
/usr/src/build/676467-ppc/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko
 needs unknown symbol csum_ipv6_magic
WARNING: 
/usr/src/build/676467-ppc/install/lib/modules/2.6.15-1.1830_FC5smp/kernel/net/dccp/dccp_ipv6.ko
 needs unknown symbol csum_ipv6_magic
WARNING: 
/usr/src/build/676465-s390/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko
 needs unknown symbol csum_ipv6_magic
WARNING: 
/usr/src/build/676460-s390x/install/lib/modules/2.6.15-1.1830_FC5/kernel/net/dccp/dccp_ipv6.ko
 needs unknown symbol csum_ipv6_magic

Missing exports/inlines ?

Dave

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread David S. Miller
From: David Lang <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 14:16:17 -0800 (PST)

> character devices are far easier to script. this really sounds like the 
> type of configuration stuff that sysfs was designed for. can we avoid yet 
> another configuration tool that's required?

netlink is being recommended exactly because it can result
in only needing one tool for everything
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread David Lang

On Fri, 6 Jan 2006, Patrick McHardy wrote:


Marcel Holtmann wrote:


I just personally liked the idea of having a device node in /dev for
every existing hardware wlan card. Like we have device nodes for
other real hardware, too. It felt like a bit of a "unix way" to do
this to me. I don't say this is the way to go.
If a netlink socket is used (which is possible, for sure), we stay with
the old way of having no device node in /dev for networking devices.
That is ok. But that is really only an implementation detail (and for sure
a matter of taste).



At the OLS last year, I think the consensus was to use netlink for all
configuration task. However this was mainly driven by Harald Welte and
he might be able to talk about the pros and cons of netlink versus a
character device.


I think the main advantages of netlink over a character device is its
flexible format, which is easily extendable, and multicast capability,
which can be used to broadcast events and configuration changes. Its
also good to have all the net stuff accessible in a uniform way.


character devices are far easier to script. this really sounds like the 
type of configuration stuff that sysfs was designed for. can we avoid yet 
another configuration tool that's required?


David Lang

--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Endian-annotate struct iphdr

2006-01-06 Thread David S. Miller
From: Alexey Dobriyan <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 23:18:37 +0300

> And fix trivial warnings that emerged.
> 
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

Applied.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Endian-annotate in_aton()

2006-01-06 Thread David S. Miller
From: Alexey Dobriyan <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 23:19:25 +0300

> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

Also applied.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Corrections to LSM-IPSec Nethooks

2006-01-06 Thread David S. Miller
From: Trent Jaeger <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 11:09:43 -0500

> Signed-off-by: Trent Jaeger <[EMAIL PROTECTED]>

Applied, thanks Trent.

I think it's a small bit of lesser known trivia that I spent one
semester at Penn State, on the Erie campus :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] fix ipvs compilation

2006-01-06 Thread David S. Miller
From: Joe <[EMAIL PROTECTED]>
Date: Thu, 5 Jan 2006 23:43:52 -0500

> Thats not all either,  ./net/ipv4/netfilter/ipt_helper.c has the same
> error and the same fix.
> 
> Here's the patch for this one.  Sorry for the dupe.. i sent the last
> as html by accident.

Applied, please provide a "Signed-off-by:" line with your patch
next time.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] bridge + netfilter + vlan + hw checksum = bug?

2006-01-06 Thread David S. Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 4 Jan 2006 16:00:41 -0800

> It looks like the bridge netfilter code does not correctly update
> the hardware checksum after popping off the VLAN header.
> 
> This is by inspection, I have *not* tested this.
> To test you would need to set up a filtering bridge with vlans
> and a device the does hardware receive checksum (skge, or sungem)
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Even though untested, it very much looks correct to me
and therefore I'll apply this now for 2.6.16

We have a lot of time to find any problems with this
change :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use newer is_multicast_ether_addr() in some files

2006-01-06 Thread David S. Miller
From: "Kris Katterjohn" <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 13:01:54 -0800

> From: Patrick McHardy
> Sent: 1/6/2006 12:52:34 PM
> 
> > Randy.Dunlap wrote:
> > > On Fri, 6 Jan 2006, Patrick McHardy wrote:
> > > 
> > >>>--- x/net/atm/br2684.c   2006-01-02 21:21:10.0 -0600
> > >>>+++ y/net/atm/br2684.c   2006-01-06 12:34:47.0 -0600
> > >>>@@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc
> > >>> unsigned char *rawp;
> > >>> eth = eth_hdr(skb);
> > >>>
> > >>>-if (*eth->h_dest & 1) {
> > >>>+if (is_multicast_ether_addr(eth->h_dest)) {
> > >>
> > >>This is not equivalent, is_multicast_ether_addr() ignores
> > >>addresses starting with 0xff.
> > > 
> > > It used to.  Not today afaict.
> > 
> > You're right, Stephen changed it two days ago.
> 
> That's why I said the newer is_multicast_ether_addr(). Sorry for the 
> confusion.

Applied, thanks Kris.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Change sk_run_filter()'s return type in net/core/filter.c

2006-01-06 Thread David S. Miller
From: "Kris Katterjohn" <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 05:53:32 -0800

> From: Patrick McHardy
> Sent: 1/6/2006 1:36:24 AM
> > Please use unsigned int not just unsigned.
> 
> Ta-da!

Applied, thanks Kris.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BCM 5705 firmware not starting...

2006-01-06 Thread Michael Chan
On Fri, 2006-01-06 at 15:13 -0500, Ben Collins wrote:
> http://bugzilla.ubuntu.com/show_bug.cgi?id=16435
> 
> The above is a bug report for a user that is getting a firmware restart
> timeout (waiting for mbox1 magic to invert).
> 
> Any ideas on if this is a software or hardware issue? Anything I can ask
> the user to do to help debug it?
> 
> This is 2.6.15, btw.
> 

It is most likely bad firmware or corrupted firmware on the card. The
mismatch of the chip revision reported by tg3 and lspci output further
confirms this. I will see what can be done to get the firmware upgraded.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use newer is_multicast_ether_addr() in some files

2006-01-06 Thread Kris Katterjohn
From: Patrick McHardy
Sent: 1/6/2006 12:52:34 PM

> Randy.Dunlap wrote:
> > On Fri, 6 Jan 2006, Patrick McHardy wrote:
> > 
> >>>--- x/net/atm/br2684.c 2006-01-02 21:21:10.0 -0600
> >>>+++ y/net/atm/br2684.c 2006-01-06 12:34:47.0 -0600
> >>>@@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc
> >>>   unsigned char *rawp;
> >>>   eth = eth_hdr(skb);
> >>>
> >>>-  if (*eth->h_dest & 1) {
> >>>+  if (is_multicast_ether_addr(eth->h_dest)) {
> >>
> >>This is not equivalent, is_multicast_ether_addr() ignores
> >>addresses starting with 0xff.
> > 
> > It used to.  Not today afaict.
> 
> You're right, Stephen changed it two days ago.

That's why I said the newer is_multicast_ether_addr(). Sorry for the confusion.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use newer is_multicast_ether_addr() in some files

2006-01-06 Thread Patrick McHardy

Randy.Dunlap wrote:

On Fri, 6 Jan 2006, Patrick McHardy wrote:


--- x/net/atm/br2684.c  2006-01-02 21:21:10.0 -0600
+++ y/net/atm/br2684.c  2006-01-06 12:34:47.0 -0600
@@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc
unsigned char *rawp;
eth = eth_hdr(skb);

-   if (*eth->h_dest & 1) {
+   if (is_multicast_ether_addr(eth->h_dest)) {


This is not equivalent, is_multicast_ether_addr() ignores
addresses starting with 0xff.


It used to.  Not today afaict.


You're right, Stephen changed it two days ago.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use newer is_multicast_ether_addr() in some files

2006-01-06 Thread Randy.Dunlap
On Fri, 6 Jan 2006, Patrick McHardy wrote:

> Kris Katterjohn wrote:
> > This uses is_multicast_ether_addr() because it has recently been changed to 
> > do
> > the same thing these seperate tests are doing.
>
> > --- x/net/atm/br2684.c  2006-01-02 21:21:10.0 -0600
> > +++ y/net/atm/br2684.c  2006-01-06 12:34:47.0 -0600
> > @@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc
> > unsigned char *rawp;
> > eth = eth_hdr(skb);
> >
> > -   if (*eth->h_dest & 1) {
> > +   if (is_multicast_ether_addr(eth->h_dest)) {
>
> This is not equivalent, is_multicast_ether_addr() ignores
> addresses starting with 0xff.

It used to.  Not today afaict.

-- 
~Randy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use newer is_multicast_ether_addr() in some files

2006-01-06 Thread Patrick McHardy

Kris Katterjohn wrote:

This uses is_multicast_ether_addr() because it has recently been changed to do
the same thing these seperate tests are doing.



--- x/net/atm/br2684.c  2006-01-02 21:21:10.0 -0600
+++ y/net/atm/br2684.c  2006-01-06 12:34:47.0 -0600
@@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc
unsigned char *rawp;
eth = eth_hdr(skb);
 
-	if (*eth->h_dest & 1) {

+   if (is_multicast_ether_addr(eth->h_dest)) {


This is not equivalent, is_multicast_ether_addr() ignores
addresses starting with 0xff.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread David S. Miller
From: "Paul E. McKenney" <[EMAIL PROTECTED]>
Date: Fri, 6 Jan 2006 12:26:26 -0800

> If not, it may be worthwhile to limit the number of times that
> rt_run_flush() runs per RCU grace period.

This is mixing two sets of requirements.

rt_run_flush() runs periodically in order to regenerate the hash
function secret key.  Now, for that specific case it might actually be
possible to rehash instead of flush, but the locking is a little bit
tricky :-)  And also, I think we're regenerating the secret key
just a little bit too often, I think we'd get enough security
with a less frequent regeneration.

I'll look into this and your other ideas later today hopefully.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Paul E. McKenney
On Fri, Jan 06, 2006 at 06:19:15PM +0100, Eric Dumazet wrote:
> Paul E. McKenney a écrit :
> >On Fri, Jan 06, 2006 at 01:37:12PM +, Alan Cox wrote:
> >>On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
> >>>I assume that if a CPU queued 10.000 items in its RCU queue, then the 
> >>>oldest entry cannot still be in use by another CPU. This might sounds as 
> >>>a violation of RCU rules, (I'm not an RCU expert) but seems quite 
> >>>reasonable.
> >>Fixing the real problem in the routing code would be the real fix. 
> >>
> >>The underlying problem of RCU and memory usage could be solved more
> >>safely by making sure that the sleeping memory allocator path always
> >>waits until at least one RCU cleanup has occurred after it fails an
> >>allocation before it starts trying harder. That ought to also naturally
> >>throttle memory consumers more in the situation which is the right
> >>behaviour.
> >
> >A quick look at rt_garbage_collect() leads me to believe that although
> >the IP route cache does try to limit its use of memory, it does not
> >fully account for memory that it has released to RCU, but that RCU has
> >not yet freed due to a grace period not having elapsed.
> >
> >The following appears to be possible:
> >
> >1.   rt_garbage_collect() sees that there are too many entries,
> > and sets "goal" to the number to free up, based on a
> > computed "equilibrium" value.
> >
> >2.   The number of entries is (correctly) decremented only when
> > the corresponding RCU callback is invoked, which actually
> > frees the entry.
> >
> >3.   Between the time that rt_garbage_collect() is invoked the
> > first time and when the RCU grace period ends, rt_garbage_collect()
> > is invoked again.  It still sees too many entries (since
> > RCU has not yet freed the ones released by the earlier
> > invocation in step (1) above), so frees a bunch more.
> >
> >4.   Packets routed now miss the route cache, because the corresponding
> > entries are waiting for a grace period, slowing the system down.
> > Therefore, even more entries are freed to make room for new
> > entries corresponding to the new packets.
> >
> >If my (likely quite naive) reading of the IP route cache code is correct,
> >it would be possible to end up in a steady state with most of the entries
> >always being in RCU rather than in the route cache.
> >
> >Eric, could this be what is happening to your system?
> >
> >If it is, one straightforward fix would be to keep a count of the number
> >of route-cache entries waiting on RCU, and for rt_garbage_collect()
> >to subtract this number of entries from its goal.  Does this make sense?
> >
> 
> Hi Paul
> 
> Thanks for reviewing route code :)
> 
> As I said, the problem comes from 'route flush cache', that is periodically 
> done by rt_run_flush(), triggered by rt_flush_timer.
> 
> The 10% of LOWMEM ram that was used by route-cache entries are pushed into 
> rcu queues (with call_rcu_bh()) and network continue to receive
> packets from *many* sources that want their route-cache entry.

Hello, Eric,

The rt_run_flush() function could indeed be suffering from the same
problem.  Dipankar's recent patch should help RCU grace periods proceed
more quickly, does that help?

If not, it may be worthwhile to limit the number of times that
rt_run_flush() runs per RCU grace period.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BCM 5705 firmware not starting...

2006-01-06 Thread Ben Collins
http://bugzilla.ubuntu.com/show_bug.cgi?id=16435

The above is a bug report for a user that is getting a firmware restart
timeout (waiting for mbox1 magic to invert).

Any ideas on if this is a software or hardware issue? Anything I can ask
the user to do to help debug it?

This is 2.6.15, btw.

-- 
   Ben Collins <[EMAIL PROTECTED]>
   Developer
   Ubuntu Linux

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Endian-annotate in_aton()

2006-01-06 Thread Alexey Dobriyan
Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 include/linux/inet.h |2 +-
 net/core/utils.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/inet.h
+++ b/include/linux/inet.h
@@ -45,6 +45,6 @@
 #ifdef __KERNEL__
 #include 
 
-extern __u32 in_aton(const char *str);
+extern __be32 in_aton(const char *str);
 #endif
 #endif /* _LINUX_INET_H */
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -162,7 +162,7 @@ EXPORT_SYMBOL(net_srandom);
  * is otherwise not dependent on the TCP/IP stack.
  */
 
-__u32 in_aton(const char *str)
+__be32 in_aton(const char *str)
 {
unsigned long l;
unsigned int val;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Endian-annotate struct iphdr

2006-01-06 Thread Alexey Dobriyan
And fix trivial warnings that emerged.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 include/linux/ip.h |   10 +-
 net/ipv4/ip_fragment.c |2 +-
 net/ipv4/ip_output.c   |4 ++--
 net/ipv4/ipvs/ip_vs_xmit.c |2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

--- a/include/linux/ip.h
+++ b/include/linux/ip.h
@@ -90,14 +90,14 @@ struct iphdr {
 #error "Please fix "
 #endif
__u8tos;
-   __u16   tot_len;
-   __u16   id;
-   __u16   frag_off;
+   __be16  tot_len;
+   __be16  id;
+   __be16  frag_off;
__u8ttl;
__u8protocol;
__u16   check;
-   __u32   saddr;
-   __u32   daddr;
+   __be32  saddr;
+   __be32  daddr;
/*The options start here. */
 };
 
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -383,7 +383,7 @@ out_nomem:
  */
 static inline struct ipq *ip_find(struct iphdr *iph, u32 user)
 {
-   __u16 id = iph->id;
+   __be16 id = iph->id;
__u32 saddr = iph->saddr;
__u32 daddr = iph->daddr;
__u8 protocol = iph->protocol;
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -418,7 +418,7 @@ int ip_fragment(struct sk_buff *skb, int
struct sk_buff *skb2;
unsigned int mtu, hlen, left, len, ll_rs;
int offset;
-   int not_last_frag;
+   __be16 not_last_frag;
struct rtable *rt = (struct rtable*)skb->dst;
int err = 0;
 
@@ -1180,7 +1180,7 @@ int ip_push_pending_frames(struct sock *
struct ip_options *opt = NULL;
struct rtable *rt = inet->cork.rt;
struct iphdr *iph;
-   int df = 0;
+   __be16 df = 0;
__u8 ttl;
int err = 0;
 
diff --git a/net/ipv4/ipvs/ip_vs_xmit.c b/net/ipv4/ipvs/ip_vs_xmit.c
index 3b87482..52c12e9 100644
--- a/net/ipv4/ipvs/ip_vs_xmit.c
+++ b/net/ipv4/ipvs/ip_vs_xmit.c
@@ -322,7 +322,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, s
struct net_device *tdev;/* Device to other host */
struct iphdr  *old_iph = skb->nh.iph;
u8 tos = old_iph->tos;
-   u16df = old_iph->frag_off;
+   __be16 df = old_iph->frag_off;
struct iphdr  *iph; /* Our new IP header */
intmax_headroom;/* The extra header space 
needed */
intmtu;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] tulip: enable multiport NIC BIOS fixups for x86_64

2006-01-06 Thread John W. Linville
From: Christoph Dworzak <[EMAIL PROTECTED]>

A BIOS bug affecting some multiport tulip NICs requires an irq fixup
in tulip_core.c.  This has only been enabled for i686, but it is
needed for x86_64 as well.

Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
---

 drivers/net/tulip/tulip_core.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
index 125ed00..c67c912 100644
--- a/drivers/net/tulip/tulip_core.c
+++ b/drivers/net/tulip/tulip_core.c
@@ -1564,7 +1564,7 @@ static int __devinit tulip_init_one (str
dev->dev_addr, 6);
}
 #endif
-#if defined(__i386__)  /* Patch up x86 BIOS bug. */
+#if defined(__i386__) || defined(__x86_64__)   /* Patch up x86 BIOS bug. */
if (last_irq)
irq = last_irq;
 #endif
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Lee Revell
On Fri, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
> I have some servers that once in a while crashes when the ip route
> cache is flushed. After
> raising /proc/sys/net/ipv4/route/secret_interval (so that *no* 
> flush is done), I got better uptime for these servers. 

Argh, where is that documented?  I have been banging my head against
this for weeks - how do I keep the kernel from flushing 4096 routes at
once in softirq context causing huge (~8-20ms) latency problems?

I tried all the route related sysctls I could find and nothing worked...

Lee

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Lee Revell
On Fri, 2006-01-06 at 13:58 +0100, Andi Kleen wrote:
> Another CPU might be stuck in a long 
> running interrupt

Shouldn't a long running interrupt be considered a bug?

Lee

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6 patch] remove drivers/net/tulip/xircom_tulip_cb.c

2006-01-06 Thread Adrian Bunk
This patch removes the obsolete drivers/net/tulip/xircom_tulip_cb.c 
driver.


Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

This patch was already sent on:
- 12 Dec 2005
- 18 Nov 2005

 drivers/net/tulip/Kconfig   |   16 
 drivers/net/tulip/Makefile  |1 
 drivers/net/tulip/xircom_tulip_cb.c | 1748 
 3 files changed, 1 insertion(+), 1764 deletions(-)

--- linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Kconfig.old 2005-11-18 
03:45:53.0 +0100
+++ linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Kconfig 2005-11-18 
03:46:20.0 +0100
@@ -148,7 +148,7 @@
  be called uli526x.
  
 config PCMCIA_XIRCOM
-   tristate "Xircom CardBus support (new driver)"
+   tristate "Xircom CardBus support"
depends on NET_TULIP && CARDBUS
---help---
  This driver is for the Digital "Tulip" Ethernet CardBus adapters.
@@ -160,19 +160,5 @@
  .  The module will
  be called xircom_cb.  If unsure, say N.
 
-config PCMCIA_XIRTULIP
-   tristate "Xircom Tulip-like CardBus support (old driver)"
-   depends on NET_TULIP && CARDBUS && BROKEN_ON_SMP
-   select CRC32
-   ---help---
- This driver is for the Digital "Tulip" Ethernet CardBus adapters.
- It should work with most DEC 21*4*-based chips/ethercards, as well
- as with work-alike chips from Lite-On (PNIC) and Macronix (MXIC) and
- ASIX.
-
- To compile this driver as a module, choose M here and read
- .  The module will
- be called xircom_tulip_cb.  If unsure, say N.
-
 endmenu
 
--- linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Makefile.old2005-11-18 
03:46:32.0 +0100
+++ linux-2.6.15-rc1-mm1-full/drivers/net/tulip/Makefile2005-11-18 
03:46:41.0 +0100
@@ -2,7 +2,6 @@
 # Makefile for the Linux "Tulip" family network device drivers.
 #
 
-obj-$(CONFIG_PCMCIA_XIRTULIP)  += xircom_tulip_cb.o
 obj-$(CONFIG_PCMCIA_XIRCOM)+= xircom_cb.o
 obj-$(CONFIG_DM9102)   += dmfe.o
 obj-$(CONFIG_WINBOND_840)  += winbond-840.o
--- linux-2.6.15-rc1-mm1-full/drivers/net/tulip/xircom_tulip_cb.c   
2005-10-28 02:02:08.0 +0200
+++ /dev/null   2005-11-08 19:07:57.0 +0100
@@ -1,1748 +0,0 @@
-/* xircom_tulip_cb.c: A Xircom CBE-100 ethernet driver for Linux. */
-/*
-   Written/copyright 1994-1999 by Donald Becker.
-
-   This software may be used and distributed according to the terms
-   of the GNU General Public License, incorporated herein by reference.
-
-   The author may be reached as [EMAIL PROTECTED], or C/O
-   Scyld Computing Corporation
-   410 Severn Ave., Suite 210
-   Annapolis MD 21403
-
-   ---
-
-   Linux kernel-specific changes:
-
-   LK1.0 (Ion Badulescu)
-   - Major cleanup
-   - Use 2.4 PCI API
-   - Support ethtool
-   - Rewrite perfect filter/hash code
-   - Use interrupts for media changes
-
-   LK1.1 (Ion Badulescu)
-   - Disallow negotiation of unsupported full-duplex modes
-*/
-
-#define DRV_NAME   "xircom_tulip_cb"
-#define DRV_VERSION"0.91+LK1.1"
-#define DRV_RELDATE"October 11, 2001"
-
-#define CARDBUS 1
-
-/* A few user-configurable values. */
-
-#define xircom_debug debug
-#ifdef XIRCOM_DEBUG
-static int xircom_debug = XIRCOM_DEBUG;
-#else
-static int xircom_debug = 1;
-#endif
-
-/* Maximum events (Rx packets, etc.) to handle at each interrupt. */
-static int max_interrupt_work = 25;
-
-#define MAX_UNITS 4
-/* Used to pass the full-duplex flag, etc. */
-static int full_duplex[MAX_UNITS];
-static int options[MAX_UNITS];
-static int mtu[MAX_UNITS]; /* Jumbo MTU for interfaces. */
-
-/* Keep the ring sizes a power of two for efficiency.
-   Making the Tx ring too large decreases the effectiveness of channel
-   bonding and packet priority.
-   There are no ill effects from too-large receive rings. */
-#define TX_RING_SIZE   16
-#define RX_RING_SIZE   32
-
-/* Set the copy breakpoint for the copy-only-tiny-buffer Rx structure. */
-#ifdef __alpha__
-static int rx_copybreak = 1518;
-#else
-static int rx_copybreak = 100;
-#endif
-
-/*
-  Set the bus performance register.
-   Typical: Set 16 longword cache alignment, no burst limit.
-   Cache alignment bits 15:14   Burst length 13:8
-   No alignment  0x unlimited  0800 8 
longwords
-   40008  longwords0100 1 longword 1000 16 
longwords
-   800016 longwords0200 2 longwords2000 32 
longwords
-   C00032  longwords   0400 4 longwords
-   Warning: many older 486 systems are broken and require setting 
0x00A04800
-  8 longword cache alignment, 8 longword burst.
-   ToDo: Non-Intel setting could be better.
-*/
-
-#if defined(__alpha__) || defined(__ia64__) || defined(__x86_

[PATCH] Use newer is_multicast_ether_addr() in some files

2006-01-06 Thread Kris Katterjohn
This uses is_multicast_ether_addr() because it has recently been changed to do
the same thing these seperate tests are doing.

Signed-off-by: Kris Katterjohn <[EMAIL PROTECTED]>

Thanks!

--- x/net/atm/br2684.c  2006-01-02 21:21:10.0 -0600
+++ y/net/atm/br2684.c  2006-01-06 12:34:47.0 -0600
@@ -295,7 +295,7 @@ static inline __be16 br_type_trans(struc
unsigned char *rawp;
eth = eth_hdr(skb);
 
-   if (*eth->h_dest & 1) {
+   if (is_multicast_ether_addr(eth->h_dest)) {
if (memcmp(eth->h_dest, dev->broadcast, ETH_ALEN) == 0)
skb->pkt_type = PACKET_BROADCAST;
else


--- x/net/bridge/br_input.c 2006-01-02 21:21:10.0 -0600
+++ y/net/bridge/br_input.c 2006-01-06 12:31:59.0 -0600
@@ -63,7 +63,7 @@ int br_handle_frame_finish(struct sk_buf
}
}
 
-   if (dest[0] & 1) {
+   if (is_multicast_ether_addr(dest)) {
br_flood_forward(br, skb, !passedup);
if (!passedup)
br_pass_frame_up(br, skb);


--- x/net/ethernet/eth.c2006-01-05 21:28:02.0 -0600
+++ y/net/ethernet/eth.c2006-01-06 12:21:04.0 -0600
@@ -163,7 +163,7 @@ __be16 eth_type_trans(struct sk_buff *sk
skb_pull(skb,ETH_HLEN);
eth = eth_hdr(skb);

-   if (*eth->h_dest&1) {
+   if (is_multicast_ether_addr(eth->h_dest)) {
if (!compare_ether_addr(eth->h_dest, dev->broadcast))
skb->pkt_type = PACKET_BROADCAST;
else


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Stephen Hemminger
On Fri, 06 Jan 2006 13:46:15 +0100
Patrick McHardy <[EMAIL PROTECTED]> wrote:

> Marcel Holtmann wrote:
> 
> >>I just personally liked the idea of having a device node in /dev for
> >>every existing hardware wlan card. Like we have device nodes for
> >>other real hardware, too. It felt like a bit of a "unix way" to do
> >>this to me. I don't say this is the way to go.
> >>If a netlink socket is used (which is possible, for sure), we stay with
> >>the old way of having no device node in /dev for networking devices.
> >>That is ok. But that is really only an implementation detail (and for sure
> >>a matter of taste).
> > 
> > 
> > At the OLS last year, I think the consensus was to use netlink for all
> > configuration task. However this was mainly driven by Harald Welte and
> > he might be able to talk about the pros and cons of netlink versus a
> > character device.
> 
> I think the main advantages of netlink over a character device is its
> flexible format, which is easily extendable, and multicast capability,
> which can be used to broadcast events and configuration changes. Its
> also good to have all the net stuff accessible in a uniform way.

Also netlink doesn't have the naming issues that /dev node would.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet

Paul E. McKenney a écrit :

On Fri, Jan 06, 2006 at 01:37:12PM +, Alan Cox wrote:

On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest 
entry cannot still be in use by another CPU. This might sounds as a violation 
of RCU rules, (I'm not an RCU expert) but seems quite reasonable.
Fixing the real problem in the routing code would be the real fix. 


The underlying problem of RCU and memory usage could be solved more
safely by making sure that the sleeping memory allocator path always
waits until at least one RCU cleanup has occurred after it fails an
allocation before it starts trying harder. That ought to also naturally
throttle memory consumers more in the situation which is the right
behaviour.


A quick look at rt_garbage_collect() leads me to believe that although
the IP route cache does try to limit its use of memory, it does not
fully account for memory that it has released to RCU, but that RCU has
not yet freed due to a grace period not having elapsed.

The following appears to be possible:

1.  rt_garbage_collect() sees that there are too many entries,
and sets "goal" to the number to free up, based on a
computed "equilibrium" value.

2.  The number of entries is (correctly) decremented only when
the corresponding RCU callback is invoked, which actually
frees the entry.

3.  Between the time that rt_garbage_collect() is invoked the
first time and when the RCU grace period ends, rt_garbage_collect()
is invoked again.  It still sees too many entries (since
RCU has not yet freed the ones released by the earlier
invocation in step (1) above), so frees a bunch more.

4.  Packets routed now miss the route cache, because the corresponding
entries are waiting for a grace period, slowing the system down.
Therefore, even more entries are freed to make room for new
entries corresponding to the new packets.

If my (likely quite naive) reading of the IP route cache code is correct,
it would be possible to end up in a steady state with most of the entries
always being in RCU rather than in the route cache.

Eric, could this be what is happening to your system?

If it is, one straightforward fix would be to keep a count of the number
of route-cache entries waiting on RCU, and for rt_garbage_collect()
to subtract this number of entries from its goal.  Does this make sense?



Hi Paul

Thanks for reviewing route code :)

As I said, the problem comes from 'route flush cache', that is periodically 
done by rt_run_flush(), triggered by rt_flush_timer.


The 10% of LOWMEM ram that was used by route-cache entries are pushed into rcu 
queues (with call_rcu_bh()) and network continue to receive

packets from *many* sources that want their route-cache entry.


Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Ben Greear

Michael Buesch wrote:


How would the virtual interfaces look like? That is quite easy to answer.
They are net_devices, as they transfer data.
They should probaly _not_ be on top of the ethernet, as 80211 does not
have very much in common with ethernet. Basically they share the same
MAC address format. Does someone have another thing, which he thinks
is shared?


If you can make the virtual devices look like ethernet, I believe a lot of 
other things
will just work w/out hacking, including user-space apps that think they
know exactly what an ethernet frame/device looks like.

The only things I think of that won't work like ethernet is the ability to
change the local MAC address or go into promisc mode.  And, it's always possible
that future wifi hardware will support that as well.  Either way, the current
API handles this fine:  the requests to change will just fail with a convenient 
error.

Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] update bonding.txt to not show ip address on slaves

2006-01-06 Thread Eric Paris
ifenslave, as of abi version 2, does not set the ip address on the slave
interfaces.  The documentation example however still shows that the
ensalved interfaces should have the same IP as the master.  The patch
simply removes the lines from the example which should no longer appear.

Signed-off-by: Eric Paris <[EMAIL PROTECTED]>

 bonding.txt |2 --
 1 files changed, 2 deletions(-)

--- linux-2.6.14.2/Documentation/networking/bonding.txt.old 2006-01-06 
11:47:31.0 -0500
+++ linux-2.6.14.2/Documentation/networking/bonding.txt 2006-01-06 
11:49:18.0 -0500
@@ -944,7 +944,6 @@ bond0 Link encap:Ethernet  HWaddr 00
   collisions:0 txqueuelen:0
 
 eth0  Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
-  inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
   UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
   RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
   TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
@@ -952,7 +951,6 @@ eth0  Link encap:Ethernet  HWaddr 00
   Interrupt:10 Base address:0x1080
 
 eth1  Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
-  inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
   UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
   RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
   TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Johannes Berg
On Fri, 2006-01-06 at 17:12 +0100, Feyd wrote:
> Michael Buesch wrote:
> > The _real_ main point I wanted to make was to _not_ use a net_device for
> > the master device. What else should be used for master device, let it
> > be a device node or a netlink socket, is rather unimportant at
> > this stage.
> 
> If the only purpose of the master device was configuration, then it
> would be beter to use something other then a net_device, but you may
> want to send/receive raw 802.11 packets from userspace, most logicaly
> over a master interface.

We thought about that for a while, but it may not be feasible. Certain
hardware that manages more stuff than others in firmware/hardware may
not allow sending raw frames without going into some special mode, which
is better handled by adding some kind of raw virtual device.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Paul E. McKenney
On Fri, Jan 06, 2006 at 01:37:12PM +, Alan Cox wrote:
> On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
> > I assume that if a CPU queued 10.000 items in its RCU queue, then the 
> > oldest 
> > entry cannot still be in use by another CPU. This might sounds as a 
> > violation 
> > of RCU rules, (I'm not an RCU expert) but seems quite reasonable.
> 
> Fixing the real problem in the routing code would be the real fix. 
> 
> The underlying problem of RCU and memory usage could be solved more
> safely by making sure that the sleeping memory allocator path always
> waits until at least one RCU cleanup has occurred after it fails an
> allocation before it starts trying harder. That ought to also naturally
> throttle memory consumers more in the situation which is the right
> behaviour.

A quick look at rt_garbage_collect() leads me to believe that although
the IP route cache does try to limit its use of memory, it does not
fully account for memory that it has released to RCU, but that RCU has
not yet freed due to a grace period not having elapsed.

The following appears to be possible:

1.  rt_garbage_collect() sees that there are too many entries,
and sets "goal" to the number to free up, based on a
computed "equilibrium" value.

2.  The number of entries is (correctly) decremented only when
the corresponding RCU callback is invoked, which actually
frees the entry.

3.  Between the time that rt_garbage_collect() is invoked the
first time and when the RCU grace period ends, rt_garbage_collect()
is invoked again.  It still sees too many entries (since
RCU has not yet freed the ones released by the earlier
invocation in step (1) above), so frees a bunch more.

4.  Packets routed now miss the route cache, because the corresponding
entries are waiting for a grace period, slowing the system down.
Therefore, even more entries are freed to make room for new
entries corresponding to the new packets.

If my (likely quite naive) reading of the IP route cache code is correct,
it would be possible to end up in a steady state with most of the entries
always being in RCU rather than in the route cache.

Eric, could this be what is happening to your system?

If it is, one straightforward fix would be to keep a count of the number
of route-cache entries waiting on RCU, and for rt_garbage_collect()
to subtract this number of entries from its goal.  Does this make sense?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Newbie question

2006-01-06 Thread Arnaldo Carvalho de Melo
On 1/6/06, Alan Menegotto <[EMAIL PROTECTED]> wrote:
> Hi.
>
> I couldn't understand the logic in the function 'static int __init
> ipv4_proc_init(void)' located at net/ipv4/af_inet.c. Look at the code:
>
> static int __init ipv4_proc_init(void)
> {
>  int rc = 0;
>
>  if (raw_proc_init())
>  goto out_raw;
>  if (tcp4_proc_init())
>  goto out_tcp;
>  if (udp4_proc_init())
>  goto out_udp;
>  if (fib_proc_init())
>  goto out_fib;
>  if (ip_misc_proc_init())
>  goto out_misc;
> out:
>  return rc;
> out_misc:
>  fib_proc_exit();
> out_fib:
>  udp4_proc_exit();
> out_udp:
>  tcp4_proc_exit();
> out_tcp:
>  raw_proc_exit();
> out_raw:
>  rc = -ENOMEM;
>  goto out;
> }
>
> Calling tcp4_proc_init should go to label out_tcp, which call
> raw_proc_exit(). Is this correct? If yes, why?

No, calling tcp4_proc_init() will only lead to calling raw_proc_exit()
if tcp4_proc_init() returns !0, i.e. if it fails.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Corrections to LSM-IPSec Nethooks

2006-01-06 Thread Stephen Smalley
On Fri, 2006-01-06 at 11:09 -0500, Trent Jaeger wrote:
> Forgot signoff -- see below.
> 
> On Jan 6, 2006, at 10:48 AM, Trent Jaeger wrote:
> 
> > Hi,
> >
> > This patch contains two corrections to the LSM-IPsec Nethooks patches
> > previously applied.
> >
> > (1) free a security context on a failed insert via xfrm_user
> > interface in xfrm_add_policy.  Memory leak.
> >
> > (2) change the authorization of the allocation of a security context
> > in a xfrm_policy or xfrm_state from both relabelfrom and relabelto
> > to setcontext.
> >
> > This is intended to be a correction to the 2.6.16 tree.
> 
> Signed-off-by: Trent Jaeger <[EMAIL PROTECTED]>

Acked-by:  Stephen Smalley <[EMAIL PROTECTED]>

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Feyd

Michael Buesch wrote:

The _real_ main point I wanted to make was to _not_ use a net_device for
the master device. What else should be used for master device, let it
be a device node or a netlink socket, is rather unimportant at
this stage.


If the only purpose of the master device was configuration, then it
would be beter to use something other then a net_device, but you may
want to send/receive raw 802.11 packets from userspace, most logicaly
over a master interface.

Feyd

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Corrections to LSM-IPSec Nethooks

2006-01-06 Thread Trent Jaeger

Forgot signoff -- see below.

On Jan 6, 2006, at 10:48 AM, Trent Jaeger wrote:


Hi,

This patch contains two corrections to the LSM-IPsec Nethooks patches
previously applied.

(1) free a security context on a failed insert via xfrm_user
interface in xfrm_add_policy.  Memory leak.

(2) change the authorization of the allocation of a security context
in a xfrm_policy or xfrm_state from both relabelfrom and relabelto
to setcontext.

This is intended to be a correction to the 2.6.16 tree.


Signed-off-by: Trent Jaeger <[EMAIL PROTECTED]>



Regards,
Trent.
-
---

 net/xfrm/xfrm_user.c |1 +
 security/selinux/include/av_perm_to_string.h |3 +--
 security/selinux/include/av_permissions.h|3 +--
 security/selinux/xfrm.c  |8 +---
 4 files changed, 4 insertions(+), 11 deletions(-)

diff -puN include/linux/security.h~lsm-relabel-nethooks include/ 
linux/security.h

diff -puN net/key/af_key.c~lsm-relabel-nethooks net/key/af_key.c
diff -puN net/xfrm/xfrm_user.c~lsm-relabel-nethooks net/xfrm/ 
xfrm_user.c
--- linux-2.6.15-rc5/net/xfrm/xfrm_user.c~lsm-relabel-nethooks	 
2006-01-04 22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/net/xfrm/xfrm_user.c	2006-01-05  
10:36:04.0 -0500

@@ -802,6 +802,7 @@ static int xfrm_add_policy(struct sk_buf
excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY;
err = xfrm_policy_insert(p->dir, xp, excl);
if (err) {
+   security_xfrm_policy_free(xp);
kfree(xp);
return err;
}
diff -puN security/dummy.c~lsm-relabel-nethooks security/dummy.c
diff -puN security/selinux/hooks.c~lsm-relabel-nethooks security/ 
selinux/hooks.c
diff -puN security/selinux/include/av_perm_to_string.h~lsm-relabel- 
nethooks security/selinux/include/av_perm_to_string.h
--- linux-2.6.15-rc5/security/selinux/include/ 
av_perm_to_string.h~lsm-relabel-nethooks	2006-01-04  
22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/security/selinux/include/ 
av_perm_to_string.h	2006-01-04 22:38:14.0 -0500

@@ -238,5 +238,4 @@
S_(SECCLASS_NSCD, NSCD__SHMEMHOST, "shmemhost")
S_(SECCLASS_ASSOCIATION, ASSOCIATION__SENDTO, "sendto")
S_(SECCLASS_ASSOCIATION, ASSOCIATION__RECVFROM, "recvfrom")
-   S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELFROM, "relabelfrom")
-   S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELTO, "relabelto")
+   S_(SECCLASS_ASSOCIATION, ASSOCIATION__SETCONTEXT, "setcontext")
diff -puN security/selinux/include/av_permissions.h~lsm-relabel- 
nethooks security/selinux/include/av_permissions.h
--- linux-2.6.15-rc5/security/selinux/include/av_permissions.h~lsm- 
relabel-nethooks	2006-01-04 22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/security/selinux/include/av_permissions.h	 
2006-01-04 22:38:13.0 -0500

@@ -908,8 +908,7 @@

 #define ASSOCIATION__SENDTO   0x0001UL
 #define ASSOCIATION__RECVFROM 0x0002UL
-#define ASSOCIATION__RELABELFROM  0x0004UL
-#define ASSOCIATION__RELABELTO0x0008UL
+#define ASSOCIATION__SETCONTEXT   0x0004UL

 #define NETLINK_KOBJECT_UEVENT_SOCKET__IOCTL  0x0001UL
 #define NETLINK_KOBJECT_UEVENT_SOCKET__READ   0x0002UL
diff -puN security/selinux/include/av_inherit.h~lsm-relabel- 
nethooks security/selinux/include/av_inherit.h
diff -puN security/selinux/include/class_to_string.h~lsm-relabel- 
nethooks security/selinux/include/class_to_string.h
diff -puN security/selinux/include/common_perm_to_string.h~lsm- 
relabel-nethooks security/selinux/include/common_perm_to_string.h
diff -puN security/selinux/include/flask.h~lsm-relabel-nethooks  
security/selinux/include/flask.h
diff -puN security/selinux/include/initial_sid_to_string.h~lsm- 
relabel-nethooks security/selinux/include/initial_sid_to_string.h
diff -puN security/selinux/include/xfrm.h~lsm-relabel-nethooks  
security/selinux/include/xfrm.h
diff -puN security/selinux/xfrm.c~lsm-relabel-nethooks security/ 
selinux/xfrm.c
--- linux-2.6.15-rc5/security/selinux/xfrm.c~lsm-relabel-nethooks	 
2006-01-04 22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/security/selinux/xfrm.c	2006-01-04  
22:35:41.0 -0500

@@ -137,15 +137,9 @@ static int selinux_xfrm_sec_ctx_alloc(st
 	 * Must be permitted to relabel from default socket type (process  
type)

 * to specified context
 */
-   rc = avc_has_perm(tsec->sid, tsec->sid,
- SECCLASS_ASSOCIATION,
- ASSOCIATION__RELABELFROM, NULL);
-   if (rc)
-   goto out;
-
rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
  SECCLASS_ASSOCIATION,
- ASSOCIATION__RELABELTO, NULL);
+ ASSOCIATION__SETCONTEXT, NULL);
if (rc)
goto out;

_




Regards,
Trent.
--
Trent Jaeger, A

[PATCH 1/1] Corrections to LSM-IPSec Nethooks

2006-01-06 Thread Trent Jaeger
Hi,

This patch contains two corrections to the LSM-IPsec Nethooks patches
previously applied.  

(1) free a security context on a failed insert via xfrm_user 
interface in xfrm_add_policy.  Memory leak.

(2) change the authorization of the allocation of a security context
in a xfrm_policy or xfrm_state from both relabelfrom and relabelto 
to setcontext.

This is intended to be a correction to the 2.6.16 tree.

Regards,
Trent.
-
---

 net/xfrm/xfrm_user.c |1 +
 security/selinux/include/av_perm_to_string.h |3 +--
 security/selinux/include/av_permissions.h|3 +--
 security/selinux/xfrm.c  |8 +---
 4 files changed, 4 insertions(+), 11 deletions(-)

diff -puN include/linux/security.h~lsm-relabel-nethooks include/linux/security.h
diff -puN net/key/af_key.c~lsm-relabel-nethooks net/key/af_key.c
diff -puN net/xfrm/xfrm_user.c~lsm-relabel-nethooks net/xfrm/xfrm_user.c
--- linux-2.6.15-rc5/net/xfrm/xfrm_user.c~lsm-relabel-nethooks  2006-01-04 
22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/net/xfrm/xfrm_user.c  2006-01-05 10:36:04.0 
-0500
@@ -802,6 +802,7 @@ static int xfrm_add_policy(struct sk_buf
excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY;
err = xfrm_policy_insert(p->dir, xp, excl);
if (err) {
+   security_xfrm_policy_free(xp);
kfree(xp);
return err;
}
diff -puN security/dummy.c~lsm-relabel-nethooks security/dummy.c
diff -puN security/selinux/hooks.c~lsm-relabel-nethooks security/selinux/hooks.c
diff -puN security/selinux/include/av_perm_to_string.h~lsm-relabel-nethooks 
security/selinux/include/av_perm_to_string.h
--- 
linux-2.6.15-rc5/security/selinux/include/av_perm_to_string.h~lsm-relabel-nethooks
  2006-01-04 22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/security/selinux/include/av_perm_to_string.h  
2006-01-04 22:38:14.0 -0500
@@ -238,5 +238,4 @@
S_(SECCLASS_NSCD, NSCD__SHMEMHOST, "shmemhost")
S_(SECCLASS_ASSOCIATION, ASSOCIATION__SENDTO, "sendto")
S_(SECCLASS_ASSOCIATION, ASSOCIATION__RECVFROM, "recvfrom")
-   S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELFROM, "relabelfrom")
-   S_(SECCLASS_ASSOCIATION, ASSOCIATION__RELABELTO, "relabelto")
+   S_(SECCLASS_ASSOCIATION, ASSOCIATION__SETCONTEXT, "setcontext")
diff -puN security/selinux/include/av_permissions.h~lsm-relabel-nethooks 
security/selinux/include/av_permissions.h
--- 
linux-2.6.15-rc5/security/selinux/include/av_permissions.h~lsm-relabel-nethooks 
2006-01-04 22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/security/selinux/include/av_permissions.h 
2006-01-04 22:38:13.0 -0500
@@ -908,8 +908,7 @@
 
 #define ASSOCIATION__SENDTO   0x0001UL
 #define ASSOCIATION__RECVFROM 0x0002UL
-#define ASSOCIATION__RELABELFROM  0x0004UL
-#define ASSOCIATION__RELABELTO0x0008UL
+#define ASSOCIATION__SETCONTEXT   0x0004UL
 
 #define NETLINK_KOBJECT_UEVENT_SOCKET__IOCTL  0x0001UL
 #define NETLINK_KOBJECT_UEVENT_SOCKET__READ   0x0002UL
diff -puN security/selinux/include/av_inherit.h~lsm-relabel-nethooks 
security/selinux/include/av_inherit.h
diff -puN security/selinux/include/class_to_string.h~lsm-relabel-nethooks 
security/selinux/include/class_to_string.h
diff -puN security/selinux/include/common_perm_to_string.h~lsm-relabel-nethooks 
security/selinux/include/common_perm_to_string.h
diff -puN security/selinux/include/flask.h~lsm-relabel-nethooks 
security/selinux/include/flask.h
diff -puN security/selinux/include/initial_sid_to_string.h~lsm-relabel-nethooks 
security/selinux/include/initial_sid_to_string.h
diff -puN security/selinux/include/xfrm.h~lsm-relabel-nethooks 
security/selinux/include/xfrm.h
diff -puN security/selinux/xfrm.c~lsm-relabel-nethooks security/selinux/xfrm.c
--- linux-2.6.15-rc5/security/selinux/xfrm.c~lsm-relabel-nethooks   
2006-01-04 22:35:41.0 -0500
+++ linux-2.6.15-rc5-root/security/selinux/xfrm.c   2006-01-04 
22:35:41.0 -0500
@@ -137,15 +137,9 @@ static int selinux_xfrm_sec_ctx_alloc(st
 * Must be permitted to relabel from default socket type (process type)
 * to specified context
 */
-   rc = avc_has_perm(tsec->sid, tsec->sid,
- SECCLASS_ASSOCIATION,
- ASSOCIATION__RELABELFROM, NULL);
-   if (rc)
-   goto out;
-
rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
  SECCLASS_ASSOCIATION,
- ASSOCIATION__RELABELTO, NULL);
+ ASSOCIATION__SETCONTEXT, NULL);
if (rc)
goto out;
 
_


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Alan Cox
On Gwe, 2006-01-06 at 15:00 +0100, Eric Dumazet wrote:
> In the case of call_rcu_bh(), you can be sure that the caller cannot afford 
> 'sleeping memory allocations'. Better drop a frame than block the stack, no ?

atomic allocations can't sleep and will fail which is fine. If memory
allocation pressure exists for sleeping allocations because of a large
rcu backlog we want to be sure that the rcu backlog from the networking
stack or other sources does not cause us to OOM kill or take incorrect
action.

So if for example we want to grow a process stack and the memory is
there just stuck in the RCU lists pending recovery we want to let the
RCU recovery happen before making drastic decisions.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Newbie question

2006-01-06 Thread Johannes Berg
On Fri, 2006-01-06 at 12:38 -0200, Alan Menegotto wrote:
> Look at the code:
> 
> static int __init ipv4_proc_init(void)
> {
>  int rc = 0;
> 
>  if (raw_proc_init())
>  goto out_raw;
>  if (tcp4_proc_init())
>  goto out_tcp;
>  if (udp4_proc_init())
>  goto out_udp;
>  if (fib_proc_init())
>  goto out_fib;
>  if (ip_misc_proc_init())
>  goto out_misc;
> out:
>  return rc;
> out_misc:
>  fib_proc_exit();
> out_fib:
>  udp4_proc_exit();
> out_udp:
>  tcp4_proc_exit();
> out_tcp:
>  raw_proc_exit();
> out_raw:
>  rc = -ENOMEM;
>  goto out;
> }
> 
> Calling tcp4_proc_init should go to label out_tcp, which call 
> raw_proc_exit(). Is this correct? If yes, why?

It's symmetric. If raw_proc_init() fails, no cleanup needs to be done so
you go to out_raw. If tcp4_proc_init fails, then raw_proc_init() did
*not* fail and needs to be cleaned up after by calling raw_proc_exit().
etc.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet

Alan Cox a écrit :

On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest 
entry cannot still be in use by another CPU. This might sounds as a violation 
of RCU rules, (I'm not an RCU expert) but seems quite reasonable.


Fixing the real problem in the routing code would be the real fix. 



So far nobody succeeded in 'fixing the routing code', few people can even read 
the code from the first line to the last one...


I think this code is not buggy, it only makes general RCU assumptions about 
delayed freeing of dst entries. In some cases, the general assumptions are 
just wrong. We can fix it at RCU level, and future users of call_rcu_bh() wont 
have to think *hard* about 'general assumptions'.


Of course, we can ignore the RCU problem and mark somewhere on a sticker: 
***DONT USE OR RISK CRASHES***

***USE IT ONLY FOR FUN***


The underlying problem of RCU and memory usage could be solved more
safely by making sure that the sleeping memory allocator path always
waits until at least one RCU cleanup has occurred after it fails an
allocation before it starts trying harder. That ought to also naturally
throttle memory consumers more in the situation which is the right
behaviour.



In the case of call_rcu_bh(), you can be sure that the caller cannot afford 
'sleeping memory allocations'. Better drop a frame than block the stack, no ?


Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Change sk_run_filter()'s return type in net/core/filter.c

2006-01-06 Thread Kris Katterjohn
From: Patrick McHardy
Sent: 1/6/2006 1:36:24 AM
> Please use unsigned int not just unsigned.

Ta-da!

--- x/net/core/filter.c 2006-01-05 12:27:17.0 -0600
+++ y/net/core/filter.c 2006-01-05 17:02:32.0 -0600
@@ -75,7 +75,7 @@ static inline void *load_pointer(struct 
  * len is the number of filter blocks in the array.
  */
  
-int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen)
+unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, 
int flen)
 {
struct sock_filter *fentry; /* We walk down these */
void *ptr;
@@ -241,9 +241,9 @@ load_b:
A = X;
continue;
case BPF_RET|BPF_K:
-   return ((unsigned int)fentry->k);
+   return fentry->k;
case BPF_RET|BPF_A:
-   return ((unsigned int)A);
+   return A;
case BPF_ST:
mem[fentry->k] = A;
continue;

--- x/include/linux/filter.h2006-01-02 21:21:10.0 -0600
+++ y/include/linux/filter.h2006-01-05 17:02:58.0 -0600
@@ -143,7 +143,7 @@ static inline unsigned int sk_filter_len
 struct sk_buff;
 struct sock;
 
-extern int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int 
flen);
+extern unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter 
*filter, int flen);
 extern int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
 extern int sk_chk_filter(struct sock_filter *filter, int flen);
 #endif /* __KERNEL__ */

--- x/include/net/sock.h2006-01-05 23:06:00.0 -0600
+++ y/include/net/sock.h2006-01-05 23:06:06.0 -0600
@@ -856,8 +856,8 @@ static inline int sk_filter(struct sock 

filter = sk->sk_filter;
if (filter) {
-   int pkt_len = sk_run_filter(skb, filter->insns,
-   filter->len);
+   unsigned int pkt_len = sk_run_filter(skb, filter->insns,
+filter->len);
if (!pkt_len)
err = -EPERM;
else


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Alan Cox
On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
> I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest 
> entry cannot still be in use by another CPU. This might sounds as a violation 
> of RCU rules, (I'm not an RCU expert) but seems quite reasonable.

Fixing the real problem in the routing code would be the real fix. 

The underlying problem of RCU and memory usage could be solved more
safely by making sure that the sleeping memory allocator path always
waits until at least one RCU cleanup has occurred after it fails an
allocation before it starts trying harder. That ought to also naturally
throttle memory consumers more in the situation which is the right
behaviour.

Alan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet

Andi Kleen a écrit :

On Friday 06 January 2006 11:17, Eric Dumazet wrote:


I assume that if a CPU queued 10.000 items in its RCU queue, then the
oldest entry cannot still be in use by another CPU. This might sounds as a
violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable.


I don't think it's a good assumption. Another CPU might be stuck in a long 
running interrupt, and still have a reference in the code running below

the interrupt handler.

And in general letting correctness depend on magic numbers like this is 
very nasty.




I agree Andi, I posted a 2nd version of the patch with no more assumptions.

Eric


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Andi Kleen
On Friday 06 January 2006 11:17, Eric Dumazet wrote:

>
> I assume that if a CPU queued 10.000 items in its RCU queue, then the
> oldest entry cannot still be in use by another CPU. This might sounds as a
> violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable.

I don't think it's a good assumption. Another CPU might be stuck in a long 
running interrupt, and still have a reference in the code running below
the interrupt handler.

And in general letting correctness depend on magic numbers like this is 
very nasty.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency (Version 2), HOTPLUG_CPU fix

2006-01-06 Thread Eric Dumazet

First patch was buggy, sorry :(

This 2nd version makes no more RCU assumptions, because only the 'donelist' 
queue is fetched for an item to be deleted. Items from the donelist are ready 
to be freed.


This V2 also corrects a problem in case of a CPU hotplug, we forgot to update 
the ->count variable when transfering a queue to another one.


-
In order to avoid some OOM triggered by a flood of call_rcu() calls, we 
increased in linux 2.6.14 maxbatch from 10 to 1, and conditionally call 
set_need_resched() in call_rcu().


This solution doesnt solve all the problems and has drawbacks.

1) Using a big maxbatch has a bad impact on latency.
2) A flood of call_rcu_bh() still can OOM

I have some servers that once in a while crashes when the ip route cache is 
flushed. After raising /proc/sys/net/ipv4/route/secret_interval (so that *no* 
flush is done), I got better uptime for these servers. But in some cases I 
think the network stack can floods call_rcu_bh(), and a fatal OOM occurs.


I suggest in this patch :

1) To lower maxbatch to a more reasonable value (as far as the latency is 
concerned)


2) To be able to guard a RCU cpu queue against a maximal count (10.000 for 
example). If this limit is reached, free the oldest entry (if available from 
the donelist queue).


3) Bug correction in __rcu_offline_cpu() where we forgot to adjust ->count 
field when transfering a queue to another one.


In my stress tests, I could not reproduce OOM anymore after applying this patch.

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
--- linux-2.6.15/kernel/rcupdate.c  2006-01-03 04:21:10.0 +0100
+++ linux-2.6.15-edum/kernel/rcupdate.c 2006-01-06 13:32:02.0 +0100
@@ -71,14 +71,14 @@
 
 /* Fake initialization required by compiler */
 static DEFINE_PER_CPU(struct tasklet_struct, rcu_tasklet) = {NULL};
-static int maxbatch = 1;
+static int maxbatch = 100;
 
 #ifndef __HAVE_ARCH_CMPXCHG
 /*
  * We use an array of spinlocks for the rcurefs -- similar to ones in sparc
  * 32 bit atomic_t implementations, and a hash function similar to that
  * for our refcounting needs.
- * Can't help multiprocessors which donot have cmpxchg :(
+ * Can't help multiprocessors which dont have cmpxchg :(
  */
 
 spinlock_t __rcuref_hash[RCUREF_HASH_SIZE] = {
@@ -110,9 +110,19 @@
*rdp->nxttail = head;
rdp->nxttail = &head->next;
 
-   if (unlikely(++rdp->count > 1))
-   set_need_resched();
-
+/*
+ * OOM avoidance : If we queued too many items in this queue,
+ *  free the oldest entry (from the donelist only to respect
+ *  RCU constraints)
+ */
+   if (unlikely(++rdp->count > 1 && (head = rdp->donelist))) {
+   rdp->count--;
+   rdp->donelist = head->next;
+   if (!rdp->donelist)
+   rdp->donetail = &rdp->donelist;
+   local_irq_restore(flags);
+   return head->func(head);
+   }
local_irq_restore(flags);
 }
 
@@ -148,12 +158,19 @@
rdp = &__get_cpu_var(rcu_bh_data);
*rdp->nxttail = head;
rdp->nxttail = &head->next;
-   rdp->count++;
 /*
- *  Should we directly call rcu_do_batch() here ?
- *  if (unlikely(rdp->count > 1))
- *  rcu_do_batch(rdp);
+ * OOM avoidance : If we queued too many items in this queue,
+ *  free the oldest entry (from the donelist only to respect
+ *  RCU constraints)
  */
+   if (unlikely(++rdp->count > 1 && (head = rdp->donelist))) {
+   rdp->count--;
+   rdp->donelist = head->next;
+   if (!rdp->donelist)
+   rdp->donetail = &rdp->donelist;
+   local_irq_restore(flags);
+   return head->func(head);
+   }
local_irq_restore(flags);
 }
 
@@ -208,19 +225,20 @@
  */
 static void rcu_do_batch(struct rcu_data *rdp)
 {
-   struct rcu_head *next, *list;
-   int count = 0;
+   struct rcu_head *next = NULL, *list;
+   int count = maxbatch;
 
list = rdp->donelist;
while (list) {
-   next = rdp->donelist = list->next;
+   next = list->next;
list->func(list);
list = next;
rdp->count--;
-   if (++count >= maxbatch)
+   if (--count <= 0)
break;
}
-   if (!rdp->donelist)
+   rdp->donelist = next;
+   if (!next)
rdp->donetail = &rdp->donelist;
else
tasklet_schedule(&per_cpu(rcu_tasklet, rdp->cpu));
@@ -344,11 +362,9 @@
 static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list,
struct rcu_head **tail)
 {
-   local_irq_disable();
*this_rdp->nxttail = list;
if (list)
this_rdp->nxttail = tail;
-   local_irq_enable();
 }
 
 static void __rcu_offline_cpu(struct rcu_data *this_rdp,
@@ 

Re: State of the Union: Wireless

2006-01-06 Thread Johannes Berg
On Fri, 2006-01-06 at 13:48 +0100, Stefan Rompf wrote:

> With hardware like prism2 usb that gets "don't touch me now mode" for a while 
> after a join command is issued, current API requires a driver to delay 
> starting an association in order to wait if other config requests are issued 
> - an ugly hack.

So that settles the 'need to change multiple settings at once' issue,
saying that yes, it is indeed required.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Patrick McHardy

Marcel Holtmann wrote:


I just personally liked the idea of having a device node in /dev for
every existing hardware wlan card. Like we have device nodes for
other real hardware, too. It felt like a bit of a "unix way" to do
this to me. I don't say this is the way to go.
If a netlink socket is used (which is possible, for sure), we stay with
the old way of having no device node in /dev for networking devices.
That is ok. But that is really only an implementation detail (and for sure
a matter of taste).



At the OLS last year, I think the consensus was to use netlink for all
configuration task. However this was mainly driven by Harald Welte and
he might be able to talk about the pros and cons of netlink versus a
character device.


I think the main advantages of netlink over a character device is its
flexible format, which is easily extendable, and multicast capability,
which can be used to broadcast events and configuration changes. Its
also good to have all the net stuff accessible in a uniform way.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of the Union: Wireless

2006-01-06 Thread Stefan Rompf
Am Freitag 06 Januar 2006 12:46 schrieb Dominik Brodowski:

> From someone who has no idea at all (yet) about 802.11: why character
> device, and not sysfs or configfs files? Like

sysfs shares the main problem with wireless extensions: It configures one 
value per file / per ioctl. Setting up a wireless card to associate or form 
an IBSS network consists of multiple parameters, many requiring the card to 
disasscociate.

With hardware like prism2 usb that gets "don't touch me now mode" for a while 
after a join command is issued, current API requires a driver to delay 
starting an association in order to wait if other config requests are issued 
- an ugly hack.

I vote for netlink. It's a defined and tested interface and has all features 
needed to set multiple values in one transaction.

Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of the Union: Wireless

2006-01-06 Thread Johannes Berg

> From someone who has no idea at all (yet) about 802.11: why character
> device, and not sysfs or configfs files? Like

As Michael already said -- there's no real reason for that. We were just
brainstorming. The /dev idea seemed like a good plan at first, but then
it isn't fixed. What you suggested below does look useful too.

Coming back to the point Michael already raised: the overarching idea is
to get rid of the net_dev for the 'master' device, even if the
underlying hardware supports only a single virtual device (which might
then be created by the driver automatically)

I'll move the wiki pages a bit to accomodate different models, please
check in a few minutes.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Marcel Holtmann
Hi Michael,

> > > How would the virtual interfaces look like? That is quite easy to answer.
> > > They are net_devices, as they transfer data.
> > > They should probaly _not_ be on top of the ethernet, as 80211 does not
> > > have very much in common with ethernet. Basically they share the same
> > > MAC address format. Does someone have another thing, which he thinks
> > > is shared?
> > > How would the master interface look like? A somewhat unusual idea came
> > > up. Using a device node in /dev. So every wireless card in the system
> > > would have a node in /dev associated (/dev/wlan0 for example).
> > > A node for the master device would be ok, because no data is transferred
> > > through it. It is only a configuration interface.
> > > So you would tell the, yet-to-be-written userspace tool wconfig (or 
> > > something
> > > like that) "I need a STA in INFRA mode and want to drive it on the
> > > wlan0 card". So wconfig goes and write()s some data to /dev/wlan0
> > > telling the 80211 code to setup a virtual net_device for the driver
> > > associated to /dev/wlan0.
> > > The virtual interface is then configured though /dev/wlan0 using write()
> > > (no ugly ioctl anymore, you see...). Config data like TX rate,
> > > current essid, basically everything + xyz which is done by WE today,
> > > is written to /dev/wlan0.
> > > This config data is entirely cached in the 80211 code for the /dev/wlan0
> > > instance. This is important, to have the data persistent throughout
> > > suspend/resume cycles, if up/down cycles.
> > > After configuring, a virtual net_device (let's call it wlan0) exists,
> > > which can be brought up by ifconfig and data can be transferred though
> > > it as usual.
> > 
> > what is wrong with using netlink and/or sysfs for it? I don't see the
> > advantage of defining another /dev something interface.
> 
> Nothing is wrong with that.
> "brainstorming" was the most dominant word in the whole text. ;)

so I might got the wrong impression, because it seemed you put a lot of
thinking into the /dev/wlanX stuff without even considering netlink or
something else.

> I just personally liked the idea of having a device node in /dev for
> every existing hardware wlan card. Like we have device nodes for
> other real hardware, too. It felt like a bit of a "unix way" to do
> this to me. I don't say this is the way to go.
> If a netlink socket is used (which is possible, for sure), we stay with
> the old way of having no device node in /dev for networking devices.
> That is ok. But that is really only an implementation detail (and for sure
> a matter of taste).

At the OLS last year, I think the consensus was to use netlink for all
configuration task. However this was mainly driven by Harald Welte and
he might be able to talk about the pros and cons of netlink versus a
character device.

> The _real_ main point I wanted to make was to _not_ use a net_device for
> the master device. What else should be used for master device, let it
> be a device node or a netlink socket, is rather unimportant at
> this stage.

I am all for it, because I don't like dummy Ethernet devices that are
only used for configuration. I am still not happy that IrDA uses irda0
to get somekind of packet management etc. instead of implementing a real
suitable hardware abstraction.

Regards

Marcel


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Michael Buesch
On Friday 06 January 2006 12:38, you wrote:
> Hi Michael,
> 
> > How would the virtual interfaces look like? That is quite easy to answer.
> > They are net_devices, as they transfer data.
> > They should probaly _not_ be on top of the ethernet, as 80211 does not
> > have very much in common with ethernet. Basically they share the same
> > MAC address format. Does someone have another thing, which he thinks
> > is shared?
> > How would the master interface look like? A somewhat unusual idea came
> > up. Using a device node in /dev. So every wireless card in the system
> > would have a node in /dev associated (/dev/wlan0 for example).
> > A node for the master device would be ok, because no data is transferred
> > through it. It is only a configuration interface.
> > So you would tell the, yet-to-be-written userspace tool wconfig (or 
> > something
> > like that) "I need a STA in INFRA mode and want to drive it on the
> > wlan0 card". So wconfig goes and write()s some data to /dev/wlan0
> > telling the 80211 code to setup a virtual net_device for the driver
> > associated to /dev/wlan0.
> > The virtual interface is then configured though /dev/wlan0 using write()
> > (no ugly ioctl anymore, you see...). Config data like TX rate,
> > current essid, basically everything + xyz which is done by WE today,
> > is written to /dev/wlan0.
> > This config data is entirely cached in the 80211 code for the /dev/wlan0
> > instance. This is important, to have the data persistent throughout
> > suspend/resume cycles, if up/down cycles.
> > After configuring, a virtual net_device (let's call it wlan0) exists,
> > which can be brought up by ifconfig and data can be transferred though
> > it as usual.
> 
> what is wrong with using netlink and/or sysfs for it? I don't see the
> advantage of defining another /dev something interface.

Nothing is wrong with that.
"brainstorming" was the most dominant word in the whole text. ;)
I just personally liked the idea of having a device node in /dev for
every existing hardware wlan card. Like we have device nodes for
other real hardware, too. It felt like a bit of a "unix way" to do
this to me. I don't say this is the way to go.
If a netlink socket is used (which is possible, for sure), we stay with
the old way of having no device node in /dev for networking devices.
That is ok. But that is really only an implementation detail (and for sure
a matter of taste).
The _real_ main point I wanted to make was to _not_ use a net_device for
the master device. What else should be used for master device, let it
be a device node or a netlink socket, is rather unimportant at
this stage.

-- 
Greetings Michael.


pgppakQZ5rqcz.pgp
Description: PGP signature


Re: State of the Union: Wireless

2006-01-06 Thread Dominik Brodowski
On Fri, Jan 06, 2006 at 12:31:24PM +0100, Johannes Berg wrote:
> On Fri, 2006-01-06 at 12:00 +0100, Michael Buesch wrote:
> 
> > * "master" interface as real device node
> > * Virtual interfaces (net_devices)
> 
> I didn't want to spam the netdev wiki with this (yet) so I collected
> some more structured things outside. Anyone feel free to edit:
> http://softmac.sipsolutions.net/802.11

>From someone who has no idea at all (yet) about 802.11: why character
device, and not sysfs or configfs files? Like

TASK: get list of MAC addresses available to hardware device (usually only one 
for current hw)

cat /sys/devices/path/to/device/wireless/address

TASK: get list of virtual devices including (some of) their properties

ls -l /sys/devices/path/to/device/wireless/
...
wlan0 -> /sys/class/net/wlan0
wlan1 -> /sys/class/net/wlan1

TASK: create virtual device (with arbitrary type, netdev name and mac address)
  ^^
   isn't nameif / udev for that?

echo "$type" > /sys/devices/path/to/device/wireless/new_if
... we get uevents for this new interface; in this we can set the
mac adress doing:
echo "$mac" > /sys/class/net/wlan0/wireless/address

TASK: configure virtual device (key is the device name since that needs to be 
unique anyway) 

echo "$some_config_option_for_virtual_device" > 
/sys/class/net/wlan0/wireless/some_option
echo "$some_config_option_for_physical_device"> 
/sys/devices/path/to/dev/wireless/some_other_option


Of course the configuration userspace tool would use libsysfs for that, not
"echo" scripts... but they'd work too.

Dominik
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Marcel Holtmann
Hi Michael,

> How would the virtual interfaces look like? That is quite easy to answer.
> They are net_devices, as they transfer data.
> They should probaly _not_ be on top of the ethernet, as 80211 does not
> have very much in common with ethernet. Basically they share the same
> MAC address format. Does someone have another thing, which he thinks
> is shared?
> How would the master interface look like? A somewhat unusual idea came
> up. Using a device node in /dev. So every wireless card in the system
> would have a node in /dev associated (/dev/wlan0 for example).
> A node for the master device would be ok, because no data is transferred
> through it. It is only a configuration interface.
> So you would tell the, yet-to-be-written userspace tool wconfig (or something
> like that) "I need a STA in INFRA mode and want to drive it on the
> wlan0 card". So wconfig goes and write()s some data to /dev/wlan0
> telling the 80211 code to setup a virtual net_device for the driver
> associated to /dev/wlan0.
> The virtual interface is then configured though /dev/wlan0 using write()
> (no ugly ioctl anymore, you see...). Config data like TX rate,
> current essid, basically everything + xyz which is done by WE today,
> is written to /dev/wlan0.
> This config data is entirely cached in the 80211 code for the /dev/wlan0
> instance. This is important, to have the data persistent throughout
> suspend/resume cycles, if up/down cycles.
> After configuring, a virtual net_device (let's call it wlan0) exists,
> which can be brought up by ifconfig and data can be transferred though
> it as usual.

what is wrong with using netlink and/or sysfs for it? I don't see the
advantage of defining another /dev something interface.

Regards

Marcel


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of the Union: Wireless

2006-01-06 Thread Johannes Berg
On Fri, 2006-01-06 at 12:00 +0100, Michael Buesch wrote:

> * "master" interface as real device node
> * Virtual interfaces (net_devices)

I didn't want to spam the netdev wiki with this (yet) so I collected
some more structured things outside. Anyone feel free to edit:
http://softmac.sipsolutions.net/802.11

I'll move that content to the netdev wiki if anyone else thinks it would
be a good way forward to start with requirements, API issues and
similar.

Until we get there, we'll fix up softmac to make it usable for most
people in basic station mode without any kind of virtual devices, which
will need some slight changes to the current ieee80211.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [Bcm43xx-dev] [Fwd: State of the Union: Wireless]

2006-01-06 Thread Michael Buesch
> > * We really have no wireless maintainer.  I'm just the defacto guy,
> >   with no interest in the job.  The ideal maintainer knows 802.11 well,
> >   uses git, and isn't an asshole with no taste.  I'm just the guy who
> >   wants to make sure the net driver portion doesn't turn out to be a
> >   stinker (read: review and pass up the chain).

That problem is easiest to solve. ;)

> > * Wireless management, in particular the wireless kernel<->user
> >   interface, needs some thinking.  Wireless Extensions (WE) isn't
> >   cutting it, but I haven't seen any netlink work yet (or some
> >   other interface).  Whatever the userspace interface is, it will be
> >   basically carved in stone for years (unlike kernel APIs), so this
> >   needs a lot more thought than people have been giving it.

We did some brainstorming about this yesterday evening on the bcm
irc channel. I think we all agreed on dropping WE.
So, now we asked: How would a sane UI look like. We had a few points:
* The interface needs to support some kind of "master" interface to
configure the hardware, 80211 parameters and
to actually configure and setup the
* Virtual interfaces.
Data is transferred only though the virtual interfaces, which could
be an AP interface, a STA interface in INFRA or Ad-Hoc mode, etc... .
Configuration is done though the master interface.

How would the virtual interfaces look like? That is quite easy to answer.
They are net_devices, as they transfer data.
They should probaly _not_ be on top of the ethernet, as 80211 does not
have very much in common with ethernet. Basically they share the same
MAC address format. Does someone have another thing, which he thinks
is shared?
How would the master interface look like? A somewhat unusual idea came
up. Using a device node in /dev. So every wireless card in the system
would have a node in /dev associated (/dev/wlan0 for example).
A node for the master device would be ok, because no data is transferred
through it. It is only a configuration interface.
So you would tell the, yet-to-be-written userspace tool wconfig (or something
like that) "I need a STA in INFRA mode and want to drive it on the
wlan0 card". So wconfig goes and write()s some data to /dev/wlan0
telling the 80211 code to setup a virtual net_device for the driver
associated to /dev/wlan0.
The virtual interface is then configured though /dev/wlan0 using write()
(no ugly ioctl anymore, you see...). Config data like TX rate,
current essid, basically everything + xyz which is done by WE today,
is written to /dev/wlan0.
This config data is entirely cached in the 80211 code for the /dev/wlan0
instance. This is important, to have the data persistent throughout
suspend/resume cycles, if up/down cycles.
After configuring, a virtual net_device (let's call it wlan0) exists,
which can be brought up by ifconfig and data can be transferred though
it as usual.

This whole concept is derived from how dscape does the stuff.
With a major exception, that a device node instead of a net_device
is used for the master device. With the effect of getting rid of the
ugly WE ioctl stuff.

> > * Long term, wireless should go from being a library of common code to a
> >   "real" wireless stack, as shown in the template developed by David Miller:
> >   
> > http://kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.6/davem-p80211.tar.bz2
> >   Zhu Yi @ Intel and Vladmir @ somewhere both independently did some
> >   work in this area.

This looks very interresting and in fact is part of our thoughts I
explained above.

> > * I prefer GPL-only code.  Dual licensing has proven in practice to
> >   be a logistical nightmare that concentrates power in the hands of
> >   a few.  Dual licensing, BSD licensing works for some, but GPL-only
> >   code is quite simply the least amount of flamewars, headaches
> >   and worry.  IOW, the P.I.T.A. level of GPL-only code is lowest.

I personally prefer EXPORT_SYMBOL_GPL().
But that's only my opinion and that does not really matter. ;)

> > Dual licensed code gives kernel hackers yet more legal crapola to
> > worry about, which is never a good thing.

I don't see a point in dual licensing it.
The only benefit would be to allow BSD people to take the code.
Honestly, I really don't see this happening, anyway. ;)
They have net80211.

> > Patches welcome from all motivated, clueful parties.  Jiri Benc has a
> > long series of patches that looks nice.  Johannes Berg has done some
> > work on the ieee80211 softmac stuff and hw WEP.  But maybe DeviceScape
> > is what people like now.

Well, "like" is a strong word. I personally would say "It is better than
all currently existing solutions, if some final polishing is done to dscape."

-- 
Greetings Michael.


pgpUD0unGABZ1.pgp
Description: PGP signature


[PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet


In order to avoid some OOM triggered by a flood of call_rcu() calls, we 
increased in linux 2.6.14 maxbatch from 10 to 1, and conditionally call 
set_need_resched() in call_rcu().


This solution doesnt solve all the problems and has drawbacks.

1) Using a big maxbatch has a bad impact on latency.
2) A flood of call_rcu_bh() still can OOM

I have some servers that once in a while crashes when the ip route cache is 
flushed. After raising /proc/sys/net/ipv4/route/secret_interval (so that *no* 
flush is done), I got better uptime for these servers. But in some cases I 
think the network stack can floods call_rcu_bh(), and a fatal OOM occurs.


I suggest in this patch :

1) To lower maxbatch to a more reasonable value (as far as the latency is 
concerned)


2) To be able to guard a RCU cpu queue against a maximal count (10.000 for 
example). If this limit is reached, free the oldest entry of this queue.


I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest 
entry cannot still be in use by another CPU. This might sounds as a violation 
of RCU rules, (I'm not an RCU expert) but seems quite reasonable.



Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
--- linux-2.6.15/kernel/rcupdate.c  2006-01-03 04:21:10.0 +0100
+++ linux-2.6.15-edum/kernel/rcupdate.c 2006-01-06 11:10:45.0 +0100
@@ -71,14 +71,14 @@
 
 /* Fake initialization required by compiler */
 static DEFINE_PER_CPU(struct tasklet_struct, rcu_tasklet) = {NULL};
-static int maxbatch = 1;
+static int maxbatch = 100;
 
 #ifndef __HAVE_ARCH_CMPXCHG
 /*
  * We use an array of spinlocks for the rcurefs -- similar to ones in sparc
  * 32 bit atomic_t implementations, and a hash function similar to that
  * for our refcounting needs.
- * Can't help multiprocessors which donot have cmpxchg :(
+ * Can't help multiprocessors which dont have cmpxchg :(
  */
 
 spinlock_t __rcuref_hash[RCUREF_HASH_SIZE] = {
@@ -110,9 +110,17 @@
*rdp->nxttail = head;
rdp->nxttail = &head->next;
 
-   if (unlikely(++rdp->count > 1))
-   set_need_resched();
-
+/*
+ * OOM avoidance : If we queued too many items in this queue,
+ *  free the oldest entry
+ */
+   if (unlikely(++rdp->count > 1)) {
+   rdp->count--;
+   head = rdp->donelist;
+   rdp->donelist = head->next;
+   local_irq_restore(flags);
+   return head->func(head);
+   }
local_irq_restore(flags);
 }
 
@@ -148,12 +156,17 @@
rdp = &__get_cpu_var(rcu_bh_data);
*rdp->nxttail = head;
rdp->nxttail = &head->next;
-   rdp->count++;
 /*
- *  Should we directly call rcu_do_batch() here ?
- *  if (unlikely(rdp->count > 1))
- *  rcu_do_batch(rdp);
+ * OOM avoidance : If we queued too many items in this queue,
+ *  free the oldest entry
  */
+   if (unlikely(++rdp->count > 1)) {
+   rdp->count--;
+   head = rdp->donelist;
+   rdp->donelist = head->next;
+   local_irq_restore(flags);
+   return head->func(head);
+   }
local_irq_restore(flags);
 }
 
@@ -209,7 +222,7 @@
 static void rcu_do_batch(struct rcu_data *rdp)
 {
struct rcu_head *next, *list;
-   int count = 0;
+   int count = maxbatch;
 
list = rdp->donelist;
while (list) {
@@ -217,7 +230,7 @@
list->func(list);
list = next;
rdp->count--;
-   if (++count >= maxbatch)
+   if (--count <= 0)
break;
}
if (!rdp->donelist)


Re: [PATCH] Change sk_run_filter()'s return type in net/core/filter.c

2006-01-06 Thread Patrick McHardy

Kris Katterjohn wrote:

Whoops! Here you go:



Whoops again. Screwed that last patch up. I gotta stop doing this stuff when I'm
tired and I need to check myself :)

Sorry. Again.

--- x/net/core/filter.c 2006-01-05 12:27:17.0 -0600
+++ y/net/core/filter.c 2006-01-05 17:02:32.0 -0600
@@ -75,7 +75,7 @@ static inline void *load_pointer(struct 
  * len is the number of filter blocks in the array.

  */
  
-int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen)

+unsigned sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int 
flen)


Please use unsigned int not just unsigned.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html