Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.

2007-03-14 Thread Ravikiran G Thirumalai
On Mon, Mar 12, 2007 at 01:56:13PM -0700, David Miller wrote:
 From: Pekka J Enberg [EMAIL PROTECTED]
 Date: Mon, 12 Mar 2007 14:15:16 +0200 (EET)
 
  On 3/9/07, David Miller [EMAIL PROTECTED] wrote:
   The whole cahce-multipath subsystem has to have it's guts revamped for
   proper error handling.
  
  (Untested patch follows.)
 
 I'm not accepting untested patches, if people want to fix this
 code it cannot be done in a shoddy manner any more, it will
 need to be fixed and tested by people who really care about
 this code.

Hi Dave,
We do care for multipath cached option.  We do use it often with node
aware multipath.  We haven't experienced oopsen or crashes for our use cases
since the past year.  We will test Amit and Pekka's patch with fault injection.
We will also go through the core multipath-cached code and give a try at
cleaning it up.

 
 We need real care for this code or I will remove it in 2.6.23

Can you please point us to any open bug reports on multipath cached?  We did 
a Call for testing on netdev sometime back but did not get any bug reports.  
http://marc.info/?l=linux-netdevm=114827370231872w=2

The kernel bugzilla shows zaroo boogs for multipath cached as well.

Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] add dev_to_node()

2006-11-06 Thread Ravikiran G Thirumalai
On Sun, Nov 05, 2006 at 12:53:23AM +0100, Christoph Hellwig wrote:
 On Sat, Nov 04, 2006 at 06:06:48PM -0500, Dave Jones wrote:
  On Sat, Nov 04, 2006 at 11:56:29PM +0100, Christoph Hellwig wrote:
  
  This will break the compile for !NUMA if someone ends up doing a bisect
  and lands here as a bisect point.
  
  You introduce this nice wrapper..
 
 The dev_to_node wrapper is not enough as we can't assign to (-1) for
 the non-NUMA case.  So I added a second macro, set_dev_node for that.
 
 The patch below compiles and works on numa and non-NUMA platforms.
 
 

Hi Christoph,
dev_to_node does not work as expected on x86_64 (and i386).  This is because
node value returned by pcibus_to_node is initialized after a struct device
is created with current x86_64 code.

We need the node value initialized before the call to pci_scan_bus_parented,
as the generic devices are allocated and initialized
off pci_scan_child_bus, which gets called from pci_scan_bus_parented
The following patch does that using pci_sysdata introduced by the PCI
domain patches in -mm.

Signed-off-by: Alok N Kataria [EMAIL PROTECTED]
Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off-by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.19-rc4mm2/arch/i386/pci/acpi.c
===
--- linux-2.6.19-rc4mm2.orig/arch/i386/pci/acpi.c   2006-11-06 
11:03:50.0 -0800
+++ linux-2.6.19-rc4mm2/arch/i386/pci/acpi.c2006-11-06 22:04:14.0 
-0800
@@ -9,6 +9,7 @@ struct pci_bus * __devinit pci_acpi_scan
 {
struct pci_bus *bus;
struct pci_sysdata *sd;
+   int pxm;
 
/* Allocate per-root-bus (not per bus) arch-specific data.
 * TODO: leak; this memory is never freed.
@@ -30,15 +31,21 @@ struct pci_bus * __devinit pci_acpi_scan
}
 #endif /* CONFIG_PCI_DOMAINS */
 
+   sd-node = -1;
+
+   pxm = acpi_get_pxm(device-handle);
+#ifdef CONFIG_ACPI_NUMA
+   if (pxm = 0)
+   sd-node = pxm_to_node(pxm);
+#endif
+
bus = pci_scan_bus_parented(NULL, busnum, pci_root_ops, sd);
if (!bus)
kfree(sd);
 
 #ifdef CONFIG_ACPI_NUMA
if (bus != NULL) {
-   int pxm = acpi_get_pxm(device-handle);
if (pxm = 0) {
-   sd-node = pxm_to_node(pxm);
printk(bus %d - pxm %d - node %d\n,
busnum, pxm, sd-node);
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[CFT] CONFIG_IP_ROUTE_MULTIPATH_CACHED [was Re: [PATCH] Disable multipath cache routing ]

2006-05-21 Thread Ravikiran G Thirumalai
On Sat, May 20, 2006 at 01:51:45PM -0700, Chris Wedgwood wrote:
 ...
 Anyhow, the code as-is hasn't been maintained for a long time except
 for a few minor blips (I'm using hg's annotate to find those and have
 included those people on the cc' list as presumably there are using
 these features and might have useful input).

We are interested in keeping CONFIG_IP_ROUTE_MULTIPATH_CACHED,
as this enables us to do node aware multipathing
http://lkml.org/lkml/2006/3/22/13
Hence, we are also interested in stabilizing MULTIPATH_CACHED.  Please send
bug reports with the test case scenario to myself and Pravin at 
[EMAIL PROTECTED] and [EMAIL PROTECTED]

Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/2] net: Node aware multipath device round robin

2006-03-21 Thread Ravikiran G Thirumalai
Following patch adds in node aware, device round robin ip multipathing.  
It is based on multipath_drr.c, the multipath device round robin algorithm, and
is derived from it.  This implementation maintians per node state table, and 
round robins between interfaces on the same node.  The implementation needs to 
be aware of the NIC proximity to a node.  Hence we have added a nodeid field to 
struct netdevice.  NIC device drivers can initialize this with the node id 
the NIC belongs to.  This patch uses IP_MP_ALG_DRR slot like the regular 
multipath_drr too.  So either SMP multipath_drr or node aware 
multipath_node_drr should be used for device round robin, based on system having
proximity information for the NICs.

Performance results:
1. Single NIC test -- 1 client targets 1 nic on the server with 300 concurrent 
requests.
2. 4 NIC test -- 1 client targets 4 nics, all on different nodes on the server 
with 300 concurrent requests.

We see about 135% improvement on AB requests per second with this patch and 
the device_locality_check patch on single NIC test, on the Rackable c5100 
machine (server).  We see about 64% improvement  when all 4 NICS are targeted.

Credits:  This work was originally done by Justin Forbes 

Comments?

Signed-off by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off by: Shobhit Dayal [EMAIL PROTECTED]
Signed-off by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16/drivers/net/e1000/e1000_main.c
===
--- linux-2.6.16.orig/drivers/net/e1000/e1000_main.c2006-03-19 
21:53:29.0 -0800
+++ linux-2.6.16/drivers/net/e1000/e1000_main.c 2006-03-20 14:52:23.0 
-0800
@@ -692,6 +692,7 @@ e1000_probe(struct pci_dev *pdev,
 
SET_MODULE_OWNER(netdev);
SET_NETDEV_DEV(netdev, pdev-dev);
+   SET_NETDEV_NODE(netdev, pcibus_to_node(pdev-bus));
 
pci_set_drvdata(pdev, netdev);
adapter = netdev_priv(netdev);
Index: linux-2.6.16/drivers/net/tg3.c
===
--- linux-2.6.16.orig/drivers/net/tg3.c 2006-03-19 21:53:29.0 -0800
+++ linux-2.6.16/drivers/net/tg3.c  2006-03-20 14:52:23.0 -0800
@@ -10705,6 +10705,7 @@ static int __devinit tg3_init_one(struct
 
SET_MODULE_OWNER(dev);
SET_NETDEV_DEV(dev, pdev-dev);
+   SET_NETDEV_NODE(dev, pcibus_to_node(pdev-bus));
 
dev-features |= NETIF_F_LLTX;
 #if TG3_VLAN_TAG_USED
Index: linux-2.6.16/include/linux/netdevice.h
===
--- linux-2.6.16.orig/include/linux/netdevice.h 2006-03-19 21:53:29.0 
-0800
+++ linux-2.6.16/include/linux/netdevice.h  2006-03-20 14:52:23.0 
-0800
@@ -315,7 +315,9 @@ struct net_device
/* Interface index. Unique device identifier*/
int ifindex;
int iflink;
-
+#ifdef CONFIG_NUMA
+   int node;   /* NUMA node this IF is close to */
+#endif
 
struct net_device_stats* (*get_stats)(struct net_device *dev);
struct iw_statistics*   (*get_wireless_stats)(struct net_device *dev);
@@ -520,6 +522,14 @@ static inline void *netdev_priv(struct n
  */
 #define SET_NETDEV_DEV(net, pdev)  ((net)-class_dev.dev = (pdev))
 
+#ifdef CONFIG_NUMA
+#define SET_NETDEV_NODE(dev, nodeid)   ((dev)-node = (nodeid))
+#define netdev_node(dev)   ((dev)-node)
+#else
+#define SET_NETDEV_NODE(dev, nodeid)   do {} while (0)
+#define netdev_node(dev)   (-1)
+#endif
+
 struct packet_type {
__be16  type;   /* This is really htons(ether_type). */
struct net_device   *dev;   /* NULL is wildcarded here   */
Index: linux-2.6.16/net/core/dev.c
===
--- linux-2.6.16.orig/net/core/dev.c2006-03-19 21:53:29.0 -0800
+++ linux-2.6.16/net/core/dev.c 2006-03-20 14:52:23.0 -0800
@@ -3003,7 +3003,8 @@ struct net_device *alloc_netdev(int size
 
if (sizeof_priv)
dev-priv = netdev_priv(dev);
-
+   
+   SET_NETDEV_NODE(dev, -1);
setup(dev);
strcpy(dev-name, name);
return dev;
Index: linux-2.6.16/net/ipv4/Kconfig
===
--- linux-2.6.16.orig/net/ipv4/Kconfig  2006-03-19 21:53:29.0 -0800
+++ linux-2.6.16/net/ipv4/Kconfig   2006-03-20 14:52:23.0 -0800
@@ -164,6 +164,15 @@ config IP_ROUTE_MULTIPATH_DRR
  available interfaces. This policy makes sense if the connections 
  should be primarily distributed on interfaces and not on routes. 
 
+config IP_ROUTE_MULTIPATH_NODE
+   tristate MULTIPATH: interface RR algorithm with node affinity
+   depends on IP_ROUTE_MULTIPATH_CACHED  NUMA  !IP_ROUTE_MULTIPATH_DRR
+   help
+ This 

[patch 2/2] net: Node aware multipath device round robin -- device locality check

2006-03-21 Thread Ravikiran G Thirumalai
This patch checks device locality on every ip packet xmit.
In multipath configuration tcp connection to route association is done at 
session startup time. The tcp session process is migrated to different nodes 
after this association.  This would mean a remote NIC is chosen for xmit,
although a local NIC could be available.   Following patch checks if a 
local NIC is available for the desitnation, and recalculates routes if so.
his leads to remote NIC  transfer  in some tcp work load such as AB.

Downside: adds a bitmap to struct rtable.  But only if 
CONFIG_IP_ROUTE_MULTIPATH_NODE is enabled.

Comments, suggestions welcome. 

Signed-off by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16/include/net/route.h
===
--- linux-2.6.16.orig/include/net/route.h   2006-03-19 21:53:29.0 
-0800
+++ linux-2.6.16/include/net/route.h2006-03-20 14:52:24.0 -0800
@@ -75,6 +75,13 @@ struct rtable
/* Miscellaneous cached information */
__u32   rt_spec_dst; /* RFC1122 specific destination */
struct inet_peer*peer; /* long-living peer info */
+#ifdef CONFIG_IP_ROUTE_MULTIPATH_NODE
+   /*  bitmap bit is set if current node has a local multi-path device for
+*  this route.
+*/
+   DECLARE_BITMAP  (mp_if_bitmap, MAX_NUMNODES);
+#endif
+
 };
 
 struct ip_rt_acct
@@ -201,4 +208,21 @@ static inline struct inet_peer *rt_get_p
 
 extern ctl_table ipv4_route_table[];
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH_NODE
+
+#include linux/ip_mp_alg.h
+
+static inline int dst_dev_node_check(struct rtable *rt)
+{
+   int cnode = numa_node_id();
+   if (unlikely(netdev_node(rt-u.dst.dev) != cnode)) {
+   if (test_bit(cnode, rt-mp_if_bitmap))
+   return 1;
+   }
+   return 0;
+}
+#else
+#define dst_dev_node_check(rt) 0
+#endif
+
 #endif /* _ROUTE_H */
Index: linux-2.6.16/net/ipv4/ip_output.c
===
--- linux-2.6.16.orig/net/ipv4/ip_output.c  2006-03-19 21:53:29.0 
-0800
+++ linux-2.6.16/net/ipv4/ip_output.c   2006-03-20 14:52:24.0 -0800
@@ -309,7 +309,7 @@ int ip_queue_xmit(struct sk_buff *skb, i
 
/* Make sure we can route this packet. */
rt = (struct rtable *)__sk_dst_check(sk, 0);
-   if (rt == NULL) {
+   if ((rt == NULL ) || dst_dev_node_check(rt)) {
u32 daddr;
 
/* Use correct destination address if we have options. */
Index: linux-2.6.16/net/ipv4/route.c
===
--- linux-2.6.16.orig/net/ipv4/route.c  2006-03-19 21:53:29.0 -0800
+++ linux-2.6.16/net/ipv4/route.c   2006-03-20 14:52:24.0 -0800
@@ -2313,6 +2313,22 @@ static inline int ip_mkroute_output(stru
if (res-fi  res-fi-fib_nhs  1) {
unsigned char hopcount = res-fi-fib_nhs;
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH_NODE
+   DECLARE_BITMAP (mp_if_bitmap, MAX_NUMNODES);
+   bitmap_zero(mp_if_bitmap, MAX_NUMNODES);
+   /* Calculating device bitmap for this multipath route */
+   if (res-fi-fib_mp_alg == IP_MP_ALG_DRR) {
+   for (hop = 0; hop  hopcount; hop++) {
+   struct net_device *dev2nexthop;
+
+   res-nh_sel = hop;
+   dev2nexthop = FIB_RES_DEV(*res);
+   dev_hold(dev2nexthop);
+   set_bit(netdev_node(dev2nexthop), mp_if_bitmap);
+   dev_put(dev2nexthop);
+   }
+   }
+#endif
for (hop = 0; hop  hopcount; hop++) {
struct net_device *dev2nexthop;
 
@@ -2343,6 +2359,10 @@ static inline int ip_mkroute_output(stru
 FIB_RES_NETMASK(*res),
 res-prefixlen,
 FIB_RES_NH(*res));
+
+#ifdef CONFIG_IP_ROUTE_MULTIPATH_NODE
+   bitmap_copy(rth-mp_if_bitmap, mp_if_bitmap, 
MAX_NUMNODES);
+#endif
cleanup:
/* release work reference to output device */
dev_put(dev2nexthop);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-09 Thread Ravikiran G Thirumalai
On Wed, Mar 08, 2006 at 04:32:58PM -0800, Andrew Morton wrote:
 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote:
 
  On Wed, Mar 08, 2006 at 03:43:21PM -0800, Andrew Morton wrote:
   Benjamin LaHaise [EMAIL PROTECTED] wrote:
   
I think it may make more sense to simply convert local_t into a long, 
given 
that most of the users will be things like stats counters.

   
   Yes, I agree that making local_t signed would be better.  It's consistent
   with atomic_t, atomic64_t and atomic_long_t and it's a bit more flexible.
   
   Perhaps.  A lot of applications would just be upcounters for statistics,
   where unsigned is desired.  But I think the consistency argument wins out.
  
  It already is... for most of the arches except x86_64.
 
 x86 uses unsigned long.

Here's a patch making x86_64 local_t to 64 bits like other 64 bit arches.
This keeps local_t unsigned long.  (We can change it to signed value 
along with other arches later in one go I guess) 

Thanks,
Kiran


Change x86_64 local_t to 64 bits like all other arches.

Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]

Index: linux-2.6.16-rc5mm3/include/asm-x86_64/local.h
===
--- linux-2.6.16-rc5mm3.orig/include/asm-x86_64/local.h 2006-03-08 
16:51:31.0 -0800
+++ linux-2.6.16-rc5mm3/include/asm-x86_64/local.h  2006-03-08 
21:56:01.0 -0800
@@ -5,18 +5,18 @@
 
 typedef struct
 {
-   volatile unsigned int counter;
+   volatile long counter;
 } local_t;
 
 #define LOCAL_INIT(i)  { (i) }
 
-#define local_read(v)  ((v)-counter)
+#define local_read(v)  ((unsigned long)(v)-counter)
 #define local_set(v,i) (((v)-counter) = (i))
 
 static __inline__ void local_inc(local_t *v)
 {
__asm__ __volatile__(
-   incl %0
+   incq %0
:=m (v-counter)
:m (v-counter));
 }
@@ -24,7 +24,7 @@ static __inline__ void local_inc(local_t
 static __inline__ void local_dec(local_t *v)
 {
__asm__ __volatile__(
-   decl %0
+   decq %0
:=m (v-counter)
:m (v-counter));
 }
@@ -32,7 +32,7 @@ static __inline__ void local_dec(local_t
 static __inline__ void local_add(unsigned int i, local_t *v)
 {
__asm__ __volatile__(
-   addl %1,%0
+   addq %1,%0
:=m (v-counter)
:ir (i), m (v-counter));
 }
@@ -40,7 +40,7 @@ static __inline__ void local_add(unsigne
 static __inline__ void local_sub(unsigned int i, local_t *v)
 {
__asm__ __volatile__(
-   subl %1,%0
+   subq %1,%0
:=m (v-counter)
:ir (i), m (v-counter));
 }
@@ -71,4 +71,4 @@ static __inline__ void local_sub(unsigne
 #define __cpu_local_add(i, v)  cpu_local_add((i), (v))
 #define __cpu_local_sub(i, v)  cpu_local_sub((i), (v))
 
-#endif /* _ARCH_I386_LOCAL_H */
+#endif /* _ARCH_X8664_LOCAL_H */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-09 Thread Ravikiran G Thirumalai
On Thu, Mar 09, 2006 at 07:14:26PM +1100, Nick Piggin wrote:
 Ravikiran G Thirumalai wrote:
 
 Here's a patch making x86_64 local_t to 64 bits like other 64 bit arches.
 This keeps local_t unsigned long.  (We can change it to signed value 
 along with other arches later in one go I guess) 
 
 
 Why not just keep naming and structure of interfaces consistent with
 atomic_t?
 
 That would be signed and 32-bit. You then also have a local64_t.

No, local_t is supposed to be 64-bits on 64bits arches and 32 bit on 32 bit
arches.  x86_64 was the only exception, so this patch fixes that.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated

2006-03-08 Thread Ravikiran G Thirumalai
On Tue, Mar 07, 2006 at 06:16:02PM -0800, Andrew Morton wrote:
 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote:

   +static inline int read_sockets_allocated(struct proto *prot)
   +{
   +  int total = 0;
   +  int cpu;
   +  for_each_cpu(cpu)
   +  total += *per_cpu_ptr(prot-sockets_allocated, cpu);
   +  return total;
   +}
 
 This is likely too big to be inlined, plus sock.h doesn't include enough
 headers to reliably compile this code.
 
   +static inline void mod_sockets_allocated(int *sockets_allocated, int 
  count)
   +{
   +  (*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
   +  put_cpu();
   +}
   +
 
 Ditto.

OK, here is a revised patch.


Change the atomic_t sockets_allocated member of struct proto to a 
per-cpu counter.

Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off-by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h 2006-03-08 11:03:19.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h  2006-03-08 12:06:52.0 
-0800
@@ -543,7 +543,7 @@ struct proto {
/* Memory pressure */
void(*enter_memory_pressure)(void);
struct percpu_counter   *memory_allocated;  /* Current allocated 
memory. */
-   atomic_t*sockets_allocated; /* Current number of 
sockets. */
+   int *sockets_allocated; /* Current number of 
sockets (percpu counter). */
 
/*
 * Pressure flag: try to collapse.
@@ -579,6 +579,12 @@ struct proto {
} stats[NR_CPUS];
 };
 
+extern int read_sockets_allocated(struct proto *prot);
+extern void mod_sockets_allocated(int *sockets_allocated, int count);
+
+#define inc_sockets_allocated(c) mod_sockets_allocated(c, 1)
+#define dec_sockets_allocated(c) mod_sockets_allocated(c, -1)
+
 extern int proto_register(struct proto *prot, int alloc_slab);
 extern void proto_unregister(struct proto *prot);
 
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h  2006-03-08 11:03:19.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h   2006-03-08 11:39:40.0 
-0800
@@ -226,7 +226,7 @@ extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
 extern struct percpu_counter tcp_memory_allocated;
-extern atomic_t tcp_sockets_allocated;
+extern int *tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
 /*
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c2006-03-08 11:03:19.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c 2006-03-08 12:09:19.0 -0800
@@ -771,7 +771,7 @@ struct sock *sk_clone(const struct sock 
newsk-sk_sleep  = NULL;
 
if (newsk-sk_prot-sockets_allocated)
-   atomic_inc(newsk-sk_prot-sockets_allocated);
+   
inc_sockets_allocated(newsk-sk_prot-sockets_allocated); 
}
 out:
return newsk;
@@ -1451,6 +1451,25 @@ void sk_common_release(struct sock *sk)
 
 EXPORT_SYMBOL(sk_common_release);
 
+int read_sockets_allocated(struct proto *prot)
+{
+   int total = 0;
+   int cpu;
+   for_each_cpu(cpu)
+   total += *per_cpu_ptr(prot-sockets_allocated, cpu);
+   return total;
+}
+
+EXPORT_SYMBOL(read_sockets_allocated);
+
+void mod_sockets_allocated(int *sockets_allocated, int count)
+{
+   (*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
+   put_cpu();
+}
+
+EXPORT_SYMBOL(mod_sockets_allocated);
+
 static DEFINE_RWLOCK(proto_list_lock);
 static LIST_HEAD(proto_list);
 
@@ -1620,7 +1639,7 @@ static void proto_seq_printf(struct seq_
%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c 
%2c %2c %2c %2c %2c %2c\n,
   proto-name,
   proto-obj_size,
-  proto-sockets_allocated != NULL ? 
atomic_read(proto-sockets_allocated) : -1,
+  proto-sockets_allocated != NULL ? 
read_sockets_allocated(proto) : -1,
   proto-memory_allocated != NULL ?

percpu_counter_read_positive(proto-memory_allocated) : -1,
   proto-memory_pressure != NULL ? *proto-memory_pressure ? 
yes : no : NI,
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c  2006-03-08 11:08:28.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c   2006-03-08 11:39:40.0 
-0800
@@ -242,7 +242,7 @@ int sk_stream_mem_schedule(struct sock *
return 1;
 
if (!*sk-sk_prot-memory_pressure ||
-   sk-sk_prot-sysctl_mem[2

Re: [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated

2006-03-08 Thread Ravikiran G Thirumalai
On Tue, Mar 07, 2006 at 07:22:34PM -0800, Andrew Morton wrote:
 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote:
  
  The problem is percpu_counter_sum has to read all the cpus cachelines.  If
  we have to use percpu_counter_sum everywhere, then might as well use plain
  per-cpu counters instead of  batching ones no?
 
 I didn't say use it everywhere ;)

:) No you did not. Sorry.  When the counter shows large varience it will affect
all places where it is read and it wasn't clear to me where to use
percpu_counter_sum vs percpu_counter_read in the context of the patch.  
But yes, using percpu_counter_sum at the critical points makes things better.
Thanks for pointing it out.  Here is an attempt on those lines. 

 
   Maybe, on very large cpu counts, 
  we should just change the FBC_BATCH so that variance does not go quadratic.
  Something like 32.  So that variance is 32 * NR_CPUS in that case, instead
  of (NR_CPUS * NR_CPUS * 2) currently.  Comments?
 
 Sure, we need to make that happen.  But it got all mixed up with the
 spinlock removal and it does need quite some thought and testing and
 documentation to help developers to choose the right settings and
 appropriate selection of defaults, etc.

I was thinking on the lines of something like the following as a simple fix
to the quadratic variance. But yes, it needs some experimentation before hand..

#if NR_CPUS = 16
#define FBC_BATCH   (32)
#else
#define FBC_BATCH   (NR_CPUS*4)
#endif

Thanks,
Kiran


Change struct proto-memory_allocated to a batching per-CPU counter 
(percpu_counter) from an atomic_t.  A batching counter is better than a 
plain per-CPU counter as this field is read often. 

Changes since last post: Use the more accurate percpu_counter_sum() at 
critical places.

Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off-by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h 2006-03-08 10:36:07.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h  2006-03-08 11:03:19.0 
-0800
@@ -48,6 +48,7 @@
 #include linux/netdevice.h
 #include linux/skbuff.h  /* struct sk_buff */
 #include linux/security.h
+#include linux/percpu_counter.h
 
 #include linux/filter.h
 
@@ -541,8 +542,9 @@ struct proto {
 
/* Memory pressure */
void(*enter_memory_pressure)(void);
-   atomic_t*memory_allocated;  /* Current allocated 
memory. */
+   struct percpu_counter   *memory_allocated;  /* Current allocated 
memory. */
atomic_t*sockets_allocated; /* Current number of 
sockets. */
+
/*
 * Pressure flag: try to collapse.
 * Technical note: it is used by multiple contexts non atomically.
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h  2006-03-08 10:36:07.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h   2006-03-08 11:03:19.0 
-0800
@@ -225,7 +225,7 @@ extern int sysctl_tcp_abc;
 extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
-extern atomic_t tcp_memory_allocated;
+extern struct percpu_counter tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c2006-03-08 10:36:07.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c 2006-03-08 11:03:19.0 -0800
@@ -1616,12 +1616,13 @@ static char proto_method_implemented(con
 
 static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 {
-   seq_printf(seq, %-9s %4u %6d  %6d   %-3s %6u   %-3s  %-10s 
+   seq_printf(seq, %-9s %4u %6d  %6lu   %-3s %6u   %-3s  %-10s 
%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c 
%2c %2c %2c %2c %2c %2c\n,
   proto-name,
   proto-obj_size,
   proto-sockets_allocated != NULL ? 
atomic_read(proto-sockets_allocated) : -1,
-  proto-memory_allocated != NULL ? 
atomic_read(proto-memory_allocated) : -1,
+  proto-memory_allocated != NULL ?
+   
percpu_counter_read_positive(proto-memory_allocated) : -1,
   proto-memory_pressure != NULL ? *proto-memory_pressure ? 
yes : no : NI,
   proto-max_header,
   proto-slab == NULL ? no : yes,
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c  2006-03-08 10:36:07.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c   2006-03-08 11:08

Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-08 Thread Ravikiran G Thirumalai
On Wed, Mar 08, 2006 at 04:17:33PM -0500, Benjamin LaHaise wrote:
 On Wed, Mar 08, 2006 at 01:07:26PM -0800, Ravikiran G Thirumalai wrote:
 
 Last time I checked, all the major architectures had efficient local_t 
 implementations.  Most of the RISC CPUs are able to do a load / store 
 conditional implementation that is the same cost (since memory barriers 
 tend to be explicite on powerpc).  So why not use it?

Then, for the batched percpu_counters, we could gain by using local_t only for 
the UP case. But we will have to have a new local_long_t implementation 
for that.  Do you think just one use case of local_long_t warrants for a new
set of apis?

Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-07 Thread Ravikiran G Thirumalai
Add percpu_counter_mod_bh for using these counters safely from
both softirq and process context.

Signed-off by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off by: Ravikiran G Thirumalai [EMAIL PROTECTED]
Signed-off by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc5mm3/include/linux/percpu_counter.h
===
--- linux-2.6.16-rc5mm3.orig/include/linux/percpu_counter.h 2006-03-07 
15:08:00.0 -0800
+++ linux-2.6.16-rc5mm3/include/linux/percpu_counter.h  2006-03-07 
15:09:21.0 -0800
@@ -11,6 +11,7 @@
 #include linux/smp.h
 #include linux/threads.h
 #include linux/percpu.h
+#include linux/interrupt.h
 
 #ifdef CONFIG_SMP
 
@@ -110,4 +111,11 @@ static inline void percpu_counter_dec(st
percpu_counter_mod(fbc, -1);
 }
 
+static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long 
amount)
+{
+   local_bh_disable();
+   percpu_counter_mod(fbc, amount);
+   local_bh_enable();
+}
+
 #endif /* _LINUX_PERCPU_COUNTER_H */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated

2006-03-07 Thread Ravikiran G Thirumalai
Change struct proto-memory_allocated to a batching per-CPU counter 
(percpu_counter) from an atomic_t.  A batching counter is better than a 
plain per-CPU counter as this field is read often.

Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off-by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h 2006-03-07 15:07:43.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h  2006-03-07 15:11:48.0 
-0800
@@ -48,6 +48,7 @@
 #include linux/netdevice.h
 #include linux/skbuff.h  /* struct sk_buff */
 #include linux/security.h
+#include linux/percpu_counter.h
 
 #include linux/filter.h
 
@@ -541,8 +542,9 @@ struct proto {
 
/* Memory pressure */
void(*enter_memory_pressure)(void);
-   atomic_t*memory_allocated;  /* Current allocated 
memory. */
+   struct percpu_counter   *memory_allocated;  /* Current allocated 
memory. */
atomic_t*sockets_allocated; /* Current number of 
sockets. */
+
/*
 * Pressure flag: try to collapse.
 * Technical note: it is used by multiple contexts non atomically.
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h  2006-03-07 15:08:00.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h   2006-03-07 15:11:48.0 
-0800
@@ -225,7 +225,7 @@ extern int sysctl_tcp_abc;
 extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
-extern atomic_t tcp_memory_allocated;
+extern struct percpu_counter tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c2006-03-07 15:07:43.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c 2006-03-07 15:11:48.0 -0800
@@ -1616,12 +1616,13 @@ static char proto_method_implemented(con
 
 static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 {
-   seq_printf(seq, %-9s %4u %6d  %6d   %-3s %6u   %-3s  %-10s 
+   seq_printf(seq, %-9s %4u %6d  %6lu   %-3s %6u   %-3s  %-10s 
%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c 
%2c %2c %2c %2c %2c %2c\n,
   proto-name,
   proto-obj_size,
   proto-sockets_allocated != NULL ? 
atomic_read(proto-sockets_allocated) : -1,
-  proto-memory_allocated != NULL ? 
atomic_read(proto-memory_allocated) : -1,
+  proto-memory_allocated != NULL ?
+   
percpu_counter_read_positive(proto-memory_allocated) : -1,
   proto-memory_pressure != NULL ? *proto-memory_pressure ? 
yes : no : NI,
   proto-max_header,
   proto-slab == NULL ? no : yes,
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c  2006-03-07 15:07:43.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c   2006-03-07 15:11:48.0 
-0800
@@ -196,11 +196,11 @@ EXPORT_SYMBOL(sk_stream_error);
 void __sk_stream_mem_reclaim(struct sock *sk)
 {
if (sk-sk_forward_alloc = SK_STREAM_MEM_QUANTUM) {
-   atomic_sub(sk-sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
-  sk-sk_prot-memory_allocated);
+   percpu_counter_mod_bh(sk-sk_prot-memory_allocated,
+   -(sk-sk_forward_alloc / 
SK_STREAM_MEM_QUANTUM));
sk-sk_forward_alloc = SK_STREAM_MEM_QUANTUM - 1;
if (*sk-sk_prot-memory_pressure 
-   (atomic_read(sk-sk_prot-memory_allocated) 
+   (percpu_counter_read(sk-sk_prot-memory_allocated) 
 sk-sk_prot-sysctl_mem[0]))
*sk-sk_prot-memory_pressure = 0;
}
@@ -213,23 +213,26 @@ int sk_stream_mem_schedule(struct sock *
int amt = sk_stream_pages(size);
 
sk-sk_forward_alloc += amt * SK_STREAM_MEM_QUANTUM;
-   atomic_add(amt, sk-sk_prot-memory_allocated);
+   percpu_counter_mod_bh(sk-sk_prot-memory_allocated, amt);
 
/* Under limit. */
-   if (atomic_read(sk-sk_prot-memory_allocated)  
sk-sk_prot-sysctl_mem[0]) {
+   if (percpu_counter_read(sk-sk_prot-memory_allocated) 
+   sk-sk_prot-sysctl_mem[0]) {
if (*sk-sk_prot-memory_pressure)
*sk-sk_prot-memory_pressure = 0;
return 1;
}
 
/* Over hard limit. */
-   if (atomic_read(sk-sk_prot-memory_allocated)  

[patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated

2006-03-07 Thread Ravikiran G Thirumalai
Change the atomic_t sockets_allocated member of struct proto to a 
per-cpu counter.

Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off-by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h 2006-03-07 15:09:22.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h  2006-03-07 15:09:52.0 
-0800
@@ -543,7 +543,7 @@ struct proto {
/* Memory pressure */
void(*enter_memory_pressure)(void);
struct percpu_counter   *memory_allocated;  /* Current allocated 
memory. */
-   atomic_t*sockets_allocated; /* Current number of 
sockets. */
+   int *sockets_allocated; /* Current number of 
sockets (percpu counter). */
 
/*
 * Pressure flag: try to collapse.
@@ -579,6 +579,24 @@ struct proto {
} stats[NR_CPUS];
 };
 
+static inline int read_sockets_allocated(struct proto *prot)
+{
+   int total = 0;
+   int cpu;
+   for_each_cpu(cpu)
+   total += *per_cpu_ptr(prot-sockets_allocated, cpu);
+   return total;
+}
+
+static inline void mod_sockets_allocated(int *sockets_allocated, int count)
+{
+   (*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
+   put_cpu();
+}
+
+#define inc_sockets_allocated(c) mod_sockets_allocated(c, 1)
+#define dec_sockets_allocated(c) mod_sockets_allocated(c, -1)
+
 extern int proto_register(struct proto *prot, int alloc_slab);
 extern void proto_unregister(struct proto *prot);
 
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h  2006-03-07 15:09:22.0 
-0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h   2006-03-07 15:09:52.0 
-0800
@@ -226,7 +226,7 @@ extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
 extern struct percpu_counter tcp_memory_allocated;
-extern atomic_t tcp_sockets_allocated;
+extern int *tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
 /*
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c2006-03-07 15:09:22.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c 2006-03-07 15:09:52.0 -0800
@@ -771,7 +771,7 @@ struct sock *sk_clone(const struct sock 
newsk-sk_sleep  = NULL;
 
if (newsk-sk_prot-sockets_allocated)
-   atomic_inc(newsk-sk_prot-sockets_allocated);
+   
inc_sockets_allocated(newsk-sk_prot-sockets_allocated); 
}
 out:
return newsk;
@@ -1620,7 +1620,7 @@ static void proto_seq_printf(struct seq_
%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c 
%2c %2c %2c %2c %2c %2c\n,
   proto-name,
   proto-obj_size,
-  proto-sockets_allocated != NULL ? 
atomic_read(proto-sockets_allocated) : -1,
+  proto-sockets_allocated != NULL ? 
read_sockets_allocated(proto) : -1,
   proto-memory_allocated != NULL ?

percpu_counter_read_positive(proto-memory_allocated) : -1,
   proto-memory_pressure != NULL ? *proto-memory_pressure ? 
yes : no : NI,
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c  2006-03-07 15:09:22.0 
-0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c   2006-03-07 15:09:52.0 
-0800
@@ -242,7 +242,7 @@ int sk_stream_mem_schedule(struct sock *
return 1;
 
if (!*sk-sk_prot-memory_pressure ||
-   sk-sk_prot-sysctl_mem[2]  
atomic_read(sk-sk_prot-sockets_allocated) *
+   sk-sk_prot-sysctl_mem[2]  read_sockets_allocated(sk-sk_prot) *
sk_stream_pages(sk-sk_wmem_queued +
atomic_read(sk-sk_rmem_alloc) 
+
sk-sk_forward_alloc))
Index: linux-2.6.16-rc5mm3/net/ipv4/proc.c
===
--- linux-2.6.16-rc5mm3.orig/net/ipv4/proc.c2006-03-07 15:09:22.0 
-0800
+++ linux-2.6.16-rc5mm3/net/ipv4/proc.c 2006-03-07 15:09:52.0 -0800
@@ -63,7 +63,7 @@ static int sockstat_seq_show(struct seq_
socket_seq_show(seq);
seq_printf(seq, TCP: inuse %d orphan %d tw %d alloc %d mem %lu\n,
   fold_prot_inuse(tcp_prot), atomic_read(tcp_orphan_count),
-  tcp_death_row.tw_count, atomic_read(tcp_sockets_allocated),
+  tcp_death_row.tw_count, read_sockets_allocated(tcp_prot),
   

Re: [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated

2006-03-07 Thread Ravikiran G Thirumalai
On Tue, Mar 07, 2006 at 06:14:22PM -0800, Andrew Morton wrote:
 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote:
 
  -   if (atomic_read(sk-sk_prot-memory_allocated)  
  sk-sk_prot-sysctl_mem[0]) {
   +  if (percpu_counter_read(sk-sk_prot-memory_allocated) 
   +  sk-sk_prot-sysctl_mem[0]) {
 
 Bear in mind that percpu_counter_read[_positive] can be inaccurate on large
 CPU counts.
 
 It might be worth running percpu_counter_sum() to get the exact count if we
 think we're about to cause something to fail.

The problem is percpu_counter_sum has to read all the cpus cachelines.  If
we have to use percpu_counter_sum everywhere, then might as well use plain
per-cpu counters instead of  batching ones no?
 
sysctl_mem[0] is about 196K  and on a 16 cpu box variance is 512 bytes, which 
is OK with just percpu_counter_read I hope.  Maybe, on very large cpu counts, 
we should just change the FBC_BATCH so that variance does not go quadratic.
Something like 32.  So that variance is 32 * NR_CPUS in that case, instead
of (NR_CPUS * NR_CPUS * 2) currently.  Comments?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-02-02 Thread Ravikiran G Thirumalai
On Fri, Jan 27, 2006 at 03:01:06PM -0800, Andrew Morton wrote:
 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote:
 
 
   
   If the benchmarks say that we need to.  If we cannot observe any problems
   in testing of existing code and if we can't demonstrate any benefit from
   the patched code then one option is to go off and do something else ;)
  
  We first tried plain per-CPU counters for memory_allocated, found that reads
  on memory_allocated was causing cacheline transfers, and then
  switched over to batching.  So batching reads is useful.  To avoid
  inaccuracy, we can maybe change percpu_counter_init to:
  
  void percpu_counter_init(struct percpu_counter *fbc, int maxdev)
  
  the percpu batching limit would then be maxdev/num_possible_cpus.  One would
  use batching counters only when both reads and writes are frequent.  With
  the above scheme, we would go fetch cachelines from other cpus for read
  often only on large cpu counts, which is not any worse than the global
  counter alternative, but it would still be beneficial on smaller machines,
  without sacrificing a pre-set deviation.  
  
  Comments?
 
 Sounds sane.


Here's an implementation which delegates tuning of batching to the user.  We
don't really need local_t at all as percpu_counter_mod is not safe against
interrupts and softirqs  as it is.  If we have a counter which could be
modified in process context and irq/bh context, we just have to use a
wrapper like percpu_counter_mod_bh which will just disable and enable bottom
halves.  Reads on the counters are safe as they are atomic_reads, and the
cpu local variables are always accessed by that cpu only.

(PS: the maxerr for ext2/ext3 is just guesstimate)

Comments?

Index: linux-2.6.16-rc1mm4/include/linux/percpu_counter.h
===
--- linux-2.6.16-rc1mm4.orig/include/linux/percpu_counter.h 2006-02-02 
11:18:54.0 -0800
+++ linux-2.6.16-rc1mm4/include/linux/percpu_counter.h  2006-02-02 
18:29:46.0 -0800
@@ -16,24 +16,32 @@
 
 struct percpu_counter {
atomic_long_t count;
+   int percpu_batch;
long *counters;
 };
 
-#if NR_CPUS = 16
-#define FBC_BATCH  (NR_CPUS*2)
-#else
-#define FBC_BATCH  (NR_CPUS*4)
-#endif
 
-static inline void percpu_counter_init(struct percpu_counter *fbc)
+/* 
+ * Choose maxerr carefully. maxerr/num_possible_cpus indicates per-cpu 
batching 
+ * Set maximum tolerance for better performance on large systems.
+ */
+static inline void percpu_counter_init(struct percpu_counter *fbc, 
+   unsigned int maxerr)
 {
atomic_long_set(fbc-count, 0);
-   fbc-counters = alloc_percpu(long);
+   fbc-percpu_batch = maxerr/num_possible_cpus();
+   if (fbc-percpu_batch) {
+   fbc-counters = alloc_percpu(long);
+   if (!fbc-counters)
+   fbc-percpu_batch = 0;
+   }
+   
 }
 
 static inline void percpu_counter_destroy(struct percpu_counter *fbc)
 {
-   free_percpu(fbc-counters);
+   if (fbc-percpu_batch)
+   free_percpu(fbc-counters);
 }
 
 void percpu_counter_mod(struct percpu_counter *fbc, long amount);
@@ -63,7 +71,8 @@ struct percpu_counter {
long count;
 };
 
-static inline void percpu_counter_init(struct percpu_counter *fbc)
+static inline void percpu_counter_init(struct percpu_counter *fbc, 
+   unsigned int maxerr)
 {
fbc-count = 0;
 }
Index: linux-2.6.16-rc1mm4/mm/swap.c
===
--- linux-2.6.16-rc1mm4.orig/mm/swap.c  2006-01-29 20:20:20.0 -0800
+++ linux-2.6.16-rc1mm4/mm/swap.c   2006-02-02 18:36:21.0 -0800
@@ -470,13 +470,20 @@ static int cpu_swap_callback(struct noti
 #ifdef CONFIG_SMP
 void percpu_counter_mod(struct percpu_counter *fbc, long amount)
 {
-   long count;
long *pcount;
-   int cpu = get_cpu();
+   long count;
+   int cpu;
 
+   /* Slow mode */
+   if (unlikely(!fbc-percpu_batch)) {
+   atomic_long_add(amount, fbc-count);
+   return;
+   }
+   
+   cpu = get_cpu();
pcount = per_cpu_ptr(fbc-counters, cpu);
count = *pcount + amount;
-   if (count = FBC_BATCH || count = -FBC_BATCH) {
+   if (count = fbc-percpu_batch || count = -fbc-percpu_batch) {
atomic_long_add(count, fbc-count);
count = 0;
}
Index: linux-2.6.16-rc1mm4/fs/ext2/super.c
===
--- linux-2.6.16-rc1mm4.orig/fs/ext2/super.c2006-02-02 18:30:28.0 
-0800
+++ linux-2.6.16-rc1mm4/fs/ext2/super.c 2006-02-02 18:36:39.0 -0800
@@ -610,6 +610,7 @@ static int ext2_fill_super(struct super_
int db_count;
int i, j;
__le32 features;
+   int maxerr;
 
sbi = kmalloc(sizeof(*sbi), GFP_KERNEL

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Ravikiran G Thirumalai
On Fri, Jan 27, 2006 at 09:53:53AM +0100, Eric Dumazet wrote:
 Ravikiran G Thirumalai a écrit :
 Change the atomic_t sockets_allocated member of struct proto to a 
 per-cpu counter.
 
 Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED]
 Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
 Signed-off-by: Shai Fultheim [EMAIL PROTECTED]
 
 Hi Ravikiran
 
 If I correctly read this patch, I think there is a scalability problem.
 
 On a big SMP machine, read_sockets_allocated() is going to be a real killer.
 
 Say we have 128 Opterons CPUS in a box.

read_sockets_allocated is being invoked when when /proc/net/protocols is read,
which can be assumed as not frequent.  
At sk_stream_mem_schedule(), read_sockets_allocated() is invoked only 
certain conditions, under memory pressure -- on a large CPU count machine, 
you'd have large memory, and I don't think read_sockets_allocated would get 
called often.  It did not atleast on our 8cpu/16G box.  So this should be OK 
I think.

There're no 128 CPU Opteron boxes yet afaik ;).

Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Ravikiran G Thirumalai
On Fri, Jan 27, 2006 at 12:16:02PM -0800, Andrew Morton wrote:
 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote:
 
  which can be assumed as not frequent.  
  At sk_stream_mem_schedule(), read_sockets_allocated() is invoked only 
  certain conditions, under memory pressure -- on a large CPU count machine, 
  you'd have large memory, and I don't think read_sockets_allocated would get 
  called often.  It did not atleast on our 8cpu/16G box.  So this should be 
  OK 
  I think.
 
 That being said, the percpu_counters aren't a terribly successful concept
 and probably do need a revisit due to the high inaccuracy at high CPU
 counts.  It might be better to do some generic version of vm_acct_memory()
 instead.

AFAICS vm_acct_memory is no better.  The deviation on large cpu counts is the 
same as percpu_counters -- (NR_CPUS * NR_CPUS * 2) ...

 
 If the benchmarks say that we need to.  If we cannot observe any problems
 in testing of existing code and if we can't demonstrate any benefit from
 the patched code then one option is to go off and do something else ;)

We first tried plain per-CPU counters for memory_allocated, found that reads
on memory_allocated was causing cacheline transfers, and then
switched over to batching.  So batching reads is useful.  To avoid
inaccuracy, we can maybe change percpu_counter_init to:

void percpu_counter_init(struct percpu_counter *fbc, int maxdev)

the percpu batching limit would then be maxdev/num_possible_cpus.  One would
use batching counters only when both reads and writes are frequent.  With
the above scheme, we would go fetch cachelines from other cpus for read
often only on large cpu counts, which is not any worse than the global
counter alternative, but it would still be beneficial on smaller machines,
without sacrificing a pre-set deviation.  

Comments?

Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Ravikiran G Thirumalai
On Fri, Jan 27, 2006 at 11:30:23PM +0100, Eric Dumazet wrote:
 
 There are several issues here :
 
 alloc_percpu() current implementation is a a waste of ram. (because it uses 
 slab allocations that have a minimum size of 32 bytes)

Oh there was a solution for that :).  

http://lwn.net/Articles/119532/

I can quickly revive it if there is interest.


Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Ravikiran G Thirumalai
On Fri, Jan 27, 2006 at 03:08:47PM -0800, Andrew Morton wrote:
 Andrew Morton [EMAIL PROTECTED] wrote:
 
  Oh, and because vm_acct_memory() is counting a singleton object, it can use
  DEFINE_PER_CPU rather than alloc_percpu(), so it saves on a bit of kmalloc
  overhead.
 
 Actually, I don't think that's true.  we're allocating a sizeof(long) with
 kmalloc_node() so there shouldn't be memory wastage.

Oh yeah there is. Each dynamic per-cpu object would have been  atleast
(NR_CPUS * sizeof (void *) + num_cpus_possible * cacheline_size ).  
Now kmalloc_node will fall back on size-32 for allocation of long, so
replace the cacheline_size above with 32 -- which then means dynamic per-cpu
data are not on a cacheline boundary anymore (most modern cpus have 64byte/128 
byte cache lines) which means per-cpu data could end up false shared

Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Ravikiran G Thirumalai
On Sat, Jan 28, 2006 at 12:21:07AM +0100, Eric Dumazet wrote:
 Ravikiran G Thirumalai a écrit :
 On Fri, Jan 27, 2006 at 11:30:23PM +0100, Eric Dumazet wrote:
 
 Why not use a boot time allocated percpu area (as done today in 
 setup_per_cpu_areas()), but instead of reserving extra space for module's 
 percpu data, being able to serve alloc_percpu() from this reserved area (ie 
 no kmalloced data anymore), and keeping your

At that time ia64 placed a limit on the max size of per-CPU area 
(PERCPU_ENOUGH_ROOM).  I think that the limit is still there, But hopefully
64K per-CPU should be enough for static + dynamic + modules?

Let me do a allyesconfig on my box and verify.

Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Ravikiran G Thirumalai
On Sat, Jan 28, 2006 at 01:35:03AM +0100, Eric Dumazet wrote:
 Eric Dumazet a écrit :
 Andrew Morton a écrit :
 Eric Dumazet [EMAIL PROTECTED] wrote:
 
 #ifdef CONFIG_SMP
 void percpu_counter_mod(struct percpu_counter *fbc, long amount)
 {
  long old, new;
  atomic_long_t *pcount;
 
  pcount = per_cpu_ptr(fbc-counters, get_cpu());
 start:
  old = atomic_long_read(pcount);
  new = old + amount;
  if (new = FBC_BATCH || new = -FBC_BATCH) {
  if (unlikely(atomic_long_cmpxchg(pcount, old, 0) != old))
  goto start;
  atomic_long_add(new, fbc-count);
  } else
  atomic_long_add(amount, pcount);
 
  put_cpu();
 }
 EXPORT_SYMBOL(percpu_counter_mod);
 
 long percpu_counter_read_accurate(struct percpu_counter *fbc)
 {
  long res = 0;
  int cpu;
  atomic_long_t *pcount;
 
  for_each_cpu(cpu) {
  pcount = per_cpu_ptr(fbc-counters, cpu);
  /* dont dirty cache line if not necessary */
  if (atomic_long_read(pcount))
  res += atomic_long_xchg(pcount, 0);
---  (A)
  }
 

   atomic_long_add(res, fbc-count);
---  (B)
   res = atomic_long_read(fbc-count);
 
  return res;
 }

The read is still theoritically FBC_BATCH * NR_CPUS inaccurate no?
What happens when other cpus update  their local counters at (A) and (B)?

(I am hoping we don't need percpu_counter_read_accurate anywhere yet and
this is just demo ;).  I certainly don't want to go on all cpus for read / 
add cmpxchg on the write path for the proto counters that started this 
discussion)

Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/4] net: Percpufy frequently used variables on struct proto

2006-01-26 Thread Ravikiran G Thirumalai
The following patches change struct proto.memory_allocated,
proto.sockets_allocated to use per-cpu counters. This patchset also switches
the proto.inuse percpu varible to use alloc_percpu, instead of NR_CPUS *
cacheline size padding.

We saw 5 % improvement in apache bench requests per second with this
patchset, on a multi nic 8 way 3.3 GHZ IBM x460 Xeon server.  

Patches follow.

Thanks,
Kiran
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-26 Thread Ravikiran G Thirumalai
Change the atomic_t sockets_allocated member of struct proto to a 
per-cpu counter.

Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off-by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc1/include/net/sock.h
===
--- linux-2.6.16-rc1.orig/include/net/sock.h2006-01-24 14:37:45.0 
-0800
+++ linux-2.6.16-rc1/include/net/sock.h 2006-01-24 17:35:34.0 -0800
@@ -543,7 +543,7 @@ struct proto {
/* Memory pressure */
void(*enter_memory_pressure)(void);
struct percpu_counter   *memory_allocated;  /* Current allocated 
memory. */
-   atomic_t*sockets_allocated; /* Current number of 
sockets. */
+   int *sockets_allocated; /* Current number of 
sockets (percpu counter). */
 
/*
 * Pressure flag: try to collapse.
@@ -579,6 +579,24 @@ struct proto {
} stats[NR_CPUS];
 };
 
+static inline int read_sockets_allocated(struct proto *prot)
+{
+   int total = 0;
+   int cpu;
+   for_each_cpu(cpu)
+   total += *per_cpu_ptr(prot-sockets_allocated, cpu);
+   return total;
+}
+
+static inline void mod_sockets_allocated(int *sockets_allocated, int count)
+{
+   (*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
+   put_cpu();
+}
+
+#define inc_sockets_allocated(c) mod_sockets_allocated(c, 1)
+#define dec_sockets_allocated(c) mod_sockets_allocated(c, -1)
+
 extern int proto_register(struct proto *prot, int alloc_slab);
 extern void proto_unregister(struct proto *prot);
 
Index: linux-2.6.16-rc1/include/net/tcp.h
===
--- linux-2.6.16-rc1.orig/include/net/tcp.h 2006-01-24 14:37:45.0 
-0800
+++ linux-2.6.16-rc1/include/net/tcp.h  2006-01-24 17:05:34.0 -0800
@@ -221,7 +221,7 @@ extern int sysctl_tcp_tso_win_divisor;
 extern int sysctl_tcp_abc;
 
 extern struct percpu_counter tcp_memory_allocated;
-extern atomic_t tcp_sockets_allocated;
+extern int *tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
 /*
Index: linux-2.6.16-rc1/net/core/sock.c
===
--- linux-2.6.16-rc1.orig/net/core/sock.c   2006-01-24 14:37:45.0 
-0800
+++ linux-2.6.16-rc1/net/core/sock.c2006-01-24 16:47:55.0 -0800
@@ -771,7 +771,7 @@ struct sock *sk_clone(const struct sock 
newsk-sk_sleep  = NULL;
 
if (newsk-sk_prot-sockets_allocated)
-   atomic_inc(newsk-sk_prot-sockets_allocated);
+   
inc_sockets_allocated(newsk-sk_prot-sockets_allocated); 
}
 out:
return newsk;
@@ -1620,7 +1620,7 @@ static void proto_seq_printf(struct seq_
%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c 
%2c %2c %2c %2c %2c %2c\n,
   proto-name,
   proto-obj_size,
-  proto-sockets_allocated != NULL ? 
atomic_read(proto-sockets_allocated) : -1,
+  proto-sockets_allocated != NULL ? 
read_sockets_allocated(proto) : -1,
   proto-memory_allocated != NULL ?
percpu_counter_read(proto-memory_allocated) : 
-1,
   proto-memory_pressure != NULL ? *proto-memory_pressure ? 
yes : no : NI,
Index: linux-2.6.16-rc1/net/core/stream.c
===
--- linux-2.6.16-rc1.orig/net/core/stream.c 2006-01-24 14:37:45.0 
-0800
+++ linux-2.6.16-rc1/net/core/stream.c  2006-01-24 16:20:22.0 -0800
@@ -243,7 +243,7 @@ int sk_stream_mem_schedule(struct sock *
return 1;
 
if (!*sk-sk_prot-memory_pressure ||
-   sk-sk_prot-sysctl_mem[2]  
atomic_read(sk-sk_prot-sockets_allocated) *
+   sk-sk_prot-sysctl_mem[2]  read_sockets_allocated(sk-sk_prot) *
sk_stream_pages(sk-sk_wmem_queued +
atomic_read(sk-sk_rmem_alloc) 
+
sk-sk_forward_alloc))
Index: linux-2.6.16-rc1/net/ipv4/proc.c
===
--- linux-2.6.16-rc1.orig/net/ipv4/proc.c   2006-01-24 14:37:45.0 
-0800
+++ linux-2.6.16-rc1/net/ipv4/proc.c2006-01-24 16:20:22.0 -0800
@@ -63,7 +63,7 @@ static int sockstat_seq_show(struct seq_
socket_seq_show(seq);
seq_printf(seq, TCP: inuse %d orphan %d tw %d alloc %d mem %lu\n,
   fold_prot_inuse(tcp_prot), atomic_read(tcp_orphan_count),
-  tcp_death_row.tw_count, atomic_read(tcp_sockets_allocated),
+  tcp_death_row.tw_count, read_sockets_allocated(tcp_prot),
   percpu_counter_read(tcp_memory_allocated));

[patch 2/4] net: Percpufy frequently used variables -- struct proto.memory_allocated

2006-01-26 Thread Ravikiran G Thirumalai
Change struct proto-memory_allocated to a batching per-CPU counter 
(percpu_counter) from an atomic_t.  A batching counter is better than a 
plain per-CPU counter as this field is read often.

Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED]
Signed-off-by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc1mm3/include/net/sock.h
===
--- linux-2.6.16-rc1mm3.orig/include/net/sock.h 2006-01-25 16:25:16.0 
-0800
+++ linux-2.6.16-rc1mm3/include/net/sock.h  2006-01-25 16:56:59.0 
-0800
@@ -48,6 +48,7 @@
 #include linux/netdevice.h
 #include linux/skbuff.h  /* struct sk_buff */
 #include linux/security.h
+#include linux/percpu_counter.h
 
 #include linux/filter.h
 
@@ -541,8 +542,9 @@ struct proto {
 
/* Memory pressure */
void(*enter_memory_pressure)(void);
-   atomic_t*memory_allocated;  /* Current allocated 
memory. */
+   struct percpu_counter   *memory_allocated;  /* Current allocated 
memory. */
atomic_t*sockets_allocated; /* Current number of 
sockets. */
+
/*
 * Pressure flag: try to collapse.
 * Technical note: it is used by multiple contexts non atomically.
Index: linux-2.6.16-rc1mm3/include/net/tcp.h
===
--- linux-2.6.16-rc1mm3.orig/include/net/tcp.h  2006-01-25 16:25:16.0 
-0800
+++ linux-2.6.16-rc1mm3/include/net/tcp.h   2006-01-25 16:56:59.0 
-0800
@@ -220,7 +220,7 @@ extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
 extern int sysctl_tcp_abc;
 
-extern atomic_t tcp_memory_allocated;
+extern struct percpu_counter tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
Index: linux-2.6.16-rc1mm3/net/core/sock.c
===
--- linux-2.6.16-rc1mm3.orig/net/core/sock.c2006-01-25 16:25:16.0 
-0800
+++ linux-2.6.16-rc1mm3/net/core/sock.c 2006-01-25 16:56:59.0 -0800
@@ -1616,12 +1616,13 @@ static char proto_method_implemented(con
 
 static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 {
-   seq_printf(seq, %-9s %4u %6d  %6d   %-3s %6u   %-3s  %-10s 
+   seq_printf(seq, %-9s %4u %6d  %6lu   %-3s %6u   %-3s  %-10s 
%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c 
%2c %2c %2c %2c %2c %2c\n,
   proto-name,
   proto-obj_size,
   proto-sockets_allocated != NULL ? 
atomic_read(proto-sockets_allocated) : -1,
-  proto-memory_allocated != NULL ? 
atomic_read(proto-memory_allocated) : -1,
+  proto-memory_allocated != NULL ?
+   percpu_counter_read(proto-memory_allocated) : 
-1,
   proto-memory_pressure != NULL ? *proto-memory_pressure ? 
yes : no : NI,
   proto-max_header,
   proto-slab == NULL ? no : yes,
Index: linux-2.6.16-rc1mm3/net/core/stream.c
===
--- linux-2.6.16-rc1mm3.orig/net/core/stream.c  2006-01-25 16:25:16.0 
-0800
+++ linux-2.6.16-rc1mm3/net/core/stream.c   2006-01-25 16:56:59.0 
-0800
@@ -17,6 +17,7 @@
 #include linux/signal.h
 #include linux/tcp.h
 #include linux/wait.h
+#include linux/percpu.h
 #include net/sock.h
 
 /**
@@ -196,11 +197,11 @@ EXPORT_SYMBOL(sk_stream_error);
 void __sk_stream_mem_reclaim(struct sock *sk)
 {
if (sk-sk_forward_alloc = SK_STREAM_MEM_QUANTUM) {
-   atomic_sub(sk-sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
-  sk-sk_prot-memory_allocated);
+   percpu_counter_mod_bh(sk-sk_prot-memory_allocated,
+   -(sk-sk_forward_alloc / 
SK_STREAM_MEM_QUANTUM));
sk-sk_forward_alloc = SK_STREAM_MEM_QUANTUM - 1;
if (*sk-sk_prot-memory_pressure 
-   (atomic_read(sk-sk_prot-memory_allocated) 
+   (percpu_counter_read(sk-sk_prot-memory_allocated) 
 sk-sk_prot-sysctl_mem[0]))
*sk-sk_prot-memory_pressure = 0;
}
@@ -213,23 +214,26 @@ int sk_stream_mem_schedule(struct sock *
int amt = sk_stream_pages(size);
 
sk-sk_forward_alloc += amt * SK_STREAM_MEM_QUANTUM;
-   atomic_add(amt, sk-sk_prot-memory_allocated);
+   percpu_counter_mod_bh(sk-sk_prot-memory_allocated, amt);
 
/* Under limit. */
-   if (atomic_read(sk-sk_prot-memory_allocated)  
sk-sk_prot-sysctl_mem[0]) {
+   if (percpu_counter_read(sk-sk_prot-memory_allocated) 
+   sk-sk_prot-sysctl_mem[0]) {
if (*sk-sk_prot-memory_pressure)
*sk-sk_prot-memory_pressure = 0;
   

[patch 1/4] net: Percpufy frequently used variables -- add percpu_counter_mod_bh

2006-01-26 Thread Ravikiran G Thirumalai
Add percpu_counter_mod_bh for using these counters safely from
both softirq and process context.

Signed-off by: Pravin B. Shelar [EMAIL PROTECTED]
Signed-off by: Ravikiran G Thirumalai [EMAIL PROTECTED]
Signed-off by: Shai Fultheim [EMAIL PROTECTED]

Index: linux-2.6.16-rc1/include/linux/percpu_counter.h
===
--- linux-2.6.16-rc1.orig/include/linux/percpu_counter.h2006-01-25 
11:53:56.0 -0800
+++ linux-2.6.16-rc1/include/linux/percpu_counter.h 2006-01-25 
12:09:41.0 -0800
@@ -11,6 +11,7 @@
 #include linux/smp.h
 #include linux/threads.h
 #include linux/percpu.h
+#include linux/interrupt.h
 
 #ifdef CONFIG_SMP
 
@@ -102,4 +103,11 @@ static inline void percpu_counter_dec(st
percpu_counter_mod(fbc, -1);
 }
 
+static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long 
amount)
+{
+   local_bh_disable();
+   percpu_counter_mod(fbc, amount);
+   local_bh_enable();
+}
+
 #endif /* _LINUX_PERCPU_COUNTER_H */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html