Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-17 Thread Roland Dreier
  Problem is, I think, you'll need to preallocate IDR_FREE_MAX items.  And
  then free them all again when none of them were consumed (usual).

Actually I think it's OK if we just pass in no more than one extra
layer for each try to add something with idr_get_new().  In the worst
case, this leads to a lot of extra calls to idr_get_new(), but in the
usual case it's fine.

I'm including a lightly tested big patch with all my idr changes for
comments -- I'll split it up into a form more suitable for merging.
(Some of the changes are unrelated and obviously good, eg using
kmem_cache_zalloc() instead of a slab cache with a constructor that
does memset(0)).

I'm not sure I'm thrilled with this approach, but it does seem to be a
net win.  With an allyesconfig with debugging options turned off (so
spinlocks shrink back down to 8 bytes), I get the following:

   textdata bss dec hex filename
243477595971210 2322176 326411451f21079 vmlinux.old
243473705970474 2320704 326385481f20654 vmlinux.new

Most of the savings comes from ocfs2, which has a static array of
255 structures that each contain an idr -- so removing the lock from
struct idr saves 255 * 8 = 2040 bytes.  However, even without
factoring that in, this does seem to be a net win:

add/remove: 2/4 grow/shrink: 23/51 up/down: 719/-3215 (-2496)
function old new   delta
idr_get_new_above 38 554+516
dm_create9571000 +43
ipath_init_one  32943329 +35
get_layer  -  32 +32
idr_alloc_layer-  16 +16
sd_probe 871 881 +10
rtc_device_register  493 503 +10
proc_register277 286  +9
mmc_add_host_sysfs   126 135  +9
idr_add_uobj  80  85  +5
cma_alloc_port   224 229  +5
sys_timer_create 876 880  +4
set_anon_super   173 177  +4
sctp_process_init   13121316  +4
ib_ucm_ctx_alloc 197 201  +4
ib_create_cm_id  287 290  +3
proc_mkdir_mode   95  97  +2
vfs_kern_mount   279 280  +1
send_mad 325 326  +1
proc_symlink 141 142  +1
proc_file_write   40  41  +1
kill_block_super  56  57  +1
get_sb_single175 176  +1
free_proc_entry  108 109  +1
deactivate_super 126 127  +1
proc_readdir 353 352  -1
proc_getattr  40  39  -1
get_sb_nodev 150 149  -1
o2net_send_message_vec  20322030  -2
hwmon_device_register198 196  -2
create_proc_entry170 168  -2
__put_super_and_need_restart  54  52  -2
v9fs_read_work  14241421  -3
v9fs_mux_init   13331330  -3
v9fs_mux_flush_cb303 300  -3
v9fs_mux_destroy 369 366  -3
v9fs_mux_cancel  382 379  -3
o2net_init   441 438  -3
inotify_add_watch285 280  -5
v9fs_session_init   14901484  -6
unnamed_dev_idr   32  24  -8
unit_table32  24  -8
tcp_ps32  24  -8
sdp_ps32  24  -8
sd_index_idr  32  24  -8
sctp_assocs_id32  24  -8
rtc_idr   32  24  -8
query_idr 32  24  -8
proc_inum_idr 32  24  -8
posix_timers_id   32  24  -8
mmc_host_idr  32  24  -8
lpfc_hba_index32  24  -8
ib_uverbs_srq_idr 32  24  -8
ib_uverbs_qp_idr  32  24  -8
ib_uverbs_pd_idr  

Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-14 Thread Arjan van de Ven
On Thu, 2006-07-13 at 17:18 -0700, Roland Dreier wrote:
 Arjan it does get harder if this is needed for your IB device to
 Arjan do more work, so that your swap device on your IB can take
 Arjan more IO's to free up ram..
 
 That's the classic problem, but it's more a matter of the consumer
 using GFP_NOIO in the right places.

GFP_NOIO isn't going to save you in the cases where the memory really is
running low and you need the memory to do more IO...



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Roland Dreier
  Sigh.  It was always a mistake (of the kernel programming 101 type) to put
  any locking at all in the idr code.  At some stage we need to weed it all
  out and move it to callers.
  
  Your fix is yet more fallout from that mistake.

Agreed.  Consider me on the hook to fix this up in a better way once
my life is a little saner.  Maybe I'll try to cook something up on the
plane ride to Ottawa.

Anyway you can punch me in the stomach if I don't have something in
time for 2.6.19.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Andrew Morton
On Thu, 13 Jul 2006 08:42:47 -0700
Roland Dreier [EMAIL PROTECTED] wrote:

   Sigh.  It was always a mistake (of the kernel programming 101 type) to put
   any locking at all in the idr code.  At some stage we need to weed it all
   out and move it to callers.
   
   Your fix is yet more fallout from that mistake.
 
 Agreed.  Consider me on the hook to fix this up in a better way once
 my life is a little saner.  Maybe I'll try to cook something up on the
 plane ride to Ottawa.
 

I suspect it'll get really ugly.  It's a container library which needs to
allocate memory when items are added, like the radix-tree.  Either it needs
to assume GFP_ATOMIC, which is bad and can easily fail or it does weird
things like radix_tree_preload().

The basic problem is:

idr_pre_get(GFP_KERNEL);
spin_lock(my_lock);
idr_get_new(..);

which is racy, because some other CPU could have got in there and consumed
some of the pool which was precharged by idr_pre_get().

It's wildly improbable that it'll actually fail.  It requires all of:

a) that the race occur

b) that the racing thread consume an appreciable amount of the pool

c) that this thread also consume an appreciable amount (such that the
   total of both exceeds the pool size).

d) that a (needs to be added) GFP_ATOMIC attempt to replenish the pool
   inside idr_get_new() fails.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Roland Dreier
  I suspect it'll get really ugly.  It's a container library which needs to
  allocate memory when items are added, like the radix-tree.  Either it needs
  to assume GFP_ATOMIC, which is bad and can easily fail or it does weird
  things like radix_tree_preload().

Actually I don't think it has to be too bad.  We could tweak the
interface a little bit so that consumers do something like:

struct idr_layer *layer = NULL; /* opaque */

retry:
spin_lock(my_idr_lock);
ret = idr_get_new(my_idr, ptr, id, layer);
spin_unlock(my_idr_lock);

if (ret == -EAGAIN) {
layer = idr_alloc_layer(my_idr, GFP_KERNEL);
if (!IS_ERR(layer))
goto retry;
}

in other words make the consumer responsible for passing in new memory
that can be used for a new entry (or freed if other entries have
become free in the meantime).

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Arjan van de Ven
On Thu, 2006-07-13 at 14:03 -0700, Roland Dreier wrote:
   I suspect it'll get really ugly.  It's a container library which needs to
   allocate memory when items are added, like the radix-tree.  Either it needs
   to assume GFP_ATOMIC, which is bad and can easily fail or it does weird
   things like radix_tree_preload().
 
 Actually I don't think it has to be too bad.  We could tweak the
 interface a little bit so that consumers do something like:
 
   

it does get harder if this is needed for your IB device to do 
more work, so that your swap device on your IB can take more IO's
to free up ram..



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Andrew Morton
On Thu, 13 Jul 2006 14:03:21 -0700
Roland Dreier [EMAIL PROTECTED] wrote:

   I suspect it'll get really ugly.  It's a container library which needs to
   allocate memory when items are added, like the radix-tree.  Either it needs
   to assume GFP_ATOMIC, which is bad and can easily fail or it does weird
   things like radix_tree_preload().
 
 Actually I don't think it has to be too bad.  We could tweak the
 interface a little bit so that consumers do something like:
 
   struct idr_layer *layer = NULL; /* opaque */
 
 retry:
 spin_lock(my_idr_lock);
   ret = idr_get_new(my_idr, ptr, id, layer);
 spin_unlock(my_idr_lock);
 
 if (ret == -EAGAIN) {
   layer = idr_alloc_layer(my_idr, GFP_KERNEL);
   if (!IS_ERR(layer))
   goto retry;
   }
 
 in other words make the consumer responsible for passing in new memory
 that can be used for a new entry (or freed if other entries have
 become free in the meantime).
 

Good point, a try-again loop would work.  Do we really need the caller to
maintain a cache?  I suspect something like

drat:
if (idr_pre_get(GFP_KERNEL) == ENOMEM)
give_up();
spin_lock();
ret = idr_get_new();
spin_unlock();
if (ret == ENOMEM)
goto drat;

would do it.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Roland Dreier
Arjan it does get harder if this is needed for your IB device to
Arjan do more work, so that your swap device on your IB can take
Arjan more IO's to free up ram..

That's the classic problem, but it's more a matter of the consumer
using GFP_NOIO in the right places.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Roland Dreier
  Good point, a try-again loop would work.  Do we really need the caller to
  maintain a cache?  I suspect something like
  
  drat:
   if (idr_pre_get(GFP_KERNEL) == ENOMEM)
   give_up();
   spin_lock();
   ret = idr_get_new();
   spin_unlock();
   if (ret == ENOMEM)
   goto drat;
  
  would do it.

The problem (for my tiny brain at least) is that I don't know where
idr_pre_get() can put the memory it allocates if there's no lock in
the idr structure -- how do you maintain internal consistency if no
locks are held when filling the cache?

Having the caller hold a chunk of memory in a stack variable was the
trick I came up with to get around that.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Andrew Morton
On Thu, 13 Jul 2006 18:08:17 -0700
Roland Dreier [EMAIL PROTECTED] wrote:

   Good point, a try-again loop would work.  Do we really need the caller to
   maintain a cache?  I suspect something like
   
   drat:
  if (idr_pre_get(GFP_KERNEL) == ENOMEM)
  give_up();
  spin_lock();
  ret = idr_get_new();
  spin_unlock();
  if (ret == ENOMEM)
  goto drat;
   
   would do it.
 
 The problem (for my tiny brain at least) is that I don't know where
 idr_pre_get() can put the memory it allocates if there's no lock in
 the idr structure -- how do you maintain internal consistency if no
 locks are held when filling the cache?

argh.  Aren't you supposed to be on vacation or something?

 Having the caller hold a chunk of memory in a stack variable was the
 trick I came up with to get around that.

Yes, that certainly works.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-13 Thread Andrew Morton
On Thu, 13 Jul 2006 18:18:35 -0700
Andrew Morton [EMAIL PROTECTED] wrote:

  Having the caller hold a chunk of memory in a stack variable was the
  trick I came up with to get around that.
 
 Yes, that certainly works.


Problem is, I think, you'll need to preallocate IDR_FREE_MAX items.  And
then free them all again when none of them were consumed (usual).

Yes, storing the preallocated nodes in the idr itself requires locking. 
But that locking is 100% private to the IDR implementation.  It locks only
the preload list and not the user's stuff.

radix_tree_preload() effectively does this.  Except the preload list is
kernel-wide.  It's split across CPUs and uses
local_irq_disable/preempt_disable locking tricks as a performance
optimisation.  But conceptually it's the same.

Simply copying that would give something which is known to work...  It
seems like a large amount of fuss, but when you think about it the problem
isn't simple.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-12 Thread Roland Dreier
Currently, the code in lib/idr.c uses a bare spin_lock(idp-lock) to
do internal locking.  This is a nasty trap for code that might call
idr functions from different contexts; for example, it seems perfectly
reasonable to call idr_get_new() from process context and idr_remove()
from interrupt context -- but with the current locking this would lead
to a potential deadlock.

The simplest fix for this is to just convert the idr locking to use
spin_lock_irqsave().

In particular, this fixes a very complicated locking issue detected by
lockdep, involving the ib_ipoib driver's priv-lock and dev-_xmit_lock,
which get involved with the ib_sa module's query_idr.lock.

Cc: Arjan van de Ven [EMAIL PROTECTED]
Cc: Ingo Molnar [EMAIL PROTECTED]
Cc: Zach Brown [EMAIL PROTECTED],
Signed-off-by: Roland Dreier [EMAIL PROTECTED]

---

diff --git a/lib/idr.c b/lib/idr.c
index 4d09681..16d2143 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -38,14 +38,15 @@ static kmem_cache_t *idr_layer_cache;
 static struct idr_layer *alloc_layer(struct idr *idp)
 {
struct idr_layer *p;
+   unsigned long flags;
 
-   spin_lock(idp-lock);
+   spin_lock_irqsave(idp-lock, flags);
if ((p = idp-id_free)) {
idp-id_free = p-ary[0];
idp-id_free_cnt--;
p-ary[0] = NULL;
}
-   spin_unlock(idp-lock);
+   spin_unlock_irqrestore(idp-lock, flags);
return(p);
 }
 
@@ -59,12 +60,14 @@ static void __free_layer(struct idr *idp
 
 static void free_layer(struct idr *idp, struct idr_layer *p)
 {
+   unsigned long flags;
+
/*
 * Depends on the return element being zeroed.
 */
-   spin_lock(idp-lock);
+   spin_lock_irqsave(idp-lock, flags);
__free_layer(idp, p);
-   spin_unlock(idp-lock);
+   spin_unlock_irqrestore(idp-lock, flags);
 }
 
 /**
@@ -168,6 +171,7 @@ static int idr_get_new_above_int(struct 
 {
struct idr_layer *p, *new;
int layers, v, id;
+   unsigned long flags;
 
id = starting_id;
 build_up:
@@ -191,14 +195,14 @@ build_up:
 * The allocation failed.  If we built part of
 * the structure tear it down.
 */
-   spin_lock(idp-lock);
+   spin_lock_irqsave(idp-lock, flags);
for (new = p; p  p != idp-top; new = p) {
p = p-ary[0];
new-ary[0] = NULL;
new-bitmap = new-count = 0;
__free_layer(idp, new);
}
-   spin_unlock(idp-lock);
+   spin_unlock_irqrestore(idp-lock, flags);
return -1;
}
new-ary[0] = p;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-12 Thread Ingo Molnar

* Roland Dreier [EMAIL PROTECTED] wrote:

 Currently, the code in lib/idr.c uses a bare spin_lock(idp-lock) to 
 do internal locking.  This is a nasty trap for code that might call 
 idr functions from different contexts; for example, it seems perfectly 
 reasonable to call idr_get_new() from process context and idr_remove() 
 from interrupt context -- but with the current locking this would lead 
 to a potential deadlock.
 
 The simplest fix for this is to just convert the idr locking to use 
 spin_lock_irqsave().
 
 In particular, this fixes a very complicated locking issue detected by 
 lockdep, involving the ib_ipoib driver's priv-lock and 
 dev-_xmit_lock, which get involved with the ib_sa module's 
 query_idr.lock.
 
 Cc: Arjan van de Ven [EMAIL PROTECTED]
 Cc: Ingo Molnar [EMAIL PROTECTED]

Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] Convert idr's internal locking to _irqsave variant

2006-07-12 Thread Andrew Morton
On Wed, 12 Jul 2006 13:45:12 -0700
Roland Dreier [EMAIL PROTECTED] wrote:

 Currently, the code in lib/idr.c uses a bare spin_lock(idp-lock) to
 do internal locking.  This is a nasty trap for code that might call
 idr functions from different contexts; for example, it seems perfectly
 reasonable to call idr_get_new() from process context and idr_remove()
 from interrupt context -- but with the current locking this would lead
 to a potential deadlock.
 
 The simplest fix for this is to just convert the idr locking to use
 spin_lock_irqsave().
 
 In particular, this fixes a very complicated locking issue detected by
 lockdep, involving the ib_ipoib driver's priv-lock and dev-_xmit_lock,
 which get involved with the ib_sa module's query_idr.lock.

Sigh.  It was always a mistake (of the kernel programming 101 type) to put
any locking at all in the idr code.  At some stage we need to weed it all
out and move it to callers.

Your fix is yet more fallout from that mistake.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general