Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-07 Thread Nick Piggin
On Thursday 03 January 2008 19:55, Ingo Molnar wrote:
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> > > Have you done anything more with allowing > 256 CPUS in this
> > > spinlock patch?  We've been testing with 1k cpus and to verify with
> > > -mm kernel, we need to "unpatch" these spinlock changes.
> >
> > Hi Mike,
> >
> > Actually I had it in my mind that 64 bit used single-byte locking like
> > i386, so I didn't think I'd caused a regression there.
> >
> > I'll take a look at fixing that up now.
>
> thanks - this is a serious showstopper for the ticket spinlock patch.
>
> ( which has otherwise been performing very well in x86.git so far - it
>   has passed a few thousand bootup tests on 64-bit and 32-bit as well,
>   so we are close to it being in a mergable state. Would be a pity to
>   lose it due to the 256 cpus limit. )

OK, this is what my test harness code looks like for > 256 CPUs
(basically the same as the in-kernel code, but some names etc. are slightly
different).

It passes my basic tests, and performance doesn't seem to have suffered.
I was going to suggest making the <= 256 vs > 256 cases config options, but
maybe we don't need to unless some CPUs are slow at shifts / rotates? I
don't know...

After I get comments, I will come up with an incremental patch against
the kernel... It will be interesting to know whether ticket locks help
big SGI systems.

static inline void xlock(lock_t *lock)
{
lock_t inc = 0x0001;
lock_t tmp;

__asm__ __volatile__ (
"lock ; xaddl %0, %1\n"
"movzwl %w0, %2\n\t"
"shrl $16, %0\n\t"
"1:\t"
"cmpl %0, %2\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movzwl %1, %2\n\t"
/* don't need lfence here, because loads are in-order */
"jmp 1b\n"
"2:"
:"+Q" (inc), "+m" (*lock), "=r" (tmp)
:
:"memory", "cc");
}

static inline int xtrylock(lock_t *lock)
{
lock_t tmp;
lock_t new;

asm volatile(
"movl %2,%0\n\t"
"movl %0,%1\n\t"
"roll $16, %0\n\t"
"cmpl %0,%1\n\t"
"jne 1f\n\t"
"addl $0x0001, %1\n\t"
"lock ; cmpxchgl %1,%2\n\t"
"1:"
"sete %b1\n\t"
"movzbl %b1,%0\n\t"
:"=" (tmp), "=r" (new), "+m" (*lock)
:
: "memory", "cc");

return tmp;
}

static inline void xunlock(lock_t *lock)
{
__asm__ __volatile__(
"incw %0"
:"+m" (*lock)
:
:"memory", "cc");
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-07 Thread Nick Piggin
On Thursday 03 January 2008 19:55, Ingo Molnar wrote:
 * Nick Piggin [EMAIL PROTECTED] wrote:
   Have you done anything more with allowing  256 CPUS in this
   spinlock patch?  We've been testing with 1k cpus and to verify with
   -mm kernel, we need to unpatch these spinlock changes.
 
  Hi Mike,
 
  Actually I had it in my mind that 64 bit used single-byte locking like
  i386, so I didn't think I'd caused a regression there.
 
  I'll take a look at fixing that up now.

 thanks - this is a serious showstopper for the ticket spinlock patch.

 ( which has otherwise been performing very well in x86.git so far - it
   has passed a few thousand bootup tests on 64-bit and 32-bit as well,
   so we are close to it being in a mergable state. Would be a pity to
   lose it due to the 256 cpus limit. )

OK, this is what my test harness code looks like for  256 CPUs
(basically the same as the in-kernel code, but some names etc. are slightly
different).

It passes my basic tests, and performance doesn't seem to have suffered.
I was going to suggest making the = 256 vs  256 cases config options, but
maybe we don't need to unless some CPUs are slow at shifts / rotates? I
don't know...

After I get comments, I will come up with an incremental patch against
the kernel... It will be interesting to know whether ticket locks help
big SGI systems.

static inline void xlock(lock_t *lock)
{
lock_t inc = 0x0001;
lock_t tmp;

__asm__ __volatile__ (
lock ; xaddl %0, %1\n
movzwl %w0, %2\n\t
shrl $16, %0\n\t
1:\t
cmpl %0, %2\n\t
je 2f\n\t
rep ; nop\n\t
movzwl %1, %2\n\t
/* don't need lfence here, because loads are in-order */
jmp 1b\n
2:
:+Q (inc), +m (*lock), =r (tmp)
:
:memory, cc);
}

static inline int xtrylock(lock_t *lock)
{
lock_t tmp;
lock_t new;

asm volatile(
movl %2,%0\n\t
movl %0,%1\n\t
roll $16, %0\n\t
cmpl %0,%1\n\t
jne 1f\n\t
addl $0x0001, %1\n\t
lock ; cmpxchgl %1,%2\n\t
1:
sete %b1\n\t
movzbl %b1,%0\n\t
:=a (tmp), =r (new), +m (*lock)
:
: memory, cc);

return tmp;
}

static inline void xunlock(lock_t *lock)
{
__asm__ __volatile__(
incw %0
:+m (*lock)
:
:memory, cc);
}


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-03 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > Have you done anything more with allowing > 256 CPUS in this 
> > spinlock patch?  We've been testing with 1k cpus and to verify with 
> > -mm kernel, we need to "unpatch" these spinlock changes.
> 
> Hi Mike,
> 
> Actually I had it in my mind that 64 bit used single-byte locking like 
> i386, so I didn't think I'd caused a regression there.
> 
> I'll take a look at fixing that up now.

thanks - this is a serious showstopper for the ticket spinlock patch. 

( which has otherwise been performing very well in x86.git so far - it 
  has passed a few thousand bootup tests on 64-bit and 32-bit as well, 
  so we are close to it being in a mergable state. Would be a pity to
  lose it due to the 256 cpus limit. )

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-03 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

  Have you done anything more with allowing  256 CPUS in this 
  spinlock patch?  We've been testing with 1k cpus and to verify with 
  -mm kernel, we need to unpatch these spinlock changes.
 
 Hi Mike,
 
 Actually I had it in my mind that 64 bit used single-byte locking like 
 i386, so I didn't think I'd caused a regression there.
 
 I'll take a look at fixing that up now.

thanks - this is a serious showstopper for the ticket spinlock patch. 

( which has otherwise been performing very well in x86.git so far - it 
  has passed a few thousand bootup tests on 64-bit and 32-bit as well, 
  so we are close to it being in a mergable state. Would be a pity to
  lose it due to the 256 cpus limit. )

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-02 Thread Nick Piggin
On Thursday 03 January 2008 10:35, Mike Travis wrote:
> Hi Nick,
>
> Have you done anything more with allowing > 256 CPUS in this spinlock
> patch?  We've been testing with 1k cpus and to verify with -mm kernel,
> we need to "unpatch" these spinlock changes.
>
> Thanks,
> Mike

Hi Mike,

Actually I had it in my mind that 64 bit used single-byte locking like
i386, so I didn't think I'd caused a regression there.

I'll take a look at fixing that up now.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-02 Thread Mike Travis
Hi Nick,

Have you done anything more with allowing > 256 CPUS in this spinlock
patch?  We've been testing with 1k cpus and to verify with -mm kernel,
we need to "unpatch" these spinlock changes.

Thanks,
Mike

Nick Piggin wrote:
> On Thursday 20 December 2007 18:04, Christoph Lameter wrote:
>>> The only reason the x86 ticket locks have the 256 CPu limit is that
>>> if they go any bigger, we can't use the partial registers so would
>>> have to have a few more instructions.
>> x86_64 is going up to 4k or 16k cpus soon for our new hardware.
>>
>>> A 32 bit spinlock would allow 64K cpus (ticket lock has 2 counters,
>>> each would be 16 bits). And it would actually shrink the spinlock in
>>> the case of preempt kernels too (because it would no longer have the
>>> lockbreak field).
>>>
>>> And yes, I'll go out on a limb and say that 64k CPUs ought to be
>>> enough for anyone ;)
>> I think those things need a timeframe applied to it. Thats likely
>> going to be true for the next 3 years (optimistic assessment ;-)).
> 
> Yeah, that was tongue in cheek ;)
> 
> 
>> Could you go to 32bit spinlock by default?
> 
> On x86, the size of the ticket locks is 32 bit, simply because I didn't
> want to risk possible alignment bugs (a subsequent patch cuts it down to
> 16 bits, but this is a much smaller win than 64->32 in general because
> of natural alignment of types).
> 
> Note that the ticket locks still support twice the number as the old
> spinlocks, so I'm not causing a regression here... but yes, increasing
> the size further will require an extra instruction or two.
> 
>> How about NUMA awareness for the spinlocks? Larger backoff periods for
>> off node lock contentions please.
> 
> ticket locks can naturally tell you how many waiters there are, and how
> many waiters are in front of you, so it is really nice for doing backoff
> (eg. you can adapt the backoff *very* nicely depending on how many are in
> front of you, and how quickly you are moving toward the front).
> 
> Also, since I got rid of the ->break_lock field, you could use that space
> perhaps to add a cpu # of the lock holder for even more backoff context
> (if you find that helps).
> 
> Anyway, I didn't do any of that because it obviously needs someone with
> real hardware in order to tune it properly.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [EMAIL PROTECTED]  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"[EMAIL PROTECTED]"> [EMAIL PROTECTED] 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-02 Thread Nick Piggin
On Thursday 03 January 2008 10:35, Mike Travis wrote:
 Hi Nick,

 Have you done anything more with allowing  256 CPUS in this spinlock
 patch?  We've been testing with 1k cpus and to verify with -mm kernel,
 we need to unpatch these spinlock changes.

 Thanks,
 Mike

Hi Mike,

Actually I had it in my mind that 64 bit used single-byte locking like
i386, so I didn't think I'd caused a regression there.

I'll take a look at fixing that up now.

Thanks,
Nick
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-20 Thread Nick Piggin
On Thursday 20 December 2007 18:04, Christoph Lameter wrote:
> > The only reason the x86 ticket locks have the 256 CPu limit is that
> > if they go any bigger, we can't use the partial registers so would
> > have to have a few more instructions.
>
> x86_64 is going up to 4k or 16k cpus soon for our new hardware.
>
> > A 32 bit spinlock would allow 64K cpus (ticket lock has 2 counters,
> > each would be 16 bits). And it would actually shrink the spinlock in
> > the case of preempt kernels too (because it would no longer have the
> > lockbreak field).
> >
> > And yes, I'll go out on a limb and say that 64k CPUs ought to be
> > enough for anyone ;)
>
> I think those things need a timeframe applied to it. Thats likely
> going to be true for the next 3 years (optimistic assessment ;-)).

Yeah, that was tongue in cheek ;)


> Could you go to 32bit spinlock by default?

On x86, the size of the ticket locks is 32 bit, simply because I didn't
want to risk possible alignment bugs (a subsequent patch cuts it down to
16 bits, but this is a much smaller win than 64->32 in general because
of natural alignment of types).

Note that the ticket locks still support twice the number as the old
spinlocks, so I'm not causing a regression here... but yes, increasing
the size further will require an extra instruction or two.

> How about NUMA awareness for the spinlocks? Larger backoff periods for
> off node lock contentions please.

ticket locks can naturally tell you how many waiters there are, and how
many waiters are in front of you, so it is really nice for doing backoff
(eg. you can adapt the backoff *very* nicely depending on how many are in
front of you, and how quickly you are moving toward the front).

Also, since I got rid of the ->break_lock field, you could use that space
perhaps to add a cpu # of the lock holder for even more backoff context
(if you find that helps).

Anyway, I didn't do any of that because it obviously needs someone with
real hardware in order to tune it properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-20 Thread Nick Piggin
On Thursday 20 December 2007 18:04, Christoph Lameter wrote:
  The only reason the x86 ticket locks have the 256 CPu limit is that
  if they go any bigger, we can't use the partial registers so would
  have to have a few more instructions.

 x86_64 is going up to 4k or 16k cpus soon for our new hardware.

  A 32 bit spinlock would allow 64K cpus (ticket lock has 2 counters,
  each would be 16 bits). And it would actually shrink the spinlock in
  the case of preempt kernels too (because it would no longer have the
  lockbreak field).
 
  And yes, I'll go out on a limb and say that 64k CPUs ought to be
  enough for anyone ;)

 I think those things need a timeframe applied to it. Thats likely
 going to be true for the next 3 years (optimistic assessment ;-)).

Yeah, that was tongue in cheek ;)


 Could you go to 32bit spinlock by default?

On x86, the size of the ticket locks is 32 bit, simply because I didn't
want to risk possible alignment bugs (a subsequent patch cuts it down to
16 bits, but this is a much smaller win than 64-32 in general because
of natural alignment of types).

Note that the ticket locks still support twice the number as the old
spinlocks, so I'm not causing a regression here... but yes, increasing
the size further will require an extra instruction or two.

 How about NUMA awareness for the spinlocks? Larger backoff periods for
 off node lock contentions please.

ticket locks can naturally tell you how many waiters there are, and how
many waiters are in front of you, so it is really nice for doing backoff
(eg. you can adapt the backoff *very* nicely depending on how many are in
front of you, and how quickly you are moving toward the front).

Also, since I got rid of the -break_lock field, you could use that space
perhaps to add a cpu # of the lock holder for even more backoff context
(if you find that helps).

Anyway, I didn't do any of that because it obviously needs someone with
real hardware in order to tune it properly.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Christoph Lameter


> The only reason the x86 ticket locks have the 256 CPu limit is that
> if they go any bigger, we can't use the partial registers so would
> have to have a few more instructions.

x86_64 is going up to 4k or 16k cpus soon for our new hardware.

> A 32 bit spinlock would allow 64K cpus (ticket lock has 2 counters,
> each would be 16 bits). And it would actually shrink the spinlock in
> the case of preempt kernels too (because it would no longer have the
> lockbreak field).
> 
> And yes, I'll go out on a limb and say that 64k CPUs ought to be
> enough for anyone ;)

I think those things need a timeframe applied to it. Thats likely 
going to be true for the next 3 years (optimistic assessment ;-)).

Could you go to 32bit spinlock by default?

How about NUMA awareness for the spinlocks? Larger backoff periods for 
off node lock contentions please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Nick Piggin
On Thursday 20 December 2007 06:28, Peter Zijlstra wrote:
> On Wed, 2007-12-19 at 11:53 -0500, Lee Schermerhorn wrote:
> > On Wed, 2007-12-19 at 11:31 -0500, Rik van Riel wrote:
> > > On Wed, 19 Dec 2007 10:52:09 -0500
> > >
> > > Lee Schermerhorn <[EMAIL PROTECTED]> wrote:
> > > > I keep these patches up to date for testing.  I don't have conclusive
> > > > evidence whether they alleviate or exacerbate the problem nor by how
> > > > much.
> > >
> > > When the queued locking from Ingo's x86 tree hits mainline,
> > > I suspect that spinlocks may end up behaving a lot nicer.
> >
> > That would be worth testing with our problematic workloads...
> >
> > > Should I drop the rwlock patches from my tree for now and
> > > focus on just the page reclaim stuff?
> >
> > That's fine with me.  They're out there is anyone is interested.  I'll
> > keep them up to date in my tree [and hope they don't conflict with split
> > lru and noreclaim patches too much] for occasional testing.
>
> Of course, someone would need to implement ticket locks for ia64 -
> preferably without the 256 cpu limit.

Yep. Wouldn't be hard at all -- ia64 has a "fetchadd" with acquire
semantics.

The only reason the x86 ticket locks have the 256 CPu limit is that
if they go any bigger, we can't use the partial registers so would
have to have a few more instructions.


> Nick, growing spinlock_t to 64 bits would yield space for 64k cpus
> right? I'm guessing that would be enough for a while, even for SGI.

A 32 bit spinlock would allow 64K cpus (ticket lock has 2 counters,
each would be 16 bits). And it would actually shrink the spinlock in
the case of preempt kernels too (because it would no longer have the
lockbreak field).

And yes, I'll go out on a limb and say that 64k CPUs ought to be
enough for anyone ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Peter Zijlstra

On Wed, 2007-12-19 at 11:53 -0500, Lee Schermerhorn wrote:
> On Wed, 2007-12-19 at 11:31 -0500, Rik van Riel wrote:
> > On Wed, 19 Dec 2007 10:52:09 -0500
> > Lee Schermerhorn <[EMAIL PROTECTED]> wrote:
> > 
> > > I keep these patches up to date for testing.  I don't have conclusive
> > > evidence whether they alleviate or exacerbate the problem nor by how
> > > much.  
> > 
> > When the queued locking from Ingo's x86 tree hits mainline,
> > I suspect that spinlocks may end up behaving a lot nicer.
> 
> That would be worth testing with our problematic workloads...
> 
> > 
> > Should I drop the rwlock patches from my tree for now and
> > focus on just the page reclaim stuff?
> 
> That's fine with me.  They're out there is anyone is interested.  I'll
> keep them up to date in my tree [and hope they don't conflict with split
> lru and noreclaim patches too much] for occasional testing.

Of course, someone would need to implement ticket locks for ia64 -
preferably without the 256 cpu limit.

Nick, growing spinlock_t to 64 bits would yield space for 64k cpus,
right? I'm guessing that would be enough for a while, even for SGI.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Lee Schermerhorn
On Wed, 2007-12-19 at 11:31 -0500, Rik van Riel wrote:
> On Wed, 19 Dec 2007 10:52:09 -0500
> Lee Schermerhorn <[EMAIL PROTECTED]> wrote:
> 
> > I keep these patches up to date for testing.  I don't have conclusive
> > evidence whether they alleviate or exacerbate the problem nor by how
> > much.  
> 
> When the queued locking from Ingo's x86 tree hits mainline,
> I suspect that spinlocks may end up behaving a lot nicer.

That would be worth testing with our problematic workloads...

> 
> Should I drop the rwlock patches from my tree for now and
> focus on just the page reclaim stuff?

That's fine with me.  They're out there is anyone is interested.  I'll
keep them up to date in my tree [and hope they don't conflict with split
lru and noreclaim patches too much] for occasional testing.

Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Rik van Riel
On Wed, 19 Dec 2007 10:52:09 -0500
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> I keep these patches up to date for testing.  I don't have conclusive
> evidence whether they alleviate or exacerbate the problem nor by how
> much.  

When the queued locking from Ingo's x86 tree hits mainline,
I suspect that spinlocks may end up behaving a lot nicer.

Should I drop the rwlock patches from my tree for now and
focus on just the page reclaim stuff?

-- 
All Rights Reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Lee Schermerhorn
On Wed, 2007-12-19 at 11:48 +1100, Nick Piggin wrote:
> On Wednesday 19 December 2007 08:15, Rik van Riel wrote:
> > I have seen soft cpu lockups in page_referenced_file() due to
> > contention on i_mmap_lock() for different pages.  Making the
> > i_mmap_lock a reader/writer lock should increase parallelism
> > in vmscan for file back pages mapped into many address spaces.
> >
> > Read lock the i_mmap_lock for all usage except:
> >
> > 1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
> > 2) unmap_mapping_range:   protecting vm_truncate_count
> >
> > rmap:  try_to_unmap_file() required new cond_resched_rwlock().
> > To reduce code duplication, I recast cond_resched_lock() as a
> > [static inline] wrapper around reworked cond_sched_lock() =>
> > __cond_resched_lock(void *lock, int type).
> > New cond_resched_rwlock() implemented as another wrapper.
> 
> Reader/writer locks really suck in terms of fairness and starvation,
> especially when the read-side is common and frequent. (also, single
> threaded performance of the read-side is worse).
> 
> I know Lee saw some big latencies on the anon_vma list lock when
> running (IIRC) a large benchmark... but are there more realistic
> situations where this is a problem?

Yes, we see the stall on the anon_vma lock most frequently running the
AIM benchmark with several tens of thousands of processes--all forked
from the same parent.  If we push the system into reclaim, all cpus end
up spinning on the lock in one of the anon_vma's shared by all the
tasks.  Quite easy to reproduce.  I have also seen this running stress
tests to force reclaim under Dave Anderson's "usex" exerciser--e.g.,
testing the split LRU and noreclaim patches--even with the reader-writer
lock patch. 

I've seen the lockups on the i_mmap_lock running Oracle workloads on our
large servers.  This is running an OLTP workload with only a thousand or
so "clients" all running the same application image.   Again, when the
system attempts to reclaim we end up spinning on the i_mmap_lock of one
of the files [possibly the shared global shmem segment] shared by all
the applications.  I also see it with the usex stress load--also, with
and without this patch.  I think this is a more probably
scenario--thousands of processes sharing a single file, such as
libc.so--than thousands of processes all descended from a single
ancestor w/o exec'ing.

I keep these patches up to date for testing.  I don't have conclusive
evidence whether they alleviate or exacerbate the problem nor by how
much.  

Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Lee Schermerhorn
On Wed, 2007-12-19 at 11:48 +1100, Nick Piggin wrote:
 On Wednesday 19 December 2007 08:15, Rik van Riel wrote:
  I have seen soft cpu lockups in page_referenced_file() due to
  contention on i_mmap_lock() for different pages.  Making the
  i_mmap_lock a reader/writer lock should increase parallelism
  in vmscan for file back pages mapped into many address spaces.
 
  Read lock the i_mmap_lock for all usage except:
 
  1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
  2) unmap_mapping_range:   protecting vm_truncate_count
 
  rmap:  try_to_unmap_file() required new cond_resched_rwlock().
  To reduce code duplication, I recast cond_resched_lock() as a
  [static inline] wrapper around reworked cond_sched_lock() =
  __cond_resched_lock(void *lock, int type).
  New cond_resched_rwlock() implemented as another wrapper.
 
 Reader/writer locks really suck in terms of fairness and starvation,
 especially when the read-side is common and frequent. (also, single
 threaded performance of the read-side is worse).
 
 I know Lee saw some big latencies on the anon_vma list lock when
 running (IIRC) a large benchmark... but are there more realistic
 situations where this is a problem?

Yes, we see the stall on the anon_vma lock most frequently running the
AIM benchmark with several tens of thousands of processes--all forked
from the same parent.  If we push the system into reclaim, all cpus end
up spinning on the lock in one of the anon_vma's shared by all the
tasks.  Quite easy to reproduce.  I have also seen this running stress
tests to force reclaim under Dave Anderson's usex exerciser--e.g.,
testing the split LRU and noreclaim patches--even with the reader-writer
lock patch. 

I've seen the lockups on the i_mmap_lock running Oracle workloads on our
large servers.  This is running an OLTP workload with only a thousand or
so clients all running the same application image.   Again, when the
system attempts to reclaim we end up spinning on the i_mmap_lock of one
of the files [possibly the shared global shmem segment] shared by all
the applications.  I also see it with the usex stress load--also, with
and without this patch.  I think this is a more probably
scenario--thousands of processes sharing a single file, such as
libc.so--than thousands of processes all descended from a single
ancestor w/o exec'ing.

I keep these patches up to date for testing.  I don't have conclusive
evidence whether they alleviate or exacerbate the problem nor by how
much.  

Lee

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Rik van Riel
On Wed, 19 Dec 2007 10:52:09 -0500
Lee Schermerhorn [EMAIL PROTECTED] wrote:

 I keep these patches up to date for testing.  I don't have conclusive
 evidence whether they alleviate or exacerbate the problem nor by how
 much.  

When the queued locking from Ingo's x86 tree hits mainline,
I suspect that spinlocks may end up behaving a lot nicer.

Should I drop the rwlock patches from my tree for now and
focus on just the page reclaim stuff?

-- 
All Rights Reversed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Lee Schermerhorn
On Wed, 2007-12-19 at 11:31 -0500, Rik van Riel wrote:
 On Wed, 19 Dec 2007 10:52:09 -0500
 Lee Schermerhorn [EMAIL PROTECTED] wrote:
 
  I keep these patches up to date for testing.  I don't have conclusive
  evidence whether they alleviate or exacerbate the problem nor by how
  much.  
 
 When the queued locking from Ingo's x86 tree hits mainline,
 I suspect that spinlocks may end up behaving a lot nicer.

That would be worth testing with our problematic workloads...

 
 Should I drop the rwlock patches from my tree for now and
 focus on just the page reclaim stuff?

That's fine with me.  They're out there is anyone is interested.  I'll
keep them up to date in my tree [and hope they don't conflict with split
lru and noreclaim patches too much] for occasional testing.

Lee

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Peter Zijlstra

On Wed, 2007-12-19 at 11:53 -0500, Lee Schermerhorn wrote:
 On Wed, 2007-12-19 at 11:31 -0500, Rik van Riel wrote:
  On Wed, 19 Dec 2007 10:52:09 -0500
  Lee Schermerhorn [EMAIL PROTECTED] wrote:
  
   I keep these patches up to date for testing.  I don't have conclusive
   evidence whether they alleviate or exacerbate the problem nor by how
   much.  
  
  When the queued locking from Ingo's x86 tree hits mainline,
  I suspect that spinlocks may end up behaving a lot nicer.
 
 That would be worth testing with our problematic workloads...
 
  
  Should I drop the rwlock patches from my tree for now and
  focus on just the page reclaim stuff?
 
 That's fine with me.  They're out there is anyone is interested.  I'll
 keep them up to date in my tree [and hope they don't conflict with split
 lru and noreclaim patches too much] for occasional testing.

Of course, someone would need to implement ticket locks for ia64 -
preferably without the 256 cpu limit.

Nick, growing spinlock_t to 64 bits would yield space for 64k cpus,
right? I'm guessing that would be enough for a while, even for SGI.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Nick Piggin
On Thursday 20 December 2007 06:28, Peter Zijlstra wrote:
 On Wed, 2007-12-19 at 11:53 -0500, Lee Schermerhorn wrote:
  On Wed, 2007-12-19 at 11:31 -0500, Rik van Riel wrote:
   On Wed, 19 Dec 2007 10:52:09 -0500
  
   Lee Schermerhorn [EMAIL PROTECTED] wrote:
I keep these patches up to date for testing.  I don't have conclusive
evidence whether they alleviate or exacerbate the problem nor by how
much.
  
   When the queued locking from Ingo's x86 tree hits mainline,
   I suspect that spinlocks may end up behaving a lot nicer.
 
  That would be worth testing with our problematic workloads...
 
   Should I drop the rwlock patches from my tree for now and
   focus on just the page reclaim stuff?
 
  That's fine with me.  They're out there is anyone is interested.  I'll
  keep them up to date in my tree [and hope they don't conflict with split
  lru and noreclaim patches too much] for occasional testing.

 Of course, someone would need to implement ticket locks for ia64 -
 preferably without the 256 cpu limit.

Yep. Wouldn't be hard at all -- ia64 has a fetchadd with acquire
semantics.

The only reason the x86 ticket locks have the 256 CPu limit is that
if they go any bigger, we can't use the partial registers so would
have to have a few more instructions.


 Nick, growing spinlock_t to 64 bits would yield space for 64k cpus
 right? I'm guessing that would be enough for a while, even for SGI.

A 32 bit spinlock would allow 64K cpus (ticket lock has 2 counters,
each would be 16 bits). And it would actually shrink the spinlock in
the case of preempt kernels too (because it would no longer have the
lockbreak field).

And yes, I'll go out on a limb and say that 64k CPUs ought to be
enough for anyone ;)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-19 Thread Christoph Lameter


 The only reason the x86 ticket locks have the 256 CPu limit is that
 if they go any bigger, we can't use the partial registers so would
 have to have a few more instructions.

x86_64 is going up to 4k or 16k cpus soon for our new hardware.

 A 32 bit spinlock would allow 64K cpus (ticket lock has 2 counters,
 each would be 16 bits). And it would actually shrink the spinlock in
 the case of preempt kernels too (because it would no longer have the
 lockbreak field).
 
 And yes, I'll go out on a limb and say that 64k CPUs ought to be
 enough for anyone ;)

I think those things need a timeframe applied to it. Thats likely 
going to be true for the next 3 years (optimistic assessment ;-)).

Could you go to 32bit spinlock by default?

How about NUMA awareness for the spinlocks? Larger backoff periods for 
off node lock contentions please.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-18 Thread KOSAKI Motohiro
Hi

> > rmap:  try_to_unmap_file() required new cond_resched_rwlock().
> > To reduce code duplication, I recast cond_resched_lock() as a
> > [static inline] wrapper around reworked cond_sched_lock() =>
> > __cond_resched_lock(void *lock, int type).
> > New cond_resched_rwlock() implemented as another wrapper.
> 
> Reader/writer locks really suck in terms of fairness and starvation,
> especially when the read-side is common and frequent. (also, single
> threaded performance of the read-side is worse).

Agreed.

rwlock got bad performance some case. (especially on many cpu machine)

if many cpu grab read-lock on and off on many cpu system.
then at least 1 cpu always grab read lock and the cpu of waiting write-lock 
never get lock.

threrefore, rwlock often make performance weakness of stress.


I want know testcase for this patch and run it.
Do you have it?


/kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-18 Thread Nick Piggin
On Wednesday 19 December 2007 08:15, Rik van Riel wrote:
> I have seen soft cpu lockups in page_referenced_file() due to
> contention on i_mmap_lock() for different pages.  Making the
> i_mmap_lock a reader/writer lock should increase parallelism
> in vmscan for file back pages mapped into many address spaces.
>
> Read lock the i_mmap_lock for all usage except:
>
> 1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
> 2) unmap_mapping_range:   protecting vm_truncate_count
>
> rmap:  try_to_unmap_file() required new cond_resched_rwlock().
> To reduce code duplication, I recast cond_resched_lock() as a
> [static inline] wrapper around reworked cond_sched_lock() =>
> __cond_resched_lock(void *lock, int type).
> New cond_resched_rwlock() implemented as another wrapper.

Reader/writer locks really suck in terms of fairness and starvation,
especially when the read-side is common and frequent. (also, single
threaded performance of the read-side is worse).

I know Lee saw some big latencies on the anon_vma list lock when
running (IIRC) a large benchmark... but are there more realistic
situations where this is a problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-18 Thread Rik van Riel
I have seen soft cpu lockups in page_referenced_file() due to 
contention on i_mmap_lock() for different pages.  Making the
i_mmap_lock a reader/writer lock should increase parallelism
in vmscan for file back pages mapped into many address spaces.

Read lock the i_mmap_lock for all usage except:

1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
2) unmap_mapping_range:   protecting vm_truncate_count

rmap:  try_to_unmap_file() required new cond_resched_rwlock().
To reduce code duplication, I recast cond_resched_lock() as a
[static inline] wrapper around reworked cond_sched_lock() =>
__cond_resched_lock(void *lock, int type). 
New cond_resched_rwlock() implemented as another wrapper.  


Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>
Signed-off-by:  Rik van Riel <[EMAIL PROTECTED]>

Index: linux-2.6.24-rc5-mm1/include/linux/fs.h
===
--- linux-2.6.24-rc5-mm1.orig/include/linux/fs.h
+++ linux-2.6.24-rc5-mm1/include/linux/fs.h
@@ -501,7 +501,7 @@ struct address_space {
unsigned inti_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root   i_mmap; /* tree of private and shared 
mappings */
struct list_headi_mmap_nonlinear;/*list VM_NONLINEAR mappings */
-   spinlock_t  i_mmap_lock;/* protect tree, count, list */
+   rwlock_ti_mmap_lock;/* protect tree, count, list */
unsigned inttruncate_count; /* Cover race condition with 
truncate */
unsigned long   nrpages;/* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
Index: linux-2.6.24-rc5-mm1/include/linux/mm.h
===
--- linux-2.6.24-rc5-mm1.orig/include/linux/mm.h
+++ linux-2.6.24-rc5-mm1/include/linux/mm.h
@@ -707,7 +707,7 @@ struct zap_details {
struct address_space *check_mapping;/* Check page->mapping if set */
pgoff_t first_index;/* Lowest page->index to unmap 
*/
pgoff_t last_index; /* Highest page->index to unmap 
*/
-   spinlock_t *i_mmap_lock;/* For unmap_mapping_range: */
+   rwlock_t *i_mmap_lock;  /* For unmap_mapping_range: */
unsigned long truncate_count;   /* Compare vm_truncate_count */
 };
 
Index: linux-2.6.24-rc5-mm1/fs/inode.c
===
--- linux-2.6.24-rc5-mm1.orig/fs/inode.c
+++ linux-2.6.24-rc5-mm1/fs/inode.c
@@ -210,7 +210,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(>i_devices);
INIT_RADIX_TREE(>i_data.page_tree, GFP_ATOMIC);
rwlock_init(>i_data.tree_lock);
-   spin_lock_init(>i_data.i_mmap_lock);
+   rwlock_init(>i_data.i_mmap_lock);
INIT_LIST_HEAD(>i_data.private_list);
spin_lock_init(>i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(>i_data.i_mmap);
Index: linux-2.6.24-rc5-mm1/fs/hugetlbfs/inode.c
===
--- linux-2.6.24-rc5-mm1.orig/fs/hugetlbfs/inode.c
+++ linux-2.6.24-rc5-mm1/fs/hugetlbfs/inode.c
@@ -420,6 +420,9 @@ static void hugetlbfs_drop_inode(struct 
hugetlbfs_forget_inode(inode);
 }
 
+/*
+ * LOCKING:  __unmap_hugepage_range() requires write lock on i_mmap_lock
+ */
 static inline void
 hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
 {
@@ -454,10 +457,10 @@ static int hugetlb_vmtruncate(struct ino
pgoff = offset >> PAGE_SHIFT;
 
i_size_write(inode, offset);
-   spin_lock(>i_mmap_lock);
+   write_lock(>i_mmap_lock);
if (!prio_tree_empty(>i_mmap))
hugetlb_vmtruncate_list(>i_mmap, pgoff);
-   spin_unlock(>i_mmap_lock);
+   write_unlock(>i_mmap_lock);
truncate_hugepages(inode, offset);
return 0;
 }
Index: linux-2.6.24-rc5-mm1/kernel/fork.c
===
--- linux-2.6.24-rc5-mm1.orig/kernel/fork.c
+++ linux-2.6.24-rc5-mm1/kernel/fork.c
@@ -272,12 +272,12 @@ static int dup_mmap(struct mm_struct *mm
atomic_dec(>i_writecount);
 
/* insert tmp into the share list, just after mpnt */
-   spin_lock(>f_mapping->i_mmap_lock);
+   write_lock(>f_mapping->i_mmap_lock);
tmp->vm_truncate_count = mpnt->vm_truncate_count;
flush_dcache_mmap_lock(file->f_mapping);
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(file->f_mapping);
-   spin_unlock(>f_mapping->i_mmap_lock);
+   write_unlock(>f_mapping->i_mmap_lock);
}
 
/*
Index: linux-2.6.24-rc5-mm1/mm/filemap_xip.c

[patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-18 Thread Rik van Riel
I have seen soft cpu lockups in page_referenced_file() due to 
contention on i_mmap_lock() for different pages.  Making the
i_mmap_lock a reader/writer lock should increase parallelism
in vmscan for file back pages mapped into many address spaces.

Read lock the i_mmap_lock for all usage except:

1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
2) unmap_mapping_range:   protecting vm_truncate_count

rmap:  try_to_unmap_file() required new cond_resched_rwlock().
To reduce code duplication, I recast cond_resched_lock() as a
[static inline] wrapper around reworked cond_sched_lock() =
__cond_resched_lock(void *lock, int type). 
New cond_resched_rwlock() implemented as another wrapper.  


Signed-off-by:  Lee Schermerhorn [EMAIL PROTECTED]
Signed-off-by:  Rik van Riel [EMAIL PROTECTED]

Index: linux-2.6.24-rc5-mm1/include/linux/fs.h
===
--- linux-2.6.24-rc5-mm1.orig/include/linux/fs.h
+++ linux-2.6.24-rc5-mm1/include/linux/fs.h
@@ -501,7 +501,7 @@ struct address_space {
unsigned inti_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root   i_mmap; /* tree of private and shared 
mappings */
struct list_headi_mmap_nonlinear;/*list VM_NONLINEAR mappings */
-   spinlock_t  i_mmap_lock;/* protect tree, count, list */
+   rwlock_ti_mmap_lock;/* protect tree, count, list */
unsigned inttruncate_count; /* Cover race condition with 
truncate */
unsigned long   nrpages;/* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
Index: linux-2.6.24-rc5-mm1/include/linux/mm.h
===
--- linux-2.6.24-rc5-mm1.orig/include/linux/mm.h
+++ linux-2.6.24-rc5-mm1/include/linux/mm.h
@@ -707,7 +707,7 @@ struct zap_details {
struct address_space *check_mapping;/* Check page-mapping if set */
pgoff_t first_index;/* Lowest page-index to unmap 
*/
pgoff_t last_index; /* Highest page-index to unmap 
*/
-   spinlock_t *i_mmap_lock;/* For unmap_mapping_range: */
+   rwlock_t *i_mmap_lock;  /* For unmap_mapping_range: */
unsigned long truncate_count;   /* Compare vm_truncate_count */
 };
 
Index: linux-2.6.24-rc5-mm1/fs/inode.c
===
--- linux-2.6.24-rc5-mm1.orig/fs/inode.c
+++ linux-2.6.24-rc5-mm1/fs/inode.c
@@ -210,7 +210,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(inode-i_devices);
INIT_RADIX_TREE(inode-i_data.page_tree, GFP_ATOMIC);
rwlock_init(inode-i_data.tree_lock);
-   spin_lock_init(inode-i_data.i_mmap_lock);
+   rwlock_init(inode-i_data.i_mmap_lock);
INIT_LIST_HEAD(inode-i_data.private_list);
spin_lock_init(inode-i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(inode-i_data.i_mmap);
Index: linux-2.6.24-rc5-mm1/fs/hugetlbfs/inode.c
===
--- linux-2.6.24-rc5-mm1.orig/fs/hugetlbfs/inode.c
+++ linux-2.6.24-rc5-mm1/fs/hugetlbfs/inode.c
@@ -420,6 +420,9 @@ static void hugetlbfs_drop_inode(struct 
hugetlbfs_forget_inode(inode);
 }
 
+/*
+ * LOCKING:  __unmap_hugepage_range() requires write lock on i_mmap_lock
+ */
 static inline void
 hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
 {
@@ -454,10 +457,10 @@ static int hugetlb_vmtruncate(struct ino
pgoff = offset  PAGE_SHIFT;
 
i_size_write(inode, offset);
-   spin_lock(mapping-i_mmap_lock);
+   write_lock(mapping-i_mmap_lock);
if (!prio_tree_empty(mapping-i_mmap))
hugetlb_vmtruncate_list(mapping-i_mmap, pgoff);
-   spin_unlock(mapping-i_mmap_lock);
+   write_unlock(mapping-i_mmap_lock);
truncate_hugepages(inode, offset);
return 0;
 }
Index: linux-2.6.24-rc5-mm1/kernel/fork.c
===
--- linux-2.6.24-rc5-mm1.orig/kernel/fork.c
+++ linux-2.6.24-rc5-mm1/kernel/fork.c
@@ -272,12 +272,12 @@ static int dup_mmap(struct mm_struct *mm
atomic_dec(inode-i_writecount);
 
/* insert tmp into the share list, just after mpnt */
-   spin_lock(file-f_mapping-i_mmap_lock);
+   write_lock(file-f_mapping-i_mmap_lock);
tmp-vm_truncate_count = mpnt-vm_truncate_count;
flush_dcache_mmap_lock(file-f_mapping);
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(file-f_mapping);
-   spin_unlock(file-f_mapping-i_mmap_lock);
+   write_unlock(file-f_mapping-i_mmap_lock);
   

Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-18 Thread Nick Piggin
On Wednesday 19 December 2007 08:15, Rik van Riel wrote:
 I have seen soft cpu lockups in page_referenced_file() due to
 contention on i_mmap_lock() for different pages.  Making the
 i_mmap_lock a reader/writer lock should increase parallelism
 in vmscan for file back pages mapped into many address spaces.

 Read lock the i_mmap_lock for all usage except:

 1) mmap/munmap:  linking vma into i_mmap prio_tree or removing
 2) unmap_mapping_range:   protecting vm_truncate_count

 rmap:  try_to_unmap_file() required new cond_resched_rwlock().
 To reduce code duplication, I recast cond_resched_lock() as a
 [static inline] wrapper around reworked cond_sched_lock() =
 __cond_resched_lock(void *lock, int type).
 New cond_resched_rwlock() implemented as another wrapper.

Reader/writer locks really suck in terms of fairness and starvation,
especially when the read-side is common and frequent. (also, single
threaded performance of the read-side is worse).

I know Lee saw some big latencies on the anon_vma list lock when
running (IIRC) a large benchmark... but are there more realistic
situations where this is a problem?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2007-12-18 Thread KOSAKI Motohiro
Hi

  rmap:  try_to_unmap_file() required new cond_resched_rwlock().
  To reduce code duplication, I recast cond_resched_lock() as a
  [static inline] wrapper around reworked cond_sched_lock() =
  __cond_resched_lock(void *lock, int type).
  New cond_resched_rwlock() implemented as another wrapper.
 
 Reader/writer locks really suck in terms of fairness and starvation,
 especially when the read-side is common and frequent. (also, single
 threaded performance of the read-side is worse).

Agreed.

rwlock got bad performance some case. (especially on many cpu machine)

if many cpu grab read-lock on and off on many cpu system.
then at least 1 cpu always grab read lock and the cpu of waiting write-lock 
never get lock.

threrefore, rwlock often make performance weakness of stress.


I want know testcase for this patch and run it.
Do you have it?


/kosaki


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/