Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-29 Thread Christoph Hellwig
On Mon, Jan 29, 2007 at 08:32:53AM -0500, Stephen Smalley wrote:
> >  - fs/selinuxfs.c:sel_remove_bools()
> > 
> > Utter madness.  I have no idea how this ever got merged.
> > Maybe the selinux folks can explain what crack they were
> > on when writing this.  The problem would go away with
> > a generic rewoke infrastructure.
> 
> It was modeled after proc_kill_inodes(), as noted in the comments.  So
> if you have a suitable replacement for proc_kill_inodes(), you can apply
> the same approach to sel_remove_bools().

Looking at it again sel_remove_bools actually only operates on files
backed by selinuxfs, so yes we could use the same approach.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-29 Thread Stephen Smalley
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote:
> On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote:
> > On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
> > > This patch-set breaks up the global file_list_lock which was found to be a
> > > severe contention point under basically any filesystem intensive workload.
> > 
> > Benchmarks, please.  Where exactly do you see contention for this?
> > 
> > 
> > filesystem intensive workload apparently means namespace operation heavy
> > workload, right?  The biggest bottleneck I've seen with those is dcache 
> > lock.
> > 
> > Even if this is becoming a real problem there must be simpler ways to fix
> > this than introducing various new locking primitives and all kinds of
> > complexity.
> 
> One good way to fix scalability without all this braindamage is
> to get rid of sb->s_files.  Current uses are:
> 
>  - fs/dquot.c:add_dquot_ref()
> 
>   This performs it's actual operation on inodes.  We should
>   be able to check inode->i_writecount to see which inodes
>   need quota initialization.
> 
>  - fs/file_table.c:fs_may_remount_ro()
> 
>   This one is gone in Dave Hansens per-mountpoint r/o patchkit
> 
>  - fs/proc/generic.c:proc_kill_inodes()
> 
>   This can be done with a list inside procfs.
> 
>  - fs/super.c:mark_files_ro()
> 
>   This one is only used for do_emergency_remount(), which is
>   and utter hack.  It might be much better to just deny any
>   kind of write access through a superblock flag here.
> 
>  - fs/selinuxfs.c:sel_remove_bools()
> 
>   Utter madness.  I have no idea how this ever got merged.
>   Maybe the selinux folks can explain what crack they were
>   on when writing this.  The problem would go away with
>   a generic rewoke infrastructure.

It was modeled after proc_kill_inodes(), as noted in the comments.  So
if you have a suitable replacement for proc_kill_inodes(), you can apply
the same approach to sel_remove_bools().

> 
> Once sb->s_files is gone we can also kill of fu_list entirely and
> replace it by a list head entirely in the tty code and make the lock
> for it per-tty.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-29 Thread Stephen Smalley
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote:
 On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote:
  On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
   This patch-set breaks up the global file_list_lock which was found to be a
   severe contention point under basically any filesystem intensive workload.
  
  Benchmarks, please.  Where exactly do you see contention for this?
  
  
  filesystem intensive workload apparently means namespace operation heavy
  workload, right?  The biggest bottleneck I've seen with those is dcache 
  lock.
  
  Even if this is becoming a real problem there must be simpler ways to fix
  this than introducing various new locking primitives and all kinds of
  complexity.
 
 One good way to fix scalability without all this braindamage is
 to get rid of sb-s_files.  Current uses are:
 
  - fs/dquot.c:add_dquot_ref()
 
   This performs it's actual operation on inodes.  We should
   be able to check inode-i_writecount to see which inodes
   need quota initialization.
 
  - fs/file_table.c:fs_may_remount_ro()
 
   This one is gone in Dave Hansens per-mountpoint r/o patchkit
 
  - fs/proc/generic.c:proc_kill_inodes()
 
   This can be done with a list inside procfs.
 
  - fs/super.c:mark_files_ro()
 
   This one is only used for do_emergency_remount(), which is
   and utter hack.  It might be much better to just deny any
   kind of write access through a superblock flag here.
 
  - fs/selinuxfs.c:sel_remove_bools()
 
   Utter madness.  I have no idea how this ever got merged.
   Maybe the selinux folks can explain what crack they were
   on when writing this.  The problem would go away with
   a generic rewoke infrastructure.

It was modeled after proc_kill_inodes(), as noted in the comments.  So
if you have a suitable replacement for proc_kill_inodes(), you can apply
the same approach to sel_remove_bools().

 
 Once sb-s_files is gone we can also kill of fu_list entirely and
 replace it by a list head entirely in the tty code and make the lock
 for it per-tty.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-29 Thread Christoph Hellwig
On Mon, Jan 29, 2007 at 08:32:53AM -0500, Stephen Smalley wrote:
   - fs/selinuxfs.c:sel_remove_bools()
  
  Utter madness.  I have no idea how this ever got merged.
  Maybe the selinux folks can explain what crack they were
  on when writing this.  The problem would go away with
  a generic rewoke infrastructure.
 
 It was modeled after proc_kill_inodes(), as noted in the comments.  So
 if you have a suitable replacement for proc_kill_inodes(), you can apply
 the same approach to sel_remove_bools().

Looking at it again sel_remove_bools actually only operates on files
backed by selinuxfs, so yes we could use the same approach.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Ingo Molnar

* Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> Thanks for having some numbers to start with.

you are welcome.

> > Any rational explanation?
> 
> Can we please stop this stupid pissing contest.  I'm totally fine to 
> admit yours is bigger than mine in public, so let's get back to the 
> facts.

sure. Could we begin with you not doing derogative, belittling comments 
like:

  so -rt folks also restart the lock brigade crap :P

and:

 * hch hands Peter Zijlstra the mister braindead locking madness award

but to show a little bit of .. respect and goodwill towards the work of 
others? Those comments might sound funny to you, but they are offensive 
to me.

I'll readily admit that the patches are complex and even ugly, but the 
problem is complex and ugly to begin with. /Finally/ someone comes along 
and tries to sort out the mess that YOU (amonst others, like me) have 
not fixed for years, and you welcome his (hard!) efforts with ridicule?

You also are NAK-ing the same patches on lkml for clearly bogus reasons: 
for example your 'this is lockdep internals' comment was off the mark by 
far. So this isnt just colorful comments, it is also actual prejudice 
that is irrational and harmful.

Linux will take 100 years to break into the mainstream if this attitude 
prevails. And i think this is a far bigger problem than pretty much 
/any/ technological problem we have at the moment so i will continue to 
be upset about it, no matter how uncomfortable it is to finally face 
this and no matter how ugly such flames might look externally.

Or if you really cannot talk about hard problems without insulting 
others then /just dont do it/. Nobody is forcing you to do this if you 
cannot find any fun in it, really.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 07:41:16PM +0100, Ingo Molnar wrote:
> starts one process per CPU and open()s/close()s a file all over again, 
> simulating an open/close-intense workload. This pattern is quite typical 
> of several key Linux applications.
> 
> Using Peter's s_files patchset the following scalability improvement can 
> be measured (lower numbers are better):
> 
>  --
>   v2.6.20-rc6 |   v2.6.20-rc6+Peter's s_files queue
>  --
>  dual-core:   2.11 usecs/op   |   1.51 usecs/op (  +39.7% win )
>  8-socket:6.30 usecs/op   |   2.70 usecs/op ( +233.3% win )

Thanks for having some numbers to start with.

> Now could you please tell me why i had to waste 3.5 hours on measuring 
> and profiling this /again/, while a tiny little bit of goodwill from 
> your side could have avoided this? I told you that we lock-profiled this 
> under -rt, and that it's an accurate measurement of such things - as the 
> numbers above prove it too. Would it have been so hard to say something 
> like: "Cool Peter! That lock had been in our way of good open()/close() 
> scalability for such a long time and it's an obviously good idea to 
> eliminate it. Now here's a couple of suggestions of how to do it even 
> simpler: [...]." Why did you have to in essence piss on his patchset? 
> Any rational explanation?

Can we please stop this stupid pissing contest.  I'm totally fine to admit
yours is bigger than mine in public, so let's get back to the facts.


The patchkit we're discussing here introduces a lot of complexity:

 - a new type of implicitly locked linked lists
 - a new synchronization primitive
 - a new locking scheme that utilizes the previous two items, aswell
   as rcu.

I think we definitly we want some numbers (which you finally provided)
to justify this.

Then going on to the implementation I don't like trying to "fix" a problem
with this big hammer approach.  I've outlined some alternate ways that
actually simplify both the underlying data structures and locking that
should help towards this problem instead of making the code more complex
and really hard to understand.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Ingo Molnar

* Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
> > This patch-set breaks up the global file_list_lock which was found to be a
> > severe contention point under basically any filesystem intensive workload.
> 
> Benchmarks, please.  Where exactly do you see contention for this?

the following very simple workload:

   http://redhat.com/~mingo/file-ops-test/file-ops-test.c

starts one process per CPU and open()s/close()s a file all over again, 
simulating an open/close-intense workload. This pattern is quite typical 
of several key Linux applications.

Using Peter's s_files patchset the following scalability improvement can 
be measured (lower numbers are better):

 --
  v2.6.20-rc6 |   v2.6.20-rc6+Peter's s_files queue
 --
 dual-core:   2.11 usecs/op   |   1.51 usecs/op (  +39.7% win )
 8-socket:6.30 usecs/op   |   2.70 usecs/op ( +233.3% win )

[ i'd expect something a 64-way box to improve its open/close 
  scalability dramatically, factor 10x or 20x better. On a 1024-CPU (or 
  4096-CPU) system the effects are very likely even more dramatic. ]

Why does this patch make such a difference? Not because the critical 
section is in any way contended - it isnt, we only do a simple list 
operation there. But this lock is touched in every sys_open() and 
sys_close() system-call, so it is a high-frequency accessed cacheline. 
The cost is there due to the global cacheline ping-pong of files_lock. 
Furthermore, in this very important VFS codepath this is the 'last' 
global cacheline that got eliminated, hence all the scalability benefits 
(of previous improvements) get reaped all at once.

Now could you please tell me why i had to waste 3.5 hours on measuring 
and profiling this /again/, while a tiny little bit of goodwill from 
your side could have avoided this? I told you that we lock-profiled this 
under -rt, and that it's an accurate measurement of such things - as the 
numbers above prove it too. Would it have been so hard to say something 
like: "Cool Peter! That lock had been in our way of good open()/close() 
scalability for such a long time and it's an obviously good idea to 
eliminate it. Now here's a couple of suggestions of how to do it even 
simpler: [...]." Why did you have to in essence piss on his patchset? 
Any rational explanation?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Martin J. Bligh

Ingo Molnar wrote:

* Christoph Hellwig <[EMAIL PROTECTED]> wrote:


On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
This patch-set breaks up the global file_list_lock which was found 
to be a severe contention point under basically any filesystem 
intensive workload.

Benchmarks, please.  Where exactly do you see contention for this?


it's the most contended spinlock we have during a parallel kernel 
compile on an 8-way system. But it's pretty common-sense as well, 
without doing any measurements, it's basically the only global lock left 
in just about every VFS workload that doesnt involve massive amount of 
dentries created/removed (which is still dominated by the dcache_lock).


filesystem intensive workload apparently means namespace operation 
heavy workload, right?  The biggest bottleneck I've seen with those is 
dcache lock.


the dcache lock is not a problem during kernel compiles. (its 
rcu-ification works nicely in that workload)


Mmm. not wholly convinced that's true. Whilst i don't have lockmeter
stats to hand, the heavy time in __d_lookup seems to indicate we may
still have a problem to me. I guess we could move the spinlocks out
of line again to test this fairly easily (or get lockmeter upstream).

114076 total  0.0545
 57766 default_idle 916.9206
 11869 prep_new_page 49.4542
  3830 find_trylock_page 67.1930
  2637 zap_pte_range  3.9125
  2486 strnlen_user  54.0435
  2018 do_page_fault  1.1941
  1940 do_wp_page 1.6973
  1869 __d_lookup 7.7231
  1331 page_remove_rmap   5.2196
  1287 do_no_page 1.6108
  1272 buffered_rmqueue   4.6423
  1160 __copy_to_user_ll 14.6835
  1027 _atomic_dec_and_lock  11.1630
   655 release_pages  1.9670
   644 do_path_lookup 1.6304
   630 schedule   0.4046
   617 kunmap_atomic  7.7125
   573 __handle_mm_fault  0.7365
   548 free_hot_page 78.2857
   500 __copy_user_intel  3.3784
   483 copy_pte_range 0.5941
   482 page_address   2.9571
   478 file_move  9.1923
   441 do_anonymous_page  0.7424
   429 filemap_nopage 0.4450
   401 anon_vma_unlink4.8902
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 04:29:04PM +0100, Peter Zijlstra wrote:
> I shall pursue this direction. Thanks for the information.

Btw, if you need this short-term there might be an even simpler
solution:

  - all files go on sb->s_list and stay there until the file is freed
  - ttys grow their private list instead of beeing mixed up with the
per-sb list

This already allows nice splitting of the locks into per-sb and per-tty
(although all tty_files accesses might be protected by tty_mutex anyway).
Especially for the superblock a rcu conversion might be quite easy
if nessecary, but all traversals are in what I'd consider more or less
slowpathes anyway.

> 
---end quoted text---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Peter Zijlstra
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote:

> > Even if this is becoming a real problem there must be simpler ways to fix
> > this than introducing various new locking primitives and all kinds of
> > complexity.
> 
> One good way to fix scalability without all this braindamage is
> to get rid of sb->s_files.  Current uses are:
> 
>  - fs/dquot.c:add_dquot_ref()
> 
>   This performs it's actual operation on inodes.  We should
>   be able to check inode->i_writecount to see which inodes
>   need quota initialization.
> 
>  - fs/file_table.c:fs_may_remount_ro()
> 
>   This one is gone in Dave Hansens per-mountpoint r/o patchkit
> 
>  - fs/proc/generic.c:proc_kill_inodes()
> 
>   This can be done with a list inside procfs.
> 
>  - fs/super.c:mark_files_ro()
> 
>   This one is only used for do_emergency_remount(), which is
>   and utter hack.  It might be much better to just deny any
>   kind of write access through a superblock flag here.
> 
>  - fs/selinuxfs.c:sel_remove_bools()
> 
>   Utter madness.  I have no idea how this ever got merged.
>   Maybe the selinux folks can explain what crack they were
>   on when writing this.  The problem would go away with
>   a generic rewoke infrastructure.
> 
> Once sb->s_files is gone we can also kill of fu_list entirely and
> replace it by a list head entirely in the tty code and make the lock
> for it per-tty.

I shall pursue this direction. Thanks for the information.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Ingo Molnar

* Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
> > This patch-set breaks up the global file_list_lock which was found 
> > to be a severe contention point under basically any filesystem 
> > intensive workload.
> 
> Benchmarks, please.  Where exactly do you see contention for this?

it's the most contended spinlock we have during a parallel kernel 
compile on an 8-way system. But it's pretty common-sense as well, 
without doing any measurements, it's basically the only global lock left 
in just about every VFS workload that doesnt involve massive amount of 
dentries created/removed (which is still dominated by the dcache_lock).

> filesystem intensive workload apparently means namespace operation 
> heavy workload, right?  The biggest bottleneck I've seen with those is 
> dcache lock.

the dcache lock is not a problem during kernel compiles. (its 
rcu-ification works nicely in that workload)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote:
> On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
> > This patch-set breaks up the global file_list_lock which was found to be a
> > severe contention point under basically any filesystem intensive workload.
> 
> Benchmarks, please.  Where exactly do you see contention for this?
> 
> 
> filesystem intensive workload apparently means namespace operation heavy
> workload, right?  The biggest bottleneck I've seen with those is dcache lock.
> 
> Even if this is becoming a real problem there must be simpler ways to fix
> this than introducing various new locking primitives and all kinds of
> complexity.

One good way to fix scalability without all this braindamage is
to get rid of sb->s_files.  Current uses are:

 - fs/dquot.c:add_dquot_ref()

This performs it's actual operation on inodes.  We should
be able to check inode->i_writecount to see which inodes
need quota initialization.

 - fs/file_table.c:fs_may_remount_ro()

This one is gone in Dave Hansens per-mountpoint r/o patchkit

 - fs/proc/generic.c:proc_kill_inodes()

This can be done with a list inside procfs.

 - fs/super.c:mark_files_ro()

This one is only used for do_emergency_remount(), which is
and utter hack.  It might be much better to just deny any
kind of write access through a superblock flag here.

 - fs/selinuxfs.c:sel_remove_bools()

Utter madness.  I have no idea how this ever got merged.
Maybe the selinux folks can explain what crack they were
on when writing this.  The problem would go away with
a generic rewoke infrastructure.

Once sb->s_files is gone we can also kill of fu_list entirely and
replace it by a list head entirely in the tty code and make the lock
for it per-tty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
> This patch-set breaks up the global file_list_lock which was found to be a
> severe contention point under basically any filesystem intensive workload.

Benchmarks, please.  Where exactly do you see contention for this?


filesystem intensive workload apparently means namespace operation heavy
workload, right?  The biggest bottleneck I've seen with those is dcache lock.

Even if this is becoming a real problem there must be simpler ways to fix
this than introducing various new locking primitives and all kinds of
complexity.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Peter Zijlstra
This patch-set breaks up the global file_list_lock which was found to be a
severe contention point under basically any filesystem intensive workload.

It has been part of the -rt kernel for some time now and is deemed solid and
useful enough to post in its own right. This contention should also occur on
the large SMP machines.

Feedback would be appreciated, especially with regard to the extra workqueue
that was added to flush per cpu lists. Is there an alternative aproach?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Peter Zijlstra
This patch-set breaks up the global file_list_lock which was found to be a
severe contention point under basically any filesystem intensive workload.

It has been part of the -rt kernel for some time now and is deemed solid and
useful enough to post in its own right. This contention should also occur on
the large SMP machines.

Feedback would be appreciated, especially with regard to the extra workqueue
that was added to flush per cpu lists. Is there an alternative aproach?



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
 This patch-set breaks up the global file_list_lock which was found to be a
 severe contention point under basically any filesystem intensive workload.

Benchmarks, please.  Where exactly do you see contention for this?


filesystem intensive workload apparently means namespace operation heavy
workload, right?  The biggest bottleneck I've seen with those is dcache lock.

Even if this is becoming a real problem there must be simpler ways to fix
this than introducing various new locking primitives and all kinds of
complexity.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote:
 On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
  This patch-set breaks up the global file_list_lock which was found to be a
  severe contention point under basically any filesystem intensive workload.
 
 Benchmarks, please.  Where exactly do you see contention for this?
 
 
 filesystem intensive workload apparently means namespace operation heavy
 workload, right?  The biggest bottleneck I've seen with those is dcache lock.
 
 Even if this is becoming a real problem there must be simpler ways to fix
 this than introducing various new locking primitives and all kinds of
 complexity.

One good way to fix scalability without all this braindamage is
to get rid of sb-s_files.  Current uses are:

 - fs/dquot.c:add_dquot_ref()

This performs it's actual operation on inodes.  We should
be able to check inode-i_writecount to see which inodes
need quota initialization.

 - fs/file_table.c:fs_may_remount_ro()

This one is gone in Dave Hansens per-mountpoint r/o patchkit

 - fs/proc/generic.c:proc_kill_inodes()

This can be done with a list inside procfs.

 - fs/super.c:mark_files_ro()

This one is only used for do_emergency_remount(), which is
and utter hack.  It might be much better to just deny any
kind of write access through a superblock flag here.

 - fs/selinuxfs.c:sel_remove_bools()

Utter madness.  I have no idea how this ever got merged.
Maybe the selinux folks can explain what crack they were
on when writing this.  The problem would go away with
a generic rewoke infrastructure.

Once sb-s_files is gone we can also kill of fu_list entirely and
replace it by a list head entirely in the tty code and make the lock
for it per-tty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Ingo Molnar

* Christoph Hellwig [EMAIL PROTECTED] wrote:

 On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
  This patch-set breaks up the global file_list_lock which was found 
  to be a severe contention point under basically any filesystem 
  intensive workload.
 
 Benchmarks, please.  Where exactly do you see contention for this?

it's the most contended spinlock we have during a parallel kernel 
compile on an 8-way system. But it's pretty common-sense as well, 
without doing any measurements, it's basically the only global lock left 
in just about every VFS workload that doesnt involve massive amount of 
dentries created/removed (which is still dominated by the dcache_lock).

 filesystem intensive workload apparently means namespace operation 
 heavy workload, right?  The biggest bottleneck I've seen with those is 
 dcache lock.

the dcache lock is not a problem during kernel compiles. (its 
rcu-ification works nicely in that workload)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Peter Zijlstra
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote:

  Even if this is becoming a real problem there must be simpler ways to fix
  this than introducing various new locking primitives and all kinds of
  complexity.
 
 One good way to fix scalability without all this braindamage is
 to get rid of sb-s_files.  Current uses are:
 
  - fs/dquot.c:add_dquot_ref()
 
   This performs it's actual operation on inodes.  We should
   be able to check inode-i_writecount to see which inodes
   need quota initialization.
 
  - fs/file_table.c:fs_may_remount_ro()
 
   This one is gone in Dave Hansens per-mountpoint r/o patchkit
 
  - fs/proc/generic.c:proc_kill_inodes()
 
   This can be done with a list inside procfs.
 
  - fs/super.c:mark_files_ro()
 
   This one is only used for do_emergency_remount(), which is
   and utter hack.  It might be much better to just deny any
   kind of write access through a superblock flag here.
 
  - fs/selinuxfs.c:sel_remove_bools()
 
   Utter madness.  I have no idea how this ever got merged.
   Maybe the selinux folks can explain what crack they were
   on when writing this.  The problem would go away with
   a generic rewoke infrastructure.
 
 Once sb-s_files is gone we can also kill of fu_list entirely and
 replace it by a list head entirely in the tty code and make the lock
 for it per-tty.

I shall pursue this direction. Thanks for the information.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 04:29:04PM +0100, Peter Zijlstra wrote:
 I shall pursue this direction. Thanks for the information.

Btw, if you need this short-term there might be an even simpler
solution:

  - all files go on sb-s_list and stay there until the file is freed
  - ttys grow their private list instead of beeing mixed up with the
per-sb list

This already allows nice splitting of the locks into per-sb and per-tty
(although all tty_files accesses might be protected by tty_mutex anyway).
Especially for the superblock a rcu conversion might be quite easy
if nessecary, but all traversals are in what I'd consider more or less
slowpathes anyway.

 
---end quoted text---
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Martin J. Bligh

Ingo Molnar wrote:

* Christoph Hellwig [EMAIL PROTECTED] wrote:


On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
This patch-set breaks up the global file_list_lock which was found 
to be a severe contention point under basically any filesystem 
intensive workload.

Benchmarks, please.  Where exactly do you see contention for this?


it's the most contended spinlock we have during a parallel kernel 
compile on an 8-way system. But it's pretty common-sense as well, 
without doing any measurements, it's basically the only global lock left 
in just about every VFS workload that doesnt involve massive amount of 
dentries created/removed (which is still dominated by the dcache_lock).


filesystem intensive workload apparently means namespace operation 
heavy workload, right?  The biggest bottleneck I've seen with those is 
dcache lock.


the dcache lock is not a problem during kernel compiles. (its 
rcu-ification works nicely in that workload)


Mmm. not wholly convinced that's true. Whilst i don't have lockmeter
stats to hand, the heavy time in __d_lookup seems to indicate we may
still have a problem to me. I guess we could move the spinlocks out
of line again to test this fairly easily (or get lockmeter upstream).

114076 total  0.0545
 57766 default_idle 916.9206
 11869 prep_new_page 49.4542
  3830 find_trylock_page 67.1930
  2637 zap_pte_range  3.9125
  2486 strnlen_user  54.0435
  2018 do_page_fault  1.1941
  1940 do_wp_page 1.6973
  1869 __d_lookup 7.7231
  1331 page_remove_rmap   5.2196
  1287 do_no_page 1.6108
  1272 buffered_rmqueue   4.6423
  1160 __copy_to_user_ll 14.6835
  1027 _atomic_dec_and_lock  11.1630
   655 release_pages  1.9670
   644 do_path_lookup 1.6304
   630 schedule   0.4046
   617 kunmap_atomic  7.7125
   573 __handle_mm_fault  0.7365
   548 free_hot_page 78.2857
   500 __copy_user_intel  3.3784
   483 copy_pte_range 0.5941
   482 page_address   2.9571
   478 file_move  9.1923
   441 do_anonymous_page  0.7424
   429 filemap_nopage 0.4450
   401 anon_vma_unlink4.8902
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Ingo Molnar

* Christoph Hellwig [EMAIL PROTECTED] wrote:

 On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote:
  This patch-set breaks up the global file_list_lock which was found to be a
  severe contention point under basically any filesystem intensive workload.
 
 Benchmarks, please.  Where exactly do you see contention for this?

the following very simple workload:

   http://redhat.com/~mingo/file-ops-test/file-ops-test.c

starts one process per CPU and open()s/close()s a file all over again, 
simulating an open/close-intense workload. This pattern is quite typical 
of several key Linux applications.

Using Peter's s_files patchset the following scalability improvement can 
be measured (lower numbers are better):

 --
  v2.6.20-rc6 |   v2.6.20-rc6+Peter's s_files queue
 --
 dual-core:   2.11 usecs/op   |   1.51 usecs/op (  +39.7% win )
 8-socket:6.30 usecs/op   |   2.70 usecs/op ( +233.3% win )

[ i'd expect something a 64-way box to improve its open/close 
  scalability dramatically, factor 10x or 20x better. On a 1024-CPU (or 
  4096-CPU) system the effects are very likely even more dramatic. ]

Why does this patch make such a difference? Not because the critical 
section is in any way contended - it isnt, we only do a simple list 
operation there. But this lock is touched in every sys_open() and 
sys_close() system-call, so it is a high-frequency accessed cacheline. 
The cost is there due to the global cacheline ping-pong of files_lock. 
Furthermore, in this very important VFS codepath this is the 'last' 
global cacheline that got eliminated, hence all the scalability benefits 
(of previous improvements) get reaped all at once.

Now could you please tell me why i had to waste 3.5 hours on measuring 
and profiling this /again/, while a tiny little bit of goodwill from 
your side could have avoided this? I told you that we lock-profiled this 
under -rt, and that it's an accurate measurement of such things - as the 
numbers above prove it too. Would it have been so hard to say something 
like: Cool Peter! That lock had been in our way of good open()/close() 
scalability for such a long time and it's an obviously good idea to 
eliminate it. Now here's a couple of suggestions of how to do it even 
simpler: [...]. Why did you have to in essence piss on his patchset? 
Any rational explanation?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Christoph Hellwig
On Sun, Jan 28, 2007 at 07:41:16PM +0100, Ingo Molnar wrote:
 starts one process per CPU and open()s/close()s a file all over again, 
 simulating an open/close-intense workload. This pattern is quite typical 
 of several key Linux applications.
 
 Using Peter's s_files patchset the following scalability improvement can 
 be measured (lower numbers are better):
 
  --
   v2.6.20-rc6 |   v2.6.20-rc6+Peter's s_files queue
  --
  dual-core:   2.11 usecs/op   |   1.51 usecs/op (  +39.7% win )
  8-socket:6.30 usecs/op   |   2.70 usecs/op ( +233.3% win )

Thanks for having some numbers to start with.

 Now could you please tell me why i had to waste 3.5 hours on measuring 
 and profiling this /again/, while a tiny little bit of goodwill from 
 your side could have avoided this? I told you that we lock-profiled this 
 under -rt, and that it's an accurate measurement of such things - as the 
 numbers above prove it too. Would it have been so hard to say something 
 like: Cool Peter! That lock had been in our way of good open()/close() 
 scalability for such a long time and it's an obviously good idea to 
 eliminate it. Now here's a couple of suggestions of how to do it even 
 simpler: [...]. Why did you have to in essence piss on his patchset? 
 Any rational explanation?

Can we please stop this stupid pissing contest.  I'm totally fine to admit
yours is bigger than mine in public, so let's get back to the facts.


The patchkit we're discussing here introduces a lot of complexity:

 - a new type of implicitly locked linked lists
 - a new synchronization primitive
 - a new locking scheme that utilizes the previous two items, aswell
   as rcu.

I think we definitly we want some numbers (which you finally provided)
to justify this.

Then going on to the implementation I don't like trying to fix a problem
with this big hammer approach.  I've outlined some alternate ways that
actually simplify both the underlying data structures and locking that
should help towards this problem instead of making the code more complex
and really hard to understand.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] breaking the global file_list_lock

2007-01-28 Thread Ingo Molnar

* Christoph Hellwig [EMAIL PROTECTED] wrote:

 Thanks for having some numbers to start with.

you are welcome.

  Any rational explanation?
 
 Can we please stop this stupid pissing contest.  I'm totally fine to 
 admit yours is bigger than mine in public, so let's get back to the 
 facts.

sure. Could we begin with you not doing derogative, belittling comments 
like:

 hch so -rt folks also restart the lock brigade crap :P

and:

 * hch hands Peter Zijlstra the mister braindead locking madness award

but to show a little bit of .. respect and goodwill towards the work of 
others? Those comments might sound funny to you, but they are offensive 
to me.

I'll readily admit that the patches are complex and even ugly, but the 
problem is complex and ugly to begin with. /Finally/ someone comes along 
and tries to sort out the mess that YOU (amonst others, like me) have 
not fixed for years, and you welcome his (hard!) efforts with ridicule?

You also are NAK-ing the same patches on lkml for clearly bogus reasons: 
for example your 'this is lockdep internals' comment was off the mark by 
far. So this isnt just colorful comments, it is also actual prejudice 
that is irrational and harmful.

Linux will take 100 years to break into the mainstream if this attitude 
prevails. And i think this is a far bigger problem than pretty much 
/any/ technological problem we have at the moment so i will continue to 
be upset about it, no matter how uncomfortable it is to finally face 
this and no matter how ugly such flames might look externally.

Or if you really cannot talk about hard problems without insulting 
others then /just dont do it/. Nobody is forcing you to do this if you 
cannot find any fun in it, really.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/