Re: [PATCH 0/7] breaking the global file_list_lock
On Mon, Jan 29, 2007 at 08:32:53AM -0500, Stephen Smalley wrote: > > - fs/selinuxfs.c:sel_remove_bools() > > > > Utter madness. I have no idea how this ever got merged. > > Maybe the selinux folks can explain what crack they were > > on when writing this. The problem would go away with > > a generic rewoke infrastructure. > > It was modeled after proc_kill_inodes(), as noted in the comments. So > if you have a suitable replacement for proc_kill_inodes(), you can apply > the same approach to sel_remove_bools(). Looking at it again sel_remove_bools actually only operates on files backed by selinuxfs, so yes we could use the same approach. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote: > On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote: > > On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: > > > This patch-set breaks up the global file_list_lock which was found to be a > > > severe contention point under basically any filesystem intensive workload. > > > > Benchmarks, please. Where exactly do you see contention for this? > > > > > > filesystem intensive workload apparently means namespace operation heavy > > workload, right? The biggest bottleneck I've seen with those is dcache > > lock. > > > > Even if this is becoming a real problem there must be simpler ways to fix > > this than introducing various new locking primitives and all kinds of > > complexity. > > One good way to fix scalability without all this braindamage is > to get rid of sb->s_files. Current uses are: > > - fs/dquot.c:add_dquot_ref() > > This performs it's actual operation on inodes. We should > be able to check inode->i_writecount to see which inodes > need quota initialization. > > - fs/file_table.c:fs_may_remount_ro() > > This one is gone in Dave Hansens per-mountpoint r/o patchkit > > - fs/proc/generic.c:proc_kill_inodes() > > This can be done with a list inside procfs. > > - fs/super.c:mark_files_ro() > > This one is only used for do_emergency_remount(), which is > and utter hack. It might be much better to just deny any > kind of write access through a superblock flag here. > > - fs/selinuxfs.c:sel_remove_bools() > > Utter madness. I have no idea how this ever got merged. > Maybe the selinux folks can explain what crack they were > on when writing this. The problem would go away with > a generic rewoke infrastructure. It was modeled after proc_kill_inodes(), as noted in the comments. So if you have a suitable replacement for proc_kill_inodes(), you can apply the same approach to sel_remove_bools(). > > Once sb->s_files is gone we can also kill of fu_list entirely and > replace it by a list head entirely in the tty code and make the lock > for it per-tty. -- Stephen Smalley National Security Agency - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote: On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote: On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? filesystem intensive workload apparently means namespace operation heavy workload, right? The biggest bottleneck I've seen with those is dcache lock. Even if this is becoming a real problem there must be simpler ways to fix this than introducing various new locking primitives and all kinds of complexity. One good way to fix scalability without all this braindamage is to get rid of sb-s_files. Current uses are: - fs/dquot.c:add_dquot_ref() This performs it's actual operation on inodes. We should be able to check inode-i_writecount to see which inodes need quota initialization. - fs/file_table.c:fs_may_remount_ro() This one is gone in Dave Hansens per-mountpoint r/o patchkit - fs/proc/generic.c:proc_kill_inodes() This can be done with a list inside procfs. - fs/super.c:mark_files_ro() This one is only used for do_emergency_remount(), which is and utter hack. It might be much better to just deny any kind of write access through a superblock flag here. - fs/selinuxfs.c:sel_remove_bools() Utter madness. I have no idea how this ever got merged. Maybe the selinux folks can explain what crack they were on when writing this. The problem would go away with a generic rewoke infrastructure. It was modeled after proc_kill_inodes(), as noted in the comments. So if you have a suitable replacement for proc_kill_inodes(), you can apply the same approach to sel_remove_bools(). Once sb-s_files is gone we can also kill of fu_list entirely and replace it by a list head entirely in the tty code and make the lock for it per-tty. -- Stephen Smalley National Security Agency - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Mon, Jan 29, 2007 at 08:32:53AM -0500, Stephen Smalley wrote: - fs/selinuxfs.c:sel_remove_bools() Utter madness. I have no idea how this ever got merged. Maybe the selinux folks can explain what crack they were on when writing this. The problem would go away with a generic rewoke infrastructure. It was modeled after proc_kill_inodes(), as noted in the comments. So if you have a suitable replacement for proc_kill_inodes(), you can apply the same approach to sel_remove_bools(). Looking at it again sel_remove_bools actually only operates on files backed by selinuxfs, so yes we could use the same approach. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > Thanks for having some numbers to start with. you are welcome. > > Any rational explanation? > > Can we please stop this stupid pissing contest. I'm totally fine to > admit yours is bigger than mine in public, so let's get back to the > facts. sure. Could we begin with you not doing derogative, belittling comments like: so -rt folks also restart the lock brigade crap :P and: * hch hands Peter Zijlstra the mister braindead locking madness award but to show a little bit of .. respect and goodwill towards the work of others? Those comments might sound funny to you, but they are offensive to me. I'll readily admit that the patches are complex and even ugly, but the problem is complex and ugly to begin with. /Finally/ someone comes along and tries to sort out the mess that YOU (amonst others, like me) have not fixed for years, and you welcome his (hard!) efforts with ridicule? You also are NAK-ing the same patches on lkml for clearly bogus reasons: for example your 'this is lockdep internals' comment was off the mark by far. So this isnt just colorful comments, it is also actual prejudice that is irrational and harmful. Linux will take 100 years to break into the mainstream if this attitude prevails. And i think this is a far bigger problem than pretty much /any/ technological problem we have at the moment so i will continue to be upset about it, no matter how uncomfortable it is to finally face this and no matter how ugly such flames might look externally. Or if you really cannot talk about hard problems without insulting others then /just dont do it/. Nobody is forcing you to do this if you cannot find any fun in it, really. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 07:41:16PM +0100, Ingo Molnar wrote: > starts one process per CPU and open()s/close()s a file all over again, > simulating an open/close-intense workload. This pattern is quite typical > of several key Linux applications. > > Using Peter's s_files patchset the following scalability improvement can > be measured (lower numbers are better): > > -- > v2.6.20-rc6 | v2.6.20-rc6+Peter's s_files queue > -- > dual-core: 2.11 usecs/op | 1.51 usecs/op ( +39.7% win ) > 8-socket:6.30 usecs/op | 2.70 usecs/op ( +233.3% win ) Thanks for having some numbers to start with. > Now could you please tell me why i had to waste 3.5 hours on measuring > and profiling this /again/, while a tiny little bit of goodwill from > your side could have avoided this? I told you that we lock-profiled this > under -rt, and that it's an accurate measurement of such things - as the > numbers above prove it too. Would it have been so hard to say something > like: "Cool Peter! That lock had been in our way of good open()/close() > scalability for such a long time and it's an obviously good idea to > eliminate it. Now here's a couple of suggestions of how to do it even > simpler: [...]." Why did you have to in essence piss on his patchset? > Any rational explanation? Can we please stop this stupid pissing contest. I'm totally fine to admit yours is bigger than mine in public, so let's get back to the facts. The patchkit we're discussing here introduces a lot of complexity: - a new type of implicitly locked linked lists - a new synchronization primitive - a new locking scheme that utilizes the previous two items, aswell as rcu. I think we definitly we want some numbers (which you finally provided) to justify this. Then going on to the implementation I don't like trying to "fix" a problem with this big hammer approach. I've outlined some alternate ways that actually simplify both the underlying data structures and locking that should help towards this problem instead of making the code more complex and really hard to understand. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: > > This patch-set breaks up the global file_list_lock which was found to be a > > severe contention point under basically any filesystem intensive workload. > > Benchmarks, please. Where exactly do you see contention for this? the following very simple workload: http://redhat.com/~mingo/file-ops-test/file-ops-test.c starts one process per CPU and open()s/close()s a file all over again, simulating an open/close-intense workload. This pattern is quite typical of several key Linux applications. Using Peter's s_files patchset the following scalability improvement can be measured (lower numbers are better): -- v2.6.20-rc6 | v2.6.20-rc6+Peter's s_files queue -- dual-core: 2.11 usecs/op | 1.51 usecs/op ( +39.7% win ) 8-socket:6.30 usecs/op | 2.70 usecs/op ( +233.3% win ) [ i'd expect something a 64-way box to improve its open/close scalability dramatically, factor 10x or 20x better. On a 1024-CPU (or 4096-CPU) system the effects are very likely even more dramatic. ] Why does this patch make such a difference? Not because the critical section is in any way contended - it isnt, we only do a simple list operation there. But this lock is touched in every sys_open() and sys_close() system-call, so it is a high-frequency accessed cacheline. The cost is there due to the global cacheline ping-pong of files_lock. Furthermore, in this very important VFS codepath this is the 'last' global cacheline that got eliminated, hence all the scalability benefits (of previous improvements) get reaped all at once. Now could you please tell me why i had to waste 3.5 hours on measuring and profiling this /again/, while a tiny little bit of goodwill from your side could have avoided this? I told you that we lock-profiled this under -rt, and that it's an accurate measurement of such things - as the numbers above prove it too. Would it have been so hard to say something like: "Cool Peter! That lock had been in our way of good open()/close() scalability for such a long time and it's an obviously good idea to eliminate it. Now here's a couple of suggestions of how to do it even simpler: [...]." Why did you have to in essence piss on his patchset? Any rational explanation? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
Ingo Molnar wrote: * Christoph Hellwig <[EMAIL PROTECTED]> wrote: On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? it's the most contended spinlock we have during a parallel kernel compile on an 8-way system. But it's pretty common-sense as well, without doing any measurements, it's basically the only global lock left in just about every VFS workload that doesnt involve massive amount of dentries created/removed (which is still dominated by the dcache_lock). filesystem intensive workload apparently means namespace operation heavy workload, right? The biggest bottleneck I've seen with those is dcache lock. the dcache lock is not a problem during kernel compiles. (its rcu-ification works nicely in that workload) Mmm. not wholly convinced that's true. Whilst i don't have lockmeter stats to hand, the heavy time in __d_lookup seems to indicate we may still have a problem to me. I guess we could move the spinlocks out of line again to test this fairly easily (or get lockmeter upstream). 114076 total 0.0545 57766 default_idle 916.9206 11869 prep_new_page 49.4542 3830 find_trylock_page 67.1930 2637 zap_pte_range 3.9125 2486 strnlen_user 54.0435 2018 do_page_fault 1.1941 1940 do_wp_page 1.6973 1869 __d_lookup 7.7231 1331 page_remove_rmap 5.2196 1287 do_no_page 1.6108 1272 buffered_rmqueue 4.6423 1160 __copy_to_user_ll 14.6835 1027 _atomic_dec_and_lock 11.1630 655 release_pages 1.9670 644 do_path_lookup 1.6304 630 schedule 0.4046 617 kunmap_atomic 7.7125 573 __handle_mm_fault 0.7365 548 free_hot_page 78.2857 500 __copy_user_intel 3.3784 483 copy_pte_range 0.5941 482 page_address 2.9571 478 file_move 9.1923 441 do_anonymous_page 0.7424 429 filemap_nopage 0.4450 401 anon_vma_unlink4.8902 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 04:29:04PM +0100, Peter Zijlstra wrote: > I shall pursue this direction. Thanks for the information. Btw, if you need this short-term there might be an even simpler solution: - all files go on sb->s_list and stay there until the file is freed - ttys grow their private list instead of beeing mixed up with the per-sb list This already allows nice splitting of the locks into per-sb and per-tty (although all tty_files accesses might be protected by tty_mutex anyway). Especially for the superblock a rcu conversion might be quite easy if nessecary, but all traversals are in what I'd consider more or less slowpathes anyway. > ---end quoted text--- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote: > > Even if this is becoming a real problem there must be simpler ways to fix > > this than introducing various new locking primitives and all kinds of > > complexity. > > One good way to fix scalability without all this braindamage is > to get rid of sb->s_files. Current uses are: > > - fs/dquot.c:add_dquot_ref() > > This performs it's actual operation on inodes. We should > be able to check inode->i_writecount to see which inodes > need quota initialization. > > - fs/file_table.c:fs_may_remount_ro() > > This one is gone in Dave Hansens per-mountpoint r/o patchkit > > - fs/proc/generic.c:proc_kill_inodes() > > This can be done with a list inside procfs. > > - fs/super.c:mark_files_ro() > > This one is only used for do_emergency_remount(), which is > and utter hack. It might be much better to just deny any > kind of write access through a superblock flag here. > > - fs/selinuxfs.c:sel_remove_bools() > > Utter madness. I have no idea how this ever got merged. > Maybe the selinux folks can explain what crack they were > on when writing this. The problem would go away with > a generic rewoke infrastructure. > > Once sb->s_files is gone we can also kill of fu_list entirely and > replace it by a list head entirely in the tty code and make the lock > for it per-tty. I shall pursue this direction. Thanks for the information. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: > > This patch-set breaks up the global file_list_lock which was found > > to be a severe contention point under basically any filesystem > > intensive workload. > > Benchmarks, please. Where exactly do you see contention for this? it's the most contended spinlock we have during a parallel kernel compile on an 8-way system. But it's pretty common-sense as well, without doing any measurements, it's basically the only global lock left in just about every VFS workload that doesnt involve massive amount of dentries created/removed (which is still dominated by the dcache_lock). > filesystem intensive workload apparently means namespace operation > heavy workload, right? The biggest bottleneck I've seen with those is > dcache lock. the dcache lock is not a problem during kernel compiles. (its rcu-ification works nicely in that workload) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote: > On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: > > This patch-set breaks up the global file_list_lock which was found to be a > > severe contention point under basically any filesystem intensive workload. > > Benchmarks, please. Where exactly do you see contention for this? > > > filesystem intensive workload apparently means namespace operation heavy > workload, right? The biggest bottleneck I've seen with those is dcache lock. > > Even if this is becoming a real problem there must be simpler ways to fix > this than introducing various new locking primitives and all kinds of > complexity. One good way to fix scalability without all this braindamage is to get rid of sb->s_files. Current uses are: - fs/dquot.c:add_dquot_ref() This performs it's actual operation on inodes. We should be able to check inode->i_writecount to see which inodes need quota initialization. - fs/file_table.c:fs_may_remount_ro() This one is gone in Dave Hansens per-mountpoint r/o patchkit - fs/proc/generic.c:proc_kill_inodes() This can be done with a list inside procfs. - fs/super.c:mark_files_ro() This one is only used for do_emergency_remount(), which is and utter hack. It might be much better to just deny any kind of write access through a superblock flag here. - fs/selinuxfs.c:sel_remove_bools() Utter madness. I have no idea how this ever got merged. Maybe the selinux folks can explain what crack they were on when writing this. The problem would go away with a generic rewoke infrastructure. Once sb->s_files is gone we can also kill of fu_list entirely and replace it by a list head entirely in the tty code and make the lock for it per-tty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: > This patch-set breaks up the global file_list_lock which was found to be a > severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? filesystem intensive workload apparently means namespace operation heavy workload, right? The biggest bottleneck I've seen with those is dcache lock. Even if this is becoming a real problem there must be simpler ways to fix this than introducing various new locking primitives and all kinds of complexity. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/7] breaking the global file_list_lock
This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. It has been part of the -rt kernel for some time now and is deemed solid and useful enough to post in its own right. This contention should also occur on the large SMP machines. Feedback would be appreciated, especially with regard to the extra workqueue that was added to flush per cpu lists. Is there an alternative aproach? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/7] breaking the global file_list_lock
This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. It has been part of the -rt kernel for some time now and is deemed solid and useful enough to post in its own right. This contention should also occur on the large SMP machines. Feedback would be appreciated, especially with regard to the extra workqueue that was added to flush per cpu lists. Is there an alternative aproach? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? filesystem intensive workload apparently means namespace operation heavy workload, right? The biggest bottleneck I've seen with those is dcache lock. Even if this is becoming a real problem there must be simpler ways to fix this than introducing various new locking primitives and all kinds of complexity. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 02:43:25PM +, Christoph Hellwig wrote: On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? filesystem intensive workload apparently means namespace operation heavy workload, right? The biggest bottleneck I've seen with those is dcache lock. Even if this is becoming a real problem there must be simpler ways to fix this than introducing various new locking primitives and all kinds of complexity. One good way to fix scalability without all this braindamage is to get rid of sb-s_files. Current uses are: - fs/dquot.c:add_dquot_ref() This performs it's actual operation on inodes. We should be able to check inode-i_writecount to see which inodes need quota initialization. - fs/file_table.c:fs_may_remount_ro() This one is gone in Dave Hansens per-mountpoint r/o patchkit - fs/proc/generic.c:proc_kill_inodes() This can be done with a list inside procfs. - fs/super.c:mark_files_ro() This one is only used for do_emergency_remount(), which is and utter hack. It might be much better to just deny any kind of write access through a superblock flag here. - fs/selinuxfs.c:sel_remove_bools() Utter madness. I have no idea how this ever got merged. Maybe the selinux folks can explain what crack they were on when writing this. The problem would go away with a generic rewoke infrastructure. Once sb-s_files is gone we can also kill of fu_list entirely and replace it by a list head entirely in the tty code and make the lock for it per-tty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
* Christoph Hellwig [EMAIL PROTECTED] wrote: On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? it's the most contended spinlock we have during a parallel kernel compile on an 8-way system. But it's pretty common-sense as well, without doing any measurements, it's basically the only global lock left in just about every VFS workload that doesnt involve massive amount of dentries created/removed (which is still dominated by the dcache_lock). filesystem intensive workload apparently means namespace operation heavy workload, right? The biggest bottleneck I've seen with those is dcache lock. the dcache lock is not a problem during kernel compiles. (its rcu-ification works nicely in that workload) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, 2007-01-28 at 15:11 +, Christoph Hellwig wrote: Even if this is becoming a real problem there must be simpler ways to fix this than introducing various new locking primitives and all kinds of complexity. One good way to fix scalability without all this braindamage is to get rid of sb-s_files. Current uses are: - fs/dquot.c:add_dquot_ref() This performs it's actual operation on inodes. We should be able to check inode-i_writecount to see which inodes need quota initialization. - fs/file_table.c:fs_may_remount_ro() This one is gone in Dave Hansens per-mountpoint r/o patchkit - fs/proc/generic.c:proc_kill_inodes() This can be done with a list inside procfs. - fs/super.c:mark_files_ro() This one is only used for do_emergency_remount(), which is and utter hack. It might be much better to just deny any kind of write access through a superblock flag here. - fs/selinuxfs.c:sel_remove_bools() Utter madness. I have no idea how this ever got merged. Maybe the selinux folks can explain what crack they were on when writing this. The problem would go away with a generic rewoke infrastructure. Once sb-s_files is gone we can also kill of fu_list entirely and replace it by a list head entirely in the tty code and make the lock for it per-tty. I shall pursue this direction. Thanks for the information. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 04:29:04PM +0100, Peter Zijlstra wrote: I shall pursue this direction. Thanks for the information. Btw, if you need this short-term there might be an even simpler solution: - all files go on sb-s_list and stay there until the file is freed - ttys grow their private list instead of beeing mixed up with the per-sb list This already allows nice splitting of the locks into per-sb and per-tty (although all tty_files accesses might be protected by tty_mutex anyway). Especially for the superblock a rcu conversion might be quite easy if nessecary, but all traversals are in what I'd consider more or less slowpathes anyway. ---end quoted text--- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
Ingo Molnar wrote: * Christoph Hellwig [EMAIL PROTECTED] wrote: On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? it's the most contended spinlock we have during a parallel kernel compile on an 8-way system. But it's pretty common-sense as well, without doing any measurements, it's basically the only global lock left in just about every VFS workload that doesnt involve massive amount of dentries created/removed (which is still dominated by the dcache_lock). filesystem intensive workload apparently means namespace operation heavy workload, right? The biggest bottleneck I've seen with those is dcache lock. the dcache lock is not a problem during kernel compiles. (its rcu-ification works nicely in that workload) Mmm. not wholly convinced that's true. Whilst i don't have lockmeter stats to hand, the heavy time in __d_lookup seems to indicate we may still have a problem to me. I guess we could move the spinlocks out of line again to test this fairly easily (or get lockmeter upstream). 114076 total 0.0545 57766 default_idle 916.9206 11869 prep_new_page 49.4542 3830 find_trylock_page 67.1930 2637 zap_pte_range 3.9125 2486 strnlen_user 54.0435 2018 do_page_fault 1.1941 1940 do_wp_page 1.6973 1869 __d_lookup 7.7231 1331 page_remove_rmap 5.2196 1287 do_no_page 1.6108 1272 buffered_rmqueue 4.6423 1160 __copy_to_user_ll 14.6835 1027 _atomic_dec_and_lock 11.1630 655 release_pages 1.9670 644 do_path_lookup 1.6304 630 schedule 0.4046 617 kunmap_atomic 7.7125 573 __handle_mm_fault 0.7365 548 free_hot_page 78.2857 500 __copy_user_intel 3.3784 483 copy_pte_range 0.5941 482 page_address 2.9571 478 file_move 9.1923 441 do_anonymous_page 0.7424 429 filemap_nopage 0.4450 401 anon_vma_unlink4.8902 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
* Christoph Hellwig [EMAIL PROTECTED] wrote: On Sun, Jan 28, 2007 at 12:51:18PM +0100, Peter Zijlstra wrote: This patch-set breaks up the global file_list_lock which was found to be a severe contention point under basically any filesystem intensive workload. Benchmarks, please. Where exactly do you see contention for this? the following very simple workload: http://redhat.com/~mingo/file-ops-test/file-ops-test.c starts one process per CPU and open()s/close()s a file all over again, simulating an open/close-intense workload. This pattern is quite typical of several key Linux applications. Using Peter's s_files patchset the following scalability improvement can be measured (lower numbers are better): -- v2.6.20-rc6 | v2.6.20-rc6+Peter's s_files queue -- dual-core: 2.11 usecs/op | 1.51 usecs/op ( +39.7% win ) 8-socket:6.30 usecs/op | 2.70 usecs/op ( +233.3% win ) [ i'd expect something a 64-way box to improve its open/close scalability dramatically, factor 10x or 20x better. On a 1024-CPU (or 4096-CPU) system the effects are very likely even more dramatic. ] Why does this patch make such a difference? Not because the critical section is in any way contended - it isnt, we only do a simple list operation there. But this lock is touched in every sys_open() and sys_close() system-call, so it is a high-frequency accessed cacheline. The cost is there due to the global cacheline ping-pong of files_lock. Furthermore, in this very important VFS codepath this is the 'last' global cacheline that got eliminated, hence all the scalability benefits (of previous improvements) get reaped all at once. Now could you please tell me why i had to waste 3.5 hours on measuring and profiling this /again/, while a tiny little bit of goodwill from your side could have avoided this? I told you that we lock-profiled this under -rt, and that it's an accurate measurement of such things - as the numbers above prove it too. Would it have been so hard to say something like: Cool Peter! That lock had been in our way of good open()/close() scalability for such a long time and it's an obviously good idea to eliminate it. Now here's a couple of suggestions of how to do it even simpler: [...]. Why did you have to in essence piss on his patchset? Any rational explanation? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
On Sun, Jan 28, 2007 at 07:41:16PM +0100, Ingo Molnar wrote: starts one process per CPU and open()s/close()s a file all over again, simulating an open/close-intense workload. This pattern is quite typical of several key Linux applications. Using Peter's s_files patchset the following scalability improvement can be measured (lower numbers are better): -- v2.6.20-rc6 | v2.6.20-rc6+Peter's s_files queue -- dual-core: 2.11 usecs/op | 1.51 usecs/op ( +39.7% win ) 8-socket:6.30 usecs/op | 2.70 usecs/op ( +233.3% win ) Thanks for having some numbers to start with. Now could you please tell me why i had to waste 3.5 hours on measuring and profiling this /again/, while a tiny little bit of goodwill from your side could have avoided this? I told you that we lock-profiled this under -rt, and that it's an accurate measurement of such things - as the numbers above prove it too. Would it have been so hard to say something like: Cool Peter! That lock had been in our way of good open()/close() scalability for such a long time and it's an obviously good idea to eliminate it. Now here's a couple of suggestions of how to do it even simpler: [...]. Why did you have to in essence piss on his patchset? Any rational explanation? Can we please stop this stupid pissing contest. I'm totally fine to admit yours is bigger than mine in public, so let's get back to the facts. The patchkit we're discussing here introduces a lot of complexity: - a new type of implicitly locked linked lists - a new synchronization primitive - a new locking scheme that utilizes the previous two items, aswell as rcu. I think we definitly we want some numbers (which you finally provided) to justify this. Then going on to the implementation I don't like trying to fix a problem with this big hammer approach. I've outlined some alternate ways that actually simplify both the underlying data structures and locking that should help towards this problem instead of making the code more complex and really hard to understand. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] breaking the global file_list_lock
* Christoph Hellwig [EMAIL PROTECTED] wrote: Thanks for having some numbers to start with. you are welcome. Any rational explanation? Can we please stop this stupid pissing contest. I'm totally fine to admit yours is bigger than mine in public, so let's get back to the facts. sure. Could we begin with you not doing derogative, belittling comments like: hch so -rt folks also restart the lock brigade crap :P and: * hch hands Peter Zijlstra the mister braindead locking madness award but to show a little bit of .. respect and goodwill towards the work of others? Those comments might sound funny to you, but they are offensive to me. I'll readily admit that the patches are complex and even ugly, but the problem is complex and ugly to begin with. /Finally/ someone comes along and tries to sort out the mess that YOU (amonst others, like me) have not fixed for years, and you welcome his (hard!) efforts with ridicule? You also are NAK-ing the same patches on lkml for clearly bogus reasons: for example your 'this is lockdep internals' comment was off the mark by far. So this isnt just colorful comments, it is also actual prejudice that is irrational and harmful. Linux will take 100 years to break into the mainstream if this attitude prevails. And i think this is a far bigger problem than pretty much /any/ technological problem we have at the moment so i will continue to be upset about it, no matter how uncomfortable it is to finally face this and no matter how ugly such flames might look externally. Or if you really cannot talk about hard problems without insulting others then /just dont do it/. Nobody is forcing you to do this if you cannot find any fun in it, really. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/