Re: Major SMP problems with lstat/namei
On Thursday 25 September 2008 07:00:04 pm Jeff Wheelhouse wrote: On Sep 24, 2008, at 12:12 PM, John Baldwin wrote: Shared lookups only work on the NFS client in 6.x. I'm about to turn them on for UFS in HEAD (8.x) and will backport the needed fixes to 7.x after 7.1 (too risky to merge to 7.x this close to a release). OK, given all the patches you referenced, I did make a decent effort at backporting to 7.0. It sounds like you missed some of the dirhash changes somehow, as dirhash no longer has any lockmgr stuff in it (and only ever did in HEAD). I've generated a patch though using svn. You can grab it from http://www.FreeBSD.org/~jhb/patches/ufs_lookup7.patch Note that you will have to set vfs.lookup_shared=1 to enable shared locks (either loader tunable or sysctl). Also, I found a few other changes I had missed earlier that needed to be included. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
Jeff Wheelhouse [EMAIL PROTECTED] writes: http://software.wheelhouse.org/rptest.tar.bz2 Thanks. I get similar results on head; vfs.lookup_shared actually seems to *reduce* performance by about 10% - 20%. I ran the test on both UFS and ZFS; there is no significant difference. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
On Friday 26 September 2008 05:20:14 am Dag-Erling Smørgrav wrote: Jeff Wheelhouse [EMAIL PROTECTED] writes: http://software.wheelhouse.org/rptest.tar.bz2 Thanks. I get similar results on head; vfs.lookup_shared actually seems to *reduce* performance by about 10% - 20%. I ran the test on both UFS and ZFS; there is no significant difference. You might try http://www.FreeBSD.org/~jhb/patches/namei_rwlock.patch However, it might also be useful in general to enable lock profiling and see which locks (if any) are contested. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
Jeff Wheelhouse [EMAIL PROTECTED] writes: I've written a quick benchmark with a pair of tests to simplify/measure the problem. [...] Care to share? DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
On Sep 25, 2008, at 10:51 AM, Dag-Erling Smørgrav wrote: Jeff Wheelhouse [EMAIL PROTECTED] writes: I've written a quick benchmark with a pair of tests to simplify/measure the problem. [...] Care to share? No problem: http://software.wheelhouse.org/rptest.tar.bz2 Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
On Sep 24, 2008, at 12:12 PM, John Baldwin wrote: Shared lookups only work on the NFS client in 6.x. I'm about to turn them on for UFS in HEAD (8.x) and will backport the needed fixes to 7.x after 7.1 (too risky to merge to 7.x this close to a release). OK, given all the patches you referenced, I did make a decent effort at backporting to 7.0. Here are the results: Revision ChangesPath 1.87 +48 -29src/sys/ufs/ufs/ufs_lookup.c Applied, changing a couple of VOP_ISLOCKED() and vn_lock() calls to add td as the last parameter. Revision ChangesPath 1.53 +0 -1 src/sys/ufs/ufs/inode.h 1.88 +10 -13src/sys/ufs/ufs/ufs_lookup.c Applied successfully. SVN rev 181018 on 2008-07-30 21:07:56Z by jhb NOT applied, because it was a whitespace tweak on ufs_lookup 1.89 which was not on your list. SVN rev 183079 on 2008-09-16 16:18:36Z by jhb Applied cleanly. Modified files: sys/ufs/ufs inode.h ufs_lookup.c Log: SVN rev 183093 on 2008-09-16 19:06:44Z by jhb Applied cleanly. 1.6 +2 -1 src/sys/ufs/ufs/dirhash.h 1.24 +289 -227 src/sys/ufs/ufs/ufs_dirhash.c This patch applies but generates an awful lot of errors (enclosed at end). I think it may be dependent on the 8.0 lockmgr. Since most of the remaining patches are against the same files, I bailed out here. SVN rev 183080 on 2008-09-16 16:23:56Z by jhb Skipped. SVN rev 183280 on 2008-09-22 20:53:22Z by jhb Skipped. There are additional fixes needed to fix races with umount -f, so if you backport all this stuff, don't use umount -f or you risk panics. :) Noted. - mp-mnt_kern_flag |= MNTK_MPSAFE; + mp-mnt_kern_flag |= MNTK_MPSAFE | MNTK_LOOKUP_SHARED; Applied. If I can make the backport work (a big if, given the dirhash changes) on 7.0, I am happy to maintain and test the diffs locally until after the 7.1 release and send them over to you at that time, if it will save you some effort. Thanks, Jeff Dirhash compile errors: /usr/src/sys/ufs/ufs/ufs_dirhash.c:132:37: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_release': /usr/src/sys/ufs/ufs/ufs_dirhash.c:132: error: 'lockmgr' undeclared (first use in this function) /usr/src/sys/ufs/ufs/ufs_dirhash.c:132: error: (Each undeclared identifier is reported only once /usr/src/sys/ufs/ufs/ufs_dirhash.c:132: error: for each function it appears in.) /usr/src/sys/ufs/ufs/ufs_dirhash.c:161:45: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_create': /usr/src/sys/ufs/ufs/ufs_dirhash.c:161: error: 'lockmgr' undeclared (first use in this function) /usr/src/sys/ufs/ufs/ufs_dirhash.c:178:17: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c:193:60: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c:198:42: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c:222:39: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_acquire': /usr/src/sys/ufs/ufs/ufs_dirhash.c:222: error: 'lockmgr' undeclared (first use in this function) /usr/src/sys/ufs/ufs/ufs_dirhash.c:248:17: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_free': /usr/src/sys/ufs/ufs/ufs_dirhash.c:247: error: 'lockmgr' undeclared (first use in this function) /usr/src/sys/ufs/ufs/ufs_dirhash.c:385:39: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_build': /usr/src/sys/ufs/ufs/ufs_dirhash.c:385: error: 'lockmgr' undeclared (first use in this function) cc1: warnings being treated as errors /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_free_locked': /usr/src/sys/ufs/ufs/ufs_dirhash.c:403: warning: implicit declaration of function 'lockmgr_assert' /usr/src/sys/ufs/ufs/ufs_dirhash.c:403: warning: nested extern declaration of 'lockmgr_assert' /usr/src/sys/ufs/ufs/ufs_dirhash.c:403: error: 'KA_LOCKED' undeclared (first use in this function) /usr/src/sys/ufs/ufs/ufs_dirhash.c:417:37: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c:417: error: 'lockmgr' undeclared (first use in this function) /usr/src/sys/ufs/ufs/ufs_dirhash.c:418:35: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c:438:37: error: macro lockmgr requires 4 arguments, but only 3 given /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_lookup': /usr/src/sys/ufs/ufs/ufs_dirhash.c:473: error: 'KA_LOCKED' undeclared (first use in this function) /usr/src/sys/ufs/ufs/ufs_dirhash.c: In function 'ufsdirhash_findfree':
Re: Major SMP problems with lstat/namei
Jeff Wheelhouse wrote: This is on 6.3-RELEASE-p4 with vfs.lookup_shared=1. I believe this is the same issue that was previously discussed as 2 x quad-core system is slower that 2 x dual core on FreeBSD archived here: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038441.html This is becoming a huge problem for us. Is there anything that at all can be done, or any news? In the case linked above, improvement was made by changing a PHP setting that isn't applicable in our case. There is nothing that can be done within the 6.x branch. 7.x contains many improvements but I think only 8.x will directly change the lockmgr and the namei cache. The best things you can try right now is to use 7-STABLE (or soon to be released 7.1; you might need tuning with 7.0-RELEASE) or try 8-CURRENT (it's quite stable). signature.asc Description: OpenPGP digital signature
Re: Major SMP problems with lstat/namei
Ivan Voras wrote: There is nothing that can be done within the 6.x branch. 7.x contains many improvements but I think only 8.x will directly change the lockmgr and the namei cache. The best things you can try right now is to use 7-STABLE (or soon to be released 7.1; you might need tuning with 7.0-RELEASE) or try 8-CURRENT (it's quite stable). I remembered two more things: * The problematic load can also be generated with benchmarks/blogbench * I don't have the numbers here but I think I remember that ZFS had noticably larger score than UFS in this workload. Of course, ZFS has other problems. signature.asc Description: OpenPGP digital signature
Re: Major SMP problems with lstat/namei
On Wed, Sep 24, 2008 at 09:26:55AM +0200, Daniel Gerzo wrote: Hello Jeff, On Wed, 24 Sep 2008 00:52:59 -0400, Jeff Wheelhouse [EMAIL PROTECTED] wrote: We have encountered some serious SMP performance/scalability problems that we've tracked back to lstat/namei calls. I've written a quick this all seems like a reason of very poor performance of PHP when used with open_basedir and safe_mode enabled. It would be nice to see if there's something what could be done to make it better. Both of which are features which will, thankfully, be removed in PHP 6. Whoever uses these features in PHP deserves the pain -- they're worthless and provide no security what-so-ever. Consider using suPHP or an MPM like mpm-itk. Also, PHP and performance shouldn't be put in the same sentence. /rant -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
On Sep 24, 2008, at 6:12 AM, Ivan Voras wrote: There is nothing that can be done within the 6.x branch. 7.x contains many improvements but I think only 8.x will directly change the lockmgr and the namei cache. The best things you can try right now is to use 7-STABLE (or soon to be released 7.1; you might need tuning with 7.0-RELEASE) or try 8-CURRENT (it's quite stable). Really? Nothing? We get lockmgr-related panics on FreeBSD 7.0, as detailed elsewhere on this list. Stability issues aside, what else would we need to tune on 7.0, besides enabling the ULE scheduler, and how much benefit would we really get? These servers are in production, so 8-CURRENT is not an option. I've already had my knuckles rapped by a customer for trying 7.1-PRERELEASE on one of their machines. Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
On Wednesday 24 September 2008 12:52:59 am Jeff Wheelhouse wrote: We have encountered some serious SMP performance/scalability problems that we've tracked back to lstat/namei calls. I've written a quick benchmark with a pair of tests to simplify/measure the problem. Both tests use a tree of directories: the top level directory contains five subdirectories a, b, c, d, and e. Each subdirectory contains five subdirectories a, b, c, d, and e, and so on.. 1 directory at level one, 5 at level two, 25 at level three, 125 at level four, 625 at level five, and 3125 at level six. In the realpath test, a random path is constructed at the bottom of the tree (e.g. /tmp/lstat/a/b/c/d/e) and realpath() is called on that, provoking lstat() calls on the whole tree. This is to simulate a mix of high-contention and low-contention lstat() calls. In the lstat test, lstat is called directly on a path at the bottom of the tree. Since there are 3125 files, this simulates relatively low-contention lstat() calls. In both cases, the test repeats as many times as possible for 60 seconds. Each test is run simultaneously by multiple processes, with progressively doubling concurrency from 1 to 512. What I found was that everything is fine at concurrency 2, probably indicating that the benchmark pegged on some other resource limit. At concurrency 4, realpath drops to 31.8% of concurrency 1. At concurrency 8, performance is down to 18.3%. In the interim, CPU load goes to 80-90% system CPU. I've confirmed via ktrace and the rusage that the CPU usage is all system time, and that lstat() is the *only* system call in the test (realpath() is called with an absolute path). I then reran the 32-process test on 1-7 cores, and found that performance peaks at 2 cores and drops sharply from there. eight cores runs *fifteen* times slower than two cores. The test full results are at the bottom of this message. This is on 6.3-RELEASE-p4 with vfs.lookup_shared=1. Shared lookups only work on the NFS client in 6.x. I'm about to turn them on for UFS in HEAD (8.x) and will backport the needed fixes to 7.x after 7.1 (too risky to merge to 7.x this close to a release). So lookup_shared=1 isn't going to really help on 6.x unless you are doing it all over NFS. You also want to backport my fix to cache_enter() before using lookup_shared at all: jhb 2008-08-23 15:13:39 UTC FreeBSD src repository Modified files: sys/kern vfs_cache.c Log: SVN rev 182061 on 2008-08-23 15:13:39Z by jhb Fix a race condition with concurrent LOOKUP namecache operations for a vnode not in the namecache when shared lookups are enabled (vfs.lookup_shared=1, it is currently off by default) and the filesystem supports shared lookups (e.g. NFS client). Specifically, if multiple concurrent LOOKUPs both miss in the name cache in parallel, each of the lookups may each end up adding an entry to the namecache resulting in duplicate entries in the namecache for the same pathname. A subsequent removal of the mapping of that pathname to that vnode (via remove or rename) would only evict one of the entries from the name cache. As a result, subseqent lookups for that pathname would still return the old vnode. This race was observed with shared lookups over NFS where a file was updated by writing a new file out to a temporary file name and then renaming that temporary file to the real file to effect atomic updates of a file. Other processes on the same client that were periodically reading the file would occasionally receive an ESTALE error from open(2) because the VOP_GETATTR() in nfs_open() would receive that error when given the stale vnode. The fix here is to check for duplicates in cache_enter() and just return if an entry for this same directory and leaf file name for this vnode is already in the cache. The check for duplicates is done by walking the per-vnode list of name cache entries. It is expected that this list should be very small in the common case (usually 0 or 1 entries during a cache_enter() since most files only have 1 leaf name). Reviewed by:ups, scottl MFC after: 2 months Revision ChangesPath 1.124 +33 -9 src/sys/kern/vfs_cache.c If you want to try the UFS stuff on 7, you would need to probably backport at least the following, maybe more: jeff2008-04-11 09:44:25 UTC FreeBSD src repository Modified files: sys/ufs/ufs ufs_lookup.c Log: - cache dp-i_offset in the local 'i_offset' variable for use in loop indexes so directory lookup becomes shared lock safe. In the modifying cases an exclusive lock is held here so the commit routine may rely on the state of i_offset. - Similarly handle i_diroff by fetching at the start and setting only once the operation is complete. Without the exclusive
Re: Major SMP problems with lstat/namei
On Sep 24, 2008, at 12:12 PM, John Baldwin wrote: Shared lookups only work on the NFS client in 6.x. I'm about to turn them on for UFS in HEAD (8.x) and will backport the needed fixes to 7.x after 7.1 (too risky to merge to 7.x this close to a release). Testers available, when you get to that. :-) So lookup_shared=1 isn't going to really help on 6.x unless you are doing it all over NFS. You also want to backport my fix to cache_enter() before using lookup_shared at all: Since it sounds like 6.x is a dead end, we'll focus on 7.x, provided we can get it to be stable for us. Having never used svn, I do need to figure out how to pull the specific patches you referenced, but I'm sure that's not an unclimbable mountain. :-) I appreciate your insight on this, it's very helpful. Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
On Wednesday 24 September 2008 01:47:32 pm Jeff Wheelhouse wrote: On Sep 24, 2008, at 12:12 PM, John Baldwin wrote: Shared lookups only work on the NFS client in 6.x. I'm about to turn them on for UFS in HEAD (8.x) and will backport the needed fixes to 7.x after 7.1 (too risky to merge to 7.x this close to a release). Testers available, when you get to that. :-) So lookup_shared=1 isn't going to really help on 6.x unless you are doing it all over NFS. You also want to backport my fix to cache_enter() before using lookup_shared at all: Since it sounds like 6.x is a dead end, we'll focus on 7.x, provided we can get it to be stable for us. Yes. Having never used svn, I do need to figure out how to pull the specific patches you referenced, but I'm sure that's not an unclimbable mountain. :-) You can still use cvs to pull the revisions. All those e-mail msg's have the CVS revisions in them, too. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
On Sep 24, 2008, at 2:10 PM, John Baldwin wrote: You can still use cvs to pull the revisions. All those e-mail msg's have the CVS revisions in them, too. If I'm ever to do anything that will benefit someone besides myself, it's worth my making the effort to learn SVN. We have coasted on the back of FreeBSD without giving back for long enough. Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Major SMP problems with lstat/namei
Jeff Wheelhouse wrote: On Sep 24, 2008, at 6:12 AM, Ivan Voras wrote: There is nothing that can be done within the 6.x branch. 7.x contains many improvements but I think only 8.x will directly change the lockmgr and the namei cache. The best things you can try right now is to use 7-STABLE (or soon to be released 7.1; you might need tuning with 7.0-RELEASE) or try 8-CURRENT (it's quite stable). Really? Nothing? We get lockmgr-related panics on FreeBSD 7.0, as detailed elsewhere on this list. Stability issues aside, what else would we need to tune on 7.0, besides enabling the ULE scheduler, and how much benefit would we really get? These servers are in production, so 8-CURRENT is not an option. I've already had my knuckles rapped by a customer for trying 7.1-PRERELEASE on one of their machines. You are supposed to edit the uname info back to 7.0 before installing experimental 7.1 systems! Didn't you get the memo? Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]