Re: Namecache lock contention?
On Friday, January 28, 2011 8:46:07 am Ivan Voras wrote: I have this situation on a PHP server: 36623 www 1 760 237M 30600K *Name 6 0:14 47.27% php-cgi 36638 www 1 760 237M 30600K *Name 3 0:14 46.97% php-cgi 36628 www 1 1050 237M 30600K *Name 2 0:14 46.88% php-cgi 36627 www 1 1050 237M 30600K *Name 0 0:14 46.78% php-cgi 36639 www 1 1050 237M 30600K *Name 5 0:14 46.58% php-cgi 36643 www 1 1050 237M 30600K *Name 7 0:14 46.39% php-cgi 36629 www 1 760 237M 30600K *Name 1 0:14 46.39% php-cgi 36642 www 1 1050 237M 30600K *Name 2 0:14 46.39% php-cgi 36626 www 1 1050 237M 30600K *Name 5 0:14 46.19% php-cgi 36654 www 1 1050 237M 30600K *Name 7 0:13 46.19% php-cgi 36645 www 1 1050 237M 30600K *Name 1 0:14 45.75% php-cgi 36625 www 1 1050 237M 30600K *Name 0 0:14 45.56% php-cgi 36624 www 1 1050 237M 30600K *Name 6 0:14 45.56% php-cgi 36630 www 1 760 237M 30600K *Name 7 0:14 45.17% php-cgi 36631 www 1 1050 237M 30600K RUN 4 0:14 45.17% php-cgi 36636 www 1 1050 237M 30600K *Name 3 0:14 44.87% php-cgi It looks like periodically most or all of the php-cgi processes are blocked in *Name for long enough that top notices, then continue, probably in a thundering herd way. From grepping inside /sys the most likely suspect seems to be something in the namecache, but I can't find exactly a symbol named Name or string beginning with Name that would be connected to a lock. In vfs_cache.c: static struct rwlock cache_lock; RW_SYSINIT(vfscache, cache_lock, Name Cache); What are the php scripts doing? Do they all try to create and delete files at the same time (or do renames)? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
In the last episode (Jan 28), Ivan Voras said: I have this situation on a PHP server: 36623 www 1 760 237M 30600K *Name 6 0:14 47.27% php-cgi 36638 www 1 760 237M 30600K *Name 3 0:14 46.97% php-cgi 36628 www 1 1050 237M 30600K *Name 2 0:14 46.88% php-cgi 36627 www 1 1050 237M 30600K *Name 0 0:14 46.78% php-cgi 36639 www 1 1050 237M 30600K *Name 5 0:14 46.58% php-cgi 36643 www 1 1050 237M 30600K *Name 7 0:14 46.39% php-cgi 36629 www 1 760 237M 30600K *Name 1 0:14 46.39% php-cgi 36642 www 1 1050 237M 30600K *Name 2 0:14 46.39% php-cgi 36626 www 1 1050 237M 30600K *Name 5 0:14 46.19% php-cgi 36654 www 1 1050 237M 30600K *Name 7 0:13 46.19% php-cgi 36645 www 1 1050 237M 30600K *Name 1 0:14 45.75% php-cgi 36625 www 1 1050 237M 30600K *Name 0 0:14 45.56% php-cgi 36624 www 1 1050 237M 30600K *Name 6 0:14 45.56% php-cgi 36630 www 1 760 237M 30600K *Name 7 0:14 45.17% php-cgi 36631 www 1 1050 237M 30600K RUN 4 0:14 45.17% php-cgi 36636 www 1 1050 237M 30600K *Name 3 0:14 44.87% php-cgi It looks like periodically most or all of the php-cgi processes are blocked in *Name for long enough that top notices, then continue, probably in a thundering herd way. From grepping inside /sys the most likely suspect seems to be something in the namecache, but I can't find exactly a symbol named Name or string beginning with Name that would be connected to a lock. My guess would be: kern/vfs_cache.c:151 static struct rwlock cache_lock; kern/vfs_cache.c:152 RW_SYSINIT(vfscache, cache_lock, Name Cache); The CACHE_*LOCK() macros.c in vfs_cache use cache_lock, so you've got lots of possible contention points. procstat -ka and/or dtrace might help you determine exactly where. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
On 28 January 2011 16:15, John Baldwin j...@freebsd.org wrote: On Friday, January 28, 2011 8:46:07 am Ivan Voras wrote: I have this situation on a PHP server: 36623 www 1 76 0 237M 30600K *Name 6 0:14 47.27% php-cgi 36638 www 1 76 0 237M 30600K *Name 3 0:14 46.97% php-cgi 36628 www 1 105 0 237M 30600K *Name 2 0:14 46.88% php-cgi 36627 www 1 105 0 237M 30600K *Name 0 0:14 46.78% php-cgi 36639 www 1 105 0 237M 30600K *Name 5 0:14 46.58% php-cgi 36643 www 1 105 0 237M 30600K *Name 7 0:14 46.39% php-cgi 36629 www 1 76 0 237M 30600K *Name 1 0:14 46.39% php-cgi 36642 www 1 105 0 237M 30600K *Name 2 0:14 46.39% php-cgi 36626 www 1 105 0 237M 30600K *Name 5 0:14 46.19% php-cgi 36654 www 1 105 0 237M 30600K *Name 7 0:13 46.19% php-cgi 36645 www 1 105 0 237M 30600K *Name 1 0:14 45.75% php-cgi 36625 www 1 105 0 237M 30600K *Name 0 0:14 45.56% php-cgi 36624 www 1 105 0 237M 30600K *Name 6 0:14 45.56% php-cgi 36630 www 1 76 0 237M 30600K *Name 7 0:14 45.17% php-cgi 36631 www 1 105 0 237M 30600K RUN 4 0:14 45.17% php-cgi 36636 www 1 105 0 237M 30600K *Name 3 0:14 44.87% php-cgi It looks like periodically most or all of the php-cgi processes are blocked in *Name for long enough that top notices, then continue, probably in a thundering herd way. From grepping inside /sys the most likely suspect seems to be something in the namecache, but I can't find exactly a symbol named Name or string beginning with Name that would be connected to a lock. In vfs_cache.c: static struct rwlock cache_lock; RW_SYSINIT(vfscache, cache_lock, Name Cache); You're right, I misread it as SYSCTL at a glance. What are the php scripts doing? Do they all try to create and delete files at the same time (or do renames)? Right again - they do simultaneously create session files and in rare occasions (1%) delete them. These are sharded into a two-level directory structure by single letter (/storage/a/b/file, i.e. 32^2 directories); dirhash is large enough. During all this, the web server did around 60 PHP pages per second so it doesn't look to me like there should be such noticable contention (i.e. at most, there are 60 files/s created and on average 60/100 deletes). The file system is on softupdates, there's only light IO. Typical vmstat is: procs memory pagedisks faults cpu r b w avmfre flt re pi pofr sr da0 da1 in sy cs us sy id 17 0 0 8730M 1240M 3 0 0 0 206 0 1 0 1948 266928 15079 65 34 1 19 0 0 8730M 1240M 0 0 0 0 290 0 1 24 1835 260618 15132 63 35 2 7 0 0 8730M 1239M 0 0 0 0 200 0 0 0 1822 260783 14851 63 35 2 16 0 0 8730M 1239M 0 0 0 0 199 0 788 0 2744 259902 20465 61 37 2 16 0 0 8730M 1239M 0 0 0 0 210 0 0 0 1755 265081 17564 61 37 2 (8 cores; around 35% sys load across them - I'm trying to find out why). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
On 28 January 2011 16:25, Dan Nelson dnel...@allantgroup.com wrote: My guess would be: kern/vfs_cache.c:151 static struct rwlock cache_lock; kern/vfs_cache.c:152 RW_SYSINIT(vfscache, cache_lock, Name Cache); The CACHE_*LOCK() macros.c in vfs_cache use cache_lock, so you've got lots of possible contention points. procstat -ka and/or dtrace might help you determine exactly where. I'm new with dtrace so I tried this: lockstat:::rw-block { @traces[stack()] = count(); } with these results: http://ivoras.net/stuff/rw-block.txt It's informative because most of the traces are namecache-related. As suspected, the most blocking occurs in stat(). As this is a rwlock I'd interpret it as waiting for a write lock to be lifted so the readers can acquire it, but I need to confirm this as there's a lot of things that can in theory be stat()ed here. I'm going to continue investigating with dtrace but I'd appreciate pointers on how to make the output more useful (like including filenames from stat()). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
On (28/01/2011 18:20), Ivan Voras wrote: On 28 January 2011 16:25, Dan Nelson dnel...@allantgroup.com wrote: My guess would be: kern/vfs_cache.c:151 static struct rwlock cache_lock; kern/vfs_cache.c:152 RW_SYSINIT(vfscache, cache_lock, Name Cache); The CACHE_*LOCK() macros.c in vfs_cache use cache_lock, so you've got lots of possible contention points. procstat -ka and/or dtrace might help you determine exactly where. I'm new with dtrace so I tried this: lockstat:::rw-block { @traces[stack()] = count(); } with these results: http://ivoras.net/stuff/rw-block.txt It's informative because most of the traces are namecache-related. As suspected, the most blocking occurs in stat(). As this is a rwlock I'd interpret it as waiting for a write lock to be lifted so the readers can acquire it, but I need to confirm this as there's a lot of things that can in theory be stat()ed here. I'm going to continue investigating with dtrace but I'd appreciate pointers on how to make the output more useful (like including filenames from stat()). You could try replacing rwlock with plain mutex to check if there are priority propagation issues among readers/writers. SX locks should also work but would likely to be a considerable performance regression. Finding out home much activity is there outside of storage/a/b/file (sessions storage) could also be helpful. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
On 28 January 2011 22:18, Gleb Kurtsou gleb.kurt...@gmail.com wrote: You could try replacing rwlock with plain mutex to check if there are priority propagation issues among readers/writers. How would that manifest? (i.e. how would it be detectable) SX locks should also work but would likely to be a considerable performance regression. With mutexes I'd lose all shared (read) acquisitions so I doubt sx locks would do much more harm :) Finding out home much activity is there outside of storage/a/b/file (sessions storage) could also be helpful. Here's more information: * The session storage (i.e. mostly file creates / writes in this particular workload) is on a separate file system than the core of the application (which only does reads) * The dtrace output I've send is from around thirty seconds of operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so the processes are persistent instead of constantly respawning). In these 2000 runs there have been around 20,000 rw-block events in cache_lookup - which is strange. * Here's another dtrace output without rwlock mutex inlining, showing a different picture than what I've earlier thought: most rw-blocks events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt -- there are also some blocks without a rwlock function in the trace; I don't understand how rwlock inlining is implemented, maybe the readers are always inlined? Next step - find out how to make dtrace print files for which this happens. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
On (28/01/2011 22:59), Ivan Voras wrote: On 28 January 2011 22:18, Gleb Kurtsou gleb.kurt...@gmail.com wrote: You could try replacing rwlock with plain mutex to check if there are priority propagation issues among readers/writers. How would that manifest? (i.e. how would it be detectable) Benchmark. If there are prio propagation issues mutex version should be faster than original locking. Let me know if you need help with patch. SX locks should also work but would likely to be a considerable performance regression. With mutexes I'd lose all shared (read) acquisitions so I doubt sx locks would do much more harm :) Yeah, but sxlocks are more heavy weight. The question is what actual readers/writers ratio in your test is, e.g. all namecache entries for a dir may be purged on rename. Finding out home much activity is there outside of storage/a/b/file (sessions storage) could also be helpful. Here's more information: * The session storage (i.e. mostly file creates / writes in this particular workload) is on a separate file system than the core of the application (which only does reads) Changing namecache lock to become per-file system would hardly improve situation in general. * The dtrace output I've send is from around thirty seconds of operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so the processes are persistent instead of constantly respawning). In these 2000 runs there have been around 20,000 rw-block events in cache_lookup - which is strange. Are there rename, rmdir calls? - these purge namecache. If cache is empty, VOP_LOOKUP acquires write lock to populate the cache. * Here's another dtrace output without rwlock mutex inlining, showing a different picture than what I've earlier thought: most rw-blocks events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt -- there are also some blocks without a rwlock function in the trace; I don't understand how rwlock inlining is implemented, maybe the readers are always inlined? Add options RWLOCK_NOINLINE, recompiling with -O0 might also be good idea. Next step - find out how to make dtrace print files for which this happens. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
On 28 January 2011 23:37, Gleb Kurtsou gleb.kurt...@gmail.com wrote: * The dtrace output I've send is from around thirty seconds of operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so the processes are persistent instead of constantly respawning). In these 2000 runs there have been around 20,000 rw-block events in cache_lookup - which is strange. Are there rename, rmdir calls? - these purge namecache. If cache is empty, VOP_LOOKUP acquires write lock to populate the cache. No, only creates and deletes on files, no directory operations at all. * Here's another dtrace output without rwlock mutex inlining, showing a different picture than what I've earlier thought: most rw-blocks events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt -- there are also some blocks without a rwlock function in the trace; I don't understand how rwlock inlining is implemented, maybe the readers are always inlined? Add options RWLOCK_NOINLINE, recompiling with -O0 might also be good idea. That's what I meant by without rwlock mutex inlining. The default -O2 is enough - aggressive inlining only begins at -O3. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org