Re: Namecache lock contention?

2011-01-28 Thread John Baldwin
On Friday, January 28, 2011 8:46:07 am Ivan Voras wrote:
 I have this situation on a PHP server:
 
 36623 www 1  760   237M 30600K *Name   6   0:14 47.27% php-cgi
 36638 www 1  760   237M 30600K *Name   3   0:14 46.97% php-cgi
 36628 www 1 1050   237M 30600K *Name   2   0:14 46.88% php-cgi
 36627 www 1 1050   237M 30600K *Name   0   0:14 46.78% php-cgi
 36639 www 1 1050   237M 30600K *Name   5   0:14 46.58% php-cgi
 36643 www 1 1050   237M 30600K *Name   7   0:14 46.39% php-cgi
 36629 www 1  760   237M 30600K *Name   1   0:14 46.39% php-cgi
 36642 www 1 1050   237M 30600K *Name   2   0:14 46.39% php-cgi
 36626 www 1 1050   237M 30600K *Name   5   0:14 46.19% php-cgi
 36654 www 1 1050   237M 30600K *Name   7   0:13 46.19% php-cgi
 36645 www 1 1050   237M 30600K *Name   1   0:14 45.75% php-cgi
 36625 www 1 1050   237M 30600K *Name   0   0:14 45.56% php-cgi
 36624 www 1 1050   237M 30600K *Name   6   0:14 45.56% php-cgi
 36630 www 1  760   237M 30600K *Name   7   0:14 45.17% php-cgi
 36631 www 1 1050   237M 30600K RUN 4   0:14 45.17% php-cgi
 36636 www 1 1050   237M 30600K *Name   3   0:14 44.87% php-cgi
 
 It looks like periodically most or all of the php-cgi processes are 
 blocked in *Name for long enough that top notices, then continue, 
 probably in a thundering herd way. From grepping inside /sys the most 
 likely suspect seems to be something in the namecache, but I can't find 
 exactly a symbol named Name or string beginning with Name that would 
 be connected to a lock.

In vfs_cache.c:

static struct rwlock cache_lock;
RW_SYSINIT(vfscache, cache_lock, Name Cache);

What are the php scripts doing?  Do they all try to create and delete files at 
the same time (or do renames)?

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Namecache lock contention?

2011-01-28 Thread Dan Nelson
In the last episode (Jan 28), Ivan Voras said:
 I have this situation on a PHP server:
 
 36623 www 1  760   237M 30600K *Name   6   0:14 47.27% php-cgi
 36638 www 1  760   237M 30600K *Name   3   0:14 46.97% php-cgi
 36628 www 1 1050   237M 30600K *Name   2   0:14 46.88% php-cgi
 36627 www 1 1050   237M 30600K *Name   0   0:14 46.78% php-cgi
 36639 www 1 1050   237M 30600K *Name   5   0:14 46.58% php-cgi
 36643 www 1 1050   237M 30600K *Name   7   0:14 46.39% php-cgi
 36629 www 1  760   237M 30600K *Name   1   0:14 46.39% php-cgi
 36642 www 1 1050   237M 30600K *Name   2   0:14 46.39% php-cgi
 36626 www 1 1050   237M 30600K *Name   5   0:14 46.19% php-cgi
 36654 www 1 1050   237M 30600K *Name   7   0:13 46.19% php-cgi
 36645 www 1 1050   237M 30600K *Name   1   0:14 45.75% php-cgi
 36625 www 1 1050   237M 30600K *Name   0   0:14 45.56% php-cgi
 36624 www 1 1050   237M 30600K *Name   6   0:14 45.56% php-cgi
 36630 www 1  760   237M 30600K *Name   7   0:14 45.17% php-cgi
 36631 www 1 1050   237M 30600K RUN 4   0:14 45.17% php-cgi
 36636 www 1 1050   237M 30600K *Name   3   0:14 44.87% php-cgi
 
 It looks like periodically most or all of the php-cgi processes are
 blocked in *Name for long enough that top notices, then continue,
 probably in a thundering herd way.  From grepping inside /sys the most
 likely suspect seems to be something in the namecache, but I can't find
 exactly a symbol named Name or string beginning with Name that would
 be connected to a lock.

My guess would be:

kern/vfs_cache.c:151 static struct rwlock cache_lock;
kern/vfs_cache.c:152 RW_SYSINIT(vfscache, cache_lock, Name Cache);

The CACHE_*LOCK() macros.c in vfs_cache use cache_lock, so you've got lots
of possible contention points.  procstat -ka and/or dtrace might help you
determine exactly where.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 16:15, John Baldwin j...@freebsd.org wrote:
 On Friday, January 28, 2011 8:46:07 am Ivan Voras wrote:
 I have this situation on a PHP server:

 36623 www         1  76    0   237M 30600K *Name   6   0:14 47.27% php-cgi
 36638 www         1  76    0   237M 30600K *Name   3   0:14 46.97% php-cgi
 36628 www         1 105    0   237M 30600K *Name   2   0:14 46.88% php-cgi
 36627 www         1 105    0   237M 30600K *Name   0   0:14 46.78% php-cgi
 36639 www         1 105    0   237M 30600K *Name   5   0:14 46.58% php-cgi
 36643 www         1 105    0   237M 30600K *Name   7   0:14 46.39% php-cgi
 36629 www         1  76    0   237M 30600K *Name   1   0:14 46.39% php-cgi
 36642 www         1 105    0   237M 30600K *Name   2   0:14 46.39% php-cgi
 36626 www         1 105    0   237M 30600K *Name   5   0:14 46.19% php-cgi
 36654 www         1 105    0   237M 30600K *Name   7   0:13 46.19% php-cgi
 36645 www         1 105    0   237M 30600K *Name   1   0:14 45.75% php-cgi
 36625 www         1 105    0   237M 30600K *Name   0   0:14 45.56% php-cgi
 36624 www         1 105    0   237M 30600K *Name   6   0:14 45.56% php-cgi
 36630 www         1  76    0   237M 30600K *Name   7   0:14 45.17% php-cgi
 36631 www         1 105    0   237M 30600K RUN     4   0:14 45.17% php-cgi
 36636 www         1 105    0   237M 30600K *Name   3   0:14 44.87% php-cgi

 It looks like periodically most or all of the php-cgi processes are
 blocked in *Name for long enough that top notices, then continue,
 probably in a thundering herd way. From grepping inside /sys the most
 likely suspect seems to be something in the namecache, but I can't find
 exactly a symbol named Name or string beginning with Name that would
 be connected to a lock.

 In vfs_cache.c:

 static struct rwlock cache_lock;
 RW_SYSINIT(vfscache, cache_lock, Name Cache);

You're right, I misread it as SYSCTL at a glance.

 What are the php scripts doing?  Do they all try to create and delete files at
 the same time (or do renames)?

Right again - they do simultaneously create session files and in rare
occasions (1%) delete them. These are sharded into a two-level
directory structure by single letter (/storage/a/b/file, i.e. 32^2
directories); dirhash is large enough.

During all this, the web server did around 60 PHP pages per second so
it doesn't look to me like there should be such noticable contention
(i.e. at most, there are 60 files/s created and on average 60/100
deletes). The file system is on softupdates, there's only light IO.

Typical vmstat is:

 procs  memory  pagedisks faults cpu
 r b w avmfre   flt  re  pi  pofr  sr da0 da1   in   sy
cs us sy id

17 0 0   8730M  1240M 3   0   0   0   206   0   1   0 1948 266928
15079 65 34  1
19 0 0   8730M  1240M 0   0   0   0   290   0   1  24 1835 260618
15132 63 35  2
 7 0 0   8730M  1239M 0   0   0   0   200   0   0   0 1822 260783
14851 63 35  2
16 0 0   8730M  1239M 0   0   0   0   199   0 788   0 2744 259902
20465 61 37  2
16 0 0   8730M  1239M 0   0   0   0   210   0   0   0 1755 265081
17564 61 37  2

(8 cores; around 35% sys load across them - I'm trying to find out why).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 16:25, Dan Nelson dnel...@allantgroup.com wrote:

 My guess would be:

 kern/vfs_cache.c:151 static struct rwlock cache_lock;
 kern/vfs_cache.c:152 RW_SYSINIT(vfscache, cache_lock, Name Cache);

 The CACHE_*LOCK() macros.c in vfs_cache use cache_lock, so you've got lots
 of possible contention points.  procstat -ka and/or dtrace might help you
 determine exactly where.

I'm new with dtrace so I tried this:

lockstat:::rw-block
{
@traces[stack()] = count();
}

with these results:

http://ivoras.net/stuff/rw-block.txt

It's informative because most of the traces are namecache-related. As
suspected, the most blocking occurs in stat().

As this is a rwlock I'd interpret it as waiting for a write lock to be
lifted so the readers can acquire it, but I need to confirm this as
there's a lot of things that can in theory be stat()ed here.

I'm going to continue investigating with dtrace but I'd appreciate
pointers on how to make the output more useful (like including
filenames from stat()).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Namecache lock contention?

2011-01-28 Thread Gleb Kurtsou
On (28/01/2011 18:20), Ivan Voras wrote:
 On 28 January 2011 16:25, Dan Nelson dnel...@allantgroup.com wrote:
 
  My guess would be:
 
  kern/vfs_cache.c:151 static struct rwlock cache_lock;
  kern/vfs_cache.c:152 RW_SYSINIT(vfscache, cache_lock, Name Cache);
 
  The CACHE_*LOCK() macros.c in vfs_cache use cache_lock, so you've got lots
  of possible contention points.  procstat -ka and/or dtrace might help you
  determine exactly where.
 
 I'm new with dtrace so I tried this:
 
 lockstat:::rw-block
 {
 @traces[stack()] = count();
 }
 
 with these results:
 
 http://ivoras.net/stuff/rw-block.txt
 
 It's informative because most of the traces are namecache-related. As
 suspected, the most blocking occurs in stat().
 
 As this is a rwlock I'd interpret it as waiting for a write lock to be
 lifted so the readers can acquire it, but I need to confirm this as
 there's a lot of things that can in theory be stat()ed here.
 
 I'm going to continue investigating with dtrace but I'd appreciate
 pointers on how to make the output more useful (like including
 filenames from stat()).

You could try replacing rwlock with plain mutex to check if there are
priority propagation issues among readers/writers. SX locks should also
work but would likely to be a considerable performance regression.

Finding out home much activity is there outside of storage/a/b/file
(sessions storage) could also be helpful.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 22:18, Gleb Kurtsou gleb.kurt...@gmail.com wrote:

 You could try replacing rwlock with plain mutex to check if there are
 priority propagation issues among readers/writers.

How would that manifest? (i.e. how would it be detectable)

 SX locks should also
 work but would likely to be a considerable performance regression.

With mutexes I'd lose all shared (read) acquisitions so I doubt sx
locks would do much more harm :)

 Finding out home much activity is there outside of storage/a/b/file
 (sessions storage) could also be helpful.

Here's more information:

* The session storage (i.e. mostly file creates / writes in this
particular workload) is on a separate file system than the core of the
application (which only does reads)

* The dtrace output I've send is from around thirty seconds of
operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so
the processes are persistent instead of constantly respawning). In
these 2000 runs there have been around 20,000 rw-block events in
cache_lookup - which is strange.

* Here's another dtrace output without rwlock mutex inlining, showing
a different picture than what I've earlier thought: most rw-blocks
events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt  --
there are also some blocks without a rwlock function in the trace; I
don't understand how rwlock inlining is implemented, maybe the readers
are always inlined?

Next step - find out how to make dtrace print files for which this happens.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Namecache lock contention?

2011-01-28 Thread Gleb Kurtsou
On (28/01/2011 22:59), Ivan Voras wrote:
 On 28 January 2011 22:18, Gleb Kurtsou gleb.kurt...@gmail.com wrote:
 
  You could try replacing rwlock with plain mutex to check if there are
  priority propagation issues among readers/writers.
 
 How would that manifest? (i.e. how would it be detectable)
Benchmark. If there are prio propagation issues mutex version should be
faster than original locking. Let me know if you need help with patch.

  SX locks should also
  work but would likely to be a considerable performance regression.
 
 With mutexes I'd lose all shared (read) acquisitions so I doubt sx
 locks would do much more harm :)
Yeah, but sxlocks are more heavy weight. The question is what actual
readers/writers ratio in your test is, e.g. all namecache entries for a
dir may be purged on rename.

  Finding out home much activity is there outside of storage/a/b/file
  (sessions storage) could also be helpful.
 
 Here's more information:
 
 * The session storage (i.e. mostly file creates / writes in this
 particular workload) is on a separate file system than the core of the
 application (which only does reads)
Changing namecache lock to become per-file system would hardly improve
situation in general.

 * The dtrace output I've send is from around thirty seconds of
 operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so
 the processes are persistent instead of constantly respawning). In
 these 2000 runs there have been around 20,000 rw-block events in
 cache_lookup - which is strange.
Are there rename, rmdir calls? - these purge namecache.
If cache is empty, VOP_LOOKUP acquires write lock to populate the cache.

 * Here's another dtrace output without rwlock mutex inlining, showing
 a different picture than what I've earlier thought: most rw-blocks
 events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt  --
 there are also some blocks without a rwlock function in the trace; I
 don't understand how rwlock inlining is implemented, maybe the readers
 are always inlined?
Add options RWLOCK_NOINLINE, recompiling with -O0 might also be good
idea.

 
 Next step - find out how to make dtrace print files for which this happens.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 23:37, Gleb Kurtsou gleb.kurt...@gmail.com wrote:

 * The dtrace output I've send is from around thirty seconds of
 operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so
 the processes are persistent instead of constantly respawning). In
 these 2000 runs there have been around 20,000 rw-block events in
 cache_lookup - which is strange.

 Are there rename, rmdir calls? - these purge namecache.
 If cache is empty, VOP_LOOKUP acquires write lock to populate the cache.

No, only creates and deletes on files, no directory operations at all.

 * Here's another dtrace output without rwlock mutex inlining, showing
 a different picture than what I've earlier thought: most rw-blocks
 events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt  --
 there are also some blocks without a rwlock function in the trace; I
 don't understand how rwlock inlining is implemented, maybe the readers
 are always inlined?
 Add options RWLOCK_NOINLINE, recompiling with -O0 might also be good
 idea.

That's what I meant by without rwlock mutex inlining. The default
-O2 is enough - aggressive inlining only begins at -O3.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org