Re: mbuf leakage with nfs/zfs?

Robert N. M. Watson Sun, 28 Feb 2010 04:22:58 -0800

On Feb 28, 2010, at 12:11 PM, Daniel Braniss wrote:

>> I'm pulling in Robert Watson, who has some familiarity with the UDP
>> stack/code in FreeBSD.  I'm not sure he'll be a sufficient source of
>> knowledge for this specific issue since it appears (?) to be specific to
>> NFS; Rick Macklem would be a better choice, but as reported, he's MIA.
>> 
>> Robert, are you aware of any changes or implementation issues which
>> might cause excessive (read: leaking) mbuf use under UDP-based NFS?  Do
>> you know of a way folks could determine the source of the leak, either
>> via DDB or while the system is live?
> 
> I have been runing some tests in a controlled environment.
> 
> server and client are both 64bit Xeon/X5550 @  2.67GHz with 16Gb of memory
> FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads
> 
> the client is runing latest 8.0 stable
> the load is created by runing 'make -j32 buildworld' and sleeping 150 sec.
> in between runs, this is the straight line you will see in the graphs.
> Both the src and obj directories are NFS mounted from the server, regular UFS.
> 
> when server is running 7.2-stable no leakage is seen.
> see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-7.2.ps
> when server is runing 8.0-stable
> see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-8.0.ps
> you can see that udp is leaking!
> 
> cheers,
>       danny
> ps: I think the subject should be changed again, removing zfs ...


This type of problem (occurs with one client but not another) is almost always 
the result of the access pattern of a particular client triggering a specific 
(and perhaps single) bug in error-handling. For example, we might not be 
properly freeing the received request when generating an EPERM in an edge case. 
The hard bit is identifying which it is. If it's reproducible with UDP, then 
usually the process is:

- Build a minimal test case to trigger the problem -- ideally with as little 
complexity as possible.
- Run netstat -m at the beginning of the test and the end of the test on the 
server to count the number of leaked mbufs
- Run wireshark throughout the test
- Walk the wireshark trace looking for some error that occurs at about the same 
or slightly lower number of times then the number of mbufs leaked
- Iterate, narrowing the test case until it's either obvious exactly what's 
going on, or you've identified a relatively constrained code path and can just 
spot the bug by reading the code

It's almost certainly one or a small number of very specific RPCs that are 
triggering it -- maybe OpenBSD does an extra lookup, or stat, or something, on 
a name that may not exist anymore, or does it sooner than the other clients. 
Hard to say, other than to wave hands at the possibilities.

And it may well be we're looking at two bugs: Danny may see one bug, perhaps 
triggered by a race condition, but it may be different from the OpenBSD 
client-triggered bug (to be clear: it's definitely a FreeBSD bug, although we 
might only see it when an OpenBSD client is used because perhaps OpenBSD also 
has a bug or feature).

Robert_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mbuf leakage with nfs/zfs?

Reply via email to