On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote: > The most significant problem working with rpc.lockd is creating easy to > reproduce test cases. Not least because they can potentially involve > multiple clients. If you can help to produce simple test cases to > reproduce the bugs you're seeing, that would be invaluable. > ........ > > Reducing complex failure modes to easily reproduced test cases is tricky > also, though. It requires careful analysis, often with ktrace and > tcpdump/ethereal to work out what's going on, and not a little luck to > perform the reduction of a large trace down to a simple test scenario. The > first step is to try and figure out what, if any, specific workload results > in a problem. For example, can you trigger it using work on just one > client against a server, without client<->client interactions? This makes > tracking and reproduction a lot easier, as multi-client test cases are > really tricky! Once you've established whether it can be reproduced with a > single client, you have to track down the behavior that triggers it -- > normally, this is done by attempting to narrow down the specific program or > sequence of events that causes the bug to trigger, removing things one at a > time to see what causes the problem to disappear. This is made more > difficult as lock managers are sensitive to timing, so removing a high load > item from the list, even if it isn't the source of the problem, might cause > it to trigger less frequently.
I made the patch for rpc.lockd that could somewhat ease obtaining debug information. Patch is available at http://people.freebsd.org/~kib/rpc.lockd-debug.patch No functional changes. Patch only adds dumping of currently held locks (as perceived by lockd) on receiving of SIGUSR1. You need to specify debug level 2 or 3 to obtain the dump. Also, the both lockd processes now put identification information in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.
pgpyMjtyKCekU.pgp
Description: PGP signature