On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote:
> The most significant problem working with rpc.lockd is creating easy to 
> reproduce test cases.  Not least because they can potentially involve 
> multiple clients.  If you can help to produce simple test cases to 
> reproduce the bugs you're seeing, that would be invaluable.
> Reducing complex failure modes to easily reproduced test cases is tricky 
> also, though.  It requires careful analysis, often with ktrace and 
> tcpdump/ethereal to work out what's going on, and not a little luck to 
> perform the reduction of a large trace down to a simple test scenario.  The 
> first step is to try and figure out what, if any, specific workload results 
> in a problem.  For example, can you trigger it using work on just one 
> client against a server, without client<->client interactions?  This makes 
> tracking and reproduction a lot easier, as multi-client test cases are 
> really tricky!  Once you've established whether it can be reproduced with a 
> single client, you have to track down the behavior that triggers it -- 
> normally, this is done by attempting to narrow down the specific program or 
> sequence of events that causes the bug to trigger, removing things one at a 
> time to see what causes the problem to disappear.  This is made more 
> difficult as lock managers are sensitive to timing, so removing a high load 
> item from the list, even if it isn't the source of the problem, might cause 
> it to trigger less frequently.

I made the patch for rpc.lockd that could somewhat ease obtaining
debug information. Patch is available at

No functional changes. Patch only adds dumping of currently held locks
(as perceived by lockd) on receiving of SIGUSR1. You need to specify
debug level 2 or 3 to obtain the dump.

Also, the both lockd processes now put identification information
in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.

Attachment: pgpyMjtyKCekU.pgp
Description: PGP signature

Reply via email to