Michael Tratz wrote: > > On Aug 15, 2013, at 2:39 PM, Rick Macklem <rmack...@uoguelph.ca> > wrote: > > > Michael Tratz wrote: > >> > >> On Jul 27, 2013, at 11:25 PM, Konstantin Belousov > >> <kostik...@gmail.com> wrote: > >> > >>> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote: > >>>> Let's assume the pid which started the deadlock is 14001 (it > >>>> will > >>>> be a different pid when we get the results, because the machine > >>>> has been restarted) > >>>> > >>>> I type: > >>>> > >>>> show proc 14001 > >>>> > >>>> I get the thread numbers from that output and type: > >>>> > >>>> show thread xxxxx > >>>> > >>>> for each one. > >>>> > >>>> And a trace for each thread with the command? > >>>> > >>>> tr xxxx > >>>> > >>>> Anything else I should try to get or do? Or is that not the data > >>>> at all you are looking for? > >>>> > >>> Yes, everything else which is listed in the 'debugging deadlocks' > >>> page > >>> must be provided, otherwise the deadlock cannot be tracked. > >>> > >>> The investigator should be able to see the whole deadlock chain > >>> (loop) > >>> to make any useful advance. > >> > >> Ok, I have made some excellent progress in debugging the NFS > >> deadlock. > >> > >> Rick! You are genius. :-) You found the right commit r250907 > >> (dated > >> May 22) is the definitely the problem. > >> > >> Here is how I did the testing: One machine received a kernel > >> before > >> r250907, the second machine received a kernel after r250907. Sure > >> enough within a few hours the machine with r250907 went into the > >> usual deadlock state. The machine without that commit kept on > >> working fine. Then I went back to the latest revision (r253726), > >> but > >> leaving r250907 out. The machines have been running happy and rock > >> solid without any deadlocks. I have expanded the testing to 3 > >> machines now and no reports of any issues. > >> > >> I guess now Konstantin has to figure out why that commit is > >> causing > >> the deadlock. Lovely! :-) I will get that information as soon as > >> possible. I'm a little behind with normal work load, but I expect > >> to > >> have the data by Tuesday evening or Wednesday. > >> > > Have you been able to pass the debugging info on to Kostik? > > > > It would be really nice to get this fixed for FreeBSD9.2. > > > > Thanks for your help with this, rick > > Sorry Rick, I wasn't able to get you guys that info quickly enough. I > thought I would have enough time, before my own wedding and > honeymoon came along, but everything went a little crazy and > stressful. I didn't think it would be this nuts. :-) > > I'm caught up with everything and from what I can see from the > discussions is that we know now what the problem is. > > I can report that the machines which I have had without r250907 have > been running without any problems for 27+ days. > > If you need me to test any new patches, please let me know. If I > should test with the partial merge of r253927 I'll be happy to do > so. > It's up to you, but you might want to wait until the other tester (J. David?) reports back on success/failure.
Thanks for your help with this, rick > Thanks, > > Michael > > > > > _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"