Thanks. I'll be sure to follow your suggestions next time this happens. I have a naive question/suggestion though. I see from browsing past discussions on ZFS problems that it has been suggested a number of times that problems that appear to originate in ZFS in fact come from lower layers; in particular because of driver bugs or disks in the process of failing. It seems that it can take a lot of time to troubleshoot such problems. I accept that ZFS behavior correctly leaves dealing with timeouts to lower layers, but it seems to me that the ZFS layer would be a great place to warn the user about issues and provide some information to troubleshoot them.
For example, if some I/O requests get lost because of a buggy driver, the driver itself might not be the best place to identify those lost requests. But perhaps we could have a compile time option in ZFS code that spits out a warning if it gets stuck waiting for a particular request to come back for more than say 10 seconds, and identifies the problematic disk? I'm sure there would be cases where these warnings would be unwarranted, and I imagine that changes in the code to provide such warnings would impact performance; so one certainly would not want that code active by default. But someone in my position could certainly recompile the kernel with a ZFS debugging option turned on to figure out the problem. I understand that ZFS code comes from upstream, and that you guys probably want to keep FreeBSD-specific changes minimal. If that's a big problem, even just a patch provided "as such" that does not make it into the FreeBSD code base might be extremely useful. I wish I could help write something like that, but I know very little about the kernel or ZFS. I would certainly be willing to help with testing. Just my 2 cents worth. Thanks for the help Olivier On Thu, Dec 13, 2012 at 2:36 AM, Andriy Gapon <a...@freebsd.org> wrote: > > I decided to share here the comment that I made in private, so that more > people > could potentially benefit from it. > > on 03/12/2012 20:41 olivier olivier said the following: > > Hi all > > After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm having > > severe problems with NFS sharing of a ZFS volume. nfsd appears to hang at > > random times (between once every couple hours to once every two days) > while > > accessing a ZFS volume, and the only way I have found of resolving the > > problem is to reboot. The server console is sometimes still responsive > > during the nfsd hang, and I can read and write files to the same ZFS > volume > > while nfsd is hung. I am pasting below the output of procstat -kk on > nfsd, > > and details of my pool (nfsstat on the server gets hung when the problem > > has started occurring, and does not produce any output). The pool is v28 > > and was created from a bunch of volumes attached over Fibre Channel using > > the mpt driver. My system has a Supermicro board and 4 AMD Opteron 6274 > > CPUs. > > > > I did not experience any nfsd hangs with 9.0-RELEASE (same machine, > > essentially same configuration, same usage pattern). > > > > I would greatly appreciate any help to resolve this problem! > > > I've looked at the provided data and I do not see anything that implicates > ZFS. > My rules of the thumb for ZFS hangs: > - if there are threads in zio_wait > - if you can firm that they are indeed stuck there[*] > - if there are no threads in zio_interrupt > > [*] you have to be sure that a thread just sits in zio_wait and doesn't > make any > forward progress as opposed to the thread doing a lot of I/O and thus > having a > high probability of being seen in zio_wait. > > Then it is most likely that the problem is at the storage level. > Most likely it is a bug in storage controller driver which allowed an I/O > request > to get lost (instead of "errored out" or timed out). > > `camcontrol tags <disk> -v` can be used to query depth of a queue for each > disk > and determine the bad one. > > -- > Andriy Gapon > _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"