I appear to be having the same problem on a HP MicroServer N54L. Messages: /sbin/dhcpagent[95]: [ID 778557 daemon.warning] configure_v4_lease: no IP broadcast specified for bge0, making best guess rootnex: [ID 349649 kern.info] iscsi0 at root genunix: [ID 936769 kern.info] iscsi0 is /iscsi pseudo: [ID 129642 kern.info] pseudo-device: dtrace0 genunix: [ID 936769 kern.info] dtrace0 is /pseudo/dtrace@0 klmmod: [ID 814159 kern.notice] NOTICE: Failed to connect to local statd (rpcerr=5) /usr/lib/nfs/lockd[473]: [ID 491006 daemon.error] Cannot establish NLM service over <file desc. 9, protocol udp> : I/O error. Exiting svc.startd[10]: [ID 652011 daemon.warning] svc:/network/nfs/nlockmgr:default: Method "/lib/svc/method/nlockmgr" failed with exit status 1. klmmod: [ID 814159 kern.notice] NOTICE: Failed to connect to local statd (rpcerr=5) /usr/lib/nfs/lockd[534]: [ID 491006 daemon.error] Cannot establish NLM service over <file desc. 9, protocol udp> : I/O error. Exiting svc.startd[10]: [ID 652011 daemon.warning] svc:/network/nfs/nlockmgr:default: Method "/lib/svc/method/nlockmgr" failed with exit status 1. pseudo: [ID 129642 kern.info] pseudo-device: pool0 genunix: [ID 936769 kern.info] pool0 is /pseudo/pool@0 klmmod: [ID 814159 kern.notice] NOTICE: Failed to connect to local statd (rpcerr=5) /usr/lib/nfs/lockd[537]: [ID 491006 daemon.error] Cannot establish NLM service over <file desc. 9, protocol udp> : I/O error. Exiting svc.startd[10]: [ID 652011 daemon.warning] svc:/network/nfs/nlockmgr:default: Method "/lib/svc/method/nlockmgr" failed with exit status 1. svc.startd[10]: [ID 748625 daemon.error] network/nfs/nlockmgr:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Rebooting sometimes makes the nlockmgr problem go away, but sometimes it does not. If there is anything I can do diagnostics-wise let me know. I have no clue. 2014-10-10 21:09 GMT+02:00 Schweiss, Chip <c...@innovates.com>: > Apparently something common in my OmniOS setup is triggering this. I > have no idea what yet, and I'm feeling green at digging through this > issue. > > On one of my VMs for doing script development, I exported the data pool > planing to test importing it with a different cache location and the > problem immediately happened. Now I cannot get nlockmgr to start at all > on this VM. I tried disabling all nfs services and re-enabling. Still > failing with /usr/lib/nfs/lockd[862]: [ID 491006 daemon.error] Cannot > establish NLM service over <file desc. 9, protocol udp> : I/O error. Exiting > > root@ZFSsendTest1:/root# svcs -a|grep nfs > disabled 13:47:05 svc:/network/nfs/log:default > disabled 13:47:11 svc:/network/nfs/rquota:default > disabled 13:55:05 svc:/network/nfs/server:default > disabled 13:55:32 svc:/network/nfs/nlockmgr:default > disabled 13:55:32 svc:/network/nfs/mapid:default > disabled 13:55:32 svc:/network/nfs/status:default > disabled 13:55:32 svc:/network/nfs/client:default > disabled 13:55:57 svc:/network/nfs/cbd:default > root@ZFSsendTest1:/root# svcadm enable svc:/network/nfs/status:default > svc:/network/nfs/cbd:default svc:/network/nfs/mapid:default > svc:/network/nfs/server:default svc:/network/nfs/nlockmgr:default > root@ZFSsendTest1:/root# svcs -a|grep nfs > disabled 13:47:05 svc:/network/nfs/log:default > disabled 13:47:11 svc:/network/nfs/rquota:default > disabled 13:55:32 svc:/network/nfs/client:default > online 13:56:56 svc:/network/nfs/status:default > online 13:56:56 svc:/network/nfs/cbd:default > online 13:56:56 svc:/network/nfs/mapid:default > offline 13:56:56 svc:/network/nfs/server:default > offline* 13:56:56 svc:/network/nfs/nlockmgr:default > root@ZFSsendTest1:/root# svcs -a|grep nfs > disabled 13:47:05 svc:/network/nfs/log:default > disabled 13:47:11 svc:/network/nfs/rquota:default > disabled 13:55:32 svc:/network/nfs/client:default > online 13:56:56 svc:/network/nfs/status:default > online 13:56:56 svc:/network/nfs/cbd:default > online 13:56:56 svc:/network/nfs/mapid:default > offline 13:56:56 svc:/network/nfs/server:default > maintenance 13:58:11 svc:/network/nfs/nlockmgr:default > > This VM has never had RSF-1 on it, so that definitely isn't the trigger. > This VM has never exhibited this problem before today. It has been > rebooted many times. > > I wonder if the problem is triggered by exporting a pool with NFS exports > that have active client connections. That is always the case on my > production systems. This VM has one NFS client that was connected when I > exported the pool. > > Now nlockmgr dies and goes to maintenance mode regardless if I import the > data pool or not. > > Any advice on where to dig for better diagnosis of this would be > helpful. If any developers would like to get access to this VM I'd be > happy to arrange that too. > > -Chip > > > On Fri, Oct 10, 2014 at 9:26 AM, Richard Elling < > richard.ell...@richardelling.com> wrote: > >> >> On Oct 10, 2014, at 6:15 AM, "Schweiss, Chip" <c...@innovates.com> wrote: >> >> >> On Thu, Oct 9, 2014 at 9:54 PM, Dan McDonald <dan...@omniti.com> wrote: >> >>> >>> On Oct 9, 2014, at 10:23 PM, Schweiss, Chip <c...@innovates.com> wrote: >>> >>> > Just tried my 2nd system. r151010 nlockmgr starts after clearing >>> maintenance mode. r151012 it will not start at all. nfs/status was >>> enabled and online. >>> > >>> > The commonality I see on the two systems I have tried is they are both >>> part of an HA cluster. So they don't import the pool at boot, but RSF-1 >>> imports it with cache mapped to a different location. >>> >>> That could be something HA Inc. needs to further test. We don't >>> directly support RSF-1, after all. >>> >>> >> I there really isn't anything different than an auto imported pool. I'm >> suspecting using an alternate cache location my be triggering something >> else to go wrong in the nlockmgr. >> >> >> no, these are totally separate subsystems. RSF-1 imports the pool. NFS >> sharing is started by the zpool command, in userland, via sharing after the >> dataset is mounted. You can do the same procedure manually... no magic >> pixie dust needed. >> >> >> Here's the command RSF-1 uses to import the pool: >> zpool import -c /opt/HAC/RSF-1/etc/volume-cache/nrgpool.cache -o >> cachefile=/opt/HAC/RSF-1/etc/v >> olume-cache/nrgpool.cache-live -o failmode=panic nrgpool >> >> After the pool import it puts the ip addresses back and is done. That >> happens in less than 1 second. >> >> In the mean time NFS services auto start and nlockmgr starts spinning. >> >> >> Perhaps share doesn't properly start all of the services? Does it work ok >> if you manually "svcadm enable" all of the NFS services? >> >> >> -- richard >> >> >> >> >>> > nlockmgr is becoming a real show stopper. >>> >>> svcadm disable nlockmgr nfs/status >>> svcadm enable nfs/status >>> svcadm enable nlockmgr >>> >>> You may wish to discuss this on illumos as well, I'm not sure who all >>> else is seeing this save me one time, and you seemingly a lot of times. >>> >> >> I did that this time, no joy. Today I'm working on a virtual setup with >> HA to see if I can get this reproduced on r151012. >> >> I thought this nlockmgr propblem was related to lots of nfs exports >> until, I ran into this on my SSD pool. It used to be able to fail over in >> about 3-5 seconds. It takes nlockmgr now sits in a spinning state for a >> few minutes and fails every time. A clear of the maintenance mode, brings >> it back nearly instantly. This is on r151010. On r151012 it fails every >> time. >> >> Hopefully I can reproduce and I'll start a new thread copying Illumos too. >> >> -Chip >> >> >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss@lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> >> > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > >
_______________________________________________ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss