Maybe there is some bug in illumos' new NLM implementation. We need to know more to confirm.
Note: NexentaStor 4 uses the same new NLM implementation as recent illumos does. On Wed, Jan 28, 2015 at 10:58:05AM -0800, Joe Little via illumos-discuss wrote: > Two of the affected hosts had iptables blocking the return it appears. The > third host does not. Investigating further. The behavior though changed > between old OpenSolaris-based NLM and NetApps to the newer Illumos. Those > blocks have been around for quite some time. > > > On Wed, Jan 28, 2015 at 10:52 AM, Marcel Telka <[email protected]> wrote: > > > Your NFS server is trying to connect to the rpcbind on the NFS client > > machine > > and this fails (or timeouts, to be precise). There might be various > > reasons > > for that. One might be that the NFS/NLM client didn't pass the proper > > client > > name in the NLM lock request. You could confirm that by running the > > following > > dtrace oneliner: > > > > dtrace -n 'nlm_host_findcreate:entry {printf("NLM client: %s\n", > > stringof(arg1))}' > > > > and try to reproduce again. > > > > The other reason might be that you have blocked outgoing communication > > from the > > NFS server (to the NFS client), or whatever. > > > > > > HTH. > > > > > > On Wed, Jan 28, 2015 at 10:29:53AM -0800, Joe Little wrote: > > > Just bumped up the threads from 80 to 1024 (didn't help). Here's your > > > details > > > > > > root@miele:/root# echo "::svc_pool nlm" | mdb -k > > > > > > mdb: failed to add kvm_pte_chain walker: walk name already in use > > > > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use > > > > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in use > > > > > > mdb: failed to add kvm_pte_chain walker: walk name already in use > > > > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use > > > > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in use > > > > > > SVCPOOL = ffffff07fc281aa8 -> POOL ID = NLM(2) > > > > > > Non detached threads = 1 > > > > > > Detached threads = 0 > > > > > > Max threads = 1024 > > > > > > `redline' = 1 > > > > > > Reserved threads = 0 > > > > > > Thread lock = mutex not held > > > > > > Asleep threads = 0 > > > > > > Request lock = mutex not held > > > > > > Pending requests = 0 > > > > > > Walking threads = 0 > > > > > > Max requests from xprt = 8 > > > > > > Stack size for svc_run = 0 > > > > > > Creator lock = mutex not held > > > > > > No of Master xprt's = 4 > > > > > > rwlock for the mxprtlist= owner 0 > > > > > > master xprt list ptr = ffffff079cbc3800 > > > > > > > > > root@miele:/root# echo "::stacks -m klmmod" | mdb -k > > > > > > mdb: failed to add kvm_pte_chain walker: walk name already in use > > > > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use > > > > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in use > > > > > > mdb: failed to add kvm_pte_chain walker: walk name already in use > > > > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use > > > > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in use > > > > > > THREAD STATE SOBJ COUNT > > > > > > ffffff09adf52880 SLEEP CV 1 > > > > > > swtch+0x141 > > > > > > cv_timedwait_hires+0xec > > > > > > cv_reltimedwait+0x51 > > > > > > waitforack+0x5c > > > > > > connmgr_connect+0x131 > > > > > > connmgr_wrapconnect+0x138 > > > > > > connmgr_get+0x9dc > > > > > > connmgr_wrapget+0x63 > > > > > > clnt_cots_kcallit+0x18f > > > > > > rpcbind_getaddr+0x245 > > > > > > update_host_rpcbinding+0x4f > > > > > > nlm_host_get_rpc+0x6d > > > > > > nlm_do_lock+0x10d > > > > > > nlm4_lock_4_svc+0x2a > > > > > > nlm_dispatch+0xe6 > > > > > > nlm_prog_4+0x34 > > > > > > svc_getreq+0x1c1 > > > > > > svc_run+0x146 > > > > > > svc_do_run+0x8e > > > > > > nfssys+0xf1 > > > > > > _sys_sysenter_post_swapgs+0x149 > > > > > > > > > ffffff002e1fdc40 SLEEP CV 1 > > > > > > swtch+0x141 > > > > > > cv_timedwait_hires+0xec > > > > > > cv_timedwait+0x5c > > > > > > nlm_gc+0x54 > > > > > > thread_start+8 > > > > > > > > > The file server is using VLANs and from the snoop result it seems to > > > responding with NFS4 locks for a v3 request? > > > > > > > > > root@miele:/root# snoop -r -i snoop.out > > > > > > 1 0.00000 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712 > > > FH=97AA PID=2 Region=0:0 > > > > > > 2 9.99901 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712 > > > FH=97AA PID=2 Region=0:0 (retransmit) > > > > > > 3 19.99987 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712 > > > FH=97AA PID=2 Region=0:0 (retransmit) > > > > > > 4 15.00451 VLAN#3812: 172.27.72.15 -> 172.27.72.26 NLM R LOCK4 OH=3712 > > > denied (no locks) > > > > > > 5 1.70920 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812 > > > FH=97AA PID=3 Region=0:0 > > > > > > 6 9.99927 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812 > > > FH=97AA PID=3 Region=0:0 (retransmit) > > > > > > 7 20.00018 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812 > > > FH=97AA PID=3 Region=0:0 (retransmit) > > > > > > > > > > > > > > > > > > On Wed, Jan 28, 2015 at 10:22 AM, Marcel Telka <[email protected]> wrote: > > > > > > > Please show me the output from the following commands on the affected > > > > machine: > > > > > > > > echo "::svc_pool nlm" | mdb -k > > > > echo "::stacks -m klmmod" | mdb -k > > > > > > > > Then run the following command: > > > > > > > > snoop -o snoop.out rpc nlockmgr > > > > > > > > Then reproduce the problem (to see the ENOLCK error) and then Ctrl+C > > the > > > > snoop. > > > > Make sure there is something in the snoop.out file and send the file > > to me. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > On Wed, Jan 28, 2015 at 10:06:39AM -0800, Joe Little via > > illumos-discuss > > > > wrote: > > > > > Forwarded this as requested by OmniTI > > > > > > > > > > ---------- Forwarded message ---------- > > > > > From: Joe Little <[email protected]> > > > > > Date: Wed, Jan 28, 2015 at 8:49 AM > > > > > Subject: NFS v3 locking broken in latest OmniOS r151012 and updates > > > > > To: [email protected] > > > > > > > > > > > > > > > I recently switched one file server from Nexenta 3 and then Nexenta 4 > > > > > Community (still uses closed NLM I believe) to OmniOS r151012. > > > > > > > > > > Immediately, users started to complain from various Linux clients > > that > > > > > locking was failing. Most of those clients explicitly set their NFS > > > > version > > > > > to 3. I finally isolated that the locking does not fail on NFS v4 and > > > > have > > > > > worked on transition where possible. But presently, no NFS v3 client > > and > > > > > successfully lock against OmniOS NFS v3 locking service. I've > > confirmed > > > > > that the locking service is running and is present using rpcinfo, > > > > matching > > > > > one for one in services from previous OpenSolaris and Illumos > > variants. > > > > One > > > > > example from a user: > > > > > > > > > > $ strace /bin/tcsh > > > > > > > > > > [...] > > > > > > > > > > open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 > > > > > > > > > > dup(0) = 1 > > > > > > > > > > dup(1) = 2 > > > > > > > > > > dup(2) = 3 > > > > > > > > > > dup(3) = 4 > > > > > > > > > > dup(4) = 5 > > > > > > > > > > dup(5) = 6 > > > > > > > > > > close(5) = 0 > > > > > > > > > > close(4) = 0 > > > > > > > > > > close(3) = 0 > > > > > > > > > > close(2) = 0 > > > > > > > > > > close(1) = 0 > > > > > > > > > > close(0) = 0 > > > > > > > > > > fcntl(6, F_SETFD, FD_CLOEXEC) = 0 > > > > > > > > > > fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) > > > > > > > > > > > > > > > HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No > > > > > > > > > > locks available)" > > > > > > > > > > > > > > > > > > > > ------------------------------------------- > > > > > illumos-discuss > > > > > Archives: https://www.listbox.com/member/archive/182180/=now > > > > > RSS Feed: > > > > https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8 > > > > > Modify Your Subscription: > > > > > > https://www.listbox.com/member/?& > > > > > Powered by Listbox: http://www.listbox.com > > > > > > > > -- > > > > +-------------------------------------------+ > > > > | Marcel Telka e-mail: [email protected] | > > > > | homepage: http://telka.sk/ | > > > > | jabber: [email protected] | > > > > +-------------------------------------------+ > > > > > > > > > > > > -- > > +-------------------------------------------+ > > | Marcel Telka e-mail: [email protected] | > > | homepage: http://telka.sk/ | > > | jabber: [email protected] | > > +-------------------------------------------+ > > > > > > ------------------------------------------- > illumos-discuss > Archives: https://www.listbox.com/member/archive/182180/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8 > Modify Your Subscription: https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com -- +-------------------------------------------+ | Marcel Telka e-mail: [email protected] | | homepage: http://telka.sk/ | | jabber: [email protected] | +-------------------------------------------+ ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
