I'll chime back in here. Actually, all three had iptables as closed clients
with nfs mounts. Previously, with Nexenta 3 and also with my NetApps, the
rpcbind ports did not need to be accessible for locking to work with v3.
The new behavior is limited to my Illumos-based NFS server (omniOS). I
don't think I had Nexenta 4 running on this machine long enough to isolate
its behavior. I had to switch over to OmniOS to stabilize one specific
aspect for production, and perhaps this behavior exists in Nexenta 4 as
well.

My other Nexenta 4 nodes don't tend to have much direct user interaction
(and thus file locking).


On Wed, Jan 28, 2015 at 11:02 AM, Marcel Telka <[email protected]> wrote:

> Maybe there is some bug in illumos' new NLM implementation.  We need to
> know
> more to confirm.
>
> Note: NexentaStor 4 uses the same new NLM implementation as recent illumos
> does.
>
> On Wed, Jan 28, 2015 at 10:58:05AM -0800, Joe Little via illumos-discuss
> wrote:
> > Two of the affected hosts had iptables blocking the return it appears.
> The
> > third host does not. Investigating further. The behavior though changed
> > between old OpenSolaris-based NLM and NetApps to the newer Illumos. Those
> > blocks have been around for quite some time.
> >
> >
> > On Wed, Jan 28, 2015 at 10:52 AM, Marcel Telka <[email protected]> wrote:
> >
> > > Your NFS server is trying to connect to the rpcbind on the NFS client
> > > machine
> > > and this fails (or timeouts, to be precise).  There might be various
> > > reasons
> > > for that. One might be that the NFS/NLM client didn't pass the proper
> > > client
> > > name in the NLM lock request.  You could confirm that by running the
> > > following
> > > dtrace oneliner:
> > >
> > > dtrace -n 'nlm_host_findcreate:entry {printf("NLM client: %s\n",
> > > stringof(arg1))}'
> > >
> > > and try to reproduce again.
> > >
> > > The other reason might be that you have blocked outgoing communication
> > > from the
> > > NFS server (to the NFS client), or whatever.
> > >
> > >
> > > HTH.
> > >
> > >
> > > On Wed, Jan 28, 2015 at 10:29:53AM -0800, Joe Little wrote:
> > > > Just bumped up the threads from 80 to 1024 (didn't help). Here's your
> > > > details
> > > >
> > > > root@miele:/root# echo "::svc_pool nlm" | mdb -k
> > > >
> > > > mdb: failed to add kvm_pte_chain walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in
> use
> > > >
> > > > mdb: failed to add kvm_pte_chain walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in
> use
> > > >
> > > > SVCPOOL = ffffff07fc281aa8 -> POOL ID = NLM(2)
> > > >
> > > > Non detached threads    = 1
> > > >
> > > > Detached threads        = 0
> > > >
> > > > Max threads             = 1024
> > > >
> > > > `redline'               = 1
> > > >
> > > > Reserved threads        = 0
> > > >
> > > > Thread lock     = mutex not held
> > > >
> > > > Asleep threads          = 0
> > > >
> > > > Request lock    = mutex not held
> > > >
> > > > Pending requests        = 0
> > > >
> > > > Walking threads         = 0
> > > >
> > > > Max requests from xprt  = 8
> > > >
> > > > Stack size for svc_run  = 0
> > > >
> > > > Creator lock    = mutex not held
> > > >
> > > > No of Master xprt's     = 4
> > > >
> > > > rwlock for the mxprtlist= owner 0
> > > >
> > > > master xprt list ptr    = ffffff079cbc3800
> > > >
> > > >
> > > > root@miele:/root# echo "::stacks -m klmmod" | mdb -k
> > > >
> > > > mdb: failed to add kvm_pte_chain walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in
> use
> > > >
> > > > mdb: failed to add kvm_pte_chain walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_rmap_desc walker: walk name already in use
> > > >
> > > > mdb: failed to add kvm_mmu_page_header walker: walk name already in
> use
> > > >
> > > > THREAD           STATE    SOBJ                COUNT
> > > >
> > > > ffffff09adf52880 SLEEP    CV                      1
> > > >
> > > >                  swtch+0x141
> > > >
> > > >                  cv_timedwait_hires+0xec
> > > >
> > > >                  cv_reltimedwait+0x51
> > > >
> > > >                  waitforack+0x5c
> > > >
> > > >                  connmgr_connect+0x131
> > > >
> > > >                  connmgr_wrapconnect+0x138
> > > >
> > > >                  connmgr_get+0x9dc
> > > >
> > > >                  connmgr_wrapget+0x63
> > > >
> > > >                  clnt_cots_kcallit+0x18f
> > > >
> > > >                  rpcbind_getaddr+0x245
> > > >
> > > >                  update_host_rpcbinding+0x4f
> > > >
> > > >                  nlm_host_get_rpc+0x6d
> > > >
> > > >                  nlm_do_lock+0x10d
> > > >
> > > >                  nlm4_lock_4_svc+0x2a
> > > >
> > > >                  nlm_dispatch+0xe6
> > > >
> > > >                  nlm_prog_4+0x34
> > > >
> > > >                  svc_getreq+0x1c1
> > > >
> > > >                  svc_run+0x146
> > > >
> > > >                  svc_do_run+0x8e
> > > >
> > > >                  nfssys+0xf1
> > > >
> > > >                  _sys_sysenter_post_swapgs+0x149
> > > >
> > > >
> > > > ffffff002e1fdc40 SLEEP    CV                      1
> > > >
> > > >                  swtch+0x141
> > > >
> > > >                  cv_timedwait_hires+0xec
> > > >
> > > >                  cv_timedwait+0x5c
> > > >
> > > >                  nlm_gc+0x54
> > > >
> > > >                  thread_start+8
> > > >
> > > >
> > > > The file server is using VLANs and from the snoop result it seems to
> > > > responding with NFS4 locks for a v3 request?
> > > >
> > > >
> > > > root@miele:/root# snoop -r -i snoop.out
> > > >
> > > >   1   0.00000 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4
> OH=3712
> > > > FH=97AA PID=2 Region=0:0
> > > >
> > > >   2   9.99901 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4
> OH=3712
> > > > FH=97AA PID=2 Region=0:0 (retransmit)
> > > >
> > > >   3  19.99987 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4
> OH=3712
> > > > FH=97AA PID=2 Region=0:0 (retransmit)
> > > >
> > > >   4  15.00451 VLAN#3812: 172.27.72.15 -> 172.27.72.26 NLM R LOCK4
> OH=3712
> > > > denied (no locks)
> > > >
> > > >   5   1.70920 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4
> OH=3812
> > > > FH=97AA PID=3 Region=0:0
> > > >
> > > >   6   9.99927 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4
> OH=3812
> > > > FH=97AA PID=3 Region=0:0 (retransmit)
> > > >
> > > >   7  20.00018 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4
> OH=3812
> > > > FH=97AA PID=3 Region=0:0 (retransmit)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jan 28, 2015 at 10:22 AM, Marcel Telka <[email protected]>
> wrote:
> > > >
> > > > > Please show me the output from the following commands on the
> affected
> > > > > machine:
> > > > >
> > > > > echo "::svc_pool nlm" | mdb -k
> > > > > echo "::stacks -m klmmod" | mdb -k
> > > > >
> > > > > Then run the following command:
> > > > >
> > > > > snoop -o snoop.out rpc nlockmgr
> > > > >
> > > > > Then reproduce the problem (to see the ENOLCK error) and then
> Ctrl+C
> > > the
> > > > > snoop.
> > > > > Make sure there is something in the snoop.out file and send the
> file
> > > to me.
> > > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > On Wed, Jan 28, 2015 at 10:06:39AM -0800, Joe Little via
> > > illumos-discuss
> > > > > wrote:
> > > > > > Forwarded this as requested by OmniTI
> > > > > >
> > > > > > ---------- Forwarded message ----------
> > > > > > From: Joe Little <[email protected]>
> > > > > > Date: Wed, Jan 28, 2015 at 8:49 AM
> > > > > > Subject: NFS v3 locking broken in latest OmniOS r151012 and
> updates
> > > > > > To: [email protected]
> > > > > >
> > > > > >
> > > > > > I recently switched one file server from Nexenta 3 and then
> Nexenta 4
> > > > > > Community (still uses closed NLM I believe) to OmniOS r151012.
> > > > > >
> > > > > > Immediately, users started to complain from various Linux clients
> > > that
> > > > > > locking was failing. Most of those clients explicitly set their
> NFS
> > > > > version
> > > > > > to 3. I finally isolated that the locking does not fail on NFS
> v4 and
> > > > > have
> > > > > > worked on transition where possible. But presently, no NFS v3
> client
> > > and
> > > > > > successfully lock against OmniOS NFS v3 locking service. I've
> > > confirmed
> > > > > > that the locking service is running and is present using rpcinfo,
> > > > > matching
> > > > > > one for one in services from previous OpenSolaris and Illumos
> > > variants.
> > > > > One
> > > > > > example from a user:
> > > > > >
> > > > > > $ strace /bin/tcsh
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0
> > > > > >
> > > > > > dup(0)                                  = 1
> > > > > >
> > > > > > dup(1)                                  = 2
> > > > > >
> > > > > > dup(2)                                  = 3
> > > > > >
> > > > > > dup(3)                                  = 4
> > > > > >
> > > > > > dup(4)                                  = 5
> > > > > >
> > > > > > dup(5)                                  = 6
> > > > > >
> > > > > > close(5)                                = 0
> > > > > >
> > > > > > close(4)                                = 0
> > > > > >
> > > > > > close(3)                                = 0
> > > > > >
> > > > > > close(2)                                = 0
> > > > > >
> > > > > > close(1)                                = 0
> > > > > >
> > > > > > close(0)                                = 0
> > > > > >
> > > > > > fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
> > > > > >
> > > > > > fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
> len=0})
> > > > > >
> > > > > >
> > > > > > HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK
> (No
> > > > > >
> > > > > > locks available)"
> > > > > >
> > > > > >
> > > > > >
> > > > > > -------------------------------------------
> > > > > > illumos-discuss
> > > > > > Archives: https://www.listbox.com/member/archive/182180/=now
> > > > > > RSS Feed:
> > > > >
> https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8
> > > > > > Modify Your Subscription:
> > > > >
> > > https://www.listbox.com/member/?&;
> > > > > > Powered by Listbox: http://www.listbox.com
> > > > >
> > > > > --
> > > > > +-------------------------------------------+
> > > > > | Marcel Telka   e-mail:   [email protected]  |
> > > > > |                homepage: http://telka.sk/ |
> > > > > |                jabber:   [email protected] |
> > > > > +-------------------------------------------+
> > > > >
> > >
> > >
> > >
> > > --
> > > +-------------------------------------------+
> > > | Marcel Telka   e-mail:   [email protected]  |
> > > |                homepage: http://telka.sk/ |
> > > |                jabber:   [email protected] |
> > > +-------------------------------------------+
> > >
> >
> >
> >
> > -------------------------------------------
> > illumos-discuss
> > Archives: https://www.listbox.com/member/archive/182180/=now
> > RSS Feed:
> https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8
> > Modify Your Subscription:
> https://www.listbox.com/member/?&;
> > Powered by Listbox: http://www.listbox.com
>
> --
> +-------------------------------------------+
> | Marcel Telka   e-mail:   [email protected]  |
> |                homepage: http://telka.sk/ |
> |                jabber:   [email protected] |
> +-------------------------------------------+
>



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to