Re: [discuss] Fwd: NFS v3 locking broken in latest OmniOS r151012 and updates

Joe Little via illumos-discuss Wed, 28 Jan 2015 10:30:34 -0800

Just bumped up the threads from 80 to 1024 (didn't help). Here's your
details


root@miele:/root# echo "::svc_pool nlm" | mdb -k

mdb: failed to add kvm_pte_chain walker: walk name already in use

mdb: failed to add kvm_rmap_desc walker: walk name already in use

mdb: failed to add kvm_mmu_page_header walker: walk name already in use

mdb: failed to add kvm_pte_chain walker: walk name already in use

mdb: failed to add kvm_rmap_desc walker: walk name already in use

mdb: failed to add kvm_mmu_page_header walker: walk name already in use

SVCPOOL = ffffff07fc281aa8 -> POOL ID = NLM(2)

Non detached threads    = 1

Detached threads        = 0

Max threads             = 1024

`redline'               = 1

Reserved threads        = 0

Thread lock     = mutex not held

Asleep threads          = 0

Request lock    = mutex not held

Pending requests        = 0

Walking threads         = 0

Max requests from xprt  = 8

Stack size for svc_run  = 0

Creator lock    = mutex not held

No of Master xprt's     = 4

rwlock for the mxprtlist= owner 0

master xprt list ptr    = ffffff079cbc3800


root@miele:/root# echo "::stacks -m klmmod" | mdb -k

mdb: failed to add kvm_pte_chain walker: walk name already in use

mdb: failed to add kvm_rmap_desc walker: walk name already in use

mdb: failed to add kvm_mmu_page_header walker: walk name already in use

mdb: failed to add kvm_pte_chain walker: walk name already in use

mdb: failed to add kvm_rmap_desc walker: walk name already in use

mdb: failed to add kvm_mmu_page_header walker: walk name already in use

THREAD           STATE    SOBJ                COUNT

ffffff09adf52880 SLEEP    CV                      1

                 swtch+0x141

                 cv_timedwait_hires+0xec

                 cv_reltimedwait+0x51

                 waitforack+0x5c

                 connmgr_connect+0x131

                 connmgr_wrapconnect+0x138

                 connmgr_get+0x9dc

                 connmgr_wrapget+0x63

                 clnt_cots_kcallit+0x18f

                 rpcbind_getaddr+0x245

                 update_host_rpcbinding+0x4f

                 nlm_host_get_rpc+0x6d

                 nlm_do_lock+0x10d

                 nlm4_lock_4_svc+0x2a

                 nlm_dispatch+0xe6

                 nlm_prog_4+0x34

                 svc_getreq+0x1c1

                 svc_run+0x146

                 svc_do_run+0x8e

                 nfssys+0xf1

                 _sys_sysenter_post_swapgs+0x149


ffffff002e1fdc40 SLEEP    CV                      1

                 swtch+0x141

                 cv_timedwait_hires+0xec

                 cv_timedwait+0x5c

                 nlm_gc+0x54

                 thread_start+8


The file server is using VLANs and from the snoop result it seems to
responding with NFS4 locks for a v3 request?


root@miele:/root# snoop -r -i snoop.out

  1   0.00000 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712
FH=97AA PID=2 Region=0:0

  2   9.99901 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712
FH=97AA PID=2 Region=0:0 (retransmit)

  3  19.99987 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712
FH=97AA PID=2 Region=0:0 (retransmit)

  4  15.00451 VLAN#3812: 172.27.72.15 -> 172.27.72.26 NLM R LOCK4 OH=3712
denied (no locks)

  5   1.70920 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812
FH=97AA PID=3 Region=0:0

  6   9.99927 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812
FH=97AA PID=3 Region=0:0 (retransmit)

  7  20.00018 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812
FH=97AA PID=3 Region=0:0 (retransmit)





On Wed, Jan 28, 2015 at 10:22 AM, Marcel Telka <[email protected]> wrote:

> Please show me the output from the following commands on the affected
> machine:
>
> echo "::svc_pool nlm" | mdb -k
> echo "::stacks -m klmmod" | mdb -k
>
> Then run the following command:
>
> snoop -o snoop.out rpc nlockmgr
>
> Then reproduce the problem (to see the ENOLCK error) and then Ctrl+C the
> snoop.
> Make sure there is something in the snoop.out file and send the file to me.
>
>
> Thanks.
>
>
> On Wed, Jan 28, 2015 at 10:06:39AM -0800, Joe Little via illumos-discuss
> wrote:
> > Forwarded this as requested by OmniTI
> >
> > ---------- Forwarded message ----------
> > From: Joe Little <[email protected]>
> > Date: Wed, Jan 28, 2015 at 8:49 AM
> > Subject: NFS v3 locking broken in latest OmniOS r151012 and updates
> > To: [email protected]
> >
> >
> > I recently switched one file server from Nexenta 3 and then Nexenta 4
> > Community (still uses closed NLM I believe) to OmniOS r151012.
> >
> > Immediately, users started to complain from various Linux clients that
> > locking was failing. Most of those clients explicitly set their NFS
> version
> > to 3. I finally isolated that the locking does not fail on NFS v4 and
> have
> > worked on transition where possible. But presently, no NFS v3 client and
> > successfully lock against OmniOS NFS v3 locking service. I've confirmed
> > that the locking service is running and is present using rpcinfo,
> matching
> > one for one in services from previous OpenSolaris and Illumos variants.
> One
> > example from a user:
> >
> > $ strace /bin/tcsh
> >
> > [...]
> >
> > open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0
> >
> > dup(0)                                  = 1
> >
> > dup(1)                                  = 2
> >
> > dup(2)                                  = 3
> >
> > dup(3)                                  = 4
> >
> > dup(4)                                  = 5
> >
> > dup(5)                                  = 6
> >
> > close(5)                                = 0
> >
> > close(4)                                = 0
> >
> > close(3)                                = 0
> >
> > close(2)                                = 0
> >
> > close(1)                                = 0
> >
> > close(0)                                = 0
> >
> > fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
> >
> > fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0})
> >
> >
> > HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No
> >
> > locks available)"
> >
> >
> >
> > -------------------------------------------
> > illumos-discuss
> > Archives: https://www.listbox.com/member/archive/182180/=now
> > RSS Feed:
> https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8
> > Modify Your Subscription:
> https://www.listbox.com/member/?&;
> > Powered by Listbox: http://www.listbox.com
>
> --
> +-------------------------------------------+
> | Marcel Telka   e-mail:   [email protected]  |
> |                homepage: http://telka.sk/ |
> |                jabber:   [email protected] |
> +-------------------------------------------+
>



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

snoop.out
Description: Binary data

Re: [discuss] Fwd: NFS v3 locking broken in latest OmniOS r151012 and updates

Reply via email to