Just bumped up the threads from 80 to 1024 (didn't help). Here's your details
root@miele:/root# echo "::svc_pool nlm" | mdb -k
mdb: failed to add kvm_pte_chain walker: walk name already in use
mdb: failed to add kvm_rmap_desc walker: walk name already in use
mdb: failed to add kvm_mmu_page_header walker: walk name already in use
mdb: failed to add kvm_pte_chain walker: walk name already in use
mdb: failed to add kvm_rmap_desc walker: walk name already in use
mdb: failed to add kvm_mmu_page_header walker: walk name already in use
SVCPOOL = ffffff07fc281aa8 -> POOL ID = NLM(2)
Non detached threads = 1
Detached threads = 0
Max threads = 1024
`redline' = 1
Reserved threads = 0
Thread lock = mutex not held
Asleep threads = 0
Request lock = mutex not held
Pending requests = 0
Walking threads = 0
Max requests from xprt = 8
Stack size for svc_run = 0
Creator lock = mutex not held
No of Master xprt's = 4
rwlock for the mxprtlist= owner 0
master xprt list ptr = ffffff079cbc3800
root@miele:/root# echo "::stacks -m klmmod" | mdb -k
mdb: failed to add kvm_pte_chain walker: walk name already in use
mdb: failed to add kvm_rmap_desc walker: walk name already in use
mdb: failed to add kvm_mmu_page_header walker: walk name already in use
mdb: failed to add kvm_pte_chain walker: walk name already in use
mdb: failed to add kvm_rmap_desc walker: walk name already in use
mdb: failed to add kvm_mmu_page_header walker: walk name already in use
THREAD STATE SOBJ COUNT
ffffff09adf52880 SLEEP CV 1
swtch+0x141
cv_timedwait_hires+0xec
cv_reltimedwait+0x51
waitforack+0x5c
connmgr_connect+0x131
connmgr_wrapconnect+0x138
connmgr_get+0x9dc
connmgr_wrapget+0x63
clnt_cots_kcallit+0x18f
rpcbind_getaddr+0x245
update_host_rpcbinding+0x4f
nlm_host_get_rpc+0x6d
nlm_do_lock+0x10d
nlm4_lock_4_svc+0x2a
nlm_dispatch+0xe6
nlm_prog_4+0x34
svc_getreq+0x1c1
svc_run+0x146
svc_do_run+0x8e
nfssys+0xf1
_sys_sysenter_post_swapgs+0x149
ffffff002e1fdc40 SLEEP CV 1
swtch+0x141
cv_timedwait_hires+0xec
cv_timedwait+0x5c
nlm_gc+0x54
thread_start+8
The file server is using VLANs and from the snoop result it seems to
responding with NFS4 locks for a v3 request?
root@miele:/root# snoop -r -i snoop.out
1 0.00000 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712
FH=97AA PID=2 Region=0:0
2 9.99901 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712
FH=97AA PID=2 Region=0:0 (retransmit)
3 19.99987 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3712
FH=97AA PID=2 Region=0:0 (retransmit)
4 15.00451 VLAN#3812: 172.27.72.15 -> 172.27.72.26 NLM R LOCK4 OH=3712
denied (no locks)
5 1.70920 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812
FH=97AA PID=3 Region=0:0
6 9.99927 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812
FH=97AA PID=3 Region=0:0 (retransmit)
7 20.00018 VLAN#3812: 172.27.72.26 -> 172.27.72.15 NLM C LOCK4 OH=3812
FH=97AA PID=3 Region=0:0 (retransmit)
On Wed, Jan 28, 2015 at 10:22 AM, Marcel Telka <[email protected]> wrote:
> Please show me the output from the following commands on the affected
> machine:
>
> echo "::svc_pool nlm" | mdb -k
> echo "::stacks -m klmmod" | mdb -k
>
> Then run the following command:
>
> snoop -o snoop.out rpc nlockmgr
>
> Then reproduce the problem (to see the ENOLCK error) and then Ctrl+C the
> snoop.
> Make sure there is something in the snoop.out file and send the file to me.
>
>
> Thanks.
>
>
> On Wed, Jan 28, 2015 at 10:06:39AM -0800, Joe Little via illumos-discuss
> wrote:
> > Forwarded this as requested by OmniTI
> >
> > ---------- Forwarded message ----------
> > From: Joe Little <[email protected]>
> > Date: Wed, Jan 28, 2015 at 8:49 AM
> > Subject: NFS v3 locking broken in latest OmniOS r151012 and updates
> > To: [email protected]
> >
> >
> > I recently switched one file server from Nexenta 3 and then Nexenta 4
> > Community (still uses closed NLM I believe) to OmniOS r151012.
> >
> > Immediately, users started to complain from various Linux clients that
> > locking was failing. Most of those clients explicitly set their NFS
> version
> > to 3. I finally isolated that the locking does not fail on NFS v4 and
> have
> > worked on transition where possible. But presently, no NFS v3 client and
> > successfully lock against OmniOS NFS v3 locking service. I've confirmed
> > that the locking service is running and is present using rpcinfo,
> matching
> > one for one in services from previous OpenSolaris and Illumos variants.
> One
> > example from a user:
> >
> > $ strace /bin/tcsh
> >
> > [...]
> >
> > open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0
> >
> > dup(0) = 1
> >
> > dup(1) = 2
> >
> > dup(2) = 3
> >
> > dup(3) = 4
> >
> > dup(4) = 5
> >
> > dup(5) = 6
> >
> > close(5) = 0
> >
> > close(4) = 0
> >
> > close(3) = 0
> >
> > close(2) = 0
> >
> > close(1) = 0
> >
> > close(0) = 0
> >
> > fcntl(6, F_SETFD, FD_CLOEXEC) = 0
> >
> > fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0})
> >
> >
> > HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No
> >
> > locks available)"
> >
> >
> >
> > -------------------------------------------
> > illumos-discuss
> > Archives: https://www.listbox.com/member/archive/182180/=now
> > RSS Feed:
> https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8
> > Modify Your Subscription:
> https://www.listbox.com/member/?&
> > Powered by Listbox: http://www.listbox.com
>
> --
> +-------------------------------------------+
> | Marcel Telka e-mail: [email protected] |
> | homepage: http://telka.sk/ |
> | jabber: [email protected] |
> +-------------------------------------------+
>
-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com
snoop.out
Description: Binary data
