Chip, yes, this is the behaviors we see, and do represent a delta vs
previous editions (OpenSolaris, NetApp, etc). I didn't know about #1 which
fixed the primary problem, but the requirement for extra threads compared
to the norm is apparently addressing #2. We have had failed reboots due to
#3 where I had to clear out the sm state directory and have all my VMware
VMs go read-only as they lost their locks.


On Wed, Jan 28, 2015 at 11:58 AM, Schweiss, Chip <[email protected]> wrote:

> I've been through a lot of problems with this to and ultimately gave up on
> the NFSv3 lock manager in Illumos and switched everything to NFSv4.
>
> There are several problems.  I diagnosed them using tcpdump and Wireshark.
>
>    1. The locking manager uses different ports than the Linux firewall
>    expects for NFS and ends up blocking connections.   The only solution I
>    found was to open all ports to the NFS server IP on the Linux NFS client.
>    2. The lock manager holds on to locks longer than it should.  It
>    appears to have a 30 second cycle on releasing lock.  This causes all locks
>    to get used up really fast.
>    3. And of course the well know problem of the lock manager not
>    starting because a client is no longer accessible.
>
> The easiest trigger I've found is run a bunch of csh processes that cause
> a logon.  The .history file in the user's home directory will get locked
> lots of times and consume all the locks.  We saw this from our processing
> cluster immediately when the new lock manger was introduced.
>
> -Chip
>
> On Wed, Jan 28, 2015 at 1:19 PM, Youzhong Yang via illumos-discuss <
> [email protected]> wrote:
>
>> A few months ago, we were hit by an issue in rpc module. When it happens,
>> lots of reserved ports are in BOUND state, so nlockmgr will refuse lock
>> request.
>>
>> If 'netstat -an | grep BOUND | wc -l' returns a large number, then it's
>> the same issue as we had.
>>
>>
>>
>> On Wed, Jan 28, 2015 at 2:09 PM, Marcel Telka via illumos-discuss <
>> [email protected]> wrote:
>>
>>> On Wed, Jan 28, 2015 at 02:07:20PM -0500, Dan McDonald wrote:
>>> >
>>> > > On Jan 28, 2015, at 2:02 PM, Marcel Telka via illumos-discuss <
>>> [email protected]> wrote:
>>> > >
>>> > > Note: NexentaStor 4 uses the same new NLM implementation as recent
>>> illumos
>>> > > does.
>>> >
>>> > Is there anything there that needs to be upstreamed?
>>>
>>> No. Nexenta's NLM is in sync with illumos (from top of my head, but I
>>> believe
>>> I'm right).
>>>
>>> --
>>> +-------------------------------------------+
>>> | Marcel Telka   e-mail:   [email protected]  |
>>> |                homepage: http://telka.sk/ |
>>> |                jabber:   [email protected] |
>>> +-------------------------------------------+
>>>
>>>
>>> -------------------------------------------
>>> illumos-discuss
>>> Archives: https://www.listbox.com/member/archive/182180/=now
>>> RSS Feed:
>>> https://www.listbox.com/member/archive/rss/182180/26912960-6f721d7b
>>>
>>> Modify Your Subscription: https://www.listbox.com/member/?&;
>>> Powered by Listbox: http://www.listbox.com
>>>
>>
>> *illumos-discuss* | Archives
>> <https://www.listbox.com/member/archive/182180/=now>
>> <https://www.listbox.com/member/archive/rss/182180/21878145-f6040e21> |
>> Modify
>> <https://www.listbox.com/member/?&;>
>> Your Subscription <http://www.listbox.com>
>>
>
>



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to