Chip, yes, this is the behaviors we see, and do represent a delta vs previous editions (OpenSolaris, NetApp, etc). I didn't know about #1 which fixed the primary problem, but the requirement for extra threads compared to the norm is apparently addressing #2. We have had failed reboots due to #3 where I had to clear out the sm state directory and have all my VMware VMs go read-only as they lost their locks.
On Wed, Jan 28, 2015 at 11:58 AM, Schweiss, Chip <[email protected]> wrote: > I've been through a lot of problems with this to and ultimately gave up on > the NFSv3 lock manager in Illumos and switched everything to NFSv4. > > There are several problems. I diagnosed them using tcpdump and Wireshark. > > 1. The locking manager uses different ports than the Linux firewall > expects for NFS and ends up blocking connections. The only solution I > found was to open all ports to the NFS server IP on the Linux NFS client. > 2. The lock manager holds on to locks longer than it should. It > appears to have a 30 second cycle on releasing lock. This causes all locks > to get used up really fast. > 3. And of course the well know problem of the lock manager not > starting because a client is no longer accessible. > > The easiest trigger I've found is run a bunch of csh processes that cause > a logon. The .history file in the user's home directory will get locked > lots of times and consume all the locks. We saw this from our processing > cluster immediately when the new lock manger was introduced. > > -Chip > > On Wed, Jan 28, 2015 at 1:19 PM, Youzhong Yang via illumos-discuss < > [email protected]> wrote: > >> A few months ago, we were hit by an issue in rpc module. When it happens, >> lots of reserved ports are in BOUND state, so nlockmgr will refuse lock >> request. >> >> If 'netstat -an | grep BOUND | wc -l' returns a large number, then it's >> the same issue as we had. >> >> >> >> On Wed, Jan 28, 2015 at 2:09 PM, Marcel Telka via illumos-discuss < >> [email protected]> wrote: >> >>> On Wed, Jan 28, 2015 at 02:07:20PM -0500, Dan McDonald wrote: >>> > >>> > > On Jan 28, 2015, at 2:02 PM, Marcel Telka via illumos-discuss < >>> [email protected]> wrote: >>> > > >>> > > Note: NexentaStor 4 uses the same new NLM implementation as recent >>> illumos >>> > > does. >>> > >>> > Is there anything there that needs to be upstreamed? >>> >>> No. Nexenta's NLM is in sync with illumos (from top of my head, but I >>> believe >>> I'm right). >>> >>> -- >>> +-------------------------------------------+ >>> | Marcel Telka e-mail: [email protected] | >>> | homepage: http://telka.sk/ | >>> | jabber: [email protected] | >>> +-------------------------------------------+ >>> >>> >>> ------------------------------------------- >>> illumos-discuss >>> Archives: https://www.listbox.com/member/archive/182180/=now >>> RSS Feed: >>> https://www.listbox.com/member/archive/rss/182180/26912960-6f721d7b >>> >>> Modify Your Subscription: https://www.listbox.com/member/?& >>> Powered by Listbox: http://www.listbox.com >>> >> >> *illumos-discuss* | Archives >> <https://www.listbox.com/member/archive/182180/=now> >> <https://www.listbox.com/member/archive/rss/182180/21878145-f6040e21> | >> Modify >> <https://www.listbox.com/member/?&> >> Your Subscription <http://www.listbox.com> >> > > ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
