Hey Hey Panith,

>Maybe give clients a second (or more) chance to "refresh" their locks - in the 
>sense, when a lock is about to be revoked, notify the client which can then 
>call for a refresh to conform it's locks holding validity. This would require 
>some maintainance work on the client to keep >track of locked regions.

So we've thought about this as well, however the approach I'd rather is that we 
(long term) eliminate any need for multi-hour locking.  This would put the 
responsibility on the SHD/rebalance/bitrot daemons to take out another lock 
request once in a while to signal to the POSIX locks translator that they are 
still there and alive.

The world we want to be in is that locks > N minutes is most _definitely_ a bug 
or broken client and should be revoked.  With this patch it's simply a 
heuristic to make a judgement call, in our world however we've seen that once 
you have 1000's of lock requests piled out....it's only a matter of time before 
your entire cluster is going to collapse; so the "correctness" of the locking 
behavior or however much you might upset SHD/bitrot/rebalance is a completely 
secondary concern over the availability and stability of the cluster itself.

For folks that want to use this feature conservatively, they shouldn't revoke 
based on time, but rather based on (lock request) queue depth; if you are in a 
situation like I've described above it's almost certainly a bug or a situation 
not fully understood by developers.

Richard


________________________________
From: Venky Shankar [yknev.shan...@gmail.com]
Sent: Sunday, January 24, 2016 9:36 PM
To: Pranith Kumar Karampuri
Cc: Richard Wareing; Gluster Devel
Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for 
features/locks xlator (v3.7.x)


On Jan 25, 2016 08:12, "Pranith Kumar Karampuri" 
<pkara...@redhat.com<mailto:pkara...@redhat.com>> wrote:
>
>
>
> On 01/25/2016 02:17 AM, Richard Wareing wrote:
>>
>> Hello all,
>>
>> Just gave a talk at SCaLE 14x today and I mentioned our new locks revocation 
>> feature which has had a significant impact on our GFS cluster reliability.  
>> As such I wanted to share the patch with the community, so here's the 
>> bugzilla report:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1301401
>>
>> =====
>> Summary:
>> Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster 
>> instability and eventual complete unavailability due to failures in 
>> releasing entry/inode locks in a timely manner.
>>
>> Classic symptoms on this are increased brick (and/or gNFSd) memory usage due 
>> the high number of (lock request) frames piling up in the processes.  The 
>> failure-mode results in bricks eventually slowing down to a crawl due to 
>> swapping, or OOMing due to complete memory exhaustion; during this period 
>> the entire cluster can begin to fail.  End-users will experience this as 
>> hangs on the filesystem, first in a specific region of the file-system and 
>> ultimately the entire filesystem as the offending brick begins to turn into 
>> a zombie (i.e. not quite dead, but not quite alive either).
>>
>> Currently, these situations must be handled by an administrator detecting & 
>> intervening via the "clear-locks" CLI command.  Unfortunately this doesn't 
>> scale for large numbers of clusters, and it depends on the correct 
>> (external) detection of the locks piling up (for which there is little 
>> signal other than state dumps).
>>
>> This patch introduces two features to remedy this situation:
>>
>> 1. Monkey-unlocking - This is a feature targeted at developers (only!) to 
>> help track down crashes due to stale locks, and prove the utility of he lock 
>> revocation feature.  It does this by silently dropping 1% of unlock 
>> requests; simulating bugs or mis-behaving clients.
>>
>> The feature is activated via:
>> features.locks-monkey-unlocking <on/off>
>>
>> You'll see the message
>> "[<timestamp>] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY 
>> LOCKING (forcing stuck lock)!" ... in the logs indicating a request has been 
>> dropped.
>>
>> 2. Lock revocation - Once enabled, this feature will revoke a 
>> *contended*lock  (i.e. if nobody else asks for the lock, we will not revoke 
>> it) either by the amount of time the lock has been held, how many other lock 
>> requests are waiting on the lock to be freed, or some combination of both.  
>> Clients which are losing their locks will be notified by receiving EAGAIN 
>> (send back to their callback function).
>>
>> The feature is activated via these options:
>> features.locks-revocation-secs <integer; 0 to disable>
>> features.locks-revocation-clear-all [on/off]
>> features.locks-revocation-max-blocked <integer>
>>
>> Recommended settings are: 1800 seconds for a time based timeout (give 
>> clients the benefit of the doubt, or chose a max-blocked requires some 
>> experimentation depending on your workload, but generally values of hundreds 
>> to low thousands (it's normal for many ten's of locks to be taken out when 
>> files are being written @ high throughput).
>
>
> I really like this feature. One question though, self-heal, rebalance domain 
> locks are active until self-heal/rebalance is complete which can take more 
> than 30 minutes if the files are in TBs. I will try to see what we can do to 
> handle these without increasing the revocation-secs too much. May be we can 
> come up with per domain revocation timeouts. Comments are welcome.

[
    I've not gone through the design or the patch,
    hence this might be a shot in the air.
]

Maybe give clients a second (or more) chance to "refresh" their locks - in the 
sense, when a lock is about to be revoked, notify the client which can then 
call for a refresh to conform it's locks holding validity. This would require 
some maintainance work on the client to keep track of locked regions.

>
> Pranith
>>
>>
>> =====
>>
>> The patch supplied will patch clean the the v3.7.6 release tag, and probably 
>> to any 3.7.x release & master (posix locks xlator is rarely touched).
>>
>> Richard
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-devel<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=Jcdmxee_BCE3wDi3WuEqVvRppq2yzWIFgESTg0OCeSw&s=PfA3rSqMQxeXiUL77PB__BQxdWKkxrwY6iUVnwICmaY&e=>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>
> http://www.gluster.org/mailman/listinfo/gluster-devel<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=Jcdmxee_BCE3wDi3WuEqVvRppq2yzWIFgESTg0OCeSw&s=PfA3rSqMQxeXiUL77PB__BQxdWKkxrwY6iUVnwICmaY&e=>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to