Re: [Gluster-devel] Eager-lock and nfs graph generation

Pranith Kumar K Tue, 19 Feb 2013 03:56:24 -0800

On 02/19/2013 11:26 AM, Anand Avati wrote:

Thinking over this, looks like there is a problem!
Write-behind guarantees: That a second write request arriving afterthe acknowledgement of a first overlapping request (whetherwritten-behind or otherwise) will be guaranteed to be fulfilled in thebackend in the same order (i.e, the second overlapping request will be"serialized" behind the first one in the fulfillment process)
Eager-lock requirement: That write-behind will send no two writerequests on an overlapping region at the same time.
The requirement-set and guarantee-set have a big overlap, but therequirement-set is not a subset.
This is because of O_SYNC writes. write-behind performswrite-serialization at fulfillment only for written behind requests(which get covered under the conflict detection code during liabilityfulfillment). However, if two threads (or apps) issue overlappingO_SYNC writes to the same region at approx same time, thenwrite-behind will let both of them go by without any kind ofserialization, into eager lock, violating the assumptions!
I'm wondering if it is a safer idea to implement overlap checks withineager-lock code itself rather than depend on write-behind :|
Avati
On Mon, Feb 11, 2013 at 10:07 PM, Anand Avati <anand.av...@gmail.com<mailto:anand.av...@gmail.com>> wrote:
    On Mon, Feb 11, 2013 at 9:32 PM, Pranith Kumar K
    <pkara...@redhat.com <mailto:pkara...@redhat.com>> wrote:

        hi,
        Please note that this is a case in theory and I did not run
        into such situation, but I feel it is important to address this.
        Configuration with 'Eager-lock on" and "write-behind off"
        should not be allowed as it leads to lock synchronization
        problems which lead to data in-consistency among replicas in nfs.
        lets say bricks b1, b2 are in replication.
        Gluster Nfs server uses 1 anonymous fd to perform all
        write-fops. If eager-lock is enabled in afr, the lock-owner is
        used as fd's address which will be same for all write-fops, so
        there will never be any inodelk contention. If write-behind is
        disabled, there can be writes that overlap. (Does nfs makes
        sure that the ranges don't overlap?)

        Now imagine the following scenario:
        lets say w1, w2 are 2 write fops on same offset and length. w1
        with all '0's and w2 with all '1's. If these 2 write fops are
        executed in 2 different threads, the order of arrival of write
        fops on b1 can be w1, w2 where as on b2 it is w2, w1 leading
        to data inconsistency between the two replicas. The lock
        contention will not happen as both lk-owner, transport are
        same for these 2 fops.


    Write-behind has to functions - a) performing operations in the
    background and b) serializing overlapping operations.

    While the problem does exist, the specifics are different from
    what you describe. since all writes coming in from NFS will always
    use the same anonymous FD, two near-in-time/overlapping writes
    will never contend with inodelk() but instead the second write
    will inherit the lock and changelog from the first. In either
    case, it is a problem.

        We can add a check in glusterd for volume set to disallow such
        configuration, BUT by default write-behind is off in nfs graph
        and by default eager-lock is on. So we should either turn on
        write-behind for nfs or turn off eager-lock by default.

        Could you please suggest how to proceed with this if you agree
        that I did not miss any important detail that makes this
        theory invalid.


    It seems loading write-behind xlator in NFS graph  looks like a
    simpler solution. eager-locking is crucial for replicated NFS
    write performance.

    Avati

Shall we disable eager-lock for files opened with O_SYNC, for now?

Pranith

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Eager-lock and nfs graph generation

Reply via email to