On 02/19/2013 11:26 AM, Anand Avati wrote:
Thinking over this, looks like there is a problem!
Write-behind guarantees: That a second write request arriving after
the acknowledgement of a first overlapping request (whether
written-behind or otherwise) will be guaranteed to be fulfilled in the
backend in the same order (i.e, the second overlapping request will be
"serialized" behind the first one in the fulfillment process)
Eager-lock requirement: That write-behind will send no two write
requests on an overlapping region at the same time.
The requirement-set and guarantee-set have a big overlap, but the
requirement-set is not a subset.
This is because of O_SYNC writes. write-behind performs
write-serialization at fulfillment only for written behind requests
(which get covered under the conflict detection code during liability
fulfillment). However, if two threads (or apps) issue overlapping
O_SYNC writes to the same region at approx same time, then
write-behind will let both of them go by without any kind of
serialization, into eager lock, violating the assumptions!
I'm wondering if it is a safer idea to implement overlap checks within
eager-lock code itself rather than depend on write-behind :|
Avati
On Mon, Feb 11, 2013 at 10:07 PM, Anand Avati <anand.av...@gmail.com
<mailto:anand.av...@gmail.com>> wrote:
On Mon, Feb 11, 2013 at 9:32 PM, Pranith Kumar K
<pkara...@redhat.com <mailto:pkara...@redhat.com>> wrote:
hi,
Please note that this is a case in theory and I did not run
into such situation, but I feel it is important to address this.
Configuration with 'Eager-lock on" and "write-behind off"
should not be allowed as it leads to lock synchronization
problems which lead to data in-consistency among replicas in nfs.
lets say bricks b1, b2 are in replication.
Gluster Nfs server uses 1 anonymous fd to perform all
write-fops. If eager-lock is enabled in afr, the lock-owner is
used as fd's address which will be same for all write-fops, so
there will never be any inodelk contention. If write-behind is
disabled, there can be writes that overlap. (Does nfs makes
sure that the ranges don't overlap?)
Now imagine the following scenario:
lets say w1, w2 are 2 write fops on same offset and length. w1
with all '0's and w2 with all '1's. If these 2 write fops are
executed in 2 different threads, the order of arrival of write
fops on b1 can be w1, w2 where as on b2 it is w2, w1 leading
to data inconsistency between the two replicas. The lock
contention will not happen as both lk-owner, transport are
same for these 2 fops.
Write-behind has to functions - a) performing operations in the
background and b) serializing overlapping operations.
While the problem does exist, the specifics are different from
what you describe. since all writes coming in from NFS will always
use the same anonymous FD, two near-in-time/overlapping writes
will never contend with inodelk() but instead the second write
will inherit the lock and changelog from the first. In either
case, it is a problem.
We can add a check in glusterd for volume set to disallow such
configuration, BUT by default write-behind is off in nfs graph
and by default eager-lock is on. So we should either turn on
write-behind for nfs or turn off eager-lock by default.
Could you please suggest how to proceed with this if you agree
that I did not miss any important detail that makes this
theory invalid.
It seems loading write-behind xlator in NFS graph looks like a
simpler solution. eager-locking is crucial for replicated NFS
write performance.
Avati
Shall we disable eager-lock for files opened with O_SYNC, for now?
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel