Hi Raghavendra,

On 10/02/17 04:51, Raghavendra Gowdappa wrote:
+gluster-devel

----- Original Message -----
From: "Milind Changire" <mchan...@redhat.com>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "rhs-zteam" <rhs-zt...@redhat.com>
Sent: Thursday, February 9, 2017 11:00:18 PM
Subject: patch for "limited performance for disperse volumes"

My first comment was:
looks like patch for "limited performance for disperse volume" [1] is going
to be helpful for all other types of volumes as well; but how do we
guarantee ordering for writes over the same fd for the same offset and
length in the file ?

then thinking over a bit and in case you missed my comment over IRC:
I was thinking about network multi-pathing and rpc requests(two writes)
being routed through different interfaces to gluster nodes which might
lead to a non-increasing transaction ID sequence and hence might lead
to incorrect final value if the older write is committed to the same
offset+length

then it dawned on me that for blocking operations the write() call
wont return until the data is safe on the disk across the network or
the intermediate translators have cached it appropriately to be
written behind.

so would the patch work for two non-blocking writes originating for the
same fd from the same thread for the same offset+length and being
routed over multi-pathing and write #2 getting routed quicker than
write #1 ?

To be honest I've not considered the case of asynchronous writes from 
application till now. What is the ordering guarantee the OS/filesystems provide 
for two async writes? For eg., if there are two writes w1 and w2, when is w2 
issued?
* After cbk of w1 is called or
* parallely just after async_write (w1) returns (cbk of w1 is not invoked yet)?

What do POSIX or other standards (or expectation from OS) say about ordering in 
case 2 above?

I'm not an expert on POSIX. But I've found this [1]:

    2.9.7 Thread Interactions with Regular File Operations

    All of the following functions shall be atomic with respect to
    each other in the effects specified in POSIX.1-2008 when they
    operate on regular files or symbolic links: [...] write [...]

    If two threads each call one of these functions, each call shall
    either see all of the specified effects of the other call, or none
    of them. The requirement on the close() function shall also apply
    whenever a file descriptor is successfully closed, however caused
    (for example, as a consequence of calling close(), calling dup2(),
    or of process termination).

Not sure if this also applies to write requests issued asynchronously from the same thread, but this would be the worst case (if the OS already orders it, we won't have any problem).

As I see it, this is already satisfied by EC because it doesn't allow two concurrent writes to happen at the same time. They can be reordered if the second one arrives before the first one, but they are executed atomically as POSIX requires. Not sure if AFR also satisfies this condition, but I think so.

From the point of view of EC it's irrelevant if the write comes from the same thread or from different processes on different clients. They are handled in the same way.

However a thing to be aware of (from the man page of write):

    [...] among the effects that should be atomic across threads (and
    processes) are updates of the file offset. However, on Linux before
    version 3.14, this was not the case: if two processes that share an
    open file description (see open(2)) perform a write() (or
    writev(2)) at the same time, then the I/O operations were not atomic
    with respect updating the file offset, with the result that the
    blocks of data output by the two processes might (incorrectly)
    overlap. This problem was fixed in Linux 3.14.

Xavi

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07



[1] https://review.gluster.org/15036


just thinking aloud

--
Milind

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Reply via email to