On Wed, Aug 29, 2012 at 10:57 AM, Anand Avati <[email protected]> wrote:
> (CC'ing gluster-devel) > > On 08/28/2012 09:38 PM, Raghavendra Gowdappa wrote: > >> Avati, >> >> Following are the questions/thoughts related to anonymous fd framework >> and their usage in quick read. Please answer or give your feedback. >> >> Questions related to anonymous fd framework: >> ==============================**============== >> * Anonymous fds can work because open in itself doesn't do any primary >> task application is interested in - like read, write etc (application does >> an open with an intent of doing something else). This brings in the >> question, why do we need open at all, can't we eliminate it altogether? If >> we were to eliminate open, aren't we moving from a neater to a messy >> design - each fop has to check whether the work associated with open (like >> storing contexts etc) is done in every invocation? >> > > Some corrections to the above statement. There are two parts to the > open() call > > 1) The effects of the call itself. Like > a) Perform permission checks and establish a 'session' (with the fd) on > the allowed permission [even if permission of the inode changes in the > future while the fd is still open] > b) Perform additional operation like file truncation when flag O_TRUNC > is specified > > 2) Side effects of the call, like > a) Specify the cache effects on future syscalls with O_[RD]SYNC, > O_DIRECT flags > b) Offer immunity against future calls like rename() and unlink() > > These are the kind of things even Gluster (or any other FS) has to > guarantee with its open() syscall. > > Anonymous fds exist because > a) Protocols like NFS3 do not support the above semantics and they are > implemented completely in the client side. But we require an fd_t > parameter in the read/write fops which also do not require some of the > above semantics (like read/write perm checks) and other semantics are > guaranteed by anonymous fds already (like immunity against rename()). > Note that immunity against unlink() is currently not existing in > anonymous fds. > > b) Internal optimizations in perf xlators do not require all the above > semantics sometimes. > > Whether we use anonymous FDs or not, we need to keep up all the above > semantics. There are some issues with the semantics even in today's > version of quick-read - we assume permission check has already happened > (which is usually true as FUSE performs permission checks) - but that > may not be the case always. That apart, the benefit of anonymous fds in > quick-read can be in handling of fd based fops in the window of time > between a short-cutt'ed open() and its completion from the backend. They > need not wait for the open() completion if they arrive early. Instead > they can proceed with an anonymous fd -- which can significantly reduce > code complexity. > Again, this can be limited to O_RDONLY + > ~O_DIRECT|O_TRUNC flag'ed open()s Why is this restriction? Can you elaborate on that? > and thereby only be vulnerable to > unlink()s happening in that window. > Irrespective of anonymous fds, quick read would be vulnerable to unlinks in the window bounded by open returning in application and open actually happening in backend. I am not seeing how anonymous fds alter this situation. Can you please explain? > * how are ops like fsync handled with anonymous fds? How are we going to >> identify the fd(s) on server on which writes are actually performed? The >> problem is more acute if we happen to load write-behind on server side. >> > > With the changes in http://review.gluster.com/712, an fsync() fop will > be a barrier against all previous writes on the inode (no matter which > fd). There is no problem if you load write-behind on the server side. > fsync() is essentially an inode operation and must not discriminate > writes based on the fd of origin. > Is this true even for fsync operation on backend filesystems? Does fsync flush changes across all fds opened on a file? > * Though we are trying to decouple path from adressing an inode in >> glusterfs using nameless lookups, that decoupling is not complete. There >> are translators which use naming patterns to assign priorities to file >> (like io-cache, quick-read for the purposes of deciding whether to flush a >> cache or not). To be honest, the problem is seen only in fd-migration where >> we are using nameless lookups - for fresh lookups - in new graph, after a >> graph switch. Currently I am using nameless lookups with loc.path set, >> which solves the problem. Ideally nameless lookups are not the ones to be >> used during migration, since they are not meant to be used for fresh >> lookups (atleast till we get rid of dependencies on path based >> addressability internally in glusterfs). However, they have huge >> performance beniefits. >> > > Not sure what the above point is w.r.t anonymous fds, Nothing related to anonymous fds themselves, but to their usage during fd migration after graph switch. After a graph switch, the first lookup in new graph is fresh one and translators like io-cache, quick-read, quota that make use of path information for their internal workings will be in trouble, if we don't have correct path in loc.path. but yes - nameless > lookup takes away the sense of hierarchy (and "filename") and operations > which depend on filename or hierarchy might not always work. But then > this has been true even before we brought in nameless lookups as FUSE > issues open() on an inode and therefore we are not guaranteed to perform > open() on the right path when you have hardlinks. > > Using anonymous fd framework in quick-read: >> ==============================**============= >> * as far as quick read goes, its task becomes very simple. Just convert >> the fd to anonymous during open and return. It can eliminate all the >> dependencies of fops having to wait till open is actually done. In fact the >> fops it has to implement are: lookup, open and readv. >> > > Look at my previous comments, it must perform a little more checks. > quick-read cannot just "convert" an fd to anonymous fd. Anonymous fd has > fd->pid == -1 (which a quick-unwound open() fd will not). There are also > other semantics which need to be met (at least with best effort) while > the actual fd is still unopened. > > >> * Anonymous fd awareness should be brought in afr. it shouldn't try to >> open the files in fops like writev if fd happens to be anonymous. >> > > I think that already is the case. Also, why do you specifically mention > afr? > I was thinking in terms of using anonymous fds in quick-read, without having to open the file explicitly at all by delegating that responsibility (of open) to servers. Hence, I thought afr need not worry about opening the files. However, this may not work as you've explained earlier and I need to think over it. > Thanks, > Avati > > > > > > > ______________________________**_________________ > Gluster-devel mailing list > [email protected] > https://lists.nongnu.org/**mailman/listinfo/gluster-devel > > -- > Raghavendra G > > > > <https://lists.nongnu.org/mailman/listinfo/gluster-devel>
_______________________________________________ Gluster-devel mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/gluster-devel
