On Wed, Jan 27, 2021 at 2:49 PM Greg Kurz <gr...@kaod.org> wrote: > > On Wed, 27 Jan 2021 11:34:52 +0100 > Miklos Szeredi <mszer...@redhat.com> wrote:
> > Another solution specifically for O_CREAT without O_EXCL would be to > > turn it into an exclusive create. > > Would this added O_EXCL then appear on the client side, e.g. to > guest userspace doing fcntl(F_GETFL) ? No. Guest kernel keeps track of open flags. > > If that fails with EEXIST then try > > the normal open path (open with O_PATH, fstat, open proc symlink). If > > open(O_PATH | O_NOFOLLOW) + fstatat(AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW) > would indeed allow to filter out anything that isn't a directory and > to safely open the proc symlink. > > > that fails with ENOENT, then retry the whole thing a certain number of > > Indeed someone could have unlinked the file in the meantime, in which > case the open(O_PATH | O_NOFOLLOW) would fail, but if it succeeds then > we cannot hit ENOENT anymore AFAICT. Right. > > times. If it still fails then somebody is definitely messing with us > > and we can fail the request with EIO. > > > > Not sure what the retry+timeout is trying to mitigate here... why not > returning EIO right away ? The semantics of O_CREATE are that it can fail neither because the file exists nor because it doesn't. This doesn't matter if the exported tree is not modified outside of a single guest, because of locking provided by the guest kernel. However if we want to support shared access to a tree then O_CREAT semantics should work even in the face of races due to external modification of the tree. I.e. following race: virtiofsd: open(foo, O_CREAT | O_EXCL) -> EEXIST other task: unlink(foo) virtiofsd: open(foo, O_PATH | O_NOFOLLOW) -> ENOENT To properly support the above the O_CREAT | O_EXCL open would need to be retried. Thanks, Miklos