Re: [PATCH 07/12] xfs: implement failure-atomic writes

2017-03-01 Thread Christoph Hellwig
On Tue, Feb 28, 2017 at 03:09:40PM -0800, Darrick J. Wong wrote:
> By the way, the copy on write code remembers the extents it has
> allocated for CoW staging in the refcount btree so that it can free them
> after a crash, which means that O_ATOMIC requires reflink to be enabled.

Yeah.

> There doesn't seem to be any explicit checking that reflink is even
> enabled, which will probably just lead to weird crashes on a pre-reflink
> xfs.

True.  I had this earlier when I hat basic O_ATOMIC validity checking,
but that was dropped from the series I posted.

> 
> FWIW I didn't see any checking anywhere (vfs or xfs) that the filesystem
> can actually support O_ATOMIC.  If the FS doesn't support atomic writes,
> shouldn't the kernel send EINVAL or something back to userspace?

Older kernels can't check it, so having new ones check it creates even
more of a mess.

I'm still not feeling very well about O_ATOMIC - either we need an
open2 that checks for unknown flags, or I need to change this to
a per-op flag - RWF_ATOMIC for write (pwritev2 actually), and MAP_ATOMIC
for mmap.  But given that pwritev2 isn't really supported in common
userland yet that might be rather painful.

> At the start of xfs_reflink.c is a long block comment describing how the
> copy on write mechanism works.  Since O_ATOMIC is a variant on CoW (it's
> basically CoW with remapping deferred until fsync), please update the
> comment so that the comments capture the details of how atomic writes
> work.
> 
> (IOWs: Dave asked me to leave the big comment, so I'm going to try to
> keep it fairly up to date.)

I'll add some information to it.

> I suppose it goes without saying that userspace will have to coordinate
> its O_ATOMIC writes to the file.

It does - but if you have multiple writers to a file they really need
to be coordinated anyway.  If you have threads whose updates race
you'd need something like

open(O_TMPFILE)
clone file (or range) into tempfile

update tempfile

clone region you want atomically inserted back into the original file.

We can actually do that with existing primitives, but it's a bit more
heavyweight.  We could opimize this a bit by checking if an extent
already points to the same physical blocks before replacing it in
clone_file_range.

> > +   if (file->f_flags & O_ATOMIC)
> > +   printk_ratelimited("O_ATOMIC!\n");
> 
> Per above,
> 
> if (file->f_flags & O_ATOMIC) {
>   if (!xfs_sb_version_hasreflink(...))
>   return -EPROTONOSUPPORT;

Yeah.

>   printk_ratelimited("EXPERIMENTAL atomic writes feature in use!\n");

And that should just go away - it was a local debug aid :)


Re: [PATCH 07/12] xfs: implement failure-atomic writes

2017-02-28 Thread Darrick J. Wong
On Tue, Feb 28, 2017 at 06:57:32AM -0800, Christoph Hellwig wrote:
> If O_ATOMIC is specified in the open flags this will cause XFS to
> allocate new extents in the COW for even if overwriting existing data,

"COW fork"^^^

The previous patch's commit message also has that quirk.

> and not remap them into the data fork until ->fsync is called,
> at which point the whole range will be atomically remapped into the
> data fork.  This allows applications to Ñ•afely overwrite data instead
> of having to do double writes.

By the way, the copy on write code remembers the extents it has
allocated for CoW staging in the refcount btree so that it can free them
after a crash, which means that O_ATOMIC requires reflink to be enabled.
There doesn't seem to be any explicit checking that reflink is even
enabled, which will probably just lead to weird crashes on a pre-reflink
xfs.

FWIW I didn't see any checking anywhere (vfs or xfs) that the filesystem
can actually support O_ATOMIC.  If the FS doesn't support atomic writes,
shouldn't the kernel send EINVAL or something back to userspace?

> Signed-off-by: Christoph Hellwig 
> ---
>  fs/xfs/xfs_aops.c| 18 +-
>  fs/xfs/xfs_aops.h|  4 ++-
>  fs/xfs/xfs_file.c| 17 +
>  fs/xfs/xfs_iomap.c   | 18 --
>  fs/xfs/xfs_reflink.c | 69 
> ++--
>  fs/xfs/xfs_reflink.h |  5 ++--
>  6 files changed, 91 insertions(+), 40 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index c78b585b3d84..1c5efbb05b47 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -292,6 +292,7 @@ xfs_end_io(
>   if (unlikely(error)) {
>   switch (ioend->io_type) {
>   case XFS_IO_COW:
> + case XFS_IO_ATOMIC:

So we cancel the CoW staging blocks if the write was atomic and failed.

Later in the !error case we remap the blocks if it was a cow write, but
leave the mapping in memory if the write was atomic.  That is consistent
with the commit message, good.

At the start of xfs_reflink.c is a long block comment describing how the
copy on write mechanism works.  Since O_ATOMIC is a variant on CoW (it's
basically CoW with remapping deferred until fsync), please update the
comment so that the comments capture the details of how atomic writes
work.

(IOWs: Dave asked me to leave the big comment, so I'm going to try to
keep it fairly up to date.)

>   xfs_reflink_cancel_cow_range(ip, offset, size, 0);
>   break;
>   }
> @@ -327,7 +328,9 @@ xfs_end_bio(
>   struct xfs_ioend*ioend = bio->bi_private;
>   struct xfs_mount*mp = XFS_I(ioend->io_inode)->i_mount;
>  
> - if (ioend->io_type == XFS_IO_UNWRITTEN || ioend->io_type == XFS_IO_COW)
> + if (ioend->io_type == XFS_IO_UNWRITTEN ||
> + ioend->io_type == XFS_IO_COW ||
> + ioend->io_type == XFS_IO_ATOMIC)
>   queue_work(mp->m_unwritten_workqueue, >io_work);
>   else if (ioend->io_append_trans)
>   queue_work(mp->m_data_workqueue, >io_work);
> @@ -354,6 +357,7 @@ xfs_map_blocks(
>   return -EIO;
>  
>   ASSERT(type != XFS_IO_COW);
> + ASSERT(type != XFS_IO_ATOMIC);
>   if (type == XFS_IO_UNWRITTEN)
>   bmapi_flags |= XFS_BMAPI_IGSTATE;
>  
> @@ -768,7 +772,8 @@ xfs_map_cow(
>   struct xfs_writepage_ctx *wpc,
>   struct inode*inode,
>   loff_t  offset,
> - unsigned int*new_type)
> + unsigned int*new_type,
> + boolatomic)
>  {
>   struct xfs_inode*ip = XFS_I(inode);
>   struct xfs_bmbt_irecimap;
> @@ -778,10 +783,10 @@ xfs_map_cow(
>   /*
>* If we already have a valid COW mapping keep using it.
>*/
> - if (wpc->io_type == XFS_IO_COW) {
> + if (wpc->io_type == XFS_IO_COW || wpc->io_type == XFS_IO_ATOMIC) {
>   wpc->imap_valid = xfs_imap_valid(inode, >imap, offset);
>   if (wpc->imap_valid) {
> - *new_type = XFS_IO_COW;
> + *new_type = wpc->io_type;
>   return 0;
>   }
>   }
> @@ -807,7 +812,7 @@ xfs_map_cow(
>   return error;
>   }
>  
> - wpc->io_type = *new_type = XFS_IO_COW;
> + wpc->io_type = *new_type = atomic ? XFS_IO_ATOMIC : XFS_IO_COW;
>   wpc->imap_valid = true;
>   wpc->imap = imap;
>   return 0;
> @@ -886,7 +891,8 @@ xfs_writepage_map(
>   }
>  
>   if (XFS_I(inode)->i_cowfp) {
> - error = xfs_map_cow(wpc, inode, offset, _type);
> + error = xfs_map_cow(wpc, inode, offset, _type,
> + buffer_atomic(bh));
>   if (error)
>   goto out;
>   }
> diff --git