On Tue, Mar 5, 2019 at 2:50 AM Dave Chinner <da...@fromorbit.com> wrote:
>
> On Mon, Mar 04, 2019 at 05:04:23PM +0200, Amir Goldstein wrote:
> > On Mon, Mar 4, 2019 at 4:44 PM <fdman...@kernel.org> wrote:
> > >
> > > From: Filipe Manana <fdman...@suse.com>
> > >
> > > Test that if we truncate a file to reduce its size, rename it and then
> > > fsync it, after a power failure the file has a correct size and name.
> > >
> >
> > I am not sure that ext4/xfs semantics guaranty anything about
> > persisting file name after fsync of file?...
>
> They do.  It's that pesky "strictly ordered metadata" thing I keep
> having to explain to people...
>
> i.e. if you fsync an inode, then you are persisting all the changes
> needed to reference that file and it's data. And so if there was a
> rename in the history of that file, then that is persisted, too.
> Which means that both the original and the new directory
> modifications are persisted, too.
>
> *POSIX* doesn't require this - it says that if you O_DSYNC data,
> then it also includes all the metadata needed to reference that
> data. So even if the data is there, POSIX doesn't define whether the
> rename is there or noti, just that you can get to the fsync'd data
> via either the old or new name. IOWs, POSIX allows the behaviour to
> be implementation specific.
>
> In this case, file systems with strictly ordered metadata will end
> up making the rename visible because the rename occurred before the
> truncate that the fsync() is persisting...
>

That is not what is happening in Filipe's test. Test has:
- ftruncate A
- fsync A
- rename A B
- fsync B

So the reason this is working is because 2nd fsync needs to
persist ctime of B and not because it needs to persist the
truncate.

XFS does it, but it doesn't seem like something that any
filesystem is guaranteed to do the same:
        /*
         * We always want to hit the ctime on the source inode.
         *
         * This isn't strictly required by the standards since the source
         * inode isn't really being changed, but old unix file systems did
         * it and some incremental backup programs won't work without it.
         */
        xfs_trans_ichgtime(tp, src_ip, XFS_ICHGTIME_CHG);

So for the purpose of the test itself, which needs to guaranty that
btrfs persists the size, fsync of parent would be more robust for
any filesystem.

Thanks,
Amir.

Reply via email to