Dear all,

this is an RFC. I made very few test, so use at your risk.

In the past there were several reports [1][2] about corruption when O_DIRECT is 
used.
Every time the analysis was the same: it is too difficult to sync the checksum
with the data when O_DIRECT is used.

This aim of this patch is to avoid the corruption disabling the O_DIRECT write 
when
the file is NOT marked as NODATACSUM. In other words, O_DIRECT is honored only 
for
the file not protected by CSUM: O_DIRECT ^ DATASUM

The user-space is not informed that O_DIRECT is not honored.

On the best of my knowledge, ZFS does the same thing.

It is not a regression: today an O_DIRECT file update may compromise the 
checksum
calculation. So may be the file content is correct, but the checksum no.This 
prevent
to read the file.

In [2] there is a test program which trigger the problem. With this patch the 
program
doesn't trigger the problem.

Open question:
- does the kernel return an error when a file is opened with O_DIRECT and the 
file has
  the checksum ?
- does make sense to have a mount option to select different behaviours:
        1) let the kernel to behave as today (O_DIRECT is allowed even at risk 
of
           having some form of corruption)
        2) let the kernel to ignore O_DIRECT if the file is protected by 
checksum
        3) let the kernel to return error when O_DIRECT is used with file with
           checksum
   ?

Comments are welcome.

BR
G.Baroncelli

[1] 
https://lore.kernel.org/linux-btrfs/1ad3962943592e9a60f88aecdb493f368c70bbe1.ca...@infradead.org/#r
[2] 
https://lore.kernel.org/linux-btrfs/cf8a733f-2c9d-7ffe-e865-4c13d99df...@libero.it/


-----

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0e41459b8de6..af73157e8200 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2018,7 +2018,12 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
        if (sync)
                atomic_inc(&BTRFS_I(inode)->sync_writers);
- if (iocb->ki_flags & IOCB_DIRECT)
+       /*
+        * O_DIRECT doesn't play well with CSUM, so allow the O_DIRECT
+        * only if the file is marked BTRFS_INODE_NODATASUM
+        */
+       if (iocb->ki_flags & IOCB_DIRECT &&
+           (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
                num_written = btrfs_direct_write(iocb, from);
        else
                num_written = btrfs_buffered_write(iocb, from);



-----------

--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

Reply via email to