Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-22 Thread David Howells
J. Bruce Fields  wrote:

> > J. Bruce Fields  wrote:
> > > So, I'm still confused: there must be some case where we know fscache
> > > actually works reliably and doesn't corrupt your data, right?
> > 
> > Using ext2/3, for example.  I don't know under what circumstances xfs, ext4
> > and btrfs might insert/remove blocks of zeros, but I'm told it can happen.
> 
> Do ext2/3 work well for fscache in other ways?

Ext3 shouldn't be a problem.  That's what I used when developing it.  I'm not
sure if ext2 supports xattrs, though.

David



Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-22 Thread J. Bruce Fields
On Thu, Jan 21, 2021 at 08:08:24PM +, David Howells wrote:
> J. Bruce Fields  wrote:
> > So, I'm still confused: there must be some case where we know fscache
> > actually works reliably and doesn't corrupt your data, right?
> 
> Using ext2/3, for example.  I don't know under what circumstances xfs, ext4
> and btrfs might insert/remove blocks of zeros, but I'm told it can happen.

Do ext2/3 work well for fscache in other ways?

--b.


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-22 Thread Christoph Hellwig
On Thu, Jan 21, 2021 at 06:55:13PM +, David Howells wrote:
> > Is it that those "bridging" blocks only show up in certain corner cases
> > that users can arrange to avoid?  Or that it's OK as long as you use
> > certain specific file systems whose behavior goes beyond what's
> > technically required by the bamp or seek interfaces?
> 
> That's a question for the xfs, ext4 and btrfs maintainers, and may vary
> between kernel versions and fsck or filesystem packing utility versions.

For XFS if you do not use reflinks, extent size hints or the RT
subvolume there are no new allocations before i_size that will magically
show up.  But relying on such undocumented assumptions is very
dangerous.


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread David Howells
J. Bruce Fields  wrote:

> > J. Bruce Fields  wrote:
> > 
> > > > Fixing this requires a much bigger overhaul of cachefiles than this 
> > > > patchset
> > > > performs.
> > > 
> > > That sounds like "sometimes you may get file corruption and there's
> > > nothing you can do about it".  But I know people actually use fscache,
> > > so it must be reliable at least for some use cases.
> > 
> > Yes.  That's true for the upstream code because that uses bmap.
> 
> Sorry, when you say "that's true", what part are you referring to?

Sometimes, theoretically, you may get file corruption due to this.

> > I'm switching
> > to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't 
> > change
> > the issue.
> > 
> > > Is it that those "bridging" blocks only show up in certain corner cases
> > > that users can arrange to avoid?  Or that it's OK as long as you use
> > > certain specific file systems whose behavior goes beyond what's
> > > technically required by the bamp or seek interfaces?
> > 
> > That's a question for the xfs, ext4 and btrfs maintainers, and may vary
> > between kernel versions and fsck or filesystem packing utility versions.
> 
> So, I'm still confused: there must be some case where we know fscache
> actually works reliably and doesn't corrupt your data, right?

Using ext2/3, for example.  I don't know under what circumstances xfs, ext4
and btrfs might insert/remove blocks of zeros, but I'm told it can happen.

David



Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread David Howells
J. Bruce Fields  wrote:

> > Fixing this requires a much bigger overhaul of cachefiles than this patchset
> > performs.
> 
> That sounds like "sometimes you may get file corruption and there's
> nothing you can do about it".  But I know people actually use fscache,
> so it must be reliable at least for some use cases.

Yes.  That's true for the upstream code because that uses bmap.  I'm switching
to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change
the issue.

> Is it that those "bridging" blocks only show up in certain corner cases
> that users can arrange to avoid?  Or that it's OK as long as you use
> certain specific file systems whose behavior goes beyond what's
> technically required by the bamp or seek interfaces?

That's a question for the xfs, ext4 and btrfs maintainers, and may vary
between kernel versions and fsck or filesystem packing utility versions.

David



Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread J. Bruce Fields
On Thu, Jan 21, 2021 at 06:55:13PM +, David Howells wrote:
> J. Bruce Fields  wrote:
> 
> > > Fixing this requires a much bigger overhaul of cachefiles than this 
> > > patchset
> > > performs.
> > 
> > That sounds like "sometimes you may get file corruption and there's
> > nothing you can do about it".  But I know people actually use fscache,
> > so it must be reliable at least for some use cases.
> 
> Yes.  That's true for the upstream code because that uses bmap.

Sorry, when you say "that's true", what part are you referring to?

> I'm switching
> to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change
> the issue.
> 
> > Is it that those "bridging" blocks only show up in certain corner cases
> > that users can arrange to avoid?  Or that it's OK as long as you use
> > certain specific file systems whose behavior goes beyond what's
> > technically required by the bamp or seek interfaces?
> 
> That's a question for the xfs, ext4 and btrfs maintainers, and may vary
> between kernel versions and fsck or filesystem packing utility versions.

So, I'm still confused: there must be some case where we know fscache
actually works reliably and doesn't corrupt your data, right?

--b.


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread J. Bruce Fields
On Thu, Jan 21, 2021 at 05:02:57PM +, David Howells wrote:
> J. Bruce Fields  wrote:
> 
> > On Wed, Jan 20, 2021 at 10:21:24PM +, David Howells wrote:
> > >  Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
> > >  to be read from the cache.  Whilst this is an improvement from the
> > >  bmap interface, it still has a problem with regard to a modern
> > >  extent-based filesystem inserting or removing bridging blocks of
> > >  zeros.
> > 
> > What are the consequences from the point of view of a user?
> 
> The cache can get both false positive and false negative results on checks for
> the presence of data because an extent-based filesystem can, at will, insert
> or remove blocks of contiguous zeros to make the extents easier to encode
> (ie. bridge them or split them).
> 
> A false-positive means that you get a block of zeros in the middle of your
> file that very probably shouldn't be there (ie. file corruption); a
> false-negative means that we go and reload the missing chunk from the server.
> 
> The problem exists in cachefiles whether we use bmap or we use
> SEEK_HOLE/SEEK_DATA.  The only way round it is to keep track of what data is
> present independently of backing filesystem's metadata.
> 
> To this end, it shouldn't (mis)behave differently than the code already there
> - except that it handles better the case in which the backing filesystem
> blocksize != PAGE_SIZE (which may not be relevant on an extent-based
> filesystem anyway if it packs parts of different files together in a single
> block) because the current implementation only bmaps the first block in a page
> and doesn't probe for the rest.
> 
> Fixing this requires a much bigger overhaul of cachefiles than this patchset
> performs.

That sounds like "sometimes you may get file corruption and there's
nothing you can do about it".  But I know people actually use fscache,
so it must be reliable at least for some use cases.

Is it that those "bridging" blocks only show up in certain corner cases
that users can arrange to avoid?  Or that it's OK as long as you use
certain specific file systems whose behavior goes beyond what's
technically required by the bamp or seek interfaces?

--b.

> 
> Also, it works towards getting rid of this use of bmap, but that's not user
> visible.
> 
> David


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread David Howells
J. Bruce Fields  wrote:

> On Wed, Jan 20, 2021 at 10:21:24PM +, David Howells wrote:
> >  Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
> >  to be read from the cache.  Whilst this is an improvement from the
> >  bmap interface, it still has a problem with regard to a modern
> >  extent-based filesystem inserting or removing bridging blocks of
> >  zeros.
> 
> What are the consequences from the point of view of a user?

The cache can get both false positive and false negative results on checks for
the presence of data because an extent-based filesystem can, at will, insert
or remove blocks of contiguous zeros to make the extents easier to encode
(ie. bridge them or split them).

A false-positive means that you get a block of zeros in the middle of your
file that very probably shouldn't be there (ie. file corruption); a
false-negative means that we go and reload the missing chunk from the server.

The problem exists in cachefiles whether we use bmap or we use
SEEK_HOLE/SEEK_DATA.  The only way round it is to keep track of what data is
present independently of backing filesystem's metadata.

To this end, it shouldn't (mis)behave differently than the code already there
- except that it handles better the case in which the backing filesystem
blocksize != PAGE_SIZE (which may not be relevant on an extent-based
filesystem anyway if it packs parts of different files together in a single
block) because the current implementation only bmaps the first block in a page
and doesn't probe for the rest.

Fixing this requires a much bigger overhaul of cachefiles than this patchset
performs.

Also, it works towards getting rid of this use of bmap, but that's not user
visible.

David



Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread J. Bruce Fields
On Wed, Jan 20, 2021 at 10:21:24PM +, David Howells wrote:
>  Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
>  to be read from the cache.  Whilst this is an improvement from the
>  bmap interface, it still has a problem with regard to a modern
>  extent-based filesystem inserting or removing bridging blocks of
>  zeros.

What are the consequences from the point of view of a user?

--b.

> 
> This is a step towards overhauling the fscache API.  The change is opt-in
> on the part of the network filesystem.  A netfs should not try to mix the
> old and the new API because of conflicting ways of handling pages and the
> PG_fscache page flag and because it would be mixing DIO with buffered I/O.
> Further, the helper library can't be used with the old API.
> 
> This does not change any of the fscache cookie handling APIs or the way
> invalidation is done.
> 
> In the near term, I intend to deprecate and remove the old I/O API
> (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(),
> fscache_write_page() and fscache_uncache_page()) and eventually replace
> most of fscache/cachefiles with something simpler and easier to follow.
> 
> The patchset contains four parts:
> 
>  (1) Some helper patches, including provision of an ITER_XARRAY iov
>  iterator and a function to do readahead expansion.
> 
>  (2) Patches to add the netfs helper library.
> 
>  (3) A patch to add the fscache/cachefiles kiocb API
> 
>  (4) Patches to add support in AFS for this.
> 
> With this, AFS without a cache passes all expected xfstests; with a cache,
> there's an extra failure, but that's also there before these patches.
> Fixing that probably requires a greater overhaul.
> 
> These patches can be found also on:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib
> 
> David
> ---
> David Howells (24):
>   iov_iter: Add ITER_XARRAY
>   vm: Add wait/unlock functions for PG_fscache
>   mm: Implement readahead_control pageset expansion
>   vfs: Export rw_verify_area() for use by cachefiles
>   netfs: Make a netfs helper module
>   netfs: Provide readahead and readpage netfs helpers
>   netfs: Add tracepoints
>   netfs: Gather stats
>   netfs: Add write_begin helper
>   netfs: Define an interface to talk to a cache
>   fscache, cachefiles: Add alternate API to use kiocb for read/write to 
> cache
>   afs: Disable use of the fscache I/O routines
>   afs: Pass page into dirty region helpers to provide THP size
>   afs: Print the operation debug_id when logging an unexpected data 
> version
>   afs: Move key to afs_read struct
>   afs: Don't truncate iter during data fetch
>   afs: Log remote unmarshalling errors
>   afs: Set up the iov_iter before calling afs_extract_data()
>   afs: Use ITER_XARRAY for writing
>   afs: Wait on PG_fscache before modifying/releasing a page
>   afs: Extract writeback extension into its own function
>   afs: Prepare for use of THPs
>   afs: Use the fs operation ops to handle FetchData completion
>   afs: Use new fscache read helper API
> 
> Takashi Iwai (1):
>   cachefiles: Drop superfluous readpages aops NULL check
> 
> 
>  fs/Kconfig|1 +
>  fs/Makefile   |1 +
>  fs/afs/Kconfig|1 +
>  fs/afs/dir.c  |  225 ---
>  fs/afs/file.c |  472 --
>  fs/afs/fs_operation.c |4 +-
>  fs/afs/fsclient.c |  108 ++--
>  fs/afs/inode.c|7 +-
>  fs/afs/internal.h |   57 +-
>  fs/afs/rxrpc.c|  150 ++---
>  fs/afs/write.c|  610 ++
>  fs/afs/yfsclient.c|   82 +--
>  fs/cachefiles/Makefile|1 +
>  fs/cachefiles/interface.c |5 +-
>  fs/cachefiles/internal.h  |9 +
>  fs/cachefiles/rdwr.c  |2 -
>  fs/cachefiles/rdwr2.c |  406 
>  fs/fscache/Makefile   |3 +-
>  fs/fscache/internal.h |3 +
>  fs/fscache/page.c |2 +-
>  fs/fscache/page2.c|  116 
>  fs/fscache/stats.c|1 +
>  fs/internal.h |5 -
>  fs/netfs/Kconfig  |   23 +
>  fs/netfs/Makefile |5 +
>  fs/netfs/internal.h   |   97 +++
>  fs/netfs/read_helper.c| 1142 +
>  fs/netfs/stats.c  |   57 ++
>  fs/read_write.c   |1 +
>  include/linux/fs.h|1 +
>  include/linux/fscache-cache.h |4 +
>  include/linux/fscache.h   |   28 +-
>  include/linux/netfs.h |  167 +
>  include/linux/pagemap.h   |   16 +
>  include/net/af_rxrpc.h|2 +-
>  include/trace/events/afs.h|   74 +--
>  include/trace/events/netfs.h  |  201 ++
>  mm/filemap.c  |   18 +
>  

[RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-20 Thread David Howells


Here's a set of patches to do two things:

 (1) Add a helper library to handle the new VM readahead interface.  This
 is intended to be used unconditionally by the filesystem (whether or
 not caching is enabled) and provides a common framework for doing
 caching, transparent huge pages and, in the future, possibly fscrypt
 and read bandwidth maximisation.  It also allows the netfs and the
 cache to align, expand and slice up a read request from the VM in
 various ways; the netfs need only provide a function to read a stretch
 of data to the pagecache and the helper takes care of the rest.

 (2) Add an alternative fscache/cachfiles I/O API that uses the kiocb
 facility to do async DIO to transfer data to/from the netfs's pages,
 rather than using readpage with wait queue snooping on one side and
 vfs_write() on the other.  It also uses less memory, since it doesn't
 do buffered I/O on the backing file.

 Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
 to be read from the cache.  Whilst this is an improvement from the
 bmap interface, it still has a problem with regard to a modern
 extent-based filesystem inserting or removing bridging blocks of
 zeros.  Fixing that requires a much greater overhaul.

This is a step towards overhauling the fscache API.  The change is opt-in
on the part of the network filesystem.  A netfs should not try to mix the
old and the new API because of conflicting ways of handling pages and the
PG_fscache page flag and because it would be mixing DIO with buffered I/O.
Further, the helper library can't be used with the old API.

This does not change any of the fscache cookie handling APIs or the way
invalidation is done.

In the near term, I intend to deprecate and remove the old I/O API
(fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(),
fscache_write_page() and fscache_uncache_page()) and eventually replace
most of fscache/cachefiles with something simpler and easier to follow.

The patchset contains four parts:

 (1) Some helper patches, including provision of an ITER_XARRAY iov
 iterator and a function to do readahead expansion.

 (2) Patches to add the netfs helper library.

 (3) A patch to add the fscache/cachefiles kiocb API

 (4) Patches to add support in AFS for this.

With this, AFS without a cache passes all expected xfstests; with a cache,
there's an extra failure, but that's also there before these patches.
Fixing that probably requires a greater overhaul.

These patches can be found also on:


https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib

David
---
David Howells (24):
  iov_iter: Add ITER_XARRAY
  vm: Add wait/unlock functions for PG_fscache
  mm: Implement readahead_control pageset expansion
  vfs: Export rw_verify_area() for use by cachefiles
  netfs: Make a netfs helper module
  netfs: Provide readahead and readpage netfs helpers
  netfs: Add tracepoints
  netfs: Gather stats
  netfs: Add write_begin helper
  netfs: Define an interface to talk to a cache
  fscache, cachefiles: Add alternate API to use kiocb for read/write to 
cache
  afs: Disable use of the fscache I/O routines
  afs: Pass page into dirty region helpers to provide THP size
  afs: Print the operation debug_id when logging an unexpected data version
  afs: Move key to afs_read struct
  afs: Don't truncate iter during data fetch
  afs: Log remote unmarshalling errors
  afs: Set up the iov_iter before calling afs_extract_data()
  afs: Use ITER_XARRAY for writing
  afs: Wait on PG_fscache before modifying/releasing a page
  afs: Extract writeback extension into its own function
  afs: Prepare for use of THPs
  afs: Use the fs operation ops to handle FetchData completion
  afs: Use new fscache read helper API

Takashi Iwai (1):
  cachefiles: Drop superfluous readpages aops NULL check


 fs/Kconfig|1 +
 fs/Makefile   |1 +
 fs/afs/Kconfig|1 +
 fs/afs/dir.c  |  225 ---
 fs/afs/file.c |  472 --
 fs/afs/fs_operation.c |4 +-
 fs/afs/fsclient.c |  108 ++--
 fs/afs/inode.c|7 +-
 fs/afs/internal.h |   57 +-
 fs/afs/rxrpc.c|  150 ++---
 fs/afs/write.c|  610 ++
 fs/afs/yfsclient.c|   82 +--
 fs/cachefiles/Makefile|1 +
 fs/cachefiles/interface.c |5 +-
 fs/cachefiles/internal.h  |9 +
 fs/cachefiles/rdwr.c  |2 -
 fs/cachefiles/rdwr2.c |  406 
 fs/fscache/Makefile   |3 +-
 fs/fscache/internal.h |3 +
 fs/fscache/page.c |2 +-
 fs/fscache/page2.c|  116 
 fs/fscache/stats.c|1 +
 fs/internal.h |5