在 2024/8/23 07:55, Qu Wenruo 写道:


在 2024/8/22 21:37, Matthew Wilcox 写道:
On Thu, Aug 22, 2024 at 08:28:09PM +0930, Qu Wenruo wrote:
在 2024/8/22 12:35, Matthew Wilcox 写道:
-    while (cur < page_start + PAGE_SIZE) {
+    while (cur < folio_start + PAGE_SIZE) {

Presumably we want to support large folios in btrfs at some point?

Yes, and we're already working towards that direction.

I certainly want to remove CONFIG_READ_ONLY_THP_FOR_FS soon and that'll
be a bit of a regression for btrfs if it doesn't have large folio
support.  So shouldn't we also s/PAGE_SIZE/folio_size(folio)/ ?

AFAIK we're only going to support larger folios to support larger than
PAGE_SIZE sector size so far.

Why do you not want the performance gains from using larger folios?

So every folio is still in a fixed size (sector size, >= PAGE_SIZE).

Not familiar with transparent huge page, I thought transparent huge page
is transparent to fs.

Or do we need some special handling?
My uneducated guess is, we will get a larger folio passed to readpage
call back directly?

Why do you choose to remain uneducated?  It's not like I've been keeping
all of this to myself for the past five years.  I've given dozens of
presentations on it, including plenary sessions at LSFMM.  As a
filesystem
developer, you must want to not know about it at this point.

It's straightforward enough to read all contents for a larger folio,
it's no different to subpage handling.

But what will happen if some writes happened to that larger folio?
Do MM layer detects that and split the folios? Or the fs has to go the
subpage routine (with an extra structure recording all the subpage flags
bitmap)?

Entirely up to the filesystem.  It would help if btrfs used the same
terminology as the rest of the filesystems instead of inventing its own
"subpage" thing.  As far as I can tell, "subpage" means "fs block size",
but maybe it has a different meaning that I haven't ascertained.

Then tell me the correct terminology to describe fs block size smaller
than page size in the first place.

"fs block size" is not good enough, we want a terminology to describe
"fs block size" smaller than page size.


Tracking dirtiness on a per-folio basis does not seem to be good enough.
Various people have workloads that regress in performance if you do
that.  So having some data structure attached to folio->private which
tracks dirtiness on a per-fs-block basis works pretty well.  iomap also
tracks the uptodate bit on a per-fs-block basis, but I'm less convinced
that's necessary.

I have no idea why btrfs thinks it needs to track writeback, ordered,
checked and locked in a bitmap.  Those make no sense to me.  But they
make no sense to me if you're support a 4KiB filesystem on a machine
with a 64KiB PAGE_SIZE, not just in the context of "larger folios".
Writeback is something the VM tells you to do; why do you need to tag
individual blocks for writeback?

Because there are cases where btrfs needs to only write back part of the
folio independently.

And especially for mixing compression and non-compression writes inside
a page, e.g:

       0     16K     32K     48K      64K
       |//|          |///////|
          4K

In above case, if we need to writeback above page with 4K sector size,
then the first 4K is not suitable for compression (result will still
take a full 4K block), while the range [32K, 48K) will be compressed.

In that case, [0, 4K) range will be submitted directly for IO.
Meanwhile [32K, 48) will be submitted for compression in antoher wq.
(Or time consuming compression will delay the writeback of the remaining
pages)

This means the dirty/writeback flags will have a different timing to be
changed.

Just in case if you mean using an atomic to trace the writeback/lock
progress, then it's possible to go that path, but for now it's not space
efficient.

For 16 blocks per page case (4K sectorsize 64K page size), each atomic
takes 4 bytes while a bitmap only takes 2 bytes.

And for 4K sectorsize 16K page size case, it's worse and btrfs compact
all the bitmaps into a larger one to save more space, while each atomic
still takes 4 bytes.

Thanks,
Qu


I think compression is no long a btrfs exclusive feature, thus this
should be obvious?

Thanks,
Qu



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to