On 2021/4/16 下午10:42, Josef Bacik wrote:
On 4/15/21 1:04 AM, Qu Wenruo wrote:
This patch will refactor btrfs_invalidatepage() for the incoming subpage
support.

The invovled modifcations are:
- Use while() loop instead of "goto again;"
- Use single variable to determine whether to delete extent states
   Each branch will also have comments why we can or cannot delete the
   extent states
- Do qgroup free and extent states deletion per-loop
   Current code can only work for PAGE_SIZE == sectorsize case.

This refactor also makes it clear what we do for different sectors:
- Sectors without ordered extent
   We're completely safe to remove all extent states for the sector(s)

- Sectors with ordered extent, but no Private2 bit
   This means the endio has already been executed, we can't remove all
   extent states for the sector(s).

- Sectors with ordere extent, still has Private2 bit
   This means we need to decrease the ordered extent accounting.
   And then it comes to two different variants:
   * We have finished and removed the ordered extent
     Then it's the same as "sectors without ordered extent"
   * We didn't finished the ordered extent
     We can remove some extent states, but not all.

Signed-off-by: Qu Wenruo <w...@suse.com>
---
  fs/btrfs/inode.c | 173 +++++++++++++++++++++++++----------------------
  1 file changed, 94 insertions(+), 79 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4c894de2e813..93bb7c0482ba 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8320,15 +8320,12 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
  {
      struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
      struct extent_io_tree *tree = &inode->io_tree;
-    struct btrfs_ordered_extent *ordered;
      struct extent_state *cached_state = NULL;
      u64 page_start = page_offset(page);
      u64 page_end = page_start + PAGE_SIZE - 1;
-    u64 start;
-    u64 end;
+    u64 cur;
+    u32 sectorsize = inode->root->fs_info->sectorsize;
      int inode_evicting = inode->vfs_inode.i_state & I_FREEING;
-    bool found_ordered = false;
-    bool completed_ordered = false;
      /*
       * We have page locked so no new ordered extent can be created on
@@ -8352,96 +8349,114 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
      if (!inode_evicting)
          lock_extent_bits(tree, page_start, page_end, &cached_state);
-    start = page_start;
-again:
-    ordered = btrfs_lookup_ordered_range(inode, start, page_end - start + 1);
-    if (ordered) {
-        found_ordered = true;
-        end = min(page_end,
-              ordered->file_offset + ordered->num_bytes - 1);
+    cur = page_start;
+    while (cur < page_end) {
+        struct btrfs_ordered_extent *ordered;
+        bool delete_states = false;
+        u64 range_end;
+
+        /*
+         * Here we can't pass "file_offset = cur" and
+         * "len = page_end + 1 - cur", as btrfs_lookup_ordered_range()
+         * may not return the first ordered extent after @file_offset.
+         *
+         * Here we want to iterate through the range in byte order.
+         * This is slower but definitely correct.
+         *
+         * TODO: Make btrfs_lookup_ordered_range() to return the
+         * first ordered extent in the range to reduce the number
+         * of loops.
+         */
+        ordered = btrfs_lookup_ordered_range(inode, cur, sectorsize);

How does it not find the first ordered extent after file_offset? Looking at the code it just loops through and returns the first thing it finds that overlaps our range.  Is there a bug in btrfs_lookup_ordered_range()?

btrfs_lookup_ordered_range() does two search:
node = tree_search(tree, file_offset);
if (!node) {
        node = tree_search(tree, file_offset + len);
}

That means for the following seach pattern, it will not return the first OE:
start                                   end
|       |///////|       |///////|       |



We should add some self tests to make sure these helpers are doing the right thing if there is in fact a bug.

It's not a bug, as most call sites for btrfs_lookup_ordered_range() will wait for the ordered extent to finish, then re-search until all ordered extent is exhausted.

In that case, they don't care the order of returned OE.

It's really the first time we need a specific ordered.

Since you're already complaining, I guess I'd either add a new function to make the existing one to follow the order.


+        if (!ordered) {
+            range_end = cur + sectorsize - 1;
+            /*
+             * No ordered extent covering this sector, we are safe
+             * to delete all extent states in the range.
+             */
+            delete_states = true;
+            goto next;
+        }
+
+        range_end = min(ordered->file_offset + ordered->num_bytes - 1,
+                page_end);
+        if (!PagePrivate2(page)) {
+            /*
+             * If Private2 is cleared, it means endio has already
+             * been executed for the range.
+             * We can't delete the extent states as
+             * btrfs_finish_ordered_io() may still use some of them.
+             */
+            delete_states = false;

delete_states is already false.

Yes, but I want to put some comment here.

I can just remove the initialization value.


+            goto next;
+        }
+        ClearPagePrivate2(page);
+
          /*
           * IO on this page will never be started, so we need to account
           * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
           * here, must leave that up for the ordered extent completion.
+         *
+         * This will also unlock the range for incoming
+         * btrfs_finish_ordered_io().
           */
          if (!inode_evicting)
-            clear_extent_bit(tree, start, end,
+            clear_extent_bit(tree, cur, range_end,
                       EXTENT_DELALLOC |
                       EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
                       EXTENT_DEFRAG, 1, 0, &cached_state);
+
+        spin_lock_irq(&inode->ordered_tree.lock);
+        set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
+        ASSERT(cur - ordered->file_offset < U32_MAX);
+        ordered->truncated_len = min_t(u32, ordered->truncated_len,
+                           cur - ordered->file_offset);

I've realized my previous comment about this needing to be u64 was wrong, I'm starting to wake up now.  However I still don't see the value in saving the space, as we can just leave everything u64 and the math all works out cleanly.

No problem, I can drop that patch.


+        spin_unlock_irq(&inode->ordered_tree.lock);
+
+        ASSERT(range_end + 1 - cur < U32_MAX);

And we don't have to pollute the code with all of these checks.

That's indeed to the point.

Thanks,
Qu


+        if (btrfs_dec_test_ordered_pending(inode, &ordered,
+                    cur, range_end + 1 - cur, 1)) {
+            btrfs_finish_ordered_io(ordered);
+            /*
+             * The ordered extent has finished, now we're again
+             * safe to delete all extent states of the range.
+             */
+            delete_states = true;
+        } else {
+            /*
+             * btrfs_finish_ordered_io() will get executed by endio of
+             * other pages, thus we can't delete extent states any more
+             */
+            delete_states = false;

This is already false.  Thanks,

Josef

Reply via email to