Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead

2018-12-08 Thread David Sterba
On Sat, Dec 08, 2018 at 08:50:32AM +0800, Qu Wenruo wrote:
> > I've adapted a stress tests that unpacks a large tarball, snaphosts
> > every 20 seconds, deletes a random snapshot every 50 seconds, deletes
> > file from the original subvolume, now enhanced with qgroups just for the
> > new snapshots inherigin the toplevel subvolume. Lockup.
> > 
> > It gets stuck in a snapshot call with the follwin stacktrace
> > 
> > [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
> > [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]
> 
> This looks like the original subtree tracing has something wrong.

Yes, I ran the test on current master and it locked up too, so it's not
due to your patchset.

> Thanks for the report, I'll investigate it.

Thanks.


Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead

2018-12-07 Thread David Sterba
On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/12/7 上午3:35, David Sterba wrote:
> > On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
> >> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> >>> This patchset can be fetched from github:
> >>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> >>>
> >>> Which is based on v4.20-rc1.
> >>
> >> Thanks, I'll add it to for-next soon.
> > 
> > The branch was there for some time but not for at least a week (my
> > mistake I did not notice in time). I've rebased it on top of recent
> > misc-next, but without the delayed refs patchset from Josef.
> > 
> > At the moment I'm considering it for merge to 4.21, there's still some
> > time to pull it out in case it shows up to be too problematic. I'm
> > mostly worried about the unknown interactions with the enospc updates or
> 
> For that part, I don't think it would have some obvious problem for
> enospc updates.
> 
> As the user-noticeable effect is the delay of reloc tree deletion.
> 
> Despite that, it's mostly transparent to extent allocation.
> 
> > generally because of lack of qgroup and reloc code reviews.
> 
> That's the biggest problem.
> 
> However most of the current qgroup + balance optimization is done inside
> qgroup code (to skip certain qgroup record), if we're going to hit some
> problem then this patchset would have the highest possibility to hit
> problem.
> 
> Later patches will just keep tweaking qgroup to without affecting any
> other parts mostly.
> 
> So I'm fine if you decide to pull it out for now.

I've adapted a stress tests that unpacks a large tarball, snaphosts
every 20 seconds, deletes a random snapshot every 50 seconds, deletes
file from the original subvolume, now enhanced with qgroups just for the
new snapshots inherigin the toplevel subvolume. Lockup.

It gets stuck in a snapshot call with the follwin stacktrace

[<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs]
[<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs]
[<0>] do_walk_down+0x681/0xb20 [btrfs]
[<0>] walk_down_tree+0xf5/0x1c0 [btrfs]
[<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs]
[<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs]
[<0>] cleaner_kthread+0xf8/0x170 [btrfs]
[<0>] kthread+0x121/0x140
[<0>] ret_from_fork+0x27/0x50

and that's like 10th snapshot and ~3rd deltion. This is qgroup show:

qgroupid rfer excl parent
   --
0/5 865.27MiB  1.66MiB ---
0/257   0.00B0.00B ---
0/259   0.00B0.00B ---
0/260   806.58MiB637.25MiB ---
0/262   0.00B0.00B ---
0/263   0.00B0.00B ---
0/264   0.00B0.00B ---
0/265   0.00B0.00B ---
0/266   0.00B0.00B ---
0/267   0.00B0.00B ---
0/268   0.00B0.00B ---
0/269   0.00B0.00B ---
0/270   989.04MiB  1.22MiB ---
0/271   0.00B0.00B ---
0/272   922.25MiB416.00KiB ---
0/273   931.02MiB  1.50MiB ---
0/274   910.94MiB  1.52MiB ---
1/1   1.64GiB  1.64GiB
0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274

No IO or cpu activity at this point, the stacktrace and show output
remains the same.

So, considering this, I'm not going to add the patchset to 4.21 but will
keep it in for-next for testing, any fixups or updates will be applied.


Re: [PATCH] libbtrfsutil: fix unprivileged tests if kernel lacks support

2018-12-07 Thread David Sterba
On Thu, Dec 06, 2018 at 04:29:32PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> I apparently didn't test this on a pre-4.18 kernel.
> test_subvolume_info_unprivileged() checks for an ENOTTY, but this
> doesn't seem to work correctly with subTest().
> test_subvolume_iterator_unprivileged() doesn't have a check at all. Add
> an explicit check to both before doing the actual test.
> 
> Signed-off-by: Omar Sandoval 

Applied, thanks.


Re: [PATCH v2 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead

2018-12-06 Thread David Sterba
On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote:
> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote:
> > This patchset can be fetched from github:
> > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased
> > 
> > Which is based on v4.20-rc1.
> 
> Thanks, I'll add it to for-next soon.

The branch was there for some time but not for at least a week (my
mistake I did not notice in time). I've rebased it on top of recent
misc-next, but without the delayed refs patchset from Josef.

At the moment I'm considering it for merge to 4.21, there's still some
time to pull it out in case it shows up to be too problematic. I'm
mostly worried about the unknown interactions with the enospc updates or
generally because of lack of qgroup and reloc code reviews.

I'm going to do some testing of the rebased branch before I add it to
for-next. The branch is ext/qu/qgroup-delay-scan in my devel repos,
plase check if everyghing is still ok there. Thanks.


Re: [PATCH][v2] btrfs: run delayed items before dropping the snapshot

2018-12-06 Thread David Sterba
On Wed, Dec 05, 2018 at 12:12:21PM -0500, Josef Bacik wrote:
> From: Josef Bacik 
> 
> With my delayed refs patches in place we started seeing a large amount
> of aborts in __btrfs_free_extent
> 
> BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 
> 35964  owner 1 offset 0
> Call Trace:
>  ? btrfs_merge_delayed_refs+0xaf/0x340
>  __btrfs_run_delayed_refs+0x6ea/0xfc0
>  ? btrfs_set_path_blocking+0x31/0x60
>  btrfs_run_delayed_refs+0xeb/0x180
>  btrfs_commit_transaction+0x179/0x7f0
>  ? btrfs_check_space_for_delayed_refs+0x30/0x50
>  ? should_end_transaction.isra.19+0xe/0x40
>  btrfs_drop_snapshot+0x41c/0x7c0
>  btrfs_clean_one_deleted_snapshot+0xb5/0xd0
>  cleaner_kthread+0xf6/0x120
>  kthread+0xf8/0x130
>  ? btree_invalidatepage+0x90/0x90
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x35/0x40
> 
> This was because btrfs_drop_snapshot depends on the root not being modified
> while it's dropping the snapshot.  It will unlock the root node (and really
> every node) as it walks down the tree, only to re-lock it when it needs to do
> something.  This is a problem because if we modify the tree we could cow a 
> block
> in our path, which free's our reference to that block.  Then once we get back 
> to
> that shared block we'll free our reference to it again, and get ENOENT when
> trying to lookup our extent reference to that block in __btrfs_free_extent.
> 
> This is ultimately happening because we have delayed items left to be 
> processed
> for our deleted snapshot _after_ all of the inodes are closed for the 
> snapshot.
> We only run the delayed inode item if we're deleting the inode, and even then 
> we
> do not run the delayed insertions or delayed removals.  These can be run at 
> any
> point after our final inode does it's last iput, which is what triggers the
> snapshot deletion.  We can end up with the snapshot deletion happening and 
> then
> have the delayed items run on that file system, resulting in the above 
> problem.
> 
> This problem has existed forever, however my patches made it much easier to 
> hit
> as I wake up the cleaner much more often to deal with delayed iputs, which 
> made
> us more likely to start the snapshot dropping work before the transaction
> commits, which is when the delayed items would generally be run.  Before,
> generally speaking, we would run the delayed items, commit the transaction, 
> and
> wakeup the cleaner thread to start deleting snapshots, which means we were 
> less
> likely to hit this problem.  You could still hit it if you had multiple
> snapshots to be deleted and ended up with lots of delayed items, but it was
> definitely harder.
> 
> Fix for now by simply running all the delayed items before starting to drop 
> the
> snapshot.  We could make this smarter in the future by making the delayed 
> items
> per-root, and then simply drop any delayed items for roots that we are going 
> to
> delete.  But for now just a quick and easy solution is the safest.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Josef Bacik 
> ---
> v1->v2:
> - check for errors from btrfs_run_delayed_items.
> - Dave I can reroll the series, but the second version of patch 1 is the same,
>   let me know what you want.

As this is a small update it's fine to send just that patch. You may
also use --in-reply-to so it threads to the original series. Resending
series makes most sense (to me) when there's a discussion and many
changes, so a fresh series makes it clear what's the current status.

Patch replaced in for-next topic branch, thanks.


Re: [PATCH 1/2] btrfs: catch cow on deleting snapshots

2018-12-06 Thread David Sterba
On Fri, Nov 30, 2018 at 12:19:18PM -0500, Josef Bacik wrote:
> On Fri, Nov 30, 2018 at 05:14:54PM +, Filipe Manana wrote:
> > On Fri, Nov 30, 2018 at 4:53 PM Josef Bacik  wrote:
> > >
> > > From: Josef Bacik 
> > >
> > > When debugging some weird extent reference bug I suspected that we were
> > > changing a snapshot while we were deleting it, which could explain my
> > > bug.  This was indeed what was happening, and this patch helped me
> > > verify my theory.  It is never correct to modify the snapshot once it's
> > > being deleted, so mark the root when we are deleting it and make sure we
> > > complain about it when it happens.
> > >
> > > Signed-off-by: Josef Bacik 
> > > ---
> > >  fs/btrfs/ctree.c   | 3 +++
> > >  fs/btrfs/ctree.h   | 1 +
> > >  fs/btrfs/extent-tree.c | 9 +
> > >  3 files changed, 13 insertions(+)
> > >
> > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > > index 5912a97b07a6..5f82f86085e8 100644
> > > --- a/fs/btrfs/ctree.c
> > > +++ b/fs/btrfs/ctree.c
> > > @@ -1440,6 +1440,9 @@ noinline int btrfs_cow_block(struct 
> > > btrfs_trans_handle *trans,
> > > u64 search_start;
> > > int ret;
> > >
> > > +   if (test_bit(BTRFS_ROOT_DELETING, >state))
> > > +   WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats 
> > > being dropped\n");
> > 
> > Please use btrfs_warn(), it makes sure we use a consistent message
> > style, identifies the fs, etc.
> > Also, "thats" should be "that is" or "that's".
> > 
> 
> Ah yeah, I was following the other convention in there but we should probably
> convert all of those to btrfs_warn.  I'll fix the grammer thing as well, just 
> a
> leftover from the much less code of conduct friendly message I originally had
> there.  Thanks,

Committed with the following fixup:

-   WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats being 
dropped\n");
+   btrfs_error(fs_info,
+   "COW'ing blocks on a fs root that's being dropped");



Re: [PATCH 08/10] btrfs: rework btrfs_check_space_for_delayed_refs

2018-12-06 Thread David Sterba
On Thu, Dec 06, 2018 at 06:52:21PM +0200, Nikolay Borisov wrote:
> 
> 
> On 3.12.18 г. 17:20 ч., Josef Bacik wrote:
> > Now with the delayed_refs_rsv we can now know exactly how much pending
> > delayed refs space we need.  This means we can drastically simplify
> 
> IMO it will be helpful if there is a sentence here referring back to
> btrfs_update_delayed_refs_rsv to put your first sentence into context.
> But I guess this is something David can also do.

I'll update the changelog, but I'm not sure what exactly you want to see
there, please post the replacement text. Thanks.

> > btrfs_check_space_for_delayed_refs by simply checking how much space we
> > have reserved for the global rsv (which acts as a spill over buffer) and
> > the delayed refs rsv.  If our total size is beyond that amount then we
> > know it's time to commit the transaction and stop any more delayed refs
> > from being generated.
> > 
> > Signed-off-by: Josef Bacik 


Re: [PATCH 00/10][V2] Delayed refs rsv

2018-12-06 Thread David Sterba
On Mon, Dec 03, 2018 at 10:20:28AM -0500, Josef Bacik wrote:
> v1->v2:
> - addressed the comments from the various reviewers.
> - split "introduce delayed_refs_rsv" into 5 patches.  The patches are the same
>   together as they were, just split out more logically.  They can't really be
>   bisected across in that you will likely have fun enospc failures, but they
>   compile properly.  This was done to make it easier for review.

Thanks, that was helpful. The bisectability is IMO not hurt, as it
compiles and runs despite the potential ENOSPC problems. I guess the
point is to be able to reach any commit & compile & run and not be hit
by unrelated bugs. If somebody is bisecting an ENOSPC problem and lands
in the middle of the series, there's a hint in the changelogs that
something is going on.

> -- Original message --
> 
> This patchset changes how we do space reservations for delayed refs.  We were
> hitting probably 20-40 enospc abort's per day in production while running
> delayed refs at transaction commit time.  This means we ran out of space in 
> the
> global reserve and couldn't easily get more space in use_block_rsv().
> 
> The global reserve has grown to cover everything we don't reserve space
> explicitly for, and we've grown a lot of weird ad-hoc hueristics to know if
> we're running short on space and when it's time to force a commit.  A failure
> rate of 20-40 file systems when we run hundreds of thousands of them isn't 
> super
> high, but cleaning up this code will make things less ugly and more 
> predictible.
> 
> Thus the delayed refs rsv.  We always know how many delayed refs we have
> outstanding, and although running them generates more we can use the global
> reserve for that spill over, which fits better into it's desired use than a 
> full
> blown reservation.  This first approach is to simply take how many times we're
> reserving space for and multiply that by 2 in order to save enough space for 
> the
> delayed refs that could be generated.  This is a niave approach and will
> probably evolve, but for now it works.
> 
> With this patchset we've gone down to 2-8 failures per week.  It's not 
> perfect,
> there are some corner cases that still need to be addressed, but is
> significantly better than what we had.  Thanks,

I've merged the series to misc-next, the amount of reviews and testing
is sufficient. Though there are some more comments or suggestions for
improvements, they can be done as followups.

The other patchsets from you are namely fixes, that can be applied at
the -rc time, for that reason I'm glad the main infrastructure change
can be merged before the 4.21 code freeze.


Re: [PATCH 0/3] btrfs: use offset_in_page and PAGE_ALIGNED

2018-12-06 Thread David Sterba
On Wed, Dec 05, 2018 at 03:23:02PM +0100, Johannes Thumshirn wrote:
> Use the offset_in_page() and PAGE_ALIGNED() macros instead of open-coding them
> throughout btrfs.
> 
> This series also includes a patch for 'make coccicheck' which is marked as an
> RFC and I've CCed Julia in the hoping to get input from her.
> 
> Johannes Thumshirn (3):
>   btrfs: use offset_in_page instead of open-coding it
>   btrfs: use PAGE_ALIGNED instead of open-coding it

Added to misc-next, thanks.


Re: [PATCHv3] btrfs: Fix error handling in btrfs_cleanup_ordered_extents

2018-12-05 Thread David Sterba
On Wed, Nov 21, 2018 at 05:10:52PM +0200, Nikolay Borisov wrote:
> Running btrfs/124 in a loop hung up on me sporadically with the
> following call trace:
>   btrfs   D0  5760   5324 0x
>   Call Trace:
>? __schedule+0x243/0x800
>schedule+0x33/0x90
>btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
>? wait_woken+0xa0/0xa0
>btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
>btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
>btrfs_relocate_chunk+0x49/0x100 [btrfs]
>btrfs_balance+0xbeb/0x1740 [btrfs]
>btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
>btrfs_ioctl+0x1691/0x3110 [btrfs]
>? lockdep_hardirqs_on+0xed/0x180
>? __handle_mm_fault+0x8e7/0xfb0
>? _raw_spin_unlock+0x24/0x30
>? __handle_mm_fault+0x8e7/0xfb0
>? do_vfs_ioctl+0xa5/0x6e0
>? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>do_vfs_ioctl+0xa5/0x6e0
>? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
>ksys_ioctl+0x3a/0x70
>__x64_sys_ioctl+0x16/0x20
>do_syscall_64+0x60/0x1b0
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> This happens because during page writeback it's valid for
> writepage_delalloc to instantiate a delalloc range which doesn't
> belong to the page currently being written back.
> 
> The reason this case is valid is due to find_lock_delalloc_range
> returning any available range after the passed delalloc_start and
> ignorting whether the page under writeback is within that range.
> In turn ordered extents (OE) are always created for the returned range
> from find_lock_delalloc_range. If, however, a failure occurs while OE
> are being created then the clean up code in btrfs_cleanup_ordered_extents
> will be called.
> 
> Unfortunately the code in btrfs_cleanup_ordered_extents doesn't consider
> the case of such 'foreign' range being processed and instead it always
> assumes that the range OE are created for belongs to the page. This
> leads to the first page of such foregin range to not be cleaned up since
> it's deliberately missed skipped by the current cleaning up code.
> 
> Fix this by correctly checking whether the current page belongs to the
> range being instantiated and if so adjsut the range parameters
> passed for cleaning up. If it doesn't, then just clean the whole OE
> range directly.
> 
> Signed-off-by: Nikolay Borisov 
> Reviewed-by: Josef Bacik 

Added to misc-next, thanks.


Btrfs progs release 4.19.1

2018-12-05 Thread David Sterba
Hi,

btrfs-progs version 4.19.1 have been released.  There are build fixes, minor
update to libbtrfsutil and documentation updates.

Changes since 4.19.1-rc1: fix typos in CHANGES

Changes:

  * build fixes
* big-endian builds fail due to bswap helper clashes
* 'swap' macro is too generic, renamed to prevent build failures
  * libbtrfs
* minor version update to 1.1.0
* fix default search to top=0 as documented
* rename 'async' to avoid future python binding problems
* add support for unprivileged subvolume listing ioctls
* added tests, API docs
  * other
* lot of typos fixed
* warning cleanups
* doc formatting updates
* CI tests against zstd 1.3.7

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

David Sterba (7):
  btrfs-progs: kerncompat: rename swap to __swap
  btrfs-progs: README: add link to INSTALL
  btrfs-progs: docs: fix rendering of exponents in manual pages
  btrfs-progs: link to libbtrfsutil/README from the main README
  btrfs-progs: tests: pull zstd version 1.3.7 to the travis CI
  btrfs-progs: update CHANGES for v4.19.1
  Btrfs progs v4.19.1

Josh Soref (11):
  btrfs-progs: docs: fix typos in Documentation
  btrfs-progs: docs: fix typos in READMEs, INSTALL and CHANGES
  btrfs-progs: fix typos in Makefile
  btrfs-progs: tests: fix typos in test comments
  btrfs-progs: tests: fsck/025, fix typo in helpre name
  btrfs-progs: fix typos in comments
  btrfs-progs: fix typos in user-visible strings
  btrfs-progs: check: fix typo in device_extent_record::chunk_objectid
  btrfs-progs: utils: fix typo in a variable
  btrfs-progs: mkfs: fix typo in "multipler" variables
  btrfs-progs: fix typo in btrfs-list function export

Omar Sandoval (10):
  libbtrfsutil: use top=0 as default for SubvolumeIterator()
  libbtrfsutil: change async parameters to async_ in Python bindings
  libbtrfsutil: document qgroup_inherit parameter in Python bindings
  libbtrfsutil: use SubvolumeIterator as context manager in tests
  libbtrfsutil: add test helpers for dropping privileges
  libbtrfsutil: allow tests to create multiple Btrfs instances
  libbtrfsutil: relax the privileges of subvolume_info()
  libbtrfsutil: relax the privileges of subvolume iterator
  libbtrfsutil: bump version to 1.1.0
  libbtrfsutil: document API in README

Rosen Penev (3):
  btrfs-progs: kernel-lib: bitops: Fix big endian compilation
  btrfs-progs: task-utils: Fix comparison between pointer and integer
  btrfs-progs: treewide: Fix missing declarations



Re: [PATCH v2 07/13] btrfs-progs: Fix Wmaybe-uninitialized warning

2018-12-05 Thread David Sterba
On Wed, Dec 05, 2018 at 02:40:12PM +0800, Qu Wenruo wrote:
> GCC 8.2.1 will report the following warning with "make W=1":
> 
>   ctree.c: In function 'btrfs_next_sibling_tree_block':
>   ctree.c:2990:21: warning: 'slot' may be used uninitialized in this function 
> [-Wmaybe-uninitialized]
> path->slots[level] = slot;
> ~~~^~
> 
> The culprit is the following code:
> 
>   int slot;   << Not initialized
>   int level = path->lowest_level + 1;
>   BUG_ON(path->lowest_level + 1 >= BTRFS_MAX_LEVEL);
>   while(level < BTRFS_MAX_LEVEL) {
>   slot = path->slots[level] + 1;
>   ^^ but we initialize @slot here.
>   ...
>   }
>   path->slots[level] = slot;
> 
> It's possible that compiler doesn't get enough hint for BUG_ON() on
> lowest_level + 1 >= BTRFS_MAX_LEVEL case.
> 
> Fix it by using a do {} while() loop other than while() {} loop, to
> ensure we will run the loop for at least once.

I was hoping that we can actually add the hint to BUG_ON so the code
does not continue if the condition is true.


Re: [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex

2018-12-04 Thread David Sterba
On Tue, Dec 04, 2018 at 05:22:19PM +0200, Nikolay Borisov wrote:
> > @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, 
> > u64 devid, u64 start,
> > if (ret) {
> > mutex_unlock(_info->scrub_lock);
> > mutex_unlock(_info->fs_devices->device_list_mutex);
> > -   return ret;
> > +   goto out_free_ctx;
> 
> Don't we suffer the same issue when calling scrub_workers_get since in
> it we do btrfs_alloc_workqueue which also calls kzalloc with GFP_KERNEL?

Yes, that's right. I instrumented only the allocations in scrub.c to see
if the nofs and lock_not_held assertions work at all so this one did not
get caught directly.

As scrub_workers_get still needs the scrub_lock, fixing it by moving
does not work and would need more restructuring.


[PATCH 0/2] Scrub allocations vs reclaim fix

2018-12-04 Thread David Sterba
I've instrumented the scrub alloctions to check for the GFP_NOFS context
and whether the device_list_mutex is not held. This caught one
allocation, and I've used the instrumented code to verify that Filipe's
scrub/reclaim fix is correct.

There are some debugging helpers need that live in the generic code so
I'm posting only results. The patches are in branch dev/locks-not-held
in my devel repos.

David Sterba (2):
  btrfs: scrub: pass fs_info to scrub_setup_ctx
  btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex

 fs/btrfs/scrub.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

-- 
2.19.1



[PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex

2018-12-04 Thread David Sterba
The scrub context is allocated with GFP_KERNEL and called from
btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe
regarding reclaim that could try to flush filesystem data in order to
get the memory. And the device_list_mutex is held during superblock
commit, so this would cause a lockup.

Move the alocation and initialization before any changes that require
the mutex.

Signed-off-by: David Sterba 
---
 fs/btrfs/scrub.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ffcab263e057..051d14c9f013 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3834,13 +3834,18 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
return -EINVAL;
}
 
+   /* Allocate outside of device_list_mutex */
+   sctx = scrub_setup_ctx(fs_info, is_dev_replace);
+   if (IS_ERR(sctx))
+   return PTR_ERR(sctx);
 
mutex_lock(_info->fs_devices->device_list_mutex);
dev = btrfs_find_device(fs_info, devid, NULL, NULL);
if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state) &&
 !is_dev_replace)) {
mutex_unlock(_info->fs_devices->device_list_mutex);
-   return -ENODEV;
+   ret = -ENODEV;
+   goto out_free_ctx;
}
 
if (!is_dev_replace && !readonly &&
@@ -3848,7 +3853,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
mutex_unlock(_info->fs_devices->device_list_mutex);
btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable",
rcu_str_deref(dev->name));
-   return -EROFS;
+   ret = -EROFS;
+   goto out_free_ctx;
}
 
mutex_lock(_info->scrub_lock);
@@ -3856,7 +3862,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
test_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state)) {
mutex_unlock(_info->scrub_lock);
mutex_unlock(_info->fs_devices->device_list_mutex);
-   return -EIO;
+   ret = -EIO;
+   goto out_free_ctx;
}
 
btrfs_dev_replace_read_lock(_info->dev_replace);
@@ -3866,7 +3873,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
btrfs_dev_replace_read_unlock(_info->dev_replace);
mutex_unlock(_info->scrub_lock);
mutex_unlock(_info->fs_devices->device_list_mutex);
-   return -EINPROGRESS;
+   ret = -EINPROGRESS;
+   goto out_free_ctx;
}
btrfs_dev_replace_read_unlock(_info->dev_replace);
 
@@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
if (ret) {
mutex_unlock(_info->scrub_lock);
mutex_unlock(_info->fs_devices->device_list_mutex);
-   return ret;
+   goto out_free_ctx;
}
 
-   sctx = scrub_setup_ctx(fs_info, is_dev_replace);
-   if (IS_ERR(sctx)) {
-   mutex_unlock(_info->scrub_lock);
-   mutex_unlock(_info->fs_devices->device_list_mutex);
-   scrub_workers_put(fs_info);
-   return PTR_ERR(sctx);
-   }
sctx->readonly = readonly;
dev->scrub_ctx = sctx;
mutex_unlock(_info->fs_devices->device_list_mutex);
@@ -3936,6 +3937,11 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
 
scrub_put_ctx(sctx);
 
+   return ret;
+
+out_free_ctx:
+   scrub_free_ctx(sctx);
+
return ret;
 }
 
-- 
2.19.1



[PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx

2018-12-04 Thread David Sterba
We can pass fs_info directly as this is the only member of btrfs_device
that's bing used inside scrub_setup_ctx.

Signed-off-by: David Sterba 
---
 fs/btrfs/scrub.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index bbd1b36f4918..ffcab263e057 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -578,12 +578,11 @@ static void scrub_put_ctx(struct scrub_ctx *sctx)
scrub_free_ctx(sctx);
 }
 
-static noinline_for_stack
-struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
+static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
+   struct btrfs_fs_info *fs_info, int is_dev_replace)
 {
struct scrub_ctx *sctx;
int i;
-   struct btrfs_fs_info *fs_info = dev->fs_info;
 
sctx = kzalloc(sizeof(*sctx), GFP_KERNEL);
if (!sctx)
@@ -592,7 +591,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, 
int is_dev_replace)
sctx->is_dev_replace = is_dev_replace;
sctx->pages_per_rd_bio = SCRUB_PAGES_PER_RD_BIO;
sctx->curr = -1;
-   sctx->fs_info = dev->fs_info;
+   sctx->fs_info = fs_info;
for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) {
struct scrub_bio *sbio;
 
@@ -3878,7 +3877,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
return ret;
}
 
-   sctx = scrub_setup_ctx(dev, is_dev_replace);
+   sctx = scrub_setup_ctx(fs_info, is_dev_replace);
if (IS_ERR(sctx)) {
mutex_unlock(_info->scrub_lock);
mutex_unlock(_info->fs_devices->device_list_mutex);
-- 
2.19.1



Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub

2018-12-04 Thread David Sterba
On Wed, Nov 28, 2018 at 02:40:07PM +, Filipe Manana wrote:
> > > Well, the worker tasks can also not use gfp_kernel, since the scrub
> > > task waits for them to complete before pausing.
> > > I missed this, and 2 reviewers as well, so perhaps it wasn't that
> > > trivial and I shouldn't feel that I miserably failed to identify all
> > > cases for something rather trivial. V5 sent.
> >
> > You can say that you left it there intentionally, such cookies are a
> > good drill for reviewers to stay sharp.
> 
> Unfortunately for me, it was not on purpose.
> 
> However there's the small drill of, for the workers only, potentially
> moving the memalloc_nofs_save() and
> restore to scrub_handle_errored_block(), since the only two possible
> gfp_kernel allocations for workers
> are during the case where a repair is needed:
> 
> scrub_bio_end_io_worker()
>   scrub_block_complete()
> scrub_handle_errored_block()
>   lock_full_stripe()
> insert_full_stripe_lock()
>   -> kmalloc with GFP_KERNEL
> 
> 
> scrub_bio_end_io_worker()
>scrub_block_complete()
>  scrub_handle_errored_block()
>scrub_write_page_to_dev_replace()
>  scrub_add_page_to_wr_bio()
>-> kzalloc with GFP_KERNEL
> 
> Just to avoid some duplication.

Sounds like a nice cleanup and more in line with the idea of scoped
GFP_NOFS, the markers should be at a higher level. Starting at the
bottom of the callstack is fine as it's obvious where we want the nofs
protection, and then push it up the call chain. That way it's easier to
review. Please send a followup patch, thanks.


Re: [PATCH v2 1/3] btrfs: remove always true if branch in find_delalloc_range

2018-12-04 Thread David Sterba
On Thu, Nov 29, 2018 at 11:33:38AM +0800, Lu Fengqi wrote:
> The @found is always false when it comes to the if branch. Besides, the
> bool type is more suitable for @found. Change the return value of the
> function and its caller to bool as well.
> 
> Signed-off-by: Lu Fengqi 

Added to misc-next, thanks.


Re: [PATCH] btrfs: skip file_extent generation check for free_space_inode in run_delalloc_nocow

2018-12-04 Thread David Sterba
On Thu, Nov 29, 2018 at 05:31:32PM +0800, Lu Fengqi wrote:
> The btrfs/001 with inode_cache mount option will encounter the following
> warning:
> 
> WARNING: CPU: 1 PID: 23700 at fs/btrfs/inode.c:956 
> cow_file_range.isra.19+0x32b/0x430 [btrfs]
> CPU: 1 PID: 23700 Comm: btrfs Kdump: loaded Tainted: GW  O  
> 4.20.0-rc4-custom+ #30
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:cow_file_range.isra.19+0x32b/0x430 [btrfs]
> Call Trace:
>  ? free_extent_buffer+0x46/0x90 [btrfs]
>  run_delalloc_nocow+0x455/0x900 [btrfs]
>  btrfs_run_delalloc_range+0x1a7/0x360 [btrfs]
>  writepage_delalloc+0xf9/0x150 [btrfs]
>  __extent_writepage+0x125/0x3e0 [btrfs]
>  extent_write_cache_pages+0x1b6/0x3e0 [btrfs]
>  ? __wake_up_common_lock+0x63/0xc0
>  extent_writepages+0x50/0x80 [btrfs]
>  do_writepages+0x41/0xd0
>  ? __filemap_fdatawrite_range+0x9e/0xf0
>  __filemap_fdatawrite_range+0xbe/0xf0
>  btrfs_fdatawrite_range+0x1b/0x50 [btrfs]
>  __btrfs_write_out_cache+0x42c/0x480 [btrfs]
>  btrfs_write_out_ino_cache+0x84/0xd0 [btrfs]
>  btrfs_save_ino_cache+0x551/0x660 [btrfs]
>  commit_fs_roots+0xc5/0x190 [btrfs]
>  btrfs_commit_transaction+0x2bf/0x8d0 [btrfs]
>  btrfs_mksubvol+0x48d/0x4d0 [btrfs]
>  btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs]
>  btrfs_ioctl_snap_create_v2+0x124/0x180 [btrfs]
>  btrfs_ioctl+0x123f/0x3030 [btrfs]
> 
> The file extent generation of the free space inode is equal to the last
> snapshot of the file root, so the inode will be passed to cow_file_rage.
> But the inode was created and its extents were preallocated in
> btrfs_save_ino_cache, there are no cow copies on disk.
> 
> The preallocated extents don't present on disk, and the
> btrfs_cross_ref_exist will ignore the -ENOENT returned by
> check_committed_ref, so we can directly write the inode to the disk.
> 
> Fixes: 78d4295b1eee ("btrfs: lift some btrfs_cross_ref_exist checks in nocow 
> path")
> Signed-off-by: Lu Fengqi 

With updated changelog and reviewed-by added to misc-next, thanks.


Re: [PATCH v2] Btrfs: fix fsync of files with multiple hard links in new directories

2018-12-04 Thread David Sterba
On Wed, Nov 28, 2018 at 02:54:28PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> The log tree has a long standing problem that when a file is fsync'ed we
> only check for new ancestors, created in the current transaction, by
> following only the hard link for which the fsync was issued. We follow the
> ancestors using the VFS' dget_parent() API. This means that if we create a
> new link for a file in a directory that is new (or in an any other new
> ancestor directory) and then fsync the file using an old hard link, we end
> up not logging the new ancestor, and on log replay that new hard link and
> ancestor do not exist. In some cases, involving renames, the file will not
> exist at all.
> 
> Example:
> 
>   mkfs.btrfs -f /dev/sdb
>   mount /dev/sdb /mnt
> 
>   mkdir /mnt/A
>   touch /mnt/foo
>   ln /mnt/foo /mnt/A/bar
>   xfs_io -c fsync /mnt/foo
> 
>   
> 
> In this example after log replay only the hard link named 'foo' exists
> and directory A does not exist, which is unexpected. In other major linux
> filesystems, such as ext4, xfs and f2fs for example, both hard links exist
> and so does directory A after mounting again the filesystem.
> 
> Checking if any new ancestors are new and need to be logged was added in
> 2009 by commit 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes"),
> however only for the ancestors of the hard link (dentry) for which the
> fsync was issued, instead of checking for all ancestors for all of the
> inode's hard links.
> 
> So fix this by tracking the id of the last transaction where a hard link
> was created for an inode and then on fsync fallback to a full transaction
> commit when an inode has more than one hard link and at least one new hard
> link was created in the current transaction. This is the simplest solution
> since this is not a common use case (adding frequently hard links for
> which there's an ancestor created in the current transaction and then
> fsync the file). In case it ever becomes a common use case, a solution
> that consists of iterating the fs/subvol btree for each hard link and
> check if any ancestor is new, could be implemented.
> 
> This solves many unexpected scenarios reported by Jayashree Mohan and
> Vijay Chidambaram, and for which there is a new test case for fstests
> under review.
> 
> Reported-by: Vijay Chidambaram 
> Reported-by: Jayashree Mohan 
> Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes")
> Signed-off-by: Filipe Manana 

Added to misc-next, thanks.


Re: [PATCH] btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable

2018-12-04 Thread David Sterba
On Tue, Dec 04, 2018 at 06:15:13PM +0800, Qu Wenruo wrote:
> Gentle ping.
> 
> Please put this patch into current release as the new block group size
> limit check introduced in v4.19 is causing at least 2 reports in mail list.

BTW, if there's an urgent fix or patch that should be considered for
current devel cycle, please note that in the subject like

  [PATCH for 4.20-rc] btrfs: ...

to make it more visible.


Re: [PATCH] btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable

2018-12-04 Thread David Sterba
On Tue, Dec 04, 2018 at 06:15:13PM +0800, Qu Wenruo wrote:
> Gentle ping.
> 
> Please put this patch into current release as the new block group size
> limit check introduced in v4.19 is causing at least 2 reports in mail list.

I see, on the way to 4.20-rc with stable tags. Thanks.


Re: btrfs development - question about crypto api integration

2018-12-04 Thread David Sterba
On Fri, Nov 30, 2018 at 06:27:58PM +0200, Nikolay Borisov wrote:
> 
> 
> On 30.11.18 г. 17:22 ч., Chris Mason wrote:
> > On 29 Nov 2018, at 12:37, Nikolay Borisov wrote:
> > 
> >> On 29.11.18 г. 18:43 ч., Jean Fobe wrote:
> >>> Hi all,
> >>> I've been studying LZ4 and other compression algorithms on the
> >>> kernel, and seen other projects such as zram and ubifs using the
> >>> crypto api. Is there a technical reason for not using the crypto api
> >>> for compression (and possibly for encryption) in btrfs?
> >>> I did not find any design/technical implementation choices in
> >>> btrfs development in the developer's FAQ on the wiki. If I completely
> >>> missed it, could someone point me in the right direction?
> >>> Lastly, if there is no technical reason for this, would it be
> >>> something interesting to have implemented?
> >>
> >> I personally think it would be better if btrfs' exploited the generic
> >> framework. And in fact when you look at zstd, btrfs does use the
> >> generic, low-level ZSTD routines but not the crypto library wrappers. 
> >> If
> >> I were I'd try and convert zstd (since it's the most recently added
> >> algorithm) to using the crypto layer to see if there are any lurking
> >> problems.
> > 
> > Back when I first added the zlib support, the zlib API was both easier 
> > to use and a better fit for our async worker threads.  That doesn't mean 
> > we shouldn't switch, it's just how we got to step one ;)
> 
> And what about zstd? WHy is it also using the low level api and not the
> crypto wrappers?

I think beacuse it just copied the way things already were.

Here's fs/ubifs/compress.c as an example use of the API in a filesystem:

ubifs_compress
  crypto_comp_compress
crypto_comp_crt -- address of, dereference
->cot_commpress -- takes another address of something, indirect
   function call
  with value 'crypto_compress'
-- this does 2 pointer dereferences and indirect call to
   coa_compress
   coa_compress = lzo_compress
 lzo_compress -- pointer dereferences for crypto context
   lzo1x_1_compress -- through static __lzo_compress (no overhead)

while in btrfs:

btrfs_compress_pages
  ->compress_pages
lzo1x_1_compress

The crypto API adds several pointer and function call indirectinos, I'm
not sure I want to get rid of the direct calls just yet.


Re: [PATCH 2/5] btrfs: Refactor btrfs_can_relocate

2018-12-04 Thread David Sterba
On Tue, Dec 04, 2018 at 08:34:15AM +0200, Nikolay Borisov wrote:
> 
> 
> On 3.12.18 г. 19:25 ч., David Sterba wrote:
> > On Sat, Nov 17, 2018 at 09:29:27AM +0800, Anand Jain wrote:
> >>> - ret = find_free_dev_extent(trans, device, min_free,
> >>> -_offset, NULL);
> >>> - if (!ret)
> >>> + if (!find_free_dev_extent(trans, device, min_free,
> >>> +_offset, NULL))
> >>
> >>   This can return -ENOMEM;
> >>
> >>> @@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct 
> >>> btrfs_fs_info *fs_info, u64 chunk_offset)
> >>>*/
> >>>   lockdep_assert_held(_info->delete_unused_bgs_mutex);
> >>>   
> >>> - ret = btrfs_can_relocate(fs_info, chunk_offset);
> >>> - if (ret)
> >>> + if (!btrfs_can_relocate(fs_info, chunk_offset))
> >>>   return -ENOSPC;
> >>
> >>   And ends up converting -ENOMEM to -ENOSPC.
> >>
> >>   Its better to propagate the accurate error.
> > 
> > Right, converting to bool is obscuring the reason why the functions
> > fail. Making the code simpler at this cost does not look like a good
> > idea to me. I'll remove the patch from misc-next for now.
> 
> The patch itself does not make the code more obscure than currently is,
> because even if ENOMEM is returned it's still converted to ENOSPC in
> btrfs_relocate_chunk.

Sorry, but this can be hardly used as an excuse to keep the code
obscure. btrfs_can_relocate & co are not very often changed so there's
cruft accumulated. The error value propagation was probably not a hot
topic in 2009, btrfs_can_relocate needs the cleanups but please let's do
that properly.


Re: [PATCH v1.1 9/9] btrfs-progs: Cleanup warning reported by -Wmissing-prototypes

2018-12-04 Thread David Sterba
On Tue, Dec 04, 2018 at 08:24:38PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/12/4 下午8:22, David Sterba wrote:
> > On Fri, Nov 16, 2018 at 04:04:51PM +0800, Qu Wenruo wrote:
> >> The following missing prototypes will be fixed:
> >> 1)  btrfs.c::handle_special_globals()
> >> 2)  check/mode-lowmem.c::repair_ternary_lowmem()
> >> 3)  extent-tree.c::btrfs_search_overlap_extent()
> >> Above 3 can be fixed by making them static
> >>
> >> 4)  utils.c::btrfs_check_nodesize()
> >> Fixed by moving it to fsfeatures.c
> >>
> >> 5)  chunk-recover.c::btrfs_recover_chunk_tree()
> >> 6)  super-recover.c::btrfs_recover_superblocks()
> >> Fixed by moving the declaration from cmds-rescue.c to rescue.h
> >>
> >> 7)  utils-lib.c::arg_strtou64()
> >> 8)  utils-lib.c::lookup_path_rootid()
> >> Fixed by include "utils.h"
> >>
> >> 9)  free-space-tree.c::set_free_space_tree_thresholds()
> >> Fixed by deleting it, as there is no user.
> >>
> >> 10) free-space-tree.c::convert_free_space_to_bitmaps()
> >> 11) free-space-tree.c::convert_free_space_to_extents()
> >> 12) free-space-tree.c::__remove_from_free_space_tree()
> >> 13) free-space-tree.c::__add_to_free_space_tree()
> >> 14) free-space-tree.c::btrfs_create_tree()
> >> Fixed by making them static.
> > 
> > Please split this to more patches grouped by the type of change.
> 
> No problem, just as the numbering is already grouped.

Note that 1-3 and 10-14 are the same group.


Re: [PATCH v1.1 9/9] btrfs-progs: Cleanup warning reported by -Wmissing-prototypes

2018-12-04 Thread David Sterba
On Fri, Nov 16, 2018 at 04:04:51PM +0800, Qu Wenruo wrote:
> The following missing prototypes will be fixed:
> 1)  btrfs.c::handle_special_globals()
> 2)  check/mode-lowmem.c::repair_ternary_lowmem()
> 3)  extent-tree.c::btrfs_search_overlap_extent()
> Above 3 can be fixed by making them static
> 
> 4)  utils.c::btrfs_check_nodesize()
> Fixed by moving it to fsfeatures.c
> 
> 5)  chunk-recover.c::btrfs_recover_chunk_tree()
> 6)  super-recover.c::btrfs_recover_superblocks()
> Fixed by moving the declaration from cmds-rescue.c to rescue.h
> 
> 7)  utils-lib.c::arg_strtou64()
> 8)  utils-lib.c::lookup_path_rootid()
> Fixed by include "utils.h"
> 
> 9)  free-space-tree.c::set_free_space_tree_thresholds()
> Fixed by deleting it, as there is no user.
> 
> 10) free-space-tree.c::convert_free_space_to_bitmaps()
> 11) free-space-tree.c::convert_free_space_to_extents()
> 12) free-space-tree.c::__remove_from_free_space_tree()
> 13) free-space-tree.c::__add_to_free_space_tree()
> 14) free-space-tree.c::btrfs_create_tree()
> Fixed by making them static.

Please split this to more patches grouped by the type of change.

> --- /dev/null
> +++ b/rescue.h
> @@ -0,0 +1,14 @@
> +/*
> + * Copyright (C) 2018 SUSE.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */

Missing ifdef to prevent multiple inclusion

> +int btrfs_recover_superblocks(const char *path, int verbose, int yes);
> +int btrfs_recover_chunk_tree(const char *path, int verbose, int yes);


Re: [PATCH 7/9] btrfs-progs: Fix Wmaybe-uninitialized warning

2018-12-04 Thread David Sterba
On Fri, Nov 16, 2018 at 03:54:24PM +0800, Qu Wenruo wrote:
> The only location is the following code:
> 
>   int level = path->lowest_level + 1;
>   BUG_ON(path->lowest_level + 1 >= BTRFS_MAX_LEVEL);
>   while(level < BTRFS_MAX_LEVEL) {
>   slot = path->slots[level] + 1;
>   ...
>   }
>   path->slots[level] = slot;
> 
> Again, it's the stupid compiler needs some hint for the fact that
> we will always enter the while loop for at least once, thus @slot should
> always be initialized.

Harsh words for the compiler, and I say not deserved. The same code
pasted to kernel a built with the same version does not report the
warning, so it's apparently a missing annotation of BUG_ON in
btrfs-progs that does not give the right hint.


Re: [PATCH v3 1/2] btrfs: scrub: fix circular locking dependency warning

2018-12-04 Thread David Sterba
On Fri, Nov 30, 2018 at 01:15:23PM +0800, Anand Jain wrote:
> @@ -3757,10 +3757,13 @@ static noinline_for_stack int 
> scrub_workers_get(struct btrfs_fs_info *fs_info,
>  
>  static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info 
> *fs_info)
>  {
> + lockdep_assert_held(_info->scrub_lock);
>   if (--fs_info->scrub_workers_refcnt == 0) {
> + mutex_unlock(_info->scrub_lock);
>   btrfs_destroy_workqueue(fs_info->scrub_workers);
>   btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
>   btrfs_destroy_workqueue(fs_info->scrub_parity_workers);
> + mutex_lock(_info->scrub_lock);
>   }
>   WARN_ON(fs_info->scrub_workers_refcnt < 0);
>  }

btrfs/011 lockdep warning is gone, but now there's a list corruption
reported by btrfs/073. I'm testing the 2 patches on top of current
master to avoid interference with misc-next patches.

btrfs/073   [11:07:19]
[ 3580.466293] run fstests btrfs/073at 2018-12-04 11:07:19
[ 3580.610367] BTRFS info (device vda): disk space caching is enabled
[ 3580.612809] BTRFS info (device vda): has skinny extents
[ 3580.876639] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 1 
transid 5 /dev/vdb
[ 3580.880569] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 2 
transid 5 /dev/vdc
[ 3580.882947] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 3 
transid 5 /dev/vdd
[ 3580.885499] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 4 
transid 5 /dev/vde
[ 3580.887972] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 5 
transid 5 /dev/vdf
[ 3580.890394] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 6 
transid 5 /dev/vdg
[ 3580.893729] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 7 
transid 5 /dev/vdh
[ 3580.903180] BTRFS info (device vdb): disk space caching is enabled
[ 3580.904538] BTRFS info (device vdb): has skinny extents
[ 3580.905322] BTRFS info (device vdb): flagging fs with big metadatafeature
[ 3580.908555] BTRFS info (device vdb): checking UUID tree
[ 3580.951440] [ cut here ]
[ 3580.954122] list_add corruption. prev->next should be 
next(a189faa9acc8), but was a189faa9adc0. (prev=a189faa9a480).
[ 3580.960061] WARNING: CPU: 0 PID: 24578 at 
lib/list_debug.c:28__list_add_valid+0x4d/0x70
[ 3580.962346] BTRFS info (device vdb): use no compression, level 0
[ 3580.963694] Modules linked in: dm_flakey dm_mod btrfs libcrc32c 
xorzstd_decompress zstd_compress xxhash raid6_pq loop
[ 3580.963702] CPU: 0 PID: 24578 Comm: btrfs Not tainted4.20.0-rc5-default+ #366
[ 3580.963703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 
rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 3580.963706] RIP: 0010:__list_add_valid+0x4d/0x70
[ 3580.963708] RSP: 0018:b72d88817b90 EFLAGS: 00010286
[ 3580.963709] RAX:  RBX: b72d88817c10 RCX:
[ 3580.963710] RDX: 0002 RSI: 0001 RDI:860c4c1d
[ 3580.963710] RBP: a189faa9acc8 R08: 0001 R09:
[ 3580.963714] R10:  R11: 882a0a2d R12:a189faa9ac70
[ 3580.966020] BTRFS info (device vdb): disk space caching is enabled
[ 3580.967721] R13: a189faa9a480 R14: a189faa9ac70 R15:a189faa9ac78
[ 3580.967724] FS:  7f04289d1700() 
GS:a189fd40()knlGS:
[ 3580.967725] CS:  0010 DS:  ES:  CR0: 80050033
[ 3580.967725] CR2: 7f7b58b034ac CR3: 699d9000 CR4:06f0
[ 3580.967729] Call Trace:
[ 3580.967736]  __mutex_add_waiter+0x34/0x70
[ 3580.967743]  ? drain_workqueue+0x1e/0x180
[ 3580.994465]  __mutex_lock+0x134/0x9d0
[ 3580.995526]  ? __schedule+0x2eb/0xb20
[ 3580.996584]  ? drain_workqueue+0x1e/0x180
[ 3580.997727]  drain_workqueue+0x1e/0x180
[ 3580.998793]  destroy_workqueue+0x17/0x240
[ 3580.999879]  btrfs_destroy_workqueue+0x57/0x200 [btrfs]
[ 3581.001148]  scrub_workers_put+0x6c/0x90 [btrfs]
[ 3581.002257]  btrfs_scrub_dev+0x2f6/0x590 [btrfs]
[ 3581.003370]  ? __sb_start_write+0x12c/0x1d0
[ 3581.004450]  ? mnt_want_write_file+0x24/0x60
[ 3581.005613]  btrfs_ioctl+0xfc7/0x2f00 [btrfs]
[ 3581.006361]  ? get_task_io_context+0x2d/0x90
[ 3581.007025]  ? do_vfs_ioctl+0xa2/0x6d0
[ 3581.007699]  do_vfs_ioctl+0xa2/0x6d0
[ 3581.008603]  ? __fget+0x109/0x1e0
[ 3581.009413]  ksys_ioctl+0x3a/0x70
[ 3581.010326]  __x64_sys_ioctl+0x16/0x20
[ 3581.011234]  do_syscall_64+0x54/0x180
[ 3581.012143]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3581.013213] RIP: 0033:0x7f042bac9aa7
[ 3581.014133] Code: 00 00 90 48 8b 05 f1 83 2c 00 64 c7 00 26 00 00 0048 c7 c0 
ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f05 <48> 3d 01 f0 
ff ff 73 01 c3 48 8b 0d c1 83 2c 00 f7 d8 64 89 01 48
[ 3581.018486] RSP: 002b:7f04289d0d38 EFLAGS: 0246 
ORIG_RAX:0010
[ 3581.019703] RAX: ffda RBX: 55c05475de10 

Re: [PATCH v1.1 3/9] btrfs-progs: Makefile.extrawarn: Don't warn on sign compare

2018-12-04 Thread David Sterba
On Fri, Nov 16, 2018 at 04:00:40PM +0800, Qu Wenruo wrote:
> Under most case, we are just using 'int' for 'unsigned int', and doesn't
> care about the sign.
> 
> The Wsign-compare is causing tons of false alerts.
> Suppressing it would make W=1 less noisy.
> 
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v1.1:
>Use cc-disable-warning to provide much better compatibility against
>older compiler
> ---
>  Makefile.extrawarn | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Makefile.extrawarn b/Makefile.extrawarn
> index bbb2d5173846..736bee82759f 100644
> --- a/Makefile.extrawarn
> +++ b/Makefile.extrawarn
> @@ -54,6 +54,7 @@ warning-1 += $(call cc-option, -Wmissing-include-dirs)
>  warning-1 += $(call cc-option, -Wunused-but-set-variable)
>  warning-1 += $(call cc-disable-warning, missing-field-initializers)
>  warning-1 += $(call cc-disable-warning, format-truncation)
> +warning-1 += $(call cc-disable-warning, sign-compare)

Please enable it for W=3


Re: [PATCH 2/9] btrfs-progs: fix gcc8 default build warning caused by '-Wformat-truncation'

2018-12-04 Thread David Sterba
On Fri, Nov 16, 2018 at 03:54:19PM +0800, Qu Wenruo wrote:
> From: Su Yanjun 
> 
> When using gcc8 compiles utils.c, it complains as below:
> 
> utils.c:852:45: warning: '%s' directive output may be truncated writing
> up to 4095 bytes into a region of size 4084 [-Wformat-truncation=]
>snprintf(path, sizeof(path), "/dev/mapper/%s", name);
>  ^~   
> In file included from /usr/include/stdio.h:873,
>  from utils.c:20:
> /usr/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk'
> output between 13 and 4108 bytes into a destination of size 4096
>return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
>   ^~~~
> __bos (__s), __fmt, __va_arg_pack ());
> ~
> 
> This isn't a type of warning we care about, particularly when PATH_MAX
> is much less than either.
> 
> Using the GCC option -Wno-format-truncation to disable this for default
> build.
> 
> Signed-off-by: Su Yanjun 
> [Use cc-disable-warning to fix the not working CFLAGS setting in configure.ac]
> Signed-off-by: Qu Wenruo 
> ---
>  Makefile   | 1 +
>  Makefile.extrawarn | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index f4ab14ea74c8..252299f8869f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -49,6 +49,7 @@ CSCOPE_CMD := cscope -u -b -c -q
>  include Makefile.extrawarn
>  
>  EXTRA_CFLAGS :=
> +EXTRA_CFLAGS += $(call cc-disable-warning, format-truncation)

Please don't touch EXTRA_CFLAGS, this is for users who want to override
any defaults that are set by build. This should go to CFLAGS.

>  EXTRA_LDFLAGS :=
>  
>  DEBUG_CFLAGS_DEFAULT = -O0 -U_FORTIFY_SOURCE -ggdb3
> diff --git a/Makefile.extrawarn b/Makefile.extrawarn
> index 5849036fd166..bbb2d5173846 100644
> --- a/Makefile.extrawarn
> +++ b/Makefile.extrawarn
> @@ -53,6 +53,7 @@ warning-1 += -Wold-style-definition
>  warning-1 += $(call cc-option, -Wmissing-include-dirs)
>  warning-1 += $(call cc-option, -Wunused-but-set-variable)
>  warning-1 += $(call cc-disable-warning, missing-field-initializers)
> +warning-1 += $(call cc-disable-warning, format-truncation)

It's ok to disable for W=1 but eg. the W=3 level could take all
previously disabled warnings.

>  
>  warning-2 := -Waggregate-return
>  warning-2 += -Wcast-align
> -- 
> 2.19.1


Re: [PATCH 1/9] btrfs-progs: Makefile.extrawarn: Import cc-disable-warning

2018-12-04 Thread David Sterba
On Fri, Nov 16, 2018 at 03:54:18PM +0800, Qu Wenruo wrote:
> We imported cc-option but forgot to import cc-disable-warning.
> 
> Fixes: b556a992c3ad ("btrfs-progs: build: allow to build with various 
> compiler warnings")
> Signed-off-by: Qu Wenruo 
> ---
>  Makefile.extrawarn | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Makefile.extrawarn b/Makefile.extrawarn
> index 1f4bda94a167..5849036fd166 100644
> --- a/Makefile.extrawarn
> +++ b/Makefile.extrawarn
> @@ -19,6 +19,12 @@ try-run = $(shell set -e;   \
>   cc-option = $(call try-run,\
>   $(CC) $(CFLAGS) $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2))
>  
> +# cc-disable-warning
> +# Usage: cflags-y += $(call cc-disable-warning,unused-but-set-variable)
> +cc-disable-warning = $(call try-run,\
> + $(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -W$(strip $(1)) -c 
> -x c /dev/null -o "$$TMP",-Wno-$(strip $(1)))

btrfs-progs don't use KBUILD_CPPFLAGS nor CC_OPTION_CFLAGS so this needs
to be properly ported from kernel.


Re: [PATCH v2 0/5] btrfs-progs: image: Fix error when restoring multi-disk image to single disk

2018-12-04 Thread David Sterba
On Tue, Nov 27, 2018 at 04:38:23PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/image_recover
> 
> The base commit is as usual, the latest stable tag, v4.19.
> 
> 
> Test case misc/021 will fail if using latest upstream kernel.
> 
> This is due to the enhanced code in kernel to check block group <->
> chunk <-> dev extent mapping.
> 
> This means from the very beginning, btrfs-image can't really restore a
> multi-disk image to single-disk one correctly.
> 
> The problem is, although we modified chunk item, we didn't modify block
> group item's flags or dev extents.
> 
> This patch will reset block group flags, then rebuild the whole
> dev extents by removing existing ones first, then re-add the correct
> dev extents calculated from chunk map.
> 
> Now it could pass all misc tests and fsck tests.
> 
> Changelog:
> v2:
> - Parameter list cleanup
>   * Use trans->fs_info to remove fs_info parameter
>   * Remove trans parameter for function who doesn't need
> - Merge dev extents removal code with rebuild code
> - Refactor btrfs_alloc_dev_extent() into 2 functions
>   * btrfs_insert_dev_extent() for convert and dev extent rebuild
>   * btrfs_alloc_dev_extent() for old use case
>   
> Qu Wenruo (5):
>   btrfs-progs: image: Refactor fixup_devices() to
> fixup_chunks_and_devices()
>   btrfs-progs: image: Fix block group item flags when restoring
> multi-device image to single device
>   btrfs-progs: volumes: Refactor btrfs_alloc_dev_extent() into two
> functions
>   btrfs-progs: image: Remove all existing dev extents for later rebuild
>   btrfs-progs: misc-tests/021: Do extra btrfs check before mounting

Merged to devel, thanks. I've switched strerror(-ret) to the errno
pattern.


Re: [PATCH v2 1/5] btrfs-progs: image: Refactor fixup_devices() to fixup_chunks_and_devices()

2018-12-04 Thread David Sterba
On Tue, Nov 27, 2018 at 04:38:24PM +0800, Qu Wenruo wrote:
> +error:
> + error(
> +"failed to fix chunks and devices mapping, the fs may not be mountable: %s",
> + strerror(-ret));

Recently the sterror(error code) have been converted to %m, so I changed
this to

errno = -ret
error("... %m");


Re: [PATCH v2 1/5] btrfs-progs: image: Refactor fixup_devices() to fixup_chunks_and_devices()

2018-12-04 Thread David Sterba
On Tue, Nov 27, 2018 at 04:50:57PM +0800, Qu Wenruo wrote:
> >> -static int fixup_devices(struct btrfs_fs_info *fs_info,
> >> -   struct mdrestore_struct *mdres, off_t dev_size)
> >> +static int fixup_device_size(struct btrfs_trans_handle *trans,
> >> +   struct mdrestore_struct *mdres,
> >> +   off_t dev_size)
> >>  {
> >> -  struct btrfs_trans_handle *trans;
> >> +  struct btrfs_fs_info *fs_info = trans->fs_info;
> >>struct btrfs_dev_item *dev_item;
> >>struct btrfs_path path;
> >> -  struct extent_buffer *leaf;
> >>struct btrfs_root *root = fs_info->chunk_root;
> >>struct btrfs_key key;
> >> +  struct extent_buffer *leaf;
> > 
> > nit: Unnecessary change
> 
> Doesn't it look better when all btrfs_ prefix get batched together? :)

Please don't do unrelated changes like that.


Re: [PATCH] btrfs-progs: trivial fix about line breaker in repair_inode_nbytes_lowmem()

2018-12-03 Thread David Sterba
On Sun, Dec 02, 2018 at 03:08:36PM +, damenly...@gmail.com wrote:
> From: Su Yue 
> 
> Move "\n" at end of the sentence to print.
> 
> Fixes: 281eec7a9ddf ("btrfs-progs: check: repair inode nbytes in lowmem mode")
> Signed-off-by: Su Yue 

Applied, thanks.


Re: [PATCH] btrfs-progs: fsck-tests: Move reloc tree images to 020-extent-ref-cases

2018-12-03 Thread David Sterba
On Mon, Dec 03, 2018 at 12:39:57PM +0800, Qu Wenruo wrote:
> For reloc tree, despite of its short lifespan, it's still the backref,
> where reloc tree root backref points back to itself, makes it special.
> 
> So it's more approriate to put them into 020-extent-ref-cases.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.


Re: [PATCH 2/5] btrfs: Refactor btrfs_can_relocate

2018-12-03 Thread David Sterba
On Sat, Nov 17, 2018 at 09:29:27AM +0800, Anand Jain wrote:
> > -   ret = find_free_dev_extent(trans, device, min_free,
> > -  _offset, NULL);
> > -   if (!ret)
> > +   if (!find_free_dev_extent(trans, device, min_free,
> > +  _offset, NULL))
> 
>   This can return -ENOMEM;
> 
> > @@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info 
> > *fs_info, u64 chunk_offset)
> >  */
> > lockdep_assert_held(_info->delete_unused_bgs_mutex);
> >   
> > -   ret = btrfs_can_relocate(fs_info, chunk_offset);
> > -   if (ret)
> > +   if (!btrfs_can_relocate(fs_info, chunk_offset))
> > return -ENOSPC;
> 
>   And ends up converting -ENOMEM to -ENOSPC.
> 
>   Its better to propagate the accurate error.

Right, converting to bool is obscuring the reason why the functions
fail. Making the code simpler at this cost does not look like a good
idea to me. I'll remove the patch from misc-next for now.


Re: [PATCH 0/4] Replace custom device-replace locking with rwsem

2018-12-03 Thread David Sterba
On Tue, Nov 20, 2018 at 01:50:54PM +0100, David Sterba wrote:
> The first cleanup part went to 4.19, the actual switch from the custom
> locking to rswem was postponed as I found performance degradation. This
> turned out to be related to VM cache settings, so I'm resending the
> series again.
> 
> The custom locking is based on rwlock protected reader/writer counters,
> waitqueues, which essentially is what the readwrite semaphore does.
> 
> Previous patchset:
> https://lore.kernel.org/linux-btrfs/cover.1536331604.git.dste...@suse.com/
> 
> Patches correspond to 8/11-11/11 and there's no change besides
> refreshing on top of current misc-next.
> 
> David Sterba (4):
>   btrfs: reada: reorder dev-replace locks before radix tree preload
>   btrfs: dev-replace: swich locking to rw semaphore
>   btrfs: dev-replace: remove custom read/write blocking scheme
>   btrfs: dev-replace: open code trivial locking helpers

This has been sitting in for-next for some time, no problems reported.
If anybody wants to add a review tag, please let me know. I'm going to
add the patchset to misc-next soon.


Btrfs progs pre-release 4.19.1-rc1

2018-12-03 Thread David Sterba
Hi,

this is a pre-release of btrfs-progs, 4.19.1-rc1. There are build fixes, minor
update to libbtrfsutil and documentation updates.

The 4.19.1 release is scheduled to this Wednesday, +2 days (2018-12-05).

Changelog:
  * build fixes
* big-endian builds fail due to bswap helpers clash
* 'swap' macro is too generic, renamed to prevent build failures
  * libbtrfs
* monor version update to 1.1.0
* fix default search to top=0 as documented
* rename 'async' to avoid future python binding problems
* add support for unprivileged subvolume listing ioctls
* added tests, API docs
  * other
* lot of typos fixed
* warning cleanups
* doc formatting updates
* CI tests against zstd 1.3.7

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

David Sterba (7):
  btrfs-progs: kerncompat: rename swap to __swap
  btrfs-progs: README: add link to INSTALL
  btrfs-progs: docs: fix rendering of exponents in manual pages
  btrfs-progs: link to libbtrfsutil/README from the main README
  btrfs-progs: tests: pull zstd version 1.3.7 to the travis CI
  btrfs-progs: update CHANGES for v4.19.1
  Btrfs progs v4.19.1-rc1

Josh Soref (11):
  btrfs-progs: docs: fix typos in Documentation
  btrfs-progs: docs: fix typos in READMEs, INSTALL and CHANGES
  btrfs-progs: fix typos in Makefile
  btrfs-progs: tests: fix typos in test comments
  btrfs-progs: tests: fsck/025, fix typo in helpre name
  btrfs-progs: fix typos in comments
  btrfs-progs: fix typos in user-visible strings
  btrfs-progs: check: fix typo in device_extent_record::chunk_objectid
  btrfs-progs: utils: fix typo in a variable
  btrfs-progs: mkfs: fix typo in "multipler" variables
  btrfs-progs: fix typo in btrfs-list function export

Omar Sandoval (10):
  libbtrfsutil: use top=0 as default for SubvolumeIterator()
  libbtrfsutil: change async parameters to async_ in Python bindings
  libbtrfsutil: document qgroup_inherit parameter in Python bindings
  libbtrfsutil: use SubvolumeIterator as context manager in tests
  libbtrfsutil: add test helpers for dropping privileges
  libbtrfsutil: allow tests to create multiple Btrfs instances
  libbtrfsutil: relax the privileges of subvolume_info()
  libbtrfsutil: relax the privileges of subvolume iterator
  libbtrfsutil: bump version to 1.1.0
  libbtrfsutil: document API in README

Rosen Penev (3):
  btrfs-progs: kernel-lib: bitops: Fix big endian compilation
  btrfs-progs: task-utils: Fix comparison between pointer and integer
  btrfs-progs: treewide: Fix missing declarations



Re: [PATCH] btrfs: Refactor btrfs_merge_bio_hook

2018-11-29 Thread David Sterba
On Wed, Nov 28, 2018 at 06:51:59PM +0200, Nikolay Borisov wrote:
> > Got me curious if we could get rid of the size parameter, it's 2x
> > PAGE_SIZE and could be something else in one case but it's not obvious
> > if it really happens.
> > 
> > Another thing I noticed is lack of proper error handling in all callers,
> > as its' 0, 1, and negative errno. The error would be interpreted as true
> > ie. add page to bio and continue.
> 
> Actually anything other than 0 is returned then the current bio is
> actually submitted (I presume you refer to the code in compression.c).
> As a matter of fact I think btrfs_bio_fits_in_stripe could even be
> turned to return a bool value.
> 
> THe only time this function could return an error is if the mapping
> logic goes haywire which could happen 'if (offset < stripe_offset) {' or
> we don't find a chunk for the given offset, which is unlikely.

Unlikely yes, but if it's possible to trigger the mapping failure eg. by
a crafted image, it should be handled. Besides the mapping errors, there
are EIO or ENOMEM in __btrfs_map_block or its callees. I see all other
callers of map block handle the errors.


Re: [PATCH] Btrfs: bring back key search optimization to btrfs_search_old_slot()

2018-11-29 Thread David Sterba
On Mon, Nov 26, 2018 at 04:53:11PM +, Filipe Manana wrote:
> On Fri, Nov 16, 2018 at 11:09 AM  wrote:
> >
> > From: Filipe Manana 
> >
> > Commit d7396f07358a ("Btrfs: optimize key searches in btrfs_search_slot"),
> > dated from August 2013, introduced an optimization to search for keys in a
> > node/leaf to both btrfs_search_slot() and btrfs_search_old_slot(). For the
> > later, it ended up being reverted in commit d4b4087c43cc ("Btrfs: do a
> > full search everytime in btrfs_search_old_slot"), from September 2013,
> > because the content of extent buffers were often inconsistent during
> > replay. It turned out that the reason why they were often inconsistent was
> > because the extent buffer replay stopped being done atomically, and got
> > broken after commit c8cc63416537 ("Btrfs: stop using GFP_ATOMIC for the
> > tree mod log allocations"), introduced in July 2013. The extent buffer
> > replay issue was then found and fixed by commit 5de865eebb83 ("Btrfs: fix
> > tree mod logging"), dated from December 2013.
> >
> > So bring back the optimization to btrfs_search_old_slot() as skipping it
> > and its comment about disabling it confusing. After all, if unwinding
> > extent buffers resulted in some inconsistency, the normal searches (binary
> > searches) would also not always work.
> >
> > Signed-off-by: Filipe Manana 
> 
> David, please remove this change from the integration branch.
> 
> It turns out after 3 weeks of stress tests it finally triggered an
> assertion failure (hard to hit) and
> it's indeed not reliable to use the search optimization because of how
> the mod log tree currently works.
> The idea was just to not make it different from btrfs_search_slot().
> Use of the mod log tree is limited
> to some cases where occasional faster search wouldn't bring much benefits.

Understood, thanks. Patch removed from misc-next.


Re: [PATCH 2/2] btrfs: scrub: fix circular locking dependency warning

2018-11-29 Thread David Sterba
On Wed, Nov 28, 2018 at 04:47:27PM +0800, Anand Jain wrote:
> > 2. scrub_workers_refcnt must eventually be converted to refcount_t type
> 
>   ok. Added in v2 patch set.

No such thing is in v2 and this would actually get rid of the need to
hold scrub_lock in scrub_workers_put. Which in turn can be moved out of
the locked section in btrfs_scrub_dev and the warning is gone. Problem
solved.


Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput

2018-11-28 Thread David Sterba
On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote:
> On Tue, Nov 27, 2018 at 07:59:42PM +, Chris Mason wrote:
> > On 27 Nov 2018, at 14:54, Josef Bacik wrote:
> > 
> > > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
> > >>
> > >>
> > >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> > >>> The cleaner thread usually takes care of delayed iputs, with the
> > >>> exception of the btrfs_end_transaction_throttle path.  The cleaner
> > >>> thread only gets woken up every 30 seconds, so instead wake it up to 
> > >>> do
> > >>> it's work so that we can free up that space as quickly as possible.
> > >>
> > >> Have you done any measurements how this affects the overall system. I
> > >> suspect this introduces a lot of noise since now we are going to be
> > >> doing a thread wakeup on every iput, does this give a chance to have
> > >> nice, large batches of iputs that  the cost of wake up can be 
> > >> amortized
> > >> across?
> > >
> > > I ran the whole patchset with our A/B testing stuff and the patchset 
> > > was a 5%
> > > win overall, so I'm inclined to think it's fine.  Thanks,
> > 
> > It's a good point though, a delayed wakeup may be less overhead.
> 
> Sure, but how do we go about that without it sometimes messing up?  In 
> practice
> the only time we're doing this is at the end of finish_ordered_io, so likely 
> to
> not be a constant stream.  I suppose since we have places where we force it to
> run that we don't really need this.  IDK, I'm fine with dropping it.  Thanks,

The transaction thread wakes up cleaner periodically (commit interval,
30s by default), so the time to process iputs is not unbounded.

I have the same concerns as Nikolay, coupling the wakeup to all delayed
iputs could result in smaller batches. But some of the callers of
btrfs_add_delayed_iput might benefit from the extra wakeup, like
btrfs_remove_block_group, so I don't want to leave the idea yet.


Re: [PATCH] btrfs: Remove unnecessary code from __btrfs_rebalance

2018-11-28 Thread David Sterba
On Thu, Nov 22, 2018 at 10:17:33AM +0200, Nikolay Borisov wrote:
> The first step fo the rebalance process, ensuring there is 1mb free on
> each device. This number seems rather small. And in fact when talking
> to the original authors their opinions were:
> 
> "man that's a little bonkers"
> "i don't think we even need that code anymore"
> "I think it was there to make sure we had room for the blank 1M at the
> beginning. I bet it goes all the way back to v0"
> "we just don't need any of that tho, i say we just delete it"
>
> Clearly, this piece of code has lost its original intent throughout
> the years. It doesn't really bring any real practical benefits to the
> relocation process.

That I would agree, 

> No functional changes.

though this sounds too bold given what code it removes. Shrink, grow in
a transaction, all of that can fail for various reasons and would have
visible effects. I'd rather reserve the 'no functional changes'
statement for cases where it's really just renaming or restructuring the
code without any side effects.

> - ret = btrfs_shrink_device(device, old_size - size_to_free);
> - if (ret == -ENOSPC)
> - break;

Shrink can be pretty heavy even before it fails with ENOSPC, I think the
runtime of balance will go down. And it could be even noticeable, as it
has to go through all chunks twice before it really fails.

Measuring that would be good as it's a thing that can be a hilight of
pull request/change overviews.

I'll add the patch to for-next for testing, please update te changelog
with the potential effects. Thanks.


Re: [PATCH v2] btrfs: improve error handling of btrfs_add_link()

2018-11-28 Thread David Sterba
On Tue, Nov 27, 2018 at 09:56:43AM +0100, Johannes Thumshirn wrote:
> On 26/11/2018 19:37, David Sterba wrote:
> > I though a transaction abort would be here, as the state is not
> > consistent. Also I'm not sure what I as a user would get from such error
> > message after calling link(). If the error handling in the error
> > handling fails, there's not much left to do and the abort either
> > happened earlier in the callees or is necessary here.
> 
> OK, now you've lost me. Isn't that what my previous patch did?

Yes (call abort in fail_dir_item:) but it also merged all the abort
calls -- this is the part that I wanted to be updated.


Re: [PATCH] Fix typos

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 05:17:09PM +0100, Andrea Gelmini wrote:
> On Wed, Nov 28, 2018 at 04:56:20PM +0100, David Sterba wrote:
> > Thanks, such patches are accepted once in a while when the amount of new
> > typo fixes becomes noticeable.
> 
> About this patch (and the others sent to this m/l), I don't
> care to have them committed with my name.

You've put your time to find and fix the typos so you deserve the
credit. Because you sent them with the signed-off-by line I will use
that and maybe write a short changelog but otherwise I don't see a
reason why you should not be mentioned.

> Feel free to copy/integrate/commit with your/someone else name.

Unless you really want to stay out of the linux git history, I'm somehow
obliged by the development process to stick with the patch author name.

> It's just - probably - the best way to let the real devs to know
> about the typos, but it's not valuable effort that must be
> recognized.

The typos can be annoying and distracting while reading code or prevent
some function names to be greppable (which is usually the reason to fix
them). My first patch to linux was also fixing typos, and see where I am
now :)


Re: [PATCH] btrfs: Refactor btrfs_merge_bio_hook

2018-11-28 Thread David Sterba
On Tue, Nov 27, 2018 at 08:57:58PM +0200, Nikolay Borisov wrote:
> This function really checks whether adding more data to the bio will
> straddle a stripe/chunk. So first let's give it a more appropraite
> name - btrfs_bio_fits_in_stripe. Secondly, the offset parameter was
> never used to just remove it. Thirdly, pages are submitted to either
> btree or data inodes so it's guaranteed that tree->ops is set so replace
> the check with an ASSERT. Finally, document the parameters of the
> function. No functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 

> - submit = btrfs_merge_bio_hook(page, 0, PAGE_SIZE, bio, 
> 0);
> + submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio,
> +   0);

> - submit = btrfs_merge_bio_hook(page, 0, PAGE_SIZE,
> - comp_bio, 0);
> + submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE,
> +   comp_bio, 0);


> - if (tree->ops && btrfs_merge_bio_hook(page, offset, page_size,
> -   bio, bio_flags))
> + ASSERT(tree->ops);
> + if (btrfs_bio_fits_in_stripe(page, page_size, bio, bio_flags))
>   can_merge = false;

Got me curious if we could get rid of the size parameter, it's 2x
PAGE_SIZE and could be something else in one case but it's not obvious
if it really happens.

Another thing I noticed is lack of proper error handling in all callers,
as its' 0, 1, and negative errno. The error would be interpreted as true
ie. add page to bio and continue.


Re: [PATCH 2/3] btrfs: cleanup the useless DEFINE_WAIT in cleanup_transaction

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 11:22:54AM +0800, Lu Fengqi wrote:
> When it is introduced at commit f094ac32aba3 ("Btrfs: fix NULL pointer
> after aborting a transaction"), it's useless.
> 
> Signed-off-by: Lu Fengqi 

Added to misc-next, thanks.


Re: [PATCH 3/3] btrfs: remove redundant nowait check for buffered_write

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 09:04:31AM +0200, Nikolay Borisov wrote:
> On 28.11.18 г. 5:23 ч., Lu Fengqi wrote:
> > The generic_write_checks will check the combination of IOCB_NOWAIT and
> > !IOCB_DIRECT.
> 
> True, however btrfs will return ENOSUPP whereas the generic code returns
> EINVAL. I guess this is not a big deal and it's likely generic code is
> correct, so:

I tried to check the git history if there are some clues for that. The
duplicate checks could be a intermediate step before the aio/nowait
feature was complete but somehow not removed at the end.

The btrfs part was Added in commit
91f9943e1c7b6638f27312d03fe71fcc67b23571 "fs: support RWF_NOWAIT for
buffered reads" (4.13)

The generic checks were added in
6be96d3ad34a124450028dabba43f07fe1d0c86d (4.12), so from that it seems
that it was added tot btrfs as redundant already.


Re: [PATCH] Fix typos

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 12:05:13PM +0100, Andrea Gelmini wrote:
> Signed-off-by: Andrea Gelmini 
> ---
> 
> Stupid fixes. Made on 4.20-rc4, and ported on linux-next (next-20181128).

Thanks, such patches are accepted once in a while when the amount of new
typo fixes becomes noticeable.

>  fs/btrfs/backref.c |  4 ++--
>  fs/btrfs/check-integrity.c |  2 +-
>  fs/btrfs/compression.c |  4 ++--
>  fs/btrfs/ctree.c   |  4 ++--
>  fs/btrfs/dev-replace.c |  2 +-
>  fs/btrfs/disk-io.c |  4 ++--
>  fs/btrfs/extent-tree.c | 28 ++--
>  fs/btrfs/extent_io.c   |  4 ++--
>  fs/btrfs/extent_io.h   |  2 +-
>  fs/btrfs/extent_map.c  |  2 +-
>  fs/btrfs/file.c|  6 +++---
>  fs/btrfs/free-space-tree.c |  2 +-
>  fs/btrfs/inode.c   | 10 +-
>  fs/btrfs/ioctl.c   |  4 ++--
>  fs/btrfs/lzo.c |  2 +-
>  fs/btrfs/qgroup.c  | 14 +++---
>  fs/btrfs/qgroup.h  |  4 ++--
>  fs/btrfs/raid56.c  |  2 +-
>  fs/btrfs/ref-verify.c  |  6 +++---
>  fs/btrfs/relocation.c  |  2 +-
>  fs/btrfs/scrub.c   |  2 +-
>  fs/btrfs/send.c|  4 ++--
>  fs/btrfs/super.c   |  8 
>  fs/btrfs/transaction.c |  4 ++--
>  fs/btrfs/tree-checker.c|  6 +++---
>  fs/btrfs/tree-log.c|  4 ++--
>  fs/btrfs/volumes.c |  8 
>  27 files changed, 72 insertions(+), 72 deletions(-)

Approves.

> @@ -1010,14 +1010,14 @@ int btrfs_lookup_extent_info(struct 
> btrfs_trans_handle *trans,
>   *
>   * - multiple snapshots, subvolumes, or different generations in one subvol
>   * - different files inside a single subvolume
> - * - different offsets inside a file (bookend extents in file.c)
> + * - different offsets inside a file (booked extents in file.c)
>   *
>   * The extent ref structure for the implicit back refs has fields for:
>   *
>   * - Objectid of the subvolume root
>   * - objectid of the file holding the reference
>   * - original offset in the file
> - * - how many bookend extents
> + * - how many booked extents

Please note that 'bookend' is not a typo here. The meaning used for
btrfs is close to 2a or 2b in https://www.merriam-webster.com/dictionary/bookend

and refers to the mechanism how blocks are shared by extents and
lengthier explanation would show why snapshots are so cheap (I really
tried to write a tl;dr version, but nope).


Re: [PATCH v2 0/3] Misc cosmetic changes for map_private_extent_buffer

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 09:54:53AM +0100, Johannes Thumshirn wrote:
> While trying to understand the checksum code I came across some oddities
> regarding map_private_extent_buffer() and it's interaction with
> csum_tree_block().
> 
> These patches address them but are either purely cosmetic or only add a
> comment documenting behaviour.
> 
> Changes to v1:
> * Add Reviewed-by tags
> * Change wording of commit message in patch 3/3
> 
> Johannes Thumshirn (3):
>   btrfs: don't initialize 'offset' in map_private_extent_buffer()
>   btrfs: use offset_in_page for start_offset in
> map_private_extent_buffer()
>   btrfs: document extent mapping assumptions in checksum

1 and 3 applied, thanks.


Re: [PATCH v2 2/3] btrfs: use offset_in_page for start_offset in map_private_extent_buffer()

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 09:54:55AM +0100, Johannes Thumshirn wrote:
> In map_private_extent_buffer() use offset_in_page() to initialize
> 'start_offset' instead of open-coding it.

Can you please fix all instances where it's opencoded? Grepping for
'PAGE_SIZE - 1' finds a number of them. Thanks.


Re: [PATCH 1/9] btrfs: switch BTRFS_FS_STATE_* to enums

2018-11-28 Thread David Sterba
On Tue, Nov 27, 2018 at 04:24:03PM -0800, Omar Sandoval wrote:
> On Tue, Nov 27, 2018 at 08:53:41PM +0100, David Sterba wrote:
> > We can use simple enum for values that are not part of on-disk format:
> > global filesystem states.
> 
> Reviewed-by: Omar Sandoval 
> 
> Some typos/wording suggestions below.
> 
> > Signed-off-by: David Sterba 
> > ---
> >  fs/btrfs/ctree.h | 25 +++--
> >  1 file changed, 19 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index a98507fa9192..f82ec5e41b0c 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -109,13 +109,26 @@ static inline unsigned long btrfs_chunk_item_size(int 
> > num_stripes)
> >  }
> >  
> >  /*
> > - * File system states
> > + * Runtime (in-memory) states of filesystem
> >   */
> > -#define BTRFS_FS_STATE_ERROR   0
> > -#define BTRFS_FS_STATE_REMOUNTING  1
> > -#define BTRFS_FS_STATE_TRANS_ABORTED   2
> > -#define BTRFS_FS_STATE_DEV_REPLACING   3
> > -#define BTRFS_FS_STATE_DUMMY_FS_INFO   4
> > +enum {
> > +   /* Global indicator of serious filesysystem errors */
> 
> filesysystem -> filesystem
> 
> > +   BTRFS_FS_STATE_ERROR,
> > +   /*
> > +* Filesystem is being remounted, allow to skip some operations, like
> > +* defrag
> > +*/
> > +   BTRFS_FS_STATE_REMOUNTING,
> > +   /* Track if the transaction abort has been reported */
> 
> Which one is "the" transaction abort? This gives me the impression that
> this is a flag on the transaction, but it's actually filesystem state.
> Maybe "Track if a transaction abort has been reported on this
> filesystem"?

Your wording is more clear and I've used it in the patch.

> > +   BTRFS_FS_STATE_TRANS_ABORTED,
> > +   /*
> > +* Indicate that replace source or target device state is changed and
> > +* allow to block bio operations
> > +*/
> 
> Again, this makes it sound like it's device state, but it's actually
> filesystem state. How about "Bio operations should be blocked on this
> filesystem because a source or target device is being destroyed as part
> of a device replace"?

Same. Thanks.


Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub

2018-11-28 Thread David Sterba
On Mon, Nov 26, 2018 at 08:10:30PM +, Filipe Manana wrote:
> On Mon, Nov 26, 2018 at 6:17 PM David Sterba  wrote:
> >
> > On Fri, Nov 23, 2018 at 06:25:40PM +, fdman...@kernel.org wrote:
> > > From: Filipe Manana 
> > >
> > > When a transaction commit starts, it attempts to pause scrub and it blocks
> > > until the scrub is paused. So while the transaction is blocked waiting for
> > > scrub to pause, we can not do memory allocation with GFP_KERNEL from 
> > > scrub,
> > > otherwise we risk getting into a deadlock with reclaim.
> > >
> > > Checking for scrub pause requests is done early at the beginning of the
> > > while loop of scrub_stripe() and later in the loop, scrub_extent() and
> > > scrub_raid56_parity() are called, which in turn call scrub_pages() and
> > > scrub_pages_for_parity() respectively. These last two functions do memory
> > > allocations using GFP_KERNEL. Same problem could happen while scrubbing
> > > the super blocks, since it calls scrub_pages().
> > >
> > > So make sure GFP_NOFS is used for the memory allocations because at any
> > > time a scrub pause request can happen from another task that started to
> > > commit a transaction.
> > >
> > > Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission 
> > > path")
> > > Signed-off-by: Filipe Manana 
> > > ---
> > >
> > > V2: Make using GFP_NOFS unconditionial. Previous version was racy, as 
> > > pausing
> > > requests migth happen just after we checked for them.
> > >
> > > V3: Use memalloc_nofs_save() just like V1 did.
> > >
> > > V4: Similar problem happened for raid56, which was previously missed, so
> > > deal with it as well as the case for scrub_supers().
> >
> > Enclosing the whole scrub to 'nofs' seems like the best option and
> > future proof. What I missed in 58c4e173847a was the "don't hold big lock
> > under GFP_KERNEL allocation" pattern.
> >
> > >  fs/btrfs/scrub.c | 12 
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> > > index 3be1456b5116..e08b7502d1f0 100644
> > > --- a/fs/btrfs/scrub.c
> > > +++ b/fs/btrfs/scrub.c
> > > @@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, 
> > > u64 devid, u64 start,
> > >   struct scrub_ctx *sctx;
> > >   int ret;
> > >   struct btrfs_device *dev;
> > > + unsigned int nofs_flag;
> > >
> > >   if (btrfs_fs_closing(fs_info))
> > >   return -EINVAL;
> > > @@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, 
> > > u64 devid, u64 start,
> > >   atomic_inc(_info->scrubs_running);
> > >   mutex_unlock(_info->scrub_lock);
> > >
> > > + /*
> > > +  * In order to avoid deadlock with reclaim when there is a 
> > > transaction
> > > +  * trying to pause scrub, make sure we use GFP_NOFS for all the
> > > +  * allocations done at btrfs_scrub_pages() and 
> > > scrub_pages_for_parity()
> > > +  * invoked by our callees. The pausing request is done when the
> > > +  * transaction commit starts, and it blocks the transaction until 
> > > scrub
> > > +  * is paused (done at specific points at scrub_stripe() or right 
> > > above
> > > +  * before incrementing fs_info->scrubs_running).
> >
> > This hilights why there's perhaps no point in trying to make the nofs
> > section smaller, handling all the interactions between scrub and
> > transaction would be too complex.
> >
> > Reviewed-by: David Sterba 
> 
> Well, the worker tasks can also not use gfp_kernel, since the scrub
> task waits for them to complete before pausing.
> I missed this, and 2 reviewers as well, so perhaps it wasn't that
> trivial and I shouldn't feel that I miserably failed to identify all
> cases for something rather trivial. V5 sent.

You can say that you left it there intentionally, such cookies are a
good drill for reviewers to stay sharp.

When I started the conversions of GFP_NOFS -> GFP_KERNEL, scrub looked
quite safe in this respect but turns out it's not. I was wondering if we
could add some lock assertions before GFP_KERNEL allocations, like:

assert_lock_not_held(fs_info->device_list_mutex)
kmalloc(GFP_KERNEL)

and add more locks per subsystem (eg. the scrub lock) and possibly some
convenience wrappers. Michal's scope GFS_NOFS patch series has a
debugging warning where NOFS is used in context where it does not need
to, while having the 'must not hold an important lock' would be a good
debugging helper too.


Re: [PATCH] btrfs: adjust order of unlocks in do_trimming()

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 11:21:12AM +0800, Su Yue wrote:
> In function do_trimming(), block_group->lock should be unlocked first.

Please also write why this is correct and if there are any bad
consequences of the current behaviour. Thanks.


Re: [PATCH 0/9] Switch defines to enums

2018-11-28 Thread David Sterba
On Wed, Nov 28, 2018 at 09:33:50AM +0800, Qu Wenruo wrote:
> On 2018/11/28 上午3:53, David Sterba wrote:
> > This is motivated by a merging mistake that happened a few releases ago.
> > Two patches updated BTRFS_FS_* flags independently to the same value,
> > git did not see any direct merge conflict. The two values got mixed at
> > runtime and caused crash.
> > 
> > Update all #define sequential values, the above merging problem would
> > not happen as there would be a conflict and the enum value
> > auto-increment would prevent duplicated values anyway.
> 
> Just one small question for the bitmap usage.
> 
> For enum we won't directly know the last number is, my concern is if
> we're using u16 as bitmap and there is some enum over 15, will we get a
> warning at compile time or some bug would just sneak into kernel?

Do you have a concrete example where this would happen? Most bits are
used in 'long' that should be at least 32. I'm not aware of 16bit bits
flags.


Re: [PATCH 7/9] btrfs: switch BTRFS_*_LOCK to enums

2018-11-28 Thread David Sterba
On Tue, Nov 27, 2018 at 04:37:16PM -0800, Omar Sandoval wrote:
> On Tue, Nov 27, 2018 at 08:53:55PM +0100, David Sterba wrote:
> > We can use simple enum for values that are not part of on-disk format:
> > tree lock types.
> > 
> > Signed-off-by: David Sterba 
> > ---
> >  fs/btrfs/locking.h | 10 ++
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
> > index 29135def468e..684d0ef4faa4 100644
> > --- a/fs/btrfs/locking.h
> > +++ b/fs/btrfs/locking.h
> > @@ -6,10 +6,12 @@
> >  #ifndef BTRFS_LOCKING_H
> >  #define BTRFS_LOCKING_H
> >  
> > -#define BTRFS_WRITE_LOCK 1
> > -#define BTRFS_READ_LOCK 2
> > -#define BTRFS_WRITE_LOCK_BLOCKING 3
> > -#define BTRFS_READ_LOCK_BLOCKING 4
> > +enum {
> > +   BTRFS_WRITE_LOCK,
> 
> See btrfs_set_path_blocking() and btrfs_release_path(); 0 means no lock,
> so this needs to be BTRFS_WRITE_LOCK = 1. I imagine that lockdep would
> catch this.

Oh right of course, thanks for catching it. I'll drop the patch for now,
0 could be added to the set as BTRFS_NO_LOCK and replaced in the code
which is beyond what this series does.


[PATCH 9/9] btrfs: drop extra enum initialization where using defaults

2018-11-27 Thread David Sterba
The first auto-assigned value to enum is 0, we can use that and not
initialize all members where the auto-increment does the same. This is
used for values that are not part of on-disk format.

Signed-off-by: David Sterba 
---
 fs/btrfs/btrfs_inode.h |  2 +-
 fs/btrfs/ctree.h   | 28 ++--
 fs/btrfs/disk-io.h | 10 +-
 fs/btrfs/qgroup.h  |  2 +-
 fs/btrfs/sysfs.h   |  2 +-
 fs/btrfs/transaction.h | 14 +++---
 6 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4de321aee7a5..fc25607304f2 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -20,7 +20,7 @@
  * new data the application may have written before commit.
  */
 enum {
-   BTRFS_INODE_ORDERED_DATA_CLOSE = 0,
+   BTRFS_INODE_ORDERED_DATA_CLOSE,
BTRFS_INODE_DUMMY,
BTRFS_INODE_IN_DEFRAG,
BTRFS_INODE_HAS_ASYNC_EXTENT,
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4bb0ac3050ff..f1d1c6ba3aa1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -334,7 +334,7 @@ struct btrfs_node {
  * The slots array records the index of the item or block pointer
  * used while walking the tree.
  */
-enum { READA_NONE = 0, READA_BACK, READA_FORWARD };
+enum { READA_NONE, READA_BACK, READA_FORWARD };
 struct btrfs_path {
struct extent_buffer *nodes[BTRFS_MAX_LEVEL];
int slots[BTRFS_MAX_LEVEL];
@@ -532,18 +532,18 @@ struct btrfs_free_cluster {
 };
 
 enum btrfs_caching_type {
-   BTRFS_CACHE_NO  = 0,
-   BTRFS_CACHE_STARTED = 1,
-   BTRFS_CACHE_FAST= 2,
-   BTRFS_CACHE_FINISHED= 3,
-   BTRFS_CACHE_ERROR   = 4,
+   BTRFS_CACHE_NO,
+   BTRFS_CACHE_STARTED,
+   BTRFS_CACHE_FAST,
+   BTRFS_CACHE_FINISHED,
+   BTRFS_CACHE_ERROR,
 };
 
 enum btrfs_disk_cache_state {
-   BTRFS_DC_WRITTEN= 0,
-   BTRFS_DC_ERROR  = 1,
-   BTRFS_DC_CLEAR  = 2,
-   BTRFS_DC_SETUP  = 3,
+   BTRFS_DC_WRITTEN,
+   BTRFS_DC_ERROR,
+   BTRFS_DC_CLEAR,
+   BTRFS_DC_SETUP,
 };
 
 struct btrfs_caching_control {
@@ -2621,10 +2621,10 @@ static inline gfp_t btrfs_alloc_write_mask(struct 
address_space *mapping)
 /* extent-tree.c */
 
 enum btrfs_inline_ref_type {
-   BTRFS_REF_TYPE_INVALID = 0,
-   BTRFS_REF_TYPE_BLOCK =   1,
-   BTRFS_REF_TYPE_DATA =2,
-   BTRFS_REF_TYPE_ANY = 3,
+   BTRFS_REF_TYPE_INVALID,
+   BTRFS_REF_TYPE_BLOCK,
+   BTRFS_REF_TYPE_DATA,
+   BTRFS_REF_TYPE_ANY,
 };
 
 int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 4cccba22640f..987a64bc0c66 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -21,11 +21,11 @@
 #define BTRFS_BDEV_BLOCKSIZE   (4096)
 
 enum btrfs_wq_endio_type {
-   BTRFS_WQ_ENDIO_DATA = 0,
-   BTRFS_WQ_ENDIO_METADATA = 1,
-   BTRFS_WQ_ENDIO_FREE_SPACE = 2,
-   BTRFS_WQ_ENDIO_RAID56 = 3,
-   BTRFS_WQ_ENDIO_DIO_REPAIR = 4,
+   BTRFS_WQ_ENDIO_DATA,
+   BTRFS_WQ_ENDIO_METADATA,
+   BTRFS_WQ_ENDIO_FREE_SPACE,
+   BTRFS_WQ_ENDIO_RAID56,
+   BTRFS_WQ_ENDIO_DIO_REPAIR,
 };
 
 static inline u64 btrfs_sb_offset(int mirror)
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index d8f78f5ab854..e4e6ee44073a 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -70,7 +70,7 @@ struct btrfs_qgroup_extent_record {
  * be converted into META_PERTRANS.
  */
 enum btrfs_qgroup_rsv_type {
-   BTRFS_QGROUP_RSV_DATA = 0,
+   BTRFS_QGROUP_RSV_DATA,
BTRFS_QGROUP_RSV_META_PERTRANS,
BTRFS_QGROUP_RSV_META_PREALLOC,
BTRFS_QGROUP_RSV_LAST,
diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h
index c6ee600aff89..40716b357c1d 100644
--- a/fs/btrfs/sysfs.h
+++ b/fs/btrfs/sysfs.h
@@ -9,7 +9,7 @@
 extern u64 btrfs_debugfs_test;
 
 enum btrfs_feature_set {
-   FEAT_COMPAT = 0,
+   FEAT_COMPAT,
FEAT_COMPAT_RO,
FEAT_INCOMPAT,
FEAT_MAX
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 703d5116a2fc..f1ba78949d1b 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -12,13 +12,13 @@
 #include "ctree.h"
 
 enum btrfs_trans_state {
-   TRANS_STATE_RUNNING = 0,
-   TRANS_STATE_BLOCKED = 1,
-   TRANS_STATE_COMMIT_START= 2,
-   TRANS_STATE_COMMIT_DOING= 3,
-   TRANS_STATE_UNBLOCKED   = 4,
-   TRANS_STATE_COMPLETED   = 5,
-   TRANS_STATE_MAX = 6,
+   TRANS_STATE_RUNNING,
+   TRANS_STATE_BLOCKED,
+   TRANS_STATE_COMMIT_START,
+   TRANS_STATE_COMMIT_DOING,
+   TRANS_STATE_UNBLOCKED,
+   TRANS_STATE_COMPLETED,
+   TRANS_STATE_MAX,
 };
 
 #define BTRFS_TRANS_HAVE_FREE_BGS  0
-- 
2.19.1



[PATCH 4/9] btrfs: switch BTRFS_ROOT_* to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
root tree flags.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7176b95b40e7..4bb0ac3050ff 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1180,22 +1180,23 @@ struct btrfs_subvolume_writers {
 /*
  * The state of btrfs root
  */
-/*
- * btrfs_record_root_in_trans is a multi-step process,
- * and it can race with the balancing code.   But the
- * race is very small, and only the first time the root
- * is added to each transaction.  So IN_TRANS_SETUP
- * is used to tell us when more checks are required
- */
-#define BTRFS_ROOT_IN_TRANS_SETUP  0
-#define BTRFS_ROOT_REF_COWS1
-#define BTRFS_ROOT_TRACK_DIRTY 2
-#define BTRFS_ROOT_IN_RADIX3
-#define BTRFS_ROOT_ORPHAN_ITEM_INSERTED4
-#define BTRFS_ROOT_DEFRAG_RUNNING  5
-#define BTRFS_ROOT_FORCE_COW   6
-#define BTRFS_ROOT_MULTI_LOG_TASKS 7
-#define BTRFS_ROOT_DIRTY   8
+enum {
+   /*
+* btrfs_record_root_in_trans is a multi-step process, and it can race
+* with the balancing code.   But the race is very small, and only the
+* first time the root is added to each transaction.  So IN_TRANS_SETUP
+* is used to tell us when more checks are required
+*/
+   BTRFS_ROOT_IN_TRANS_SETUP,
+   BTRFS_ROOT_REF_COWS,
+   BTRFS_ROOT_TRACK_DIRTY,
+   BTRFS_ROOT_IN_RADIX,
+   BTRFS_ROOT_ORPHAN_ITEM_INSERTED,
+   BTRFS_ROOT_DEFRAG_RUNNING,
+   BTRFS_ROOT_FORCE_COW,
+   BTRFS_ROOT_MULTI_LOG_TASKS,
+   BTRFS_ROOT_DIRTY,
+};
 
 /*
  * in ram representation of the tree.  extent_root is used for all allocations
-- 
2.19.1



[PATCH 3/9] btrfs: switch BTRFS_FS_* to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
internal filesystem states.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 63 
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 40c405d74a01..7176b95b40e7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -757,38 +757,37 @@ struct btrfs_swapfile_pin {
 
 bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
 
-#define BTRFS_FS_BARRIER   1
-#define BTRFS_FS_CLOSING_START 2
-#define BTRFS_FS_CLOSING_DONE  3
-#define BTRFS_FS_LOG_RECOVERING4
-#define BTRFS_FS_OPEN  5
-#define BTRFS_FS_QUOTA_ENABLED 6
-#define BTRFS_FS_UPDATE_UUID_TREE_GEN  9
-#define BTRFS_FS_CREATING_FREE_SPACE_TREE  10
-#define BTRFS_FS_BTREE_ERR 11
-#define BTRFS_FS_LOG1_ERR  12
-#define BTRFS_FS_LOG2_ERR  13
-#define BTRFS_FS_QUOTA_OVERRIDE14
-/* Used to record internally whether fs has been frozen */
-#define BTRFS_FS_FROZEN15
-
-/*
- * Indicate that a whole-filesystem exclusive operation is running
- * (device replace, resize, device add/delete, balance)
- */
-#define BTRFS_FS_EXCL_OP   16
-
-/*
- * To info transaction_kthread we need an immediate commit so it doesn't
- * need to wait for commit_interval
- */
-#define BTRFS_FS_NEED_ASYNC_COMMIT 17
-
-/*
- * Indicate that balance has been set up from the ioctl and is in the main
- * phase. The fs_info::balance_ctl is initialized.
- */
-#define BTRFS_FS_BALANCE_RUNNING   18
+enum {
+   BTRFS_FS_BARRIER,
+   BTRFS_FS_CLOSING_START,
+   BTRFS_FS_CLOSING_DONE,
+   BTRFS_FS_LOG_RECOVERING,
+   BTRFS_FS_OPEN,
+   BTRFS_FS_QUOTA_ENABLED,
+   BTRFS_FS_UPDATE_UUID_TREE_GEN,
+   BTRFS_FS_CREATING_FREE_SPACE_TREE,
+   BTRFS_FS_BTREE_ERR,
+   BTRFS_FS_LOG1_ERR,
+   BTRFS_FS_LOG2_ERR,
+   BTRFS_FS_QUOTA_OVERRIDE,
+   /* Used to record internally whether fs has been frozen */
+   BTRFS_FS_FROZEN,
+   /*
+* Indicate that a whole-filesystem exclusive operation is running
+* (device replace, resize, device add/delete, balance)
+*/
+   BTRFS_FS_EXCL_OP,
+   /*
+* To info transaction_kthread we need an immediate commit so it
+* doesn't need to wait for commit_interval
+*/
+   BTRFS_FS_NEED_ASYNC_COMMIT,
+   /*
+* Indicate that balance has been set up from the ioctl and is in the
+* main phase. The fs_info::balance_ctl is initialized.
+*/
+   BTRFS_FS_BALANCE_RUNNING,
+};
 
 struct btrfs_fs_info {
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
-- 
2.19.1



[PATCH 5/9] btrfs: swtich EXTENT_BUFFER_* to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
extent buffer flags;

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.h | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a1d3ea5a0d32..fd42492e62e5 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -37,18 +37,22 @@
 #define EXTENT_BIO_COMPRESSED 1
 #define EXTENT_BIO_FLAG_SHIFT 16
 
-/* these are bit numbers for test/set bit */
-#define EXTENT_BUFFER_UPTODATE 0
-#define EXTENT_BUFFER_DIRTY 2
-#define EXTENT_BUFFER_CORRUPT 3
-#define EXTENT_BUFFER_READAHEAD 4  /* this got triggered by readahead */
-#define EXTENT_BUFFER_TREE_REF 5
-#define EXTENT_BUFFER_STALE 6
-#define EXTENT_BUFFER_WRITEBACK 7
-#define EXTENT_BUFFER_READ_ERR 8/* read IO error */
-#define EXTENT_BUFFER_UNMAPPED 9
-#define EXTENT_BUFFER_IN_TREE 10
-#define EXTENT_BUFFER_WRITE_ERR 11/* write IO error */
+enum {
+   EXTENT_BUFFER_UPTODATE,
+   EXTENT_BUFFER_DIRTY,
+   EXTENT_BUFFER_CORRUPT,
+   /* this got triggered by readahead */
+   EXTENT_BUFFER_READAHEAD,
+   EXTENT_BUFFER_TREE_REF,
+   EXTENT_BUFFER_STALE,
+   EXTENT_BUFFER_WRITEBACK,
+   /* read IO error */
+   EXTENT_BUFFER_READ_ERR,
+   EXTENT_BUFFER_UNMAPPED,
+   EXTENT_BUFFER_IN_TREE,
+   /* write IO error */
+   EXTENT_BUFFER_WRITE_ERR,
+};
 
 /* these are flags for __process_pages_contig */
 #define PAGE_UNLOCK(1 << 0)
-- 
2.19.1



[PATCH 8/9] btrfs: switch BTRFS_ORDERED_* to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
ordered extent flags.

Signed-off-by: David Sterba 
---
 fs/btrfs/ordered-data.h | 45 +++--
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index b10e6765d88f..fb9a161f0215 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -37,26 +37,31 @@ struct btrfs_ordered_sum {
  * rbtree, just before waking any waiters.  It is used to indicate the
  * IO is done and any metadata is inserted into the tree.
  */
-#define BTRFS_ORDERED_IO_DONE 0 /* set when all the pages are written */
-
-#define BTRFS_ORDERED_COMPLETE 1 /* set when removed from the tree */
-
-#define BTRFS_ORDERED_NOCOW 2 /* set when we want to write in place */
-
-#define BTRFS_ORDERED_COMPRESSED 3 /* writing a zlib compressed extent */
-
-#define BTRFS_ORDERED_PREALLOC 4 /* set when writing to preallocated extent */
-
-#define BTRFS_ORDERED_DIRECT 5 /* set when we're doing DIO with this extent */
-
-#define BTRFS_ORDERED_IOERR 6 /* We had an io error when writing this out */
-
-#define BTRFS_ORDERED_UPDATED_ISIZE 7 /* indicates whether this ordered extent
-  * has done its due diligence in updating
-  * the isize. */
-#define BTRFS_ORDERED_TRUNCATED 8 /* Set when we have to truncate an extent */
-
-#define BTRFS_ORDERED_REGULAR 10 /* Regular IO for COW */
+enum {
+   /* set when all the pages are written */
+   BTRFS_ORDERED_IO_DONE,
+   /* set when removed from the tree */
+   BTRFS_ORDERED_COMPLETE,
+   /* set when we want to write in place */
+   BTRFS_ORDERED_NOCOW,
+   /* writing a zlib compressed extent */
+   BTRFS_ORDERED_COMPRESSED,
+   /* set when writing to preallocated extent */
+   BTRFS_ORDERED_PREALLOC,
+   /* set when we're doing DIO with this extent */
+   BTRFS_ORDERED_DIRECT,
+   /* We had an io error when writing this out */
+   BTRFS_ORDERED_IOERR,
+   /*
+* indicates whether this ordered extent has done its due diligence in
+* updating the isize
+*/
+   BTRFS_ORDERED_UPDATED_ISIZE,
+   /* Set when we have to truncate an extent */
+   BTRFS_ORDERED_TRUNCATED,
+   /* Regular IO for COW */
+   BTRFS_ORDERED_REGULAR,
+};
 
 struct btrfs_ordered_extent {
/* logical offset in the file */
-- 
2.19.1



[PATCH 7/9] btrfs: switch BTRFS_*_LOCK to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
tree lock types.

Signed-off-by: David Sterba 
---
 fs/btrfs/locking.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index 29135def468e..684d0ef4faa4 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -6,10 +6,12 @@
 #ifndef BTRFS_LOCKING_H
 #define BTRFS_LOCKING_H
 
-#define BTRFS_WRITE_LOCK 1
-#define BTRFS_READ_LOCK 2
-#define BTRFS_WRITE_LOCK_BLOCKING 3
-#define BTRFS_READ_LOCK_BLOCKING 4
+enum {
+   BTRFS_WRITE_LOCK,
+   BTRFS_READ_LOCK,
+   BTRFS_WRITE_LOCK_BLOCKING,
+   BTRFS_READ_LOCK_BLOCKING,
+};
 
 void btrfs_tree_lock(struct extent_buffer *eb);
 void btrfs_tree_unlock(struct extent_buffer *eb);
-- 
2.19.1



[PATCH 6/9] btrfs: switch EXTENT_FLAG_* to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
extent map flags.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_map.h | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 31977ffd6190..ef05a0121652 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -11,13 +11,20 @@
 #define EXTENT_MAP_INLINE ((u64)-2)
 #define EXTENT_MAP_DELALLOC ((u64)-1)
 
-/* bits for the flags field */
-#define EXTENT_FLAG_PINNED 0 /* this entry not yet on disk, don't free it */
-#define EXTENT_FLAG_COMPRESSED 1
-#define EXTENT_FLAG_PREALLOC 3 /* pre-allocated extent */
-#define EXTENT_FLAG_LOGGING 4 /* Logging this extent */
-#define EXTENT_FLAG_FILLING 5 /* Filling in a preallocated extent */
-#define EXTENT_FLAG_FS_MAPPING 6 /* filesystem extent mapping type */
+/* bits for the extent_map::flags field */
+enum {
+   /* this entry not yet on disk, don't free it */
+   EXTENT_FLAG_PINNED,
+   EXTENT_FLAG_COMPRESSED,
+   /* pre-allocated extent */
+   EXTENT_FLAG_PREALLOC,
+   /* Logging this extent */
+   EXTENT_FLAG_LOGGING,
+   /* Filling in a preallocated extent */
+   EXTENT_FLAG_FILLING,
+   /* filesystem extent mapping type */
+   EXTENT_FLAG_FS_MAPPING,
+};
 
 struct extent_map {
struct rb_node rb_node;
-- 
2.19.1



[PATCH 1/9] btrfs: switch BTRFS_FS_STATE_* to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
global filesystem states.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a98507fa9192..f82ec5e41b0c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -109,13 +109,26 @@ static inline unsigned long btrfs_chunk_item_size(int 
num_stripes)
 }
 
 /*
- * File system states
+ * Runtime (in-memory) states of filesystem
  */
-#define BTRFS_FS_STATE_ERROR   0
-#define BTRFS_FS_STATE_REMOUNTING  1
-#define BTRFS_FS_STATE_TRANS_ABORTED   2
-#define BTRFS_FS_STATE_DEV_REPLACING   3
-#define BTRFS_FS_STATE_DUMMY_FS_INFO   4
+enum {
+   /* Global indicator of serious filesysystem errors */
+   BTRFS_FS_STATE_ERROR,
+   /*
+* Filesystem is being remounted, allow to skip some operations, like
+* defrag
+*/
+   BTRFS_FS_STATE_REMOUNTING,
+   /* Track if the transaction abort has been reported */
+   BTRFS_FS_STATE_TRANS_ABORTED,
+   /*
+* Indicate that replace source or target device state is changed and
+* allow to block bio operations
+*/
+   BTRFS_FS_STATE_DEV_REPLACING,
+   /* The btrfs_fs_info created for self-tests */
+   BTRFS_FS_STATE_DUMMY_FS_INFO,
+};
 
 #define BTRFS_BACKREF_REV_MAX  256
 #define BTRFS_BACKREF_REV_SHIFT56
-- 
2.19.1



[PATCH 2/9] btrfs: switch BTRFS_BLOCK_RSV_* to enums

2018-11-27 Thread David Sterba
We can use simple enum for values that are not part of on-disk format:
block reserve types.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f82ec5e41b0c..40c405d74a01 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -461,13 +461,18 @@ struct btrfs_space_info {
struct kobject *block_group_kobjs[BTRFS_NR_RAID_TYPES];
 };
 
-#defineBTRFS_BLOCK_RSV_GLOBAL  1
-#defineBTRFS_BLOCK_RSV_DELALLOC2
-#defineBTRFS_BLOCK_RSV_TRANS   3
-#defineBTRFS_BLOCK_RSV_CHUNK   4
-#defineBTRFS_BLOCK_RSV_DELOPS  5
-#defineBTRFS_BLOCK_RSV_EMPTY   6
-#defineBTRFS_BLOCK_RSV_TEMP7
+/*
+ * Types of block reserves
+ */
+enum {
+   BTRFS_BLOCK_RSV_GLOBAL,
+   BTRFS_BLOCK_RSV_DELALLOC,
+   BTRFS_BLOCK_RSV_TRANS,
+   BTRFS_BLOCK_RSV_CHUNK,
+   BTRFS_BLOCK_RSV_DELOPS,
+   BTRFS_BLOCK_RSV_EMPTY,
+   BTRFS_BLOCK_RSV_TEMP,
+};
 
 struct btrfs_block_rsv {
u64 size;
-- 
2.19.1



[PATCH 0/9] Switch defines to enums

2018-11-27 Thread David Sterba
This is motivated by a merging mistake that happened a few releases ago.
Two patches updated BTRFS_FS_* flags independently to the same value,
git did not see any direct merge conflict. The two values got mixed at
runtime and caused crash.

Update all #define sequential values, the above merging problem would
not happen as there would be a conflict and the enum value
auto-increment would prevent duplicated values anyway.

David Sterba (9):
  btrfs: switch BTRFS_FS_STATE_* to enums
  btrfs: switch BTRFS_BLOCK_RSV_* to enums
  btrfs: switch BTRFS_FS_* to enums
  btrfs: switch BTRFS_ROOT_* to enums
  btrfs: swtich EXTENT_BUFFER_* to enums
  btrfs: switch EXTENT_FLAG_* to enums
  btrfs: switch BTRFS_*_LOCK to enums
  btrfs: switch BTRFS_ORDERED_* to enums
  btrfs: drop extra enum initialization where using defaults

 fs/btrfs/btrfs_inode.h  |   2 +-
 fs/btrfs/ctree.h| 168 ++--
 fs/btrfs/disk-io.h  |  10 +--
 fs/btrfs/extent_io.h|  28 ---
 fs/btrfs/extent_map.h   |  21 +++--
 fs/btrfs/locking.h  |  10 ++-
 fs/btrfs/ordered-data.h |  45 ++-
 fs/btrfs/qgroup.h   |   2 +-
 fs/btrfs/sysfs.h|   2 +-
 fs/btrfs/transaction.h  |  14 ++--
 10 files changed, 169 insertions(+), 133 deletions(-)

-- 
2.19.1



Re: [PATCH 5/6] btrfs: introduce delayed_refs_rsv

2018-11-27 Thread David Sterba
On Mon, Nov 26, 2018 at 11:14:12AM +0200, Nikolay Borisov wrote:
> > -   if (global_rsv->space_info->full) {
> > -   num_dirty_bgs_bytes <<= 1;
> > -   num_bytes <<= 1;
> > -   }
> > +   struct btrfs_block_rsv *global_rsv = _info->global_block_rsv;
> > +   struct btrfs_block_rsv *delayed_refs_rsv = _info->delayed_refs_rsv;
> > +   u64 reserved;
> > +   bool ret = false;
> 
> nit: Reverse christmas tree:

Let it be known that the reverse-xmas-tree declaration style will _not_
be enforced in new patches and cleanup patches only reordering
declarations will not be accepted.

You're free to use it but I may edit your patch so the declaration
conforms to the style used in the file or to a pattern that's common the
subsystem, eg. global pointers like fs_info or root first in the list.


Re: [PATCH v2] btrfs: improve error handling of btrfs_add_link()

2018-11-26 Thread David Sterba
On Fri, Nov 23, 2018 at 09:42:43AM +0100, Johannes Thumshirn wrote:
> err holds the return value of either btrfs_del_root_ref() or
> btrfs_del_inode_ref() but it hasn't been checked since it's
> introduction with commit fe66a05a0679 (Btrfs: improve error handling
> for btrfs_insert_dir_item callers) in 2012.
> 
> Instead of silently ignoring the return values, print a message so the user
> knows what kind of error has encountered.
> 
> Link: https://lore.kernel.org/linux-btrfs/20181119141323.gc24...@twin.jikos.cz
> Signed-off-by: Johannes Thumshirn 
> ---
> 
> Changes to v1:
> * Only print an error message and let the callers abort the transaction (Dave)
> ---
>  fs/btrfs/inode.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 9becf8543489..8ca2f82b35a3 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6351,6 +6351,7 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
>   struct btrfs_root *root = parent_inode->root;
>   u64 ino = btrfs_ino(inode);
>   u64 parent_ino = btrfs_ino(parent_inode);
> + int err;
>  
>   if (unlikely(ino == BTRFS_FIRST_FREE_OBJECTID)) {
>   memcpy(, >root->root_key, sizeof(key));
> @@ -6395,17 +6396,25 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
>  fail_dir_item:
>   if (unlikely(ino == BTRFS_FIRST_FREE_OBJECTID)) {
>   u64 local_index;
> - int err;
> +
>   err = btrfs_del_root_ref(trans, key.objectid,
>root->root_key.objectid, parent_ino,
>_index, name, name_len);
> + if (err)
> + btrfs_info(trans->fs_info,
> +"failed to delete reference to %.*s, root %llu ref %llu",
> +name_len, name, key.objectid,
> +root->root_key.objectid);

I though a transaction abort would be here, as the state is not
consistent. Also I'm not sure what I as a user would get from such error
message after calling link(). If the error handling in the error
handling fails, there's not much left to do and the abort either
happened earlier in the callees or is necessary here.


Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub

2018-11-26 Thread David Sterba
On Fri, Nov 23, 2018 at 06:25:40PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When a transaction commit starts, it attempts to pause scrub and it blocks
> until the scrub is paused. So while the transaction is blocked waiting for
> scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
> otherwise we risk getting into a deadlock with reclaim.
> 
> Checking for scrub pause requests is done early at the beginning of the
> while loop of scrub_stripe() and later in the loop, scrub_extent() and
> scrub_raid56_parity() are called, which in turn call scrub_pages() and
> scrub_pages_for_parity() respectively. These last two functions do memory
> allocations using GFP_KERNEL. Same problem could happen while scrubbing
> the super blocks, since it calls scrub_pages().
> 
> So make sure GFP_NOFS is used for the memory allocations because at any
> time a scrub pause request can happen from another task that started to
> commit a transaction.
> 
> Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
> Signed-off-by: Filipe Manana 
> ---
> 
> V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
> requests migth happen just after we checked for them.
> 
> V3: Use memalloc_nofs_save() just like V1 did.
> 
> V4: Similar problem happened for raid56, which was previously missed, so
> deal with it as well as the case for scrub_supers().

Enclosing the whole scrub to 'nofs' seems like the best option and
future proof. What I missed in 58c4e173847a was the "don't hold big lock
under GFP_KERNEL allocation" pattern.

>  fs/btrfs/scrub.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 3be1456b5116..e08b7502d1f0 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
> devid, u64 start,
>   struct scrub_ctx *sctx;
>   int ret;
>   struct btrfs_device *dev;
> + unsigned int nofs_flag;
>  
>   if (btrfs_fs_closing(fs_info))
>   return -EINVAL;
> @@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
> devid, u64 start,
>   atomic_inc(_info->scrubs_running);
>   mutex_unlock(_info->scrub_lock);
>  
> + /*
> +  * In order to avoid deadlock with reclaim when there is a transaction
> +  * trying to pause scrub, make sure we use GFP_NOFS for all the
> +  * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
> +  * invoked by our callees. The pausing request is done when the
> +  * transaction commit starts, and it blocks the transaction until scrub
> +  * is paused (done at specific points at scrub_stripe() or right above
> +  * before incrementing fs_info->scrubs_running).

This hilights why there's perhaps no point in trying to make the nofs
section smaller, handling all the interactions between scrub and
transaction would be too complex.

Reviewed-by: David Sterba 


Re: [PATCH 00/10] btrfs-progs: my libbtrfsutil patch queue

2018-11-26 Thread David Sterba
On Tue, Nov 13, 2018 at 11:46:55PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> Hi,
> 
> This series contains my backlog of libbtrfsutil changes which I've been
> collecting over the past few weeks.
> 
> Patches 1-4 are fixes. Patches 5-6 add functionality to the unit tests
> which is needed for patches 7-8. Patches 7-8 add support for the
> unprivileged ioctls added in Linux 4.18; more on those below. Patch 9
> bumps the library version. Patch 10 adds documentation for the available
> API along with examples.
> 
> Patches 7-8 are based on Misono Tomohiro's previous patch series [1],
> with a few important changes.
> 
> - Both subvolume_info() and create_subvolume_iterator() now have unit
>   tests for the unprivileged case.
> - Both no longer explicitly check that top == 0 in the unprivileged
>   case, since that will already fail with a clear permission error.
> - Unprivileged iteration is much simpler: it uses openat() instead of
>   fchdir() and is based more closely on the original tree search
>   variant. This fixes a bug in post-order iteration in Misono's version.
> - Unprivileged iteration does _not_ support passing in a non-subvolume
>   path; if this behavior is desired, I'd like it to be a separate change
>   with an explicit flag.

Series merged to devel, thanks. I've added link from the main README now
that there's the API documentation.

The test-libbtrfsutil is missing from the travis CI for some reason, I
was about to add it.  So far the testing environment does not provide
'umount' that knows about '-R' so the tests fail. I'll have a look if
there's a newer base image provided, otherwise a workaround would be
necessary.

As for the unprivileged subvolume listing ioctls, the functionality in
the util library is self-contained and the interface is up to you to
design properly, so this does not depend on the 'btrfs subvolume list'
command. That one has unfortunately not bubbled high enough in my todo.


Re: [PATCH] btrfs: Remove extent_io_ops::readpage_io_failed_hook

2018-11-26 Thread David Sterba
On Thu, Nov 22, 2018 at 10:17:49AM +0200, Nikolay Borisov wrote:
> For data inodes this hook does nothing but to return -EAGAIN which is
> used to signal to the endio routines that this bio belongs to a data
> inode. If this is the case the actual retrying is handled by
> bio_readpage_error. Alternatively, if this bio belongs to the btree
> inode then btree_io_failed_hook just does some cleanup and doesn't
> retry anything.
> 
> This patch simplifies the code flow by eliminating
> readpage_io_failed_hook and instead open-coding btree_io_failed_hook in
> end_bio_extent_readpage. Also eliminate some needless checks since IO is 
> always
> performed on either data inode or btree inode, both of which are guaranteed 
> to 
> have their extent_io_tree::ops set. 
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 


Re: [PATCH 0/6] Delayed refs rsv

2018-11-23 Thread David Sterba
On Wed, Nov 21, 2018 at 01:59:06PM -0500, Josef Bacik wrote:
> This patchset changes how we do space reservations for delayed refs.  We were
> hitting probably 20-40 enospc abort's per day in production while running
> delayed refs at transaction commit time.  This means we ran out of space in 
> the
> global reserve and couldn't easily get more space in use_block_rsv().
> 
> The global reserve has grown to cover everything we don't reserve space
> explicitly for, and we've grown a lot of weird ad-hoc hueristics to know if
> we're running short on space and when it's time to force a commit.  A failure
> rate of 20-40 file systems when we run hundreds of thousands of them isn't 
> super
> high, but cleaning up this code will make things less ugly and more 
> predictible.
> 
> Thus the delayed refs rsv.  We always know how many delayed refs we have
> outstanding, and although running them generates more we can use the global
> reserve for that spill over, which fits better into it's desired use than a 
> full
> blown reservation.  This first approach is to simply take how many times we're
> reserving space for and multiply that by 2 in order to save enough space for 
> the
> delayed refs that could be generated.  This is a niave approach and will
> probably evolve, but for now it works.
> 
> With this patchset we've gone down to 2-8 failures per week.  It's not 
> perfect,
> there are some corner cases that still need to be addressed, but is
> significantly better than what we had.  Thanks,

Parts of this cover letter could go to the main patch 5/6 to extend the
background why the change is made.

As the other patchsets build on top of this, I'll merge this to to
misc-next soon. The testing so far looks ok, some patches might need
fixups but nothing big that I can't do at commit time.

The patch 5/6 still seems too big, some parts could be split as
preparatory work, otherwise the main idea where to wire in the special
block reserve seems clear. If anybody wants to do a review there, please
do, I did only a high-level and did not check all the calculations etc.


Re: [PATCH 3/6] btrfs: cleanup extent_op handling

2018-11-23 Thread David Sterba
On Wed, Nov 21, 2018 at 01:59:09PM -0500, Josef Bacik wrote:
> From: Josef Bacik 
> 
> The cleanup_extent_op function actually would run the extent_op if it
> needed running, which made the name sort of a misnomer.  Change it to
> run_and_cleanup_extent_op, and move the actual cleanup work to
> cleanup_extent_op so it can be used by check_ref_cleanup() in order to
> unify the extent op handling.
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/extent-tree.c | 36 +++-
>  1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index e3ed3507018d..8a776dc9cb38 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2424,19 +2424,33 @@ static void unselect_delayed_ref_head(struct 
> btrfs_delayed_ref_root *delayed_ref
>   btrfs_delayed_ref_unlock(head);
>  }
>  
> -static int cleanup_extent_op(struct btrfs_trans_handle *trans,
> -  struct btrfs_delayed_ref_head *head)
> +static struct btrfs_delayed_extent_op *
> +cleanup_extent_op(struct btrfs_trans_handle *trans,
> +   struct btrfs_delayed_ref_head *head)

Please keep the type and function name on one line, if the arguments
would overflow, then move it on the next line and un-indent as
necessary. I fix such things when merging, no need to resend.

>  {
>   struct btrfs_delayed_extent_op *extent_op = head->extent_op;
> - int ret;
>  
>   if (!extent_op)
> - return 0;
> - head->extent_op = NULL;
> + return NULL;
> +
>   if (head->must_insert_reserved) {
> + head->extent_op = NULL;
>   btrfs_free_delayed_extent_op(extent_op);
> - return 0;
> + return NULL;
>   }
> + return extent_op;
> +}
> +
> +static int run_and_cleanup_extent_op(struct btrfs_trans_handle *trans,
> +  struct btrfs_delayed_ref_head *head)
> +{
> + struct btrfs_delayed_extent_op *extent_op =
> + cleanup_extent_op(trans, head);

I prefer to do non-trivial initializations in the statement block, it's
easy to overlook that in among the declarations. Simple variable init,
pointer dereferenes or using simple helpers is fine but cleanup_extent_op
does not seem to fit to that. It's in other patches too so I can update
that unless there are other things that would need a resend.


Re: [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper

2018-11-23 Thread David Sterba
On Thu, Nov 22, 2018 at 09:06:30AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/11/22 上午2:59, Josef Bacik wrote:
> > From: Josef Bacik 
> > 
> > We were missing some quota cleanups in check_ref_cleanup, so break the
> > ref head accounting cleanup into a helper and call that from both
> > check_ref_cleanup and cleanup_ref_head.  This will hopefully ensure that
> > we don't screw up accounting in the future for other things that we add.
> > 
> > Reviewed-by: Omar Sandoval 
> > Reviewed-by: Liu Bo 
> > Signed-off-by: Josef Bacik 
> > ---
> >  fs/btrfs/extent-tree.c | 67 
> > +-
> >  1 file changed, 39 insertions(+), 28 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index c36b3a42f2bb..e3ed3507018d 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -2443,6 +2443,41 @@ static int cleanup_extent_op(struct 
> > btrfs_trans_handle *trans,
> > return ret ? ret : 1;
> >  }
> >  
> > +static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans,
> > +   struct btrfs_delayed_ref_head *head)
> > +{
> > +   struct btrfs_fs_info *fs_info = trans->fs_info;
> > +   struct btrfs_delayed_ref_root *delayed_refs =
> > +   >transaction->delayed_refs;
> > +
> > +   if (head->total_ref_mod < 0) {
> > +   struct btrfs_space_info *space_info;
> > +   u64 flags;
> > +
> > +   if (head->is_data)
> > +   flags = BTRFS_BLOCK_GROUP_DATA;
> > +   else if (head->is_system)
> > +   flags = BTRFS_BLOCK_GROUP_SYSTEM;
> > +   else
> > +   flags = BTRFS_BLOCK_GROUP_METADATA;
> > +   space_info = __find_space_info(fs_info, flags);
> > +   ASSERT(space_info);
> > +   percpu_counter_add_batch(_info->total_bytes_pinned,
> > +  -head->num_bytes,
> > +  BTRFS_TOTAL_BYTES_PINNED_BATCH);
> > +
> > +   if (head->is_data) {
> > +   spin_lock(_refs->lock);
> > +   delayed_refs->pending_csums -= head->num_bytes;
> > +   spin_unlock(_refs->lock);
> > +   }
> > +   }
> > +
> > +   /* Also free its reserved qgroup space */
> > +   btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root,
> > + head->qgroup_reserved);
> 
> This part will be removed in patch "[PATCH] btrfs: qgroup: Move reserved
> data account from btrfs_delayed_ref_head to btrfs_qgroup_extent_record".

We need to decide the order of merging if the patches touch the same
code. Your patch looks easier to refresh on top of this series, so
please hold on until this gets merged.


Re: [PATCH 1/6] btrfs: add btrfs_delete_ref_head helper

2018-11-23 Thread David Sterba
On Thu, Nov 22, 2018 at 11:42:28AM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.11.18 г. 11:12 ч., Nikolay Borisov wrote:
> > 
> > 
> > On 21.11.18 г. 20:59 ч., Josef Bacik wrote:
> >> From: Josef Bacik 
> >>
> >> We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
> >> into a helper and cleanup the calling functions.
> >>
> >> Signed-off-by: Josef Bacik 
> >> Reviewed-by: Omar Sandoval 
> >> ---
> >>  fs/btrfs/delayed-ref.c | 14 ++
> >>  fs/btrfs/delayed-ref.h |  3 ++-
> >>  fs/btrfs/extent-tree.c | 22 +++---
> >>  3 files changed, 19 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> >> index 9301b3ad9217..b3e4c9fcb664 100644
> >> --- a/fs/btrfs/delayed-ref.c
> >> +++ b/fs/btrfs/delayed-ref.c
> >> @@ -400,6 +400,20 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
> >>return head;
> >>  }
> >>  
> >> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> >> + struct btrfs_delayed_ref_head *head)
> >> +{
> >> +  lockdep_assert_held(_refs->lock);
> >> +  lockdep_assert_held(>lock);
> >> +
> >> +  rb_erase_cached(>href_node, _refs->href_root);
> >> +  RB_CLEAR_NODE(>href_node);
> >> +  atomic_dec(_refs->num_entries);
> >> +  delayed_refs->num_heads--;
> >> +  if (head->processing == 0)
> >> +  delayed_refs->num_heads_ready--;
> > 
> > num_heads_ready will never execute in cleanup_ref_head, since
> > processing == 0 only when the ref head is unselected. Perhaps those 2
> > lines shouldn't be in this function? I find it a bit confusing that if
> > processing is 0 we decrement num_heads_ready in check_ref_cleanup, but
> > in unselect_delayed_ref_head we set it to 0 and increment it.
> > 
> >> +}
> >> +
> >>  /*
> >>   * Helper to insert the ref_node to the tail or merge with tail.
> >>   *
> >> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> >> index 8e20c5cb5404..d2af974f68a1 100644
> >> --- a/fs/btrfs/delayed-ref.h
> >> +++ b/fs/btrfs/delayed-ref.h
> >> @@ -261,7 +261,8 @@ static inline void btrfs_delayed_ref_unlock(struct 
> >> btrfs_delayed_ref_head *head)
> >>  {
> >>mutex_unlock(>mutex);
> >>  }
> >> -
> >> +void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
> >> + struct btrfs_delayed_ref_head *head);
> >>  
> >>  struct btrfs_delayed_ref_head *btrfs_select_ref_head(
> >>struct btrfs_delayed_ref_root *delayed_refs);
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index d242a1174e50..c36b3a42f2bb 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -2474,12 +2474,9 @@ static int cleanup_ref_head(struct 
> >> btrfs_trans_handle *trans,
> >>spin_unlock(_refs->lock);
> >>return 1;
> >>}
> >> -  delayed_refs->num_heads--;
> >> -  rb_erase_cached(>href_node, _refs->href_root);
> >> -  RB_CLEAR_NODE(>href_node);
> >> +  btrfs_delete_ref_head(delayed_refs, head);
> >>spin_unlock(>lock);
> >>spin_unlock(_refs->lock);
> >> -  atomic_dec(_refs->num_entries);
> >>  
> >>trace_run_delayed_ref_head(fs_info, head, 0);
> >>  
> >> @@ -6984,22 +6981,9 @@ static noinline int check_ref_cleanup(struct 
> >> btrfs_trans_handle *trans,
> >>if (!mutex_trylock(>mutex))
> >>goto out;
> >>  
> >> -  /*
> >> -   * at this point we have a head with no other entries.  Go
> >> -   * ahead and process it.
> >> -   */
> >> -  rb_erase_cached(>href_node, _refs->href_root);
> >> -  RB_CLEAR_NODE(>href_node);
> >> -  atomic_dec(_refs->num_entries);
> >> -
> >> -  /*
> >> -   * we don't take a ref on the node because we're removing it from the
> >> -   * tree, so we just steal the ref the tree was holding.
> >> -   */
> >> -  delayed_refs->num_heads--;
> >> -  if (head->processing == 0)
> >> -  delayed_refs->num_heads_ready--;
> >> +  btrfs_delete_ref_head(delayed_refs, head);
> >>head->processing = 0;
> 
> On a closer inspection I think here we can do:
> 
> ASSERT(head->processing == 0) because at that point we've taken the
> head->lock spinlock which is held during ordinary delayed refs
> processing (in __btrfs_run_delayed_refs) when the head is selected (and
> processing is 1). So head->processing == 0 here I think is a hard
> invariant of the code. The decrement here should pair with the increment
> when the head was initially added to the tree.
> 
> In cleanup_ref_head we don't need to ever worry about num_heads_ready
> since it has already been decremented by btrfs_select_ref_head.
> 
> As a matter fact this counter is not used anywhere so we might as well
> just remove it.

The logic does not use it, there's only a WARN_ON in
btrfs_select_ref_head, that's more like a debugging or assertion that
everything is fine. So the question is whether to keep it as a
consistency check (and add comments) or remove it and simplify the code.


Re: [PATCH] btrfs: remove btrfs_bio_end_io_t

2018-11-23 Thread David Sterba
On Fri, Nov 23, 2018 at 09:42:27AM +0100, Johannes Thumshirn wrote:
> The btrfs_bio_end_io_t typedef was introduced with commit
> a1d3c4786a4b ("btrfs: btrfs_multi_bio replaced with btrfs_bio")
> but never used anywhere. This commit also introduced a forward declaration
> of 'struct btrfs_bio' which is only needed for btrfs_bio_end_io_t.
> 
> Remove both as they're not needed anywhere.
> 
> Signed-off-by: Johannes Thumshirn 

Added to misc-next, thanks.


[PATCH 2/4] btrfs: replace async_cow::root with fs_info

2018-11-22 Thread David Sterba
The async_cow::root is used to propagate fs_info to async_cow_submit.
We can't use inode to reach it because it could become NULL after
write without compression in async_cow_start.

Signed-off-by: David Sterba 
---
 fs/btrfs/inode.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a88122b89f50..26b8bec7c2dc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -358,7 +358,7 @@ struct async_extent {
 
 struct async_cow {
struct inode *inode;
-   struct btrfs_root *root;
+   struct btrfs_fs_info *fs_info;
struct page *locked_page;
u64 start;
u64 end;
@@ -1144,13 +1144,11 @@ static noinline void async_cow_submit(struct btrfs_work 
*work)
 {
struct btrfs_fs_info *fs_info;
struct async_cow *async_cow;
-   struct btrfs_root *root;
unsigned long nr_pages;
 
async_cow = container_of(work, struct async_cow, work);
 
-   root = async_cow->root;
-   fs_info = root->fs_info;
+   fs_info = async_cow->fs_info;
nr_pages = (async_cow->end - async_cow->start + PAGE_SIZE) >>
PAGE_SHIFT;
 
@@ -1179,7 +1177,6 @@ static int cow_file_range_async(struct inode *inode, 
struct page *locked_page,
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct async_cow *async_cow;
-   struct btrfs_root *root = BTRFS_I(inode)->root;
unsigned long nr_pages;
u64 cur_end;
 
@@ -1189,7 +1186,7 @@ static int cow_file_range_async(struct inode *inode, 
struct page *locked_page,
async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
BUG_ON(!async_cow); /* -ENOMEM */
async_cow->inode = igrab(inode);
-   async_cow->root = root;
+   async_cow->fs_info = fs_info;
async_cow->locked_page = locked_page;
async_cow->start = start;
async_cow->write_flags = write_flags;
-- 
2.19.1



[PATCH 3/4] btrfs: remove redundant csum buffer in btrfs_io_bio

2018-11-22 Thread David Sterba
The io_bio tracks checksums and has an inline buffer or an allocated
one. And there's a third member that points to the right one, but we
don't need to use an extra pointer for that. Let btrfs_io_bio::csum
point to the right buffer and check that the inline buffer is not
accidentally freed.

This shrinks struct btrfs_io_bio by 8 bytes.

Signed-off-by: David Sterba 
---
 fs/btrfs/file-item.c | 12 +++-
 fs/btrfs/volumes.h   |  1 -
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index ba74827beb32..1f2d0a6ab634 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -144,7 +144,10 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle 
*trans,
 
 static void btrfs_io_bio_endio_readpage(struct btrfs_io_bio *bio, int err)
 {
-   kfree(bio->csum_allocated);
+   if (bio->csum != bio->csum_inline) {
+   kfree(bio->csum);
+   bio->csum = NULL;
+   }
 }
 
 static blk_status_t __btrfs_lookup_bio_sums(struct inode *inode, struct bio 
*bio,
@@ -175,13 +178,12 @@ static blk_status_t __btrfs_lookup_bio_sums(struct inode 
*inode, struct bio *bio
nblocks = bio->bi_iter.bi_size >> inode->i_sb->s_blocksize_bits;
if (!dst) {
if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) {
-   btrfs_bio->csum_allocated = kmalloc_array(nblocks,
-   csum_size, GFP_NOFS);
-   if (!btrfs_bio->csum_allocated) {
+   btrfs_bio->csum = kmalloc_array(nblocks, csum_size,
+   GFP_NOFS);
+   if (!btrfs_bio->csum) {
btrfs_free_path(path);
return BLK_STS_RESOURCE;
}
-   btrfs_bio->csum = btrfs_bio->csum_allocated;
btrfs_bio->end_io = btrfs_io_bio_endio_readpage;
} else {
btrfs_bio->csum = btrfs_bio->csum_inline;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1d936ce282c3..9a764f2d462e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -274,7 +274,6 @@ struct btrfs_io_bio {
u64 logical;
u8 *csum;
u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
-   u8 *csum_allocated;
btrfs_io_bio_end_io_t *end_io;
struct bvec_iter iter;
/*
-- 
2.19.1



[PATCH 4/4] btrfs: replace btrfs_io_bio::end_io with a simple helper

2018-11-22 Thread David Sterba
The end_io callback implemented as btrfs_io_bio_endio_readpage only
calls kfree. Also the callback is set only in case the csum buffer is
allocated and not pointing to the inline buffer. We can use that
information to drop the indirection and call a helper that will free the
csums only in the right case.

This shrinks struct btrfs_io_bio by 8 bytes.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c |  3 +--
 fs/btrfs/file-item.c |  9 -
 fs/btrfs/inode.c |  7 ++-
 fs/btrfs/volumes.h   | 10 --
 4 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4ea808d6cfbc..aef3c9866ff0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2623,8 +2623,7 @@ static void end_bio_extent_readpage(struct bio *bio)
if (extent_len)
endio_readpage_release_extent(tree, extent_start, extent_len,
  uptodate);
-   if (io_bio->end_io)
-   io_bio->end_io(io_bio, blk_status_to_errno(bio->bi_status));
+   btrfs_io_bio_free_csum(io_bio);
bio_put(bio);
 }
 
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 1f2d0a6ab634..920bf3b4b0ef 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -142,14 +142,6 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle 
*trans,
return ret;
 }
 
-static void btrfs_io_bio_endio_readpage(struct btrfs_io_bio *bio, int err)
-{
-   if (bio->csum != bio->csum_inline) {
-   kfree(bio->csum);
-   bio->csum = NULL;
-   }
-}
-
 static blk_status_t __btrfs_lookup_bio_sums(struct inode *inode, struct bio 
*bio,
   u64 logical_offset, u32 *dst, int dio)
 {
@@ -184,7 +176,6 @@ static blk_status_t __btrfs_lookup_bio_sums(struct inode 
*inode, struct bio *bio
btrfs_free_path(path);
return BLK_STS_RESOURCE;
}
-   btrfs_bio->end_io = btrfs_io_bio_endio_readpage;
} else {
btrfs_bio->csum = btrfs_bio->csum_inline;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 26b8bec7c2dc..6bfd37e58924 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8017,9 +8017,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
 
dio_bio->bi_status = err;
dio_end_io(dio_bio);
-
-   if (io_bio->end_io)
-   io_bio->end_io(io_bio, blk_status_to_errno(err));
+   btrfs_io_bio_free_csum(io_bio);
bio_put(bio);
 }
 
@@ -8372,8 +8370,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, 
struct inode *inode,
if (!ret)
return;
 
-   if (io_bio->end_io)
-   io_bio->end_io(io_bio, ret);
+   btrfs_io_bio_free_csum(io_bio);
 
 free_ordered:
/*
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9a764f2d462e..a13045fcfc45 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -267,14 +267,12 @@ struct btrfs_fs_devices {
  * we allocate are actually btrfs_io_bios.  We'll cram as much of
  * struct btrfs_bio as we can into this over time.
  */
-typedef void (btrfs_io_bio_end_io_t) (struct btrfs_io_bio *bio, int err);
 struct btrfs_io_bio {
unsigned int mirror_num;
unsigned int stripe_index;
u64 logical;
u8 *csum;
u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
-   btrfs_io_bio_end_io_t *end_io;
struct bvec_iter iter;
/*
 * This member must come last, bio_alloc_bioset will allocate enough
@@ -288,6 +286,14 @@ static inline struct btrfs_io_bio *btrfs_io_bio(struct bio 
*bio)
return container_of(bio, struct btrfs_io_bio, bio);
 }
 
+static inline void btrfs_io_bio_free_csum(struct btrfs_io_bio *io_bio)
+{
+   if (io_bio->csum != io_bio->csum_inline) {
+   kfree(io_bio->csum);
+   io_bio->csum = NULL;
+   }
+}
+
 struct btrfs_bio_stripe {
struct btrfs_device *dev;
u64 physical;
-- 
2.19.1



[PATCH 0/4] Minor helper and struct member cleanups

2018-11-22 Thread David Sterba
Assorted updates to code vaguely related to the bio callbacks and
structures. Remove call indirection, remove redundant struct members,
etc.

David Sterba (4):
  btrfs: merge btrfs_submit_bio_done to its caller
  btrfs: replace async_cow::root with fs_info
  btrfs: remove redundant csum buffer in btrfs_io_bio
  btrfs: replace btrfs_io_bio::end_io with a simple helper

 fs/btrfs/disk-io.c   | 18 +-
 fs/btrfs/extent_io.c |  3 +--
 fs/btrfs/file-item.c | 13 +++--
 fs/btrfs/inode.c | 39 +--
 fs/btrfs/volumes.h   | 11 ---
 5 files changed, 34 insertions(+), 50 deletions(-)

-- 
2.19.1



[PATCH 1/4] btrfs: merge btrfs_submit_bio_done to its caller

2018-11-22 Thread David Sterba
There's one caller and its code is simple, we can open code it in
run_one_async_done. The errors are passed through bio.

Signed-off-by: David Sterba 
---
 fs/btrfs/disk-io.c | 18 +-
 fs/btrfs/inode.c   | 23 ---
 2 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index feb67dfd663d..2f5c5442cf00 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -764,11 +764,22 @@ static void run_one_async_start(struct btrfs_work *work)
async->status = ret;
 }
 
+/*
+ * In order to insert checksums into the metadata in large chunks, we wait
+ * until bio submission time.   All the pages in the bio are checksummed and
+ * sums are attached onto the ordered extent record.
+ *
+ * At IO completion time the csums attached on the ordered extent record are
+ * inserted into the tree.
+ */
 static void run_one_async_done(struct btrfs_work *work)
 {
struct async_submit_bio *async;
+   struct inode *inode;
+   blk_status_t ret;
 
async = container_of(work, struct  async_submit_bio, work);
+   inode = async->private_data;
 
/* If an error occurred we just want to clean up the bio and move on */
if (async->status) {
@@ -777,7 +788,12 @@ static void run_one_async_done(struct btrfs_work *work)
return;
}
 
-   btrfs_submit_bio_done(async->private_data, async->bio, 
async->mirror_num);
+   ret = btrfs_map_bio(btrfs_sb(inode->i_sb), async->bio,
+   async->mirror_num, 1);
+   if (ret) {
+   async->bio->bi_status = ret;
+   bio_endio(async->bio);
+   }
 }
 
 static void run_one_async_free(struct btrfs_work *work)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9becf8543489..a88122b89f50 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1924,29 +1924,6 @@ static blk_status_t btrfs_submit_bio_start(void 
*private_data, struct bio *bio,
return 0;
 }
 
-/*
- * in order to insert checksums into the metadata in large chunks,
- * we wait until bio submission time.   All the pages in the bio are
- * checksummed and sums are attached onto the ordered extent record.
- *
- * At IO completion time the cums attached on the ordered extent record
- * are inserted into the btree
- */
-blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio,
- int mirror_num)
-{
-   struct inode *inode = private_data;
-   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-   blk_status_t ret;
-
-   ret = btrfs_map_bio(fs_info, bio, mirror_num, 1);
-   if (ret) {
-   bio->bi_status = ret;
-   bio_endio(bio);
-   }
-   return ret;
-}
-
 /*
  * extent_io.c submission hook. This does the right thing for csum calculation
  * on write, or reading the csums from the tree before a read.
-- 
2.19.1



Re: [PATCH v2] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation

2018-11-22 Thread David Sterba
On Thu, Nov 22, 2018 at 09:46:44PM +0800, Qu Wenruo wrote:
> >>> it and after
> >>> we started (or joined) a transaction, a lot could of modifications may
> >>> have happened.
> >>> Nevertheless I don't think it's unexpected for anyone to have the
> >>> accounting happening
> >>> only after the quota enable ioctl returns success.
> >>
> >> The problem is not accounting, the qgroup number won't cause problem.
> >>
> >> It's the reserved space. Like some dirty pages are dirtied before quota
> >> enabled, but at run_dealloc() time quota is enabled.
> >>
> >> For such case we have io_tree based method to avoid underflow so it
> >> should be OK.
> >>
> >> So v2 patch looks OK.
> > 
> > Does that mean reviewed-by? In case there's a evolved discussion under a
> > patch, a clear yes/no is appreciated and an explicit Reviewed-by even
> > more. I'm about to add this patch to rc4 pull, thre's still some time to
> > add the tag. Thanks.
> 
> I'd like to add reviewed-by tab, but I'm still not 100% if this will
> cause extra qgroup reserved space related problem.
> 
> At least from my side, I can't directly see a case where it will cause
> problem.
> 
> Does such case mean a reviewed-by tag? Or something LGTM-but-uncertain?

It means that we can keep the patch in testing branch for a bit longer,
there's still time to put it to a later rc once there's enough
certainty.


Re: [PATCH] btrfs: improve error handling of btrfs_add_link()

2018-11-22 Thread David Sterba
On Thu, Nov 22, 2018 at 02:44:35PM +0100, Johannes Thumshirn wrote:
> On 22/11/2018 14:35, David Sterba wrote:
> > On Thu, Nov 22, 2018 at 02:15:28PM +0100, Johannes Thumshirn wrote:
> >> err holds the return value of either btrfs_del_root_ref() or
> >> btrfs_del_inode_ref() but it hasn't been checked since it's
> >> introduction with commit fe66a05a0679 (Btrfs: improve error handling
> >> for btrfs_insert_dir_item callers) in 2012.
> >>
> >> The first attempt in removing the variable was rejected, so instead of
> >> removing 'err', start handling the errors of
> >> btrfs_del_root_ref()/btrfs_del_inode_ref() by aborting the transaction in
> >> btrfs_add_link() directly instead of the call-sites.
> > 
> > ... which is an anti-pattern.
> > 
> > https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_handling_and_transaction_abort
> > 
> > In case there are several conditons that can fail we want to see the
> > first one for later analysis and debugging.
> 
> OK, but then handling the error of either btrfs_del_root_ref() or
> btrfs_del_inode_ref() will essentially only be printing an error message
> as anything else would hide the error condition which has led us into
> this path (i.e. btrfs_insert_dir_item() returning EEXISTS or EOVERFLOW),
> wouldn't it?

So this seems to point out there are more things to resolve here:

* how to recover from errors of btrfs_del_root_ref or btrfs_del_inode_ref
* what to report back to the user

When insert_dir_item fails, there are the inode references to be cleaned
up, and error reported as EEXIST or EOVERFLOW.

When btrfs_del_root_ref or btrfs_del_inode_ref fail, the cleanup is not
possible and there's no other option than the transaction abort. As the
original reason was the overflow, this should be reported to back to the
user.

The reason of the del_*_ref failures can be misleading, in case it's eg.
ENOMEM. This would mean that the add-link operation is possible but
there was not enough memory and restarting later would let it possibly
succeed.

So the missing part of error hanling is to add two "if (err) abort" and
then still return 'ret'. As this is not all obvious why, a comment would
be good there.


Re: [PATCH] btrfs: improve error handling of btrfs_add_link()

2018-11-22 Thread David Sterba
On Thu, Nov 22, 2018 at 02:15:28PM +0100, Johannes Thumshirn wrote:
> err holds the return value of either btrfs_del_root_ref() or
> btrfs_del_inode_ref() but it hasn't been checked since it's
> introduction with commit fe66a05a0679 (Btrfs: improve error handling
> for btrfs_insert_dir_item callers) in 2012.
> 
> The first attempt in removing the variable was rejected, so instead of
> removing 'err', start handling the errors of
> btrfs_del_root_ref()/btrfs_del_inode_ref() by aborting the transaction in
> btrfs_add_link() directly instead of the call-sites.

... which is an anti-pattern.

https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_handling_and_transaction_abort

In case there are several conditons that can fail we want to see the
first one for later analysis and debugging.


Re: [PATCH v2] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation

2018-11-22 Thread David Sterba
On Tue, Nov 20, 2018 at 08:32:42AM +0800, Qu Wenruo wrote:
> >>> @@ -1013,16 +1013,22 @@ int btrfs_quota_enable(struct btrfs_fs_info 
> >>> *fs_info)
> >>>   btrfs_abort_transaction(trans, ret);
> >>>   goto out_free_path;
> >>>   }
> >>> - spin_lock(_info->qgroup_lock);
> >>> - fs_info->quota_root = quota_root;
> >>> - set_bit(BTRFS_FS_QUOTA_ENABLED, _info->flags);
> >>> - spin_unlock(_info->qgroup_lock);
> >>>
> >>>   ret = btrfs_commit_transaction(trans);
> >>>   trans = NULL;
> >>>   if (ret)
> >>>   goto out_free_path;
> >>
> >> The main concern here is, if we don't set qgroup enabled bit before we
> >> commit transaction, there will be a window where new tree modification
> >> could happen before QGROUP_ENABLED bit set.
> > 
> > That doesn't seem to make much sense to me, if I understood correctly.
> > Essentially you're saying stuff done to any tree in the the
> > transaction we use to
> > enable quotas must be accounted for. In that case the quota enabled bit 
> > should
> > be done as soon as the transaction is started, because before we set
> > it and after
> > we started (or joined) a transaction, a lot could of modifications may
> > have happened.
> > Nevertheless I don't think it's unexpected for anyone to have the
> > accounting happening
> > only after the quota enable ioctl returns success.
> 
> The problem is not accounting, the qgroup number won't cause problem.
> 
> It's the reserved space. Like some dirty pages are dirtied before quota
> enabled, but at run_dealloc() time quota is enabled.
> 
> For such case we have io_tree based method to avoid underflow so it
> should be OK.
> 
> So v2 patch looks OK.

Does that mean reviewed-by? In case there's a evolved discussion under a
patch, a clear yes/no is appreciated and an explicit Reviewed-by even
more. I'm about to add this patch to rc4 pull, thre's still some time to
add the tag. Thanks.


Re: [PATCH] Btrfs: fix access to available allocation bits when starting balance

2018-11-21 Thread David Sterba
On Mon, Nov 19, 2018 at 09:48:12AM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> The available allocation bits members from struct btrfs_fs_info are
> protected by a sequence lock, and when starting balance we access them
> incorrectly in two different ways:
> 
> 1) In the read sequence lock loop at btrfs_balance() we use the values we
>read from fs_info->avail_*_alloc_bits and we can immediately do actions
>that have side effects and can not be undone (printing a message and
>jumping to a label). This is wrong because a retry might be needed, so
>our actions must not have side effects and must be repeatable as long
>as read_seqretry() returns a non-zero value. In other words, we were
>essentially ignoring the sequence lock;
> 
> 2) Right below the read sequence lock loop, we were reading the values
>from avail_metadata_alloc_bits and avail_data_alloc_bits without any
>protection from concurrent writers, that is, reading them outside of
>the read sequence lock critical section.
> 
> So fix this by making sure we only read the available allocation bits
> while in a read sequence lock critical section and that what we do in the
> critical section is repeatable (has nothing that can not be undone) so
> that any eventual retry that is needed is handled properly.
> 
> Fixes: de98ced9e743 ("Btrfs: use seqlock to protect fs_info->avail_{data, 
> metadata, system}_alloc_bits")
> Fixes: 14506127979a ("btrfs: fix a bogus warning when converting only data or 
> metadata")
> Signed-off-by: Filipe Manana 

Added to misc-next, thanks.


Re: [PATCH] Btrfs: fix race between enabling quotas and subvolume creation

2018-11-21 Thread David Sterba
On Mon, Nov 19, 2018 at 04:20:34PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> We have a race between enabling quotas end subvolume creation that cause
> subvolume creation to fail with -EINVAL, and the following diagram shows
> how it happens:
> 
>   CPU 0  CPU 1
> 
>  btrfs_ioctl()
>   btrfs_ioctl_quota_ctl()
>btrfs_quota_enable()
> mutex_lock(fs_info->qgroup_ioctl_lock)
> 
>   btrfs_ioctl()
>create_subvol()
> btrfs_qgroup_inherit()
>  -> save 
> fs_info->quota_root
> into quota_root
>  -> stores a NULL value
>  -> tries to lock the 
> mutex
> qgroup_ioctl_lock
> -> blocks waiting for
>the task at CPU0
> 
>-> sets BTRFS_FS_QUOTA_ENABLED in fs_info
>-> sets quota_root in fs_info->quota_root
>   (non-NULL value)
> 
>mutex_unlock(fs_info->qgroup_ioctl_lock)
> 
>  -> checks quota enabled
> flag is set
>  -> returns -EINVAL 
> because
> fs_info->quota_root 
> was
> NULL before it 
> acquired
> the mutex
> qgroup_ioctl_lock
>-> ioctl returns -EINVAL
> 
> Returning -EINVAL to user space will be confusing if all the arguments
> passed to the subvolume creation ioctl were valid.
> 
> Fix it by grabbing the value from fs_info->quota_root after acquiring
> the mutex.
> 
> Signed-off-by: Filipe Manana 

Reviewed-by: David Sterba 

Added to misc-next, thanks.


Re: [PATCH v6 0/3] btrfs: balance: improve kernel logs

2018-11-20 Thread David Sterba
On Tue, Nov 20, 2018 at 04:12:54PM +0800, Anand Jain wrote:
> v5->v6:
>  Mostly the defines non functional changes.
>  Use goto instead of return in middle of the define.
>  Pls ref individual patches 1/3 and 2/3 for more info.
> 
> v4->v5:
>  Mainly address David review comment [1].
>  [1]
>   https://patchwork.kernel.org/patch/10425987/
>  pls ref to individual patch 2/3 for details.
> 
> v3->v4:
>  Pls ref to individual patches.
> 
> Based on misc-next.
> 
> v2->v3:
>   Inspried by describe_relocation(), improves it, makes it a helper
>   function and use it to log the balance operations.
> 
> Kernel logs are very important for the forensic investigations of the
> issues, these patchs make balance logs easy to review.
> 
> Anand Jain (3):
>   btrfs: add helper function describe_block_group()
>   btrfs: balance: add args info during start and resume
>   btrfs: balance: add kernel log for end or paused

Now in topic branch ext/anand/balance-description in my devel repos, I
made some changes as I don't want to take the rounds through mail. There
are some cosmetic changes like reordering the balance args, added
comments and aligned the \ even farther to the right. Please check the
committed patches for some inspiration.

I'll add that to misc-next after passes some tests, until then it'll be
in for-next. Thanks.


Re: [PATCH RESEND 0/9 v2] fix warn_on for replace cancel

2018-11-20 Thread David Sterba
On Tue, Nov 20, 2018 at 07:56:14PM +0800, Anand Jain wrote:
> These two patches were sent as part of
>[PATCH 0/9 v2] fix replace-start and replace-cancel racing
> before but as these aren't integrated so I am sending these again.
> 
> The patch [1] which is in misc-next, calls btrfs_dev_replace_finishing()
> after replace is canceled, so the ret argument passed to the
> btrfs_dev_replace_finishing() is -ECANCEL, which is not a real error.
> So these patches quieten the warn and error log if its -ECANCEL.
> 
> These should be integrated otherwise we see the WARN_ON and btrfs_error()
> after replace cancel.
> 
> [1] 
> 08bdfc3a0611 btrfs: fix use-after-free due to race between replace start and 
> cancel
> 
> Anand Jain (2):
>   btrfs: quieten warn if the replace is canceled at finish
>   btrfs: user requsted replace cancel is not an error

Thanks, patches added to misc-next, right after the other dev-replace
patches.


Re: [PATCH RESEND v2 1/2] btrfs: quieten warn if replace is canceled at finish

2018-11-20 Thread David Sterba
On Tue, Nov 20, 2018 at 07:56:15PM +0800, Anand Jain wrote:
> When we successfully cancel the replace its scrub returns -ECANCELED,
> which then passed to btrfs_dev_replace_finishing(), it cleans up based
> on the scrub returned status and propagates the same -ECANCELED back
> the parent function. As of now only user can cancel the replace-scrub,
> so its ok to quieten the warn here.
> 
> Signed-off-by: Anand Jain 

Ok for getting rid if the ECANCELED warning, though it would be better
to replace the WARN_ON too.

Reviewed-by: David Sterba 


[PATCH 4/4] btrfs: dev-replace: open code trivial locking helpers

2018-11-20 Thread David Sterba
The dev-replace locking functions are now trivial wrappers around rw
semaphore that can be used directly everywhere. No functional change.

Signed-off-by: David Sterba 
---
 fs/btrfs/dev-replace.c | 81 --
 fs/btrfs/dev-replace.h |  4 ---
 fs/btrfs/reada.c   | 12 +++
 fs/btrfs/scrub.c   | 15 
 fs/btrfs/volumes.c | 14 
 5 files changed, 52 insertions(+), 74 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index a37af7aaca86..9b8ab26deb41 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -284,13 +284,13 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle 
*trans,
struct btrfs_dev_replace_item *ptr;
struct btrfs_dev_replace *dev_replace = _info->dev_replace;
 
-   btrfs_dev_replace_read_lock(dev_replace);
+   down_read(_replace->rwsem);
if (!dev_replace->is_valid ||
!dev_replace->item_needs_writeback) {
-   btrfs_dev_replace_read_unlock(dev_replace);
+   up_read(_replace->rwsem);
return 0;
}
-   btrfs_dev_replace_read_unlock(dev_replace);
+   up_read(_replace->rwsem);
 
key.objectid = 0;
key.type = BTRFS_DEV_REPLACE_KEY;
@@ -348,7 +348,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
ptr = btrfs_item_ptr(eb, path->slots[0],
 struct btrfs_dev_replace_item);
 
-   btrfs_dev_replace_write_lock(dev_replace);
+   down_write(_replace->rwsem);
if (dev_replace->srcdev)
btrfs_set_dev_replace_src_devid(eb, ptr,
dev_replace->srcdev->devid);
@@ -371,7 +371,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
btrfs_set_dev_replace_cursor_right(eb, ptr,
dev_replace->cursor_right);
dev_replace->item_needs_writeback = 0;
-   btrfs_dev_replace_write_unlock(dev_replace);
+   up_write(_replace->rwsem);
 
btrfs_mark_buffer_dirty(eb);
 
@@ -432,7 +432,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info 
*fs_info,
}
 
need_unlock = true;
-   btrfs_dev_replace_write_lock(dev_replace);
+   down_write(_replace->rwsem);
switch (dev_replace->replace_state) {
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
@@ -470,7 +470,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info 
*fs_info,
dev_replace->item_needs_writeback = 1;
atomic64_set(_replace->num_write_errors, 0);
atomic64_set(_replace->num_uncorrectable_read_errors, 0);
-   btrfs_dev_replace_write_unlock(dev_replace);
+   up_write(_replace->rwsem);
need_unlock = false;
 
ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
@@ -484,7 +484,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info 
*fs_info,
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
need_unlock = true;
-   btrfs_dev_replace_write_lock(dev_replace);
+   down_write(_replace->rwsem);
dev_replace->replace_state =
BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED;
dev_replace->srcdev = NULL;
@@ -511,7 +511,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info 
*fs_info,
 
 leave:
if (need_unlock)
-   btrfs_dev_replace_write_unlock(dev_replace);
+   up_write(_replace->rwsem);
btrfs_destroy_dev_replace_tgtdev(tgt_device);
return ret;
 }
@@ -579,18 +579,18 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
/* don't allow cancel or unmount to disturb the finishing procedure */
mutex_lock(_replace->lock_finishing_cancel_unmount);
 
-   btrfs_dev_replace_read_lock(dev_replace);
+   down_read(_replace->rwsem);
/* was the operation canceled, or is it finished? */
if (dev_replace->replace_state !=
BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED) {
-   btrfs_dev_replace_read_unlock(dev_replace);
+   up_read(_replace->rwsem);
mutex_unlock(_replace->lock_finishing_cancel_unmount);
return 0;
}
 
tgt_device = dev_replace->tgtdev;
src_device = dev_replace->srcdev;
-   btrfs_dev_replace_read_unlock(dev_replace);
+   up_read(_replace->rwsem);
 
/*
 * flush all outstanding I/O and inode extent mappings before the
@@ -614,7 +614,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
/* keep away write_all_supers() during the finishing procedure */
mutex_lock(_info->fs_devices->device_list_mutex);
mutex_lock(_info->chunk_mutex);
-   btrfs_dev_replace_write_lock(dev_r

[PATCH 1/4] btrfs: reada: reorder dev-replace locks before radix tree preload

2018-11-20 Thread David Sterba
The read lock is going to use rw semaphore that might sleep, this is not
possible in the radix tree preload section. The lock nesting is now:

* device replace
  * radix tree preload
* readahead spinlock

Signed-off-by: David Sterba 
---
 fs/btrfs/reada.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index dec14b739b10..6c4fdd98582d 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -376,26 +376,28 @@ static struct reada_extent *reada_find_extent(struct 
btrfs_fs_info *fs_info,
goto error;
}
 
+   /* insert extent in reada_tree + all per-device trees, all or nothing */
+   btrfs_dev_replace_read_lock(_info->dev_replace);
ret = radix_tree_preload(GFP_KERNEL);
-   if (ret)
+   if (ret) {
+   btrfs_dev_replace_read_unlock(_info->dev_replace);
goto error;
+   }
 
-   /* insert extent in reada_tree + all per-device trees, all or nothing */
-   btrfs_dev_replace_read_lock(_info->dev_replace);
spin_lock(_info->reada_lock);
ret = radix_tree_insert(_info->reada_tree, index, re);
if (ret == -EEXIST) {
re_exist = radix_tree_lookup(_info->reada_tree, index);
re_exist->refcnt++;
spin_unlock(_info->reada_lock);
-   btrfs_dev_replace_read_unlock(_info->dev_replace);
radix_tree_preload_end();
+   btrfs_dev_replace_read_unlock(_info->dev_replace);
goto error;
}
if (ret) {
spin_unlock(_info->reada_lock);
-   btrfs_dev_replace_read_unlock(_info->dev_replace);
radix_tree_preload_end();
+   btrfs_dev_replace_read_unlock(_info->dev_replace);
goto error;
}
radix_tree_preload_end();
-- 
2.19.1



[PATCH 2/4] btrfs: dev-replace: swich locking to rw semaphore

2018-11-20 Thread David Sterba
The this is first part of removing the custom locking and waiting scheme
used for device replace. It was probably copied from extent buffer
locking, but there's nothing that would require more than is provided by
the common locking primitives.

The rw spinlock protects waiting tasks counter in case of incompatible
locks and the waitqueue. Same as rw semaphore.

This patch only switches the locking primitive, for better
bisectability.  There should be no functional change other than the
overhead of the locking and potential sleeping instead of spinning when
the lock is contended.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/dev-replace.c | 12 ++--
 fs/btrfs/disk-io.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b58c5e73e458..01efe3849968 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -360,7 +360,7 @@ struct btrfs_dev_replace {
struct btrfs_device *tgtdev;
 
struct mutex lock_finishing_cancel_unmount;
-   rwlock_t lock;
+   struct rw_semaphore rwsem;
atomic_t blocking_readers;
wait_queue_head_t read_lock_wq;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 85d93bd3b27a..387f85fcc26e 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1001,12 +1001,12 @@ int btrfs_dev_replace_is_ongoing(struct 
btrfs_dev_replace *dev_replace)
 
 void btrfs_dev_replace_read_lock(struct btrfs_dev_replace *dev_replace)
 {
-   read_lock(_replace->lock);
+   down_read(_replace->rwsem);
 }
 
 void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace)
 {
-   read_unlock(_replace->lock);
+   up_read(_replace->rwsem);
 }
 
 void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace)
@@ -1014,16 +1014,16 @@ void btrfs_dev_replace_write_lock(struct 
btrfs_dev_replace *dev_replace)
 again:
wait_event(dev_replace->read_lock_wq,
   atomic_read(_replace->blocking_readers) == 0);
-   write_lock(_replace->lock);
+   down_write(_replace->rwsem);
if (atomic_read(_replace->blocking_readers)) {
-   write_unlock(_replace->lock);
+   up_write(_replace->rwsem);
goto again;
}
 }
 
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace)
 {
-   write_unlock(_replace->lock);
+   up_write(_replace->rwsem);
 }
 
 /* inc blocking cnt and release read lock */
@@ -1032,7 +1032,7 @@ void btrfs_dev_replace_set_lock_blocking(
 {
/* only set blocking for read lock */
atomic_inc(_replace->blocking_readers);
-   read_unlock(_replace->lock);
+   up_read(_replace->rwsem);
 }
 
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c1d127decc8d..38717aa9c47d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2119,7 +2119,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info 
*fs_info)
 static void btrfs_init_dev_replace_locks(struct btrfs_fs_info *fs_info)
 {
mutex_init(_info->dev_replace.lock_finishing_cancel_unmount);
-   rwlock_init(_info->dev_replace.lock);
+   init_rwsem(_info->dev_replace.rwsem);
atomic_set(_info->dev_replace.blocking_readers, 0);
init_waitqueue_head(_info->dev_replace.replace_wait);
init_waitqueue_head(_info->dev_replace.read_lock_wq);
-- 
2.19.1



[PATCH 0/4] Replace custom device-replace locking with rwsem

2018-11-20 Thread David Sterba
The first cleanup part went to 4.19, the actual switch from the custom
locking to rswem was postponed as I found performance degradation. This
turned out to be related to VM cache settings, so I'm resending the
series again.

The custom locking is based on rwlock protected reader/writer counters,
waitqueues, which essentially is what the readwrite semaphore does.

Previous patchset:
https://lore.kernel.org/linux-btrfs/cover.1536331604.git.dste...@suse.com/

Patches correspond to 8/11-11/11 and there's no change besides
refreshing on top of current misc-next.

David Sterba (4):
  btrfs: reada: reorder dev-replace locks before radix tree preload
  btrfs: dev-replace: swich locking to rw semaphore
  btrfs: dev-replace: remove custom read/write blocking scheme
  btrfs: dev-replace: open code trivial locking helpers

 fs/btrfs/ctree.h   |  4 +-
 fs/btrfs/dev-replace.c | 97 ++
 fs/btrfs/dev-replace.h |  5 ---
 fs/btrfs/disk-io.c |  4 +-
 fs/btrfs/reada.c   | 16 ---
 fs/btrfs/scrub.c   | 15 ---
 fs/btrfs/volumes.c | 27 ++--
 7 files changed, 63 insertions(+), 105 deletions(-)

-- 
2.19.1



[PATCH 3/4] btrfs: dev-replace: remove custom read/write blocking scheme

2018-11-20 Thread David Sterba
After the rw semaphore has been added, the custom blocking using
::blocking_readers and ::read_lock_wq is redundant.

The blocking logic in __btrfs_map_block is replaced by extending the
time the semaphore is held, that has the same blocking effect on writes
as the previous custom scheme that waited until ::blocking_readers was
zero.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/dev-replace.c | 16 
 fs/btrfs/dev-replace.h |  1 -
 fs/btrfs/disk-io.c |  2 --
 fs/btrfs/volumes.c | 13 ++---
 5 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 01efe3849968..b711d4ee4c83 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -361,8 +361,6 @@ struct btrfs_dev_replace {
 
struct mutex lock_finishing_cancel_unmount;
struct rw_semaphore rwsem;
-   atomic_t blocking_readers;
-   wait_queue_head_t read_lock_wq;
 
struct btrfs_scrub_progress scrub_progress;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 387f85fcc26e..a37af7aaca86 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1011,14 +1011,7 @@ void btrfs_dev_replace_read_unlock(struct 
btrfs_dev_replace *dev_replace)
 
 void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace)
 {
-again:
-   wait_event(dev_replace->read_lock_wq,
-  atomic_read(_replace->blocking_readers) == 0);
down_write(_replace->rwsem);
-   if (atomic_read(_replace->blocking_readers)) {
-   up_write(_replace->rwsem);
-   goto again;
-   }
 }
 
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace)
@@ -1026,15 +1019,6 @@ void btrfs_dev_replace_write_unlock(struct 
btrfs_dev_replace *dev_replace)
up_write(_replace->rwsem);
 }
 
-/* inc blocking cnt and release read lock */
-void btrfs_dev_replace_set_lock_blocking(
-   struct btrfs_dev_replace *dev_replace)
-{
-   /* only set blocking for read lock */
-   atomic_inc(_replace->blocking_readers);
-   up_read(_replace->rwsem);
-}
-
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
 {
percpu_counter_inc(_info->dev_replace.bio_counter);
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index 27e3bb0cca11..dd1dcb22c1e3 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -23,6 +23,5 @@ void btrfs_dev_replace_read_lock(struct btrfs_dev_replace 
*dev_replace);
 void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace);
-void btrfs_dev_replace_set_lock_blocking(struct btrfs_dev_replace 
*dev_replace);
 
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 38717aa9c47d..39fd33ef8f80 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2120,9 +2120,7 @@ static void btrfs_init_dev_replace_locks(struct 
btrfs_fs_info *fs_info)
 {
mutex_init(_info->dev_replace.lock_finishing_cancel_unmount);
init_rwsem(_info->dev_replace.rwsem);
-   atomic_set(_info->dev_replace.blocking_readers, 0);
init_waitqueue_head(_info->dev_replace.replace_wait);
-   init_waitqueue_head(_info->dev_replace.read_lock_wq);
 }
 
 static void btrfs_init_qgroup(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1f476669479b..dc09c6abaf6d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5759,10 +5759,12 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info,
 
btrfs_dev_replace_read_lock(dev_replace);
dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
+   /*
+* Hold the semaphore for read during the whole operation, write is
+* requested at commit time but must wait.
+*/
if (!dev_replace_is_ongoing)
btrfs_dev_replace_read_unlock(dev_replace);
-   else
-   btrfs_dev_replace_set_lock_blocking(dev_replace);
 
if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
!need_full_stripe(op) && dev_replace->tgtdev != NULL) {
@@ -5957,11 +5959,8 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info,
}
 out:
if (dev_replace_is_ongoing) {
-   ASSERT(atomic_read(_replace->blocking_readers) > 0);
-   btrfs_dev_replace_read_lock(dev_replace);
-   /* Barrier implied by atomic_dec_and_test */
-   if (atomic_dec_and_test(_replace->blocking_readers))
-   cond_wake_up_nomb(_replace->read_lock_wq);
+   lockdep_assert_held(_replace->rwsem);
+   /* Unlock and let waiting writers proceed */
btrfs_dev_replace_read_unlock(dev_replace);
}
free_extent_map(em);
-- 
2.19.1



Re: [PATCH v5 1/3] btrfs: add helper function describe_block_group()

2018-11-19 Thread David Sterba
On Wed, Nov 14, 2018 at 09:17:10PM +0800, Anand Jain wrote:
> Improve on describe_relocation() add a common helper function to describe
> the block groups.
> 
> Signed-off-by: Anand Jain 
> Reviewed-by: David Sterba 
> ---
> v4.1->v5: Initialize buf[128] to null.
> v4->v4.1: Use strcpy(buf, "|NONE"); as in the original
> v3->v4: Just pass full flag name in the define DESCRIBE_FLAG(flag,..),
>so that it can be used at couple of more places.
>   Rename describe_block_groups() to btrfs_describe_block_groups().
>   Drop useless return u32.
> v3:   Born.
> 
>  fs/btrfs/relocation.c | 30 +++---
>  fs/btrfs/volumes.c| 43 +++
>  fs/btrfs/volumes.h|  1 +
>  3 files changed, 47 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 924116f654a1..0373b3cc1d36 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4184,37 +4184,13 @@ static struct reloc_control *alloc_reloc_control(void)
>  static void describe_relocation(struct btrfs_fs_info *fs_info,
>   struct btrfs_block_group_cache *block_group)
>  {
> - char buf[128];  /* prefixed by a '|' that'll be dropped */
> - u64 flags = block_group->flags;
> + char buf[128] = {'\0'};
>  
> - /* Shouldn't happen */
> - if (!flags) {
> - strcpy(buf, "|NONE");
> - } else {
> - char *bp = buf;
> -
> -#define DESCRIBE_FLAG(f, d) \
> - if (flags & BTRFS_BLOCK_GROUP_##f) { \
> - bp += snprintf(bp, buf - bp + sizeof(buf), "|%s", d); \
> - flags &= ~BTRFS_BLOCK_GROUP_##f; \
> - }
> - DESCRIBE_FLAG(DATA, "data");
> - DESCRIBE_FLAG(SYSTEM,   "system");
> - DESCRIBE_FLAG(METADATA, "metadata");
> - DESCRIBE_FLAG(RAID0,"raid0");
> - DESCRIBE_FLAG(RAID1,"raid1");
> - DESCRIBE_FLAG(DUP,  "dup");
> - DESCRIBE_FLAG(RAID10,   "raid10");
> - DESCRIBE_FLAG(RAID5,"raid5");
> - DESCRIBE_FLAG(RAID6,"raid6");
> - if (flags)
> - snprintf(bp, buf - bp + sizeof(buf), "|0x%llx", flags);
> -#undef DESCRIBE_FLAG
> - }
> + btrfs_describe_block_groups(block_group->flags, buf, sizeof(buf));
>  
>   btrfs_info(fs_info,
>  "relocating block group %llu flags %s",
> -block_group->key.objectid, buf + 1);
> +block_group->key.objectid, buf);
>  }
>  
>  /*
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f435d397019e..12b3d0625c6a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -123,6 +123,49 @@ const char *get_raid_name(enum btrfs_raid_types type)
>   return btrfs_raid_array[type].raid_name;
>  }
>  
> +void btrfs_describe_block_groups(u64 bg_flags, char *buf, u32 size_buf)
> +{
> + int i;
> + int ret;
> + char *bp = buf;
> + u64 flags = bg_flags;
> + u32 size_bp = size_buf;
> +
> + if (!flags) {
> + strcpy(bp, "NONE");
> + return;
> + }
> +
> +#define DESCRIBE_FLAG(f, d) \

Macro arguments should be in ( ) in the body

> + do { \
> + if (flags & f) { \
> + ret = snprintf(bp, size_bp, "%s|", d); \
> + if (ret < 0 || ret >= size_bp) \
> + return; \

Ok, this seems to be safe. snprintf will not write more than size_bp and
this will be caught if it overflows.

The return is obscured by the macro, I'd rather make it more visible
that there is a potential change in the code flow. The proposed change:

goto out_overflow;
...

and define out_overflow at the end of function. Same for the other
helper macros.

Also please align the backslashes of the macro a few tabs to the right.

> + size_bp -= ret; \
> + bp += ret; \
> + flags &= ~f; \
> + } \
> + } while (0)


Re: [PATCH v5 2/3] btrfs: balance: add args info during start and resume

2018-11-19 Thread David Sterba
On Wed, Nov 14, 2018 at 09:17:11PM +0800, Anand Jain wrote:
> Balance arg info is an important information to be reviewed for the
> system audit. So this patch adds them to the kernel log.
> 
> Example:
> ->btrfs bal start -f -mprofiles=raid1,convert=single,soft 
> -dlimit=10..20,usage=50 /btrfs
> 
> kernel: BTRFS info (device sdb): balance: start -f -dusage=50,limit=10..20 
> -msoft,profiles=raid1,convert=single -ssoft,profiles=raid1,convert=single
> 
> Signed-off-by: Anand Jain 
> ---
> v4.1->v5: Per David review comment the code..
>bp += snprintf(bp, buf - bp + size_buf, "soft,");
> is not safe if 'buf - bp + size_buf' becomes accidentally
> negative, so fix this by using following snprintf.
> 
>  #define CHECK_UPDATE_BP_1(a) \
>do { \
>ret = snprintf(bp, size_bp, a); \
>if (ret < 0 || ret >= size_bp) \
>goto out; \
>size_bp -= ret; \
>bp += ret; \
>} while (0)
> 
>   And initialize the tmp_buf to null.
> 
> v4->v4.1: Rename last missed sz to size_buf.
> 
> v3->v4: Change log updated with new example.
>   Log format is changed to almost match with the cli.
>   snprintf drop %s for inline string.
>   Print range as =%u..%u instead of min=%umax=%u.
>   soft is under the args now.
>   force is printed as -f.
> 
> v2->v3: Change log updated.
> Make use of describe_block_group() added in 1/4
> Drop using strcat instead use snprintf.
> Logs string format updated as shown in the example above.
> 
> v1->v2: Change log update.
> Move adding the logs for balance complete and end to a new patch
> 
>  fs/btrfs/volumes.c | 172 
> -
>  1 file changed, 169 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 12b3d0625c6a..8397f72bdac7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3754,6 +3754,173 @@ static inline int validate_convert_profile(struct 
> btrfs_balance_args *bctl_arg,
>(bctl_arg->target & ~allowed)));
>  }
>  
> +static void describe_balance_args(struct btrfs_balance_args *bargs, char 
> *buf,
> +  u32 size_buf)
> +{
> + int ret;
> + u32 size_bp = size_buf;
> + char *bp = buf;
> + u64 flags = bargs->flags;
> + char tmp_buf[128] = {'\0'};
> +
> + if (!flags)
> + return;
> +
> +#define CHECK_UPDATE_BP_1(a) \

CHECK_UPDATE_BP_NOARG and the others CHECK_UPDATE_BP_1ARG, though I'm
thinking about picking another name for the macro like CHECK_APPEND_2ARG
etc.

> + do { \
> + ret = snprintf(bp, size_bp, a); \
> + if (ret < 0 || ret >= size_bp) \
> + goto out; \

out_overflow and align \ to the right, please.

> + size_bp -= ret; \
> + bp += ret; \
> + } while (0)
> +
> +#define CHECK_UPDATE_BP_2(a, v1) \
> + do { \
> + ret = snprintf(bp, size_bp, a, v1); \
> + if (ret < 0 || ret >= size_bp) \
> + goto out; \
> + size_bp -= ret; \
> + bp += ret; \
> + } while (0)
> +
> +#define CHECK_UPDATE_BP_3(a, v1, v2) \
> + do { \
> + ret = snprintf(bp, size_bp, a, v1, v2); \
> + if (ret < 0 || ret >= size_bp) \
> + goto out; \
> + size_bp -= ret; \
> + bp += ret; \
> + } while (0)
> +
> + if (flags & BTRFS_BALANCE_ARGS_SOFT) {
> + CHECK_UPDATE_BP_1("soft,");
> + }

no { } around single statement

> +
> + if (flags & BTRFS_BALANCE_ARGS_PROFILES) {
> + btrfs_describe_block_groups(bargs->profiles, tmp_buf,
> + sizeof(tmp_buf));
> + CHECK_UPDATE_BP_2("profiles=%s,", tmp_buf);
> + }
> +
> + if (flags & BTRFS_BALANCE_ARGS_USAGE) {
> + CHECK_UPDATE_BP_2("usage=%llu,", bargs->usage);
> + }
> +
> + if (flags & BTRFS_BALANCE_ARGS_USAGE_RANGE) {
> + CHECK_UPDATE_BP_3("usage=%u..%u,",
> +   bargs->usage_min, bargs->usage_max);
> + }
> +
> + if (flags & BTRFS_BALANCE_ARGS_DEVID) {
> + CHECK_UPDATE_BP_2("devid=%llu,", bargs->devid);
> + }
> +
> + if (flags & BTRFS_BALANCE_ARGS_DRANGE) {
> + CHECK_UPDATE_BP_3("drange=%llu..%llu,",
> +   bargs->pstart, bargs->pend);
> + }
> +
> + if (flags & BTRFS_BALANCE_ARGS_VRANGE) {
> + CHECK_UPDATE_BP_3("vrange%llu..%llu,",
> +   bargs->vstart, bargs->vend);
> + }
> +
> + if (flags & BTRFS_BALANCE_ARGS_LIMIT) {
> + CHECK_UPDATE_BP_2("limit=%llu,", bargs->limit);
> + }
> +
> + if (flags & BTRFS_BALANCE_ARGS_LIMIT_RANGE) {
> + CHECK_UPDATE_BP_3("limit=%u..%u,",
> + 

Re: [PATCH] btrfs: Add sysfs support for metadata_uuid feature

2018-11-19 Thread David Sterba
On Mon, Nov 19, 2018 at 05:37:45PM +0200, Nikolay Borisov wrote:
> Since the metadata_uuid is a new incompat feature it requires the
> respective sysfs hooks. This patch adds the 'metdata_uuid' feature to
> be shown if it supported by the kernel. Additionally it adds
> /sys/fs/btrfs/UUID/metadata_uuid attribute which allows one to read
> the current metadata_uuid.
> 
> Signed-off-by: Nikolay Borisov 
> ---
> 
> I completely forgot sysfs also needs to be hooked so here it is. 

Thanks,

Reviewed-by: David Sterba 

I'll add it right after the commit that introduces the METADATA_UUID
bit.


Re: [PATCH v2 0/5] btrfs: fix compiler warning with make W=1

2018-11-19 Thread David Sterba
On Mon, Nov 19, 2018 at 10:38:12AM +0100, Johannes Thumshirn wrote:
> This patchset fixes most of the compiler warnings encountered when building
> btrfs with make W=1.
> 
> There are two more compiler warnings left in raid56.c:
>   CC [M]  fs/btrfs/raid56.o
> fs/btrfs/raid56.c: In function ‘finish_rmw’:
> fs/btrfs/raid56.c:1185:6: warning: variable ‘p_stripe’ set but not used 
> [-Wunused-but-set-variable]
>   int p_stripe = -1;
>   ^
> fs/btrfs/raid56.c: In function ‘finish_parity_scrub’:
> fs/btrfs/raid56.c:2343:6: warning: variable ‘p_stripe’ set but not used 
> [-Wunused-but-set-variable]
>   int p_stripe = -1;
>   ^
> but I'm currently unsure how an appropriate fix would look like. As far as I
> can tell these variables have always been unused since they have been
> introduced.

I don't know either, it looks like they were caluclated as indexes to
the stripe array but not used. And I don't see that there would be
code missing them so removing them is probably ok. The RAID56 code
hasn't been touched for years so such gems can be found there.

> There are still warnings left emitted by kernel-doc but these are subject to
> another patchset, this one only addresses the warnings generated by gcc.

The kernel-doc warnings are an interesting topic. As there are no public
functions that would end up in some nice documentation, their use is
sparse and inconsistent and my firt idea was to drop them completely.

OTOH, if there are kdoc comments, the build checks will make sure the
argument names match the documentation, besides that we can also write
more comments to functions where it matters.

So, let's fix them and add more where the kdoc comments just lacks the
"/**" marker, and then incrementally update the rest.

> Changes to v1:
> * Added Reviews
> * Dropped already applied patches 4 + 5
> * Added EXPORT_FOR_TESTS + users
> 
> Johannes Thumshirn (5):
>   btrfs: remove unused drop_on_err in btrfs_mkdir()
>   btrfs: remove set but not used variable err in btrfs_add_link
>   btrfs: remove unused function btrfs_sysfs_feature_update()
>   btrfs: introduce EXPORT_FOR_TESTS macro
>   btrfs: use EXPORT_FOR_TESTS for conditionally shared functions

Patches 1, 4 and 5 merged, the other have comments.


  1   2   3   4   5   6   7   8   9   10   >