On Thu, Mar 11, 2021 at 01:10:03PM -0500, Josef Bacik wrote: > On 2/26/21 1:30 PM, David Sterba wrote: > > On Wed, Dec 16, 2020 at 11:26:19AM -0500, Josef Bacik wrote: > >> Currently select_reloc_root() doesn't return an error, but followup > >> patches will make it possible for it to return an error. We do have > >> proper error recovery in do_relocation however, so handle the > >> possibility of select_reloc_root() having an error properly instead of > >> BUG_ON(!root). I've also adjusted select_reloc_root() to return > >> ERR_PTR(-ENOENT) if we don't find a root, instead of NULL, to make the > >> error case easier to deal with. I've replaced the BUG_ON(!root) with an > >> ASSERT(0) for this case as it indicates we messed up the backref walking > >> code, but it could also indicate corruption. > >> > >> Signed-off-by: Josef Bacik <jo...@toxicpanda.com> > >> --- > >> fs/btrfs/relocation.c | 15 ++++++++++++--- > >> 1 file changed, 12 insertions(+), 3 deletions(-) > >> > >> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > >> index 08ffaa93b78f..741068580455 100644 > >> --- a/fs/btrfs/relocation.c > >> +++ b/fs/btrfs/relocation.c > >> @@ -2024,8 +2024,14 @@ struct btrfs_root *select_reloc_root(struct > >> btrfs_trans_handle *trans, > >> if (!next || next->level <= node->level) > >> break; > >> } > >> - if (!root) > >> - return NULL; > >> + if (!root) { > >> + /* > >> + * This can happen if there's fs corruption or if there's a bug > >> + * in the backref lookup code. > >> + */ > >> + ASSERT(0); > > > > You've added these assert(0) to several patches and I think it's wrong. > > The asserts are supposed to verify invariants, you can hardly say that > > we're expecting 0 to be true, so the construct serves as "please crash > > here", which is no better than BUG(). It's been spreading, there are > > like 25 now. > > They are much better than a BUG_ON(), as they won't trip in production, > they'll > only trip for developers.
I'm not suggesting to use BUG_ON, only in rare cases (ideally). For developers what should blow up are conditions that validate the invariants, either in advance or after some action. > And we need them in many of these cases, and this > example you've called out is a clear example of where we absolutely want to > differentiate, because we have 3 different failure modes that will return > -EUCLEAN. If I add a ASSERT(ret != -EUCLEAN) to all of the callers then when > the developer hits a logic bug they'll have to go and manually add in their > own > assert to figure out which error condition failed. The idea is to add more conditions to differentiate if it's corrupted fs or if it's a development bug. But for testing I'd rather see a way we can exercise the corruption path, eg. fuzzing or crafted images, up to the point where EUCLEAN is encountered by some transaction abort or normal error. > Instead I've added > explanations in the comments for each assert and then have an assert for > every > error condition so that it's clear where things went wrong. There can be exceptional cases where distinguishing can't be done easily or at all so the comments are fine but I'd rather not encourage the ASSERT(0) pattern instead of thinking hard if it's really the right way to handle the errors. It too much resembles the BUG_ON() anti-pattern.