On Sun, Jun 15, 2025 at 12:27:40AM +0530, Bharadwaj Raju wrote:
> After cd3cdb1ef706 ("Single err message for btree node reads"),
> all errors caused __btree_err to return BCH_ERR_fsck_fix no matter what
> the actual error type was if the recovery pass was scanning for btree
> nodes. This lead to the code continuing despite things like bad node
> formats when they earlier would have caused a jump to fsck_err, because
> btree_err only jumps when the return from __btree_err does not match
> fsck_fix. Ultimately this lead to undefined behavior by attempting to
> unpack a key based on an invalid format.Hang on, -BCH_ERR_fsck_fix should've caused us to fix fixable errors, not cause undefined behaviour. Or is the issue that we're returning -BCH_ERR_fsck_fix for non-fixable errors? Glancing at the code, I think the bug might not be limited to btree node scan; we now seem to be passing FSCK_CAN_FIX for all errors in the non-btree-node-scan case, and I don't think that's right for BCH_ERR_btree_node_read_err_must_retry cases. But I'll have to go digging through the git history to confirm that, and it sounds like you've already looked - does that sound like it? > > Make errors of type BCH_ERR_btree_node_read_err_bad_node (only if > __bch2_topology_error) or BCH_ERR_btree_node_read_err_incompatible go > through the full __btree_err function instead of returning fsck_fix even > when we are in that recovery phase. > > Reported-by: [email protected] > Fixes: cd3cdb1ef706 ("bcachefs: Single err message for btree node reads") > Signed-off-by: Bharadwaj Raju <[email protected]> > --- > fs/bcachefs/btree_io.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c > index d8f3c4c65e90..e010ae94f1e1 100644 > --- a/fs/bcachefs/btree_io.c > +++ b/fs/bcachefs/btree_io.c > @@ -556,7 +556,10 @@ static int __btree_err(int ret, > struct printbuf *err_msg, > const char *fmt, ...) > { > - if (c->recovery.curr_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes) > + if (c->recovery.curr_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes && > + !(ret == -BCH_ERR_btree_node_read_err_bad_node && > + __bch2_topology_error(c, err_msg)) && > + ret != -BCH_ERR_btree_node_read_err_incompatible) > return bch_err_throw(c, fsck_fix); > > bool have_retry = false; > -- > 2.49.0 >
