On Tue, Mar 18, 2025 at 10:26:03AM -0400, John Stoffel wrote: > >>>>> "Kent" == Kent Overstreet <[email protected]> writes: > > > On Mon, Mar 17, 2025 at 04:58:26PM -0400, John Stoffel wrote: > >> >>>>> "Alan" == Alan Huang <[email protected]> writes: > >> > >> > Now there are 16 journal buffers, 8 is too small to be enough. > >> > Signed-off-by: Alan Huang <[email protected]> > >> > --- > >> > fs/bcachefs/recovery.c | 2 +- > >> > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> > diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c > >> > index 71c786cdb192..a6e26733854d 100644 > >> > --- a/fs/bcachefs/recovery.c > >> > +++ b/fs/bcachefs/recovery.c > >> > @@ -899,7 +899,7 @@ int bch2_fs_recovery(struct bch_fs *c) > >> > * journal sequence numbers: > >> > */ > >> > if (!c->sb.clean) > >> > - journal_seq += 8; > >> > + journal_seq += JOURNAL_BUF_NR * 4; > >> > >> Instead of magic numbers, could you put in a define with an > >> explanation of how you arrived at this number? Just to document the > >> assumptions better? > >> > >> John > > > The * 4 is a fudge factor. > > Ok. > > > But actually, I was giving this more thought and I don't think we have > > the correct number. > > We have a WAG here. :-) > > > The real bound is "number of unflushed journal entries that might have > > been allocated, and have other items (btree nodes) referring to that > > sequence number, but which don't hit because beacuse they weren't > > flushed". > > > And we don't have an actual bound on that. > > So what happens if journal_seq overflows? I don't know the code and > haven't looked.
In the olden days, in the before times, blacklisting insufficient sequence numbers would mean btree node entries could become visible after a crash that should've been ignored, because they were newer than the newest (flush) journal entry. That can't happen anymore because pointers to btree nodes now indicate how many sectors we've written and completed within that node, and they're updated after every btree node write (log append). IOW - our mechanism for sequential consistency now is that the b-tree behaves like a pure COW btree. So I retract what I said before :) Having just checked the journal read path, and the recovery path that creates the blacklist entries, I think we're good. The only reason we'd need to blacklist more is if we want some kind of a double check on the modern sequential consistency mechanism, and I don't think we need that.
