> "Stephen C. Tweedie" <[EMAIL PROTECTED]> wrote:
> >
> > For the past few months there has been a slow but steady trickle of
> > reports of oopses in kjournald.
> 
> Yes, really tenuous stuff.  Very glad if this is the fix!
> 
> >  Recently I got a couple of reports that
> > were repeatable enough to rerun with extra debugging code.
> > 
> > It turns out that we're releasing a journal_head while it is still
> > linked onto the transaction's t_locked_list.  The exact location is in
> > journal_unmap_buffer().  On several exit paths, that does:
> > 
> >             spin_unlock(&journal->j_list_lock); 
> >             jbd_unlock_bh_state(bh);
> >             spin_unlock(&journal->j_state_lock);
> >             journal_put_journal_head(jh);
> > 
> > releasing the jh *after* dropping the j_list_lock and j_state_lock.
> > 
> > kjournald can then be doing journal_commit_transaction():
> > 
> >     spin_lock(&journal->j_list_lock);
> > ...
> >             if (buffer_locked(bh)) {
> >                     BUFFER_TRACE(bh, "locked");
> >                     if (!inverted_lock(journal, bh))
> >                             goto write_out_data;
> >                     __journal_unfile_buffer(jh);
> >                     __journal_file_buffer(jh, commit_transaction,
> >                                             BJ_Locked);
> >                     jbd_unlock_bh_state(bh);
> > 
> > The problem happens if journal_unmap_buffer()'s own put_journal_head()
> > manages to get in between kjournald's *unfile_buffer and the following
> > *file_buffer.  Because journal_unmap_buffer() has dropped its bh_state
> > lock by this point, there's nothing to prevent this, leading to a
> > variety of unpleasant situations.  In particular, the jh is unfiled at
> > this point, so there's nothing to stop the put_journal_head() from
> > freeing the memory we're just about to link onto the BJ_Locked list.
> 
> Right.  I don't know why journal_put_journal_head() looks at
> ->b_transaction, really.  Should have made presence on a list contribute to
> b_jcount.  Oh well, it's been that way since 2.5.0 or older..
> 
> Don't we have the same race anywhere where we're doing a
> journal_refile_buffer() (or equiv) in parallel with a
> journal_put_journal_head() outside locks?  There seem to be many such.
  I believe the other places should be safe (mostly by luck) as the
caller has made sure that __journal_remove_journal_head() won't do
anything (e.g. set b_transaction, b_next_transaction or such).
Anyway it doesn't seem too safe to me...

                                                                Honza
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to