[Devel] [PATCH RHEL7 COMMIT] ms/jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

Konstantin Khorenko Thu, 16 Mar 2017 05:46:02 -0700

The commit is pushed to "branch-rh7-3.10.0-514.10.2.vz7.29.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.10.2.vz7.29.3
------>
commit 6729ff4cc8df023add27a85eb8d1f0ec3b834a76
Author: OGAWA Hirofumi <hirof...@mail.parknet.co.jp>
Date:   Thu Mar 16 15:42:19 2017 +0400


    ms/jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount 
path
    
    On umount path, jbd2_journal_destroy() writes latest transaction ID
    (->j_tail_sequence) to be used at next mount.
    
    The bug is that ->j_tail_sequence is not holding latest transaction ID
    in some cases. So, at next mount, there is chance to conflict with
    remaining (not overwritten yet) transactions.
    
        mount (id=10)
        write transaction (id=11)
        write transaction (id=12)
        umount (id=10) <= the bug doesn't write latest ID
    
        mount (id=10)
        write transaction (id=11)
        crash
    
        mount
        [recovery process]
                transaction (id=11)
                transaction (id=12) <= valid transaction ID, but old commit
                                           must not replay
    
    Like above, this bug become the cause of recovery failure, or FS
    corruption.
    
    So why ->j_tail_sequence doesn't point latest ID?
    
    Because if checkpoint transactions was reclaimed by memory pressure
    (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated.
    (And another case is, __jbd2_journal_clean_checkpoint_list() is called
    with empty transaction.)
    
    So in above cases, ->j_tail_sequence is not pointing latest
    transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
    done too.
    
    So, to fix this problem with minimum changes, this patch updates
    ->j_tail_sequence, and issue REQ_FLUSH.  (With more complex changes,
    some optimizations would be possible to avoid unnecessary REQ_FLUSH
    for example though.)
    
    BTW,
    
        journal->j_tail_sequence =
                ++journal->j_transaction_sequence;
    
    Increment of ->j_transaction_sequence seems to be unnecessary, but
    ext3 does this.
    
    Signed-off-by: OGAWA Hirofumi <hirof...@mail.parknet.co.jp>
    Signed-off-by: Theodore Ts'o <ty...@mit.edu>
    Cc: sta...@vger.kernel.org
    
    ms commit: c0a2ad9 ("jbd2: fix FS corruption possibility in
    jbd2_journal_destroy() on umount path")
    
    Signed-off-by: Anatoly Stepanov <astepa...@cloudlinux.com>
---
 fs/jbd2/journal.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 6c54b78..868a923 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1406,11 +1406,12 @@ void jbd2_journal_update_sb_log_tail(journal_t 
*journal, tid_t tail_tid,
 /**
  * jbd2_mark_journal_empty() - Mark on disk journal as empty.
  * @journal: The journal to update.
+ * @write_op: With which operation should we write the journal sb
  *
  * Update a journal's dynamic superblock fields to show that journal is empty.
  * Write updated superblock to disk waiting for IO to complete.
  */
-static void jbd2_mark_journal_empty(journal_t *journal)
+static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
 {
        journal_superblock_t *sb = journal->j_superblock;
 
@@ -1428,7 +1429,7 @@ static void jbd2_mark_journal_empty(journal_t *journal)
        sb->s_start    = cpu_to_be32(0);
        read_unlock(&journal->j_state_lock);
 
-       jbd2_write_superblock(journal, WRITE_FUA);
+       jbd2_write_superblock(journal, write_op);
 
        /* Log is no longer empty */
        write_lock(&journal->j_state_lock);
@@ -1704,7 +1705,13 @@ int jbd2_journal_destroy(journal_t *journal)
        if (journal->j_sb_buffer) {
                if (!is_journal_aborted(journal)) {
                        mutex_lock(&journal->j_checkpoint_mutex);
-                       jbd2_mark_journal_empty(journal);
+
+                       write_lock(&journal->j_state_lock);
+                       journal->j_tail_sequence =
+                               ++journal->j_transaction_sequence;
+                       write_unlock(&journal->j_state_lock);
+
+                       jbd2_mark_journal_empty(journal, WRITE_FLUSH_FUA);
                        mutex_unlock(&journal->j_checkpoint_mutex);
                } else
                        err = -EIO;
@@ -1956,7 +1963,7 @@ int jbd2_journal_flush(journal_t *journal)
         * the magic code for a fully-recovered superblock.  Any future
         * commits of data to the journal will restore the current
         * s_start value. */
-       jbd2_mark_journal_empty(journal);
+       jbd2_mark_journal_empty(journal, WRITE_FUA);
        mutex_unlock(&journal->j_checkpoint_mutex);
        write_lock(&journal->j_state_lock);
        J_ASSERT(!journal->j_running_transaction);
@@ -2001,7 +2008,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
        if (write) {
                /* Lock to make assertions happy... */
                mutex_lock(&journal->j_checkpoint_mutex);
-               jbd2_mark_journal_empty(journal);
+               jbd2_mark_journal_empty(journal, WRITE_FUA);
                mutex_unlock(&journal->j_checkpoint_mutex);
        }
 
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] ms/jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

Reply via email to