Delayed alloc for ordered-mode

Suparna Bhattacharya Sun, 13 Mar 2005 06:32:12 -0800

What would be really nice is if we could do this in a way that
enables reuse of generic paths even for ordered mode. One thought
that comes to mind is journal commit waiting for writeback to 
complete on the data pages which need to be flushed to disk before 
meta-data can be committed, much like we do for O_SYNC.


I realise that JBD is intended to work at a level of abstraction
where it has no awareness of filesystems - hence the correspondence
with buffer heads all through. So would the above be a complete
no-no ?

Regards
Suparna

On Fri, Mar 04, 2005 at 06:02:35PM +0300, Alex Tomas wrote:
> On 03 Mar 2005 17:12:14 -0800
> Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > One more thing, we need to keep in mind is - we need to make sure
> > that "ordered" mode also improved - since all our testcode 
> > focuses on "writeback" mode and the default mode is "ordered" :(
> > 
> 
> I've just cooked the patch to implement ordered mode for delayed
> allocation path. please take it:
> 
> ftp://ftp.clusterfs.com/pub/people/alex/2.6.11/ext3-delalloc-ordered-2.6.11-0.1.patch
> 
> Stephen, Andrew could you review it, please?
> 
> thanks, Alex
> 
> 
> Index: linux-2.6.11/include/linux/jbd.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/jbd.h     2005-03-02 20:49:13.000000000 
> +0300
> +++ linux-2.6.11/include/linux/jbd.h  2005-03-04 17:03:52.000000000 +0300
> @@ -486,6 +486,12 @@
>       struct journal_head     *t_sync_datalist;
>  
>       /*
> +      * Number of BIO's submited in context of the transaction we
> +      * want to complete before committing
> +      */
> +      atomic_t               t_bios_in_flight;
> +
> +     /*
>        * Doubly-linked circular list of all forget buffers (superseded
>        * buffers which we can un-checkpoint once this transaction commits)
>        * [j_list_lock]
> @@ -678,6 +684,9 @@
>       /* Wait queue to wait for updates to complete */
>       wait_queue_head_t       j_wait_updates;
>  
> +     /* Wait queue to wait for all BIOs to complete */
> +     wait_queue_head_t       j_wait_bios;
> +
>       /* Semaphore for locking against concurrent checkpoints */
>       struct semaphore        j_checkpoint_sem;
>  
> Index: linux-2.6.11/fs/jbd/commit.c
> ===================================================================
> --- linux-2.6.11.orig/fs/jbd/commit.c 2005-03-02 20:49:09.000000000 +0300
> +++ linux-2.6.11/fs/jbd/commit.c      2005-03-04 17:53:52.000000000 +0300
> @@ -619,6 +620,13 @@
>       if (is_journal_aborted(journal))
>               goto skip_commit;
>  
> +     /*
> +      * Before the commit record, we have to wait for all bio's
> +      * ext3_wb_writepages() issued against newly-allocated blocks
> +      */
> +     wait_event(journal->j_wait_bios, 
> +             atomic_read(&commit_transaction->t_bios_in_flight) == 0);
> +
>       /* Done it all: now write the commit record.  We should have
>        * cleaned up our previous buffers by now, so if we are in abort
>        * mode we can now just skip the rest of the journal write
> Index: linux-2.6.11/fs/jbd/transaction.c
> ===================================================================
> --- linux-2.6.11.orig/fs/jbd/transaction.c    2005-03-02 20:49:09.000000000 
> +0300
> +++ linux-2.6.11/fs/jbd/transaction.c 2005-03-04 17:05:28.000000000 +0300
> @@ -51,6 +51,7 @@
>       transaction->t_tid = journal->j_transaction_sequence++;
>       transaction->t_expires = jiffies + journal->j_commit_interval;
>       spin_lock_init(&transaction->t_handle_lock);
> +     atomic_set(&transaction->t_bios_in_flight, 0);
>  
>       /* Set up the commit timer for the new transaction. */
>       journal->j_commit_timer->expires = transaction->t_expires;
> Index: linux-2.6.11/fs/jbd/journal.c
> ===================================================================
> --- linux-2.6.11.orig/fs/jbd/journal.c        2005-03-04 17:04:29.000000000 
> +0300
> +++ linux-2.6.11/fs/jbd/journal.c     2005-03-04 17:04:40.000000000 +0300
> @@ -671,6 +671,7 @@
>       init_waitqueue_head(&journal->j_wait_checkpoint);
>       init_waitqueue_head(&journal->j_wait_commit);
>       init_waitqueue_head(&journal->j_wait_updates);
> +     init_waitqueue_head(&journal->j_wait_bios);
>       init_MUTEX(&journal->j_barrier);
>       init_MUTEX(&journal->j_checkpoint_sem);
>       spin_lock_init(&journal->j_revoke_lock);
> Index: linux-2.6.11/fs/ext3/writeback.c
> ===================================================================
> --- linux-2.6.11.orig/fs/ext3/writeback.c     2005-03-04 15:10:01.000000000 
> +0300
> +++ linux-2.6.11/fs/ext3/writeback.c  2005-03-04 17:33:05.000000000 +0300
> @@ -145,6 +145,17 @@
>       if (bio->bi_size)
>               return 1;
>  
> +     if (bio->bi_private) {
> +             transaction_t *transaction = bio->bi_private;
> +
> +             /* 
> +              * journal_commit_transaction() may be awaiting
> +              * the bio to complete.
> +              */
> +             if (atomic_dec_and_test(&transaction->t_bios_in_flight))
> +                     wake_up(&transaction->t_journal->j_wait_bios);
> +     }
> +
>       do {
>               struct page *page = bvec->bv_page;
>  
> @@ -162,6 +173,16 @@
>  static struct bio *ext3_wb_bio_submit(struct bio *bio, handle_t *handle)
>  {
>       bio->bi_end_io = ext3_wb_end_io;
> +     if (handle) {
> +             /*
> +              * In data=ordered we shouldn't commit the transaction
> +              * until all data related to the transaction get on a
> +              * platter.
> +              */
> +             atomic_inc(&handle->h_transaction->t_bios_in_flight);
> +             bio->bi_private = handle->h_transaction;
> +     } else
> +             bio->bi_private = NULL;
>       submit_bio(WRITE, bio);
>       return NULL;
>  }

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Delayed alloc for ordered-mode

Reply via email to