Re: [Ext2-devel] Re: Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-04 Thread Suparna Bhattacharya
On Thu, Mar 03, 2005 at 02:40:21AM -0700, Andreas Dilger wrote:
 On Mar 03, 2005  14:03 +0530, Suparna Bhattacharya wrote:
  diffstat of the 3 patches : 22 files changed, 5920 insertions(+), 
  47 deletions. The largest is in the extents patch (2743), mballoc 
  is 1968, and delalloc is 1209. To use delalloc, which gives us
  all the performance benefits, right now we need all the 3 patches
  to be used in conjunction. Supporting extent map btrees as well 
  as traditional indexing and associated options for compatibility etc
  is perhaps the more invasive of changes. Given that keeping ext3 
  stable and maintainable is a key concern (that is after all a major 
  reason why a lot of users rely on ext3), a somewhat incremental 
  approach is desirable. 
  
  So, I'll start from the direction that has been suggested by
  some -- (1) delayed allocation without changing the
  on-disk format. And then later (2) go on to breaking format with 
  all changes for scalability to larger files with full extents 
  support (haven't thought enough about this yet - maybe in a
  separate mail)
 
 Well, for a starter, the extents format changes are not forced on
 users, only if they mount with -o extents and write files will
 it mark the superblock incompatible and start allocating files
 this way.  I believe (though I have never tested) that even if
 extents are enabled, writes to a block-mapped file will continue
 to work and that file will not be converted to an extent file.

Files that are created with extents will not be viewable by an older
kernel, though (I think) - which is where the format breakage comes
in (is that correct ?). But I don't see this as a major issue, since 
it can perhaps be taken care of through a little bit of migration 
tooling as Ted indicated. 

So, compatibility in itself wasn't the main concern bothering me 
but how we could make it easier to assure stability  maintainability
even with all the cool stuff. For example, if we have both mballoc 
and regular balloc and similarly extents and regular indexing based 
on growth patterns (a nice idea, btw), does it multiply the 
scenarios to verify on the testing front ? Or in dealing with changes
in the future ? I'm guessing that this might be one of the things (besides
agreement on the disk layout) holding up inclusion of extents, despite
the patches being around for a while now .. but then I could be wrong.
B-tree based extent maps were mentioned by sct way back in his 2000 
paper ! And of course every filesystem out there implements B-trees in
its own way.

I can see arguments flying both ways ... at what point do we decide
to break towards an ext4 ? 

BTW, has anyone tried playing with the idea of ext4 as not a 
cp -r fs/ext3 fs/ext4 and edit, but if possible using some layered
filesystem techniques to reuse much of ext3 directly, and just override
a few operations (like get_blocks for extents etc) where there 
is a layout impact ? 

Alex, have you had a chance to prototype your idea of rooting extents
in ea ?

 
  A few random things that come to mind for (1), going through the code:
  
  - There might be possibilities for code reduction, by extending
generic routines as far as possible, e.g. ext3_wb_writepages
has a lot in common with generic writepages. That would
also make it easier to maintain.
 
 I'm sure some support for this could be gotten from e.g. XFS as well,
 since their filesystem (on Irix at least) was all about delayed alloc
 (not sure what it does under Linux), and I believe ReiserFS/Reiser4
 also desire the ability to have delayed allocation from the VFS (i.e.
 some sort of light-weight reserve space call for each page dirtied
 and then getting the actual file + offsets en masse later (if the
 VFS/VM doesn't discard the whole thing).

*nod*

Regards
Suparna

 
 Cheers, Andreas
 --
 Andreas Dilger
 http://sourceforge.net/projects/ext2resize/
 http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/
 



-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Re: Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-04 Thread Alex Tomas
On Fri, 4 Mar 2005 16:43:31 +0530
Suparna Bhattacharya [EMAIL PROTECTED] wrote:

 Alex, have you had a chance to prototype your idea of rooting extents
 in ea ?

I think all you need for this are:

1) allocate EA in ext3_new_inode()
2) write a replacement for ext3_init_tree_desc()
   just few lines of code
3) write .get_write_access and .mark_buffer_dirty methods
   again few lines
4) use replacement of ext3_init_tree_desc() in few places

thanks, Alex
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-04 Thread Alex Tomas
On 03 Mar 2005 17:12:14 -0800
Badari Pulavarty [EMAIL PROTECTED] wrote:

 One more thing, we need to keep in mind is - we need to make sure
 that ordered mode also improved - since all our testcode 
 focuses on writeback mode and the default mode is ordered :(
 

I've just cooked the patch to implement ordered mode for delayed
allocation path. please take it:

ftp://ftp.clusterfs.com/pub/people/alex/2.6.11/ext3-delalloc-ordered-2.6.11-0.1.patch

Stephen, Andrew could you review it, please?

thanks, Alex


Index: linux-2.6.11/include/linux/jbd.h
===
--- linux-2.6.11.orig/include/linux/jbd.h   2005-03-02 20:49:13.0 
+0300
+++ linux-2.6.11/include/linux/jbd.h2005-03-04 17:03:52.0 +0300
@@ -486,6 +486,12 @@
struct journal_head *t_sync_datalist;
 
/*
+* Number of BIO's submited in context of the transaction we
+* want to complete before committing
+*/
+atomic_t   t_bios_in_flight;
+
+   /*
 * Doubly-linked circular list of all forget buffers (superseded
 * buffers which we can un-checkpoint once this transaction commits)
 * [j_list_lock]
@@ -678,6 +684,9 @@
/* Wait queue to wait for updates to complete */
wait_queue_head_t   j_wait_updates;
 
+   /* Wait queue to wait for all BIOs to complete */
+   wait_queue_head_t   j_wait_bios;
+
/* Semaphore for locking against concurrent checkpoints */
struct semaphorej_checkpoint_sem;
 
Index: linux-2.6.11/fs/jbd/commit.c
===
--- linux-2.6.11.orig/fs/jbd/commit.c   2005-03-02 20:49:09.0 +0300
+++ linux-2.6.11/fs/jbd/commit.c2005-03-04 17:53:52.0 +0300
@@ -619,6 +620,13 @@
if (is_journal_aborted(journal))
goto skip_commit;
 
+   /*
+* Before the commit record, we have to wait for all bio's
+* ext3_wb_writepages() issued against newly-allocated blocks
+*/
+   wait_event(journal-j_wait_bios, 
+   atomic_read(commit_transaction-t_bios_in_flight) == 0);
+
/* Done it all: now write the commit record.  We should have
 * cleaned up our previous buffers by now, so if we are in abort
 * mode we can now just skip the rest of the journal write
Index: linux-2.6.11/fs/jbd/transaction.c
===
--- linux-2.6.11.orig/fs/jbd/transaction.c  2005-03-02 20:49:09.0 
+0300
+++ linux-2.6.11/fs/jbd/transaction.c   2005-03-04 17:05:28.0 +0300
@@ -51,6 +51,7 @@
transaction-t_tid = journal-j_transaction_sequence++;
transaction-t_expires = jiffies + journal-j_commit_interval;
spin_lock_init(transaction-t_handle_lock);
+   atomic_set(transaction-t_bios_in_flight, 0);
 
/* Set up the commit timer for the new transaction. */
journal-j_commit_timer-expires = transaction-t_expires;
Index: linux-2.6.11/fs/jbd/journal.c
===
--- linux-2.6.11.orig/fs/jbd/journal.c  2005-03-04 17:04:29.0 +0300
+++ linux-2.6.11/fs/jbd/journal.c   2005-03-04 17:04:40.0 +0300
@@ -671,6 +671,7 @@
init_waitqueue_head(journal-j_wait_checkpoint);
init_waitqueue_head(journal-j_wait_commit);
init_waitqueue_head(journal-j_wait_updates);
+   init_waitqueue_head(journal-j_wait_bios);
init_MUTEX(journal-j_barrier);
init_MUTEX(journal-j_checkpoint_sem);
spin_lock_init(journal-j_revoke_lock);
Index: linux-2.6.11/fs/ext3/writeback.c
===
--- linux-2.6.11.orig/fs/ext3/writeback.c   2005-03-04 15:10:01.0 
+0300
+++ linux-2.6.11/fs/ext3/writeback.c2005-03-04 17:33:05.0 +0300
@@ -145,6 +145,17 @@
if (bio-bi_size)
return 1;
 
+   if (bio-bi_private) {
+   transaction_t *transaction = bio-bi_private;
+
+   /* 
+* journal_commit_transaction() may be awaiting
+* the bio to complete.
+*/
+   if (atomic_dec_and_test(transaction-t_bios_in_flight))
+   wake_up(transaction-t_journal-j_wait_bios);
+   }
+
do {
struct page *page = bvec-bv_page;
 
@@ -162,6 +173,16 @@
 static struct bio *ext3_wb_bio_submit(struct bio *bio, handle_t *handle)
 {
bio-bi_end_io = ext3_wb_end_io;
+   if (handle) {
+   /*
+* In data=ordered we shouldn't commit the transaction
+* until all data related to the transaction get on a
+* platter.
+*/
+   atomic_inc(handle-h_transaction-t_bios_in_flight);
+   bio-bi_private = handle-h_transaction;
+   } else
+   

Re: [Ext2-devel] Re: Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-04 Thread Andreas Dilger
On Mar 04, 2005  15:29 +0300, Alex Tomas wrote:
 On Fri, 4 Mar 2005 16:43:31 +0530
 Suparna Bhattacharya [EMAIL PROTECTED] wrote:
 
  Alex, have you had a chance to prototype your idea of rooting extents
  in ea ?
 
 I think all you need for this are:
 
 1) allocate EA in ext3_new_inode()
 2) write a replacement for ext3_init_tree_desc()
just few lines of code
 3) write .get_write_access and .mark_buffer_dirty methods
again few lines
 4) use replacement of ext3_init_tree_desc() in few places

This should of course only be done for large inodes.  Also, at some
point it will consume all of the EA space and we need to use an
external block.  It might help in some middle cases (i.e. files with
more extents than can fit in i_blocks (60 bytes), but less than fit
into the large inode space (128 or maybe 384 bytes)) but it might
also hurt other things if we need to allocate an EA block for another
EA...

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpwmhI74h2Rc.pgp
Description: PGP signature