Re: Patch queue update

2008-01-24 Thread Mingming Cao
On Thu, 2008-01-24 at 20:20 +0530, Aneesh Kumar K.V wrote:
> I have updated patches based on the review feedback from Andrew.
> 
> I have tested this on 
> 128(64p) ppc64sles
> 4(2p)ppc64  debian
> 4(2p)  x86_64 ubuntu-gutsy
> 
> Updated patches are at
> http://www.radian.org/~kvaneesh/ext4/jan-24-2008/
> http://www.radian.org/~kvaneesh/ext4/jan-24-2008/patches.tar
> 

Thanks, updated ext4 patch queue with your changes. And fixed the
checkpatch warnings with mballoc-core.patch.

Mingming

> Diff for reference
> 
> diff --git a/Documentation/filesystems/ext4.txt 
> b/Documentation/filesystems/ext4.txt
> index 4f329af..ec7d349 100644
> --- a/Documentation/filesystems/ext4.txt
> +++ b/Documentation/filesystems/ext4.txt
> @@ -89,6 +89,8 @@ When mounting an ext4 filesystem, the following option are 
> accepted:
>  extents  ext4 will use extents to address file data.  The
>   file system will no longer be mountable by ext3.
> 
> +noextentsext4 will not use extents for new files created.
> +
>  journal_checksum Enable checksumming of the journal transactions.
>   This will allow the recovery code in e2fsck and the
>   kernel to detect corruption in the kernel.  It is a
> @@ -206,6 +208,10 @@ nobh (a) cache disk block mapping 
> information
>   "nobh" option tries to avoid associating buffer
>   heads (supported only for "writeback" mode).
> 
> +mballoc  (*) Use the mutliblock allocator for block 
> allocation
> +nomballocdisabled multiblock allocator for block allocation.
> +stripe=n filesystem blocks per stripe for a RAID configuration.
> +
> 
>  Data Mode
>  -
> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index dec9945..4413a2d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -857,6 +857,45 @@ CPUs.
>  The   "procs_blocked" line gives  the  number of  processes currently 
> blocked,
>  waiting for I/O to complete.
> 
> +1.9 Ext4 file system parameters
> +--
> +Ext4 file system have one directory per partition under /proc/fs/ext4/
> +# ls /proc/fs/ext4/hdc/
> +group_prealloc  max_to_scan  mb_groups  mb_history  min_to_scan  order2_req
> +stats  stream_req
> +
> +mb_groups:
> +This file gives the details of mutiblock allocator buddy cache of free blocks
> +
> +mb_history:
> +Multiblock allocation history.
> +
> +stats:
> +This file indicate whether the multiblock allocator should start collecting
> +statistics. The statistics are shown during unmount
> +
> +group_prealloc:
> +The multiblock allocator normalize the block allocation request to
> +group_prealloc filesystem blocks if we don't have strip value set.
> +The stripe value can be specified at mount time or during mke2fs.
> +
> +max_to_scan:
> +How long multiblock allocator can look for a best extent (in found extents)
> +
> +min_to_scan:
> +How long multiblock allocator  must look for a best extent
> +
> +order2_req:
> +Multiblock allocator use  2^N search using buddies only for requests greater
> +than or equal to order2_req. The request size is specfied in file system
> +blocks. A value of 2 indicate only if the requests are greater than or equal
> +to 4 blocks.
> +
> +stream_req:
> +Files smaller than stream_req are served by the stream allocator, whose
> +purpose is to pack requests as close each to other as possible to
> +produce smooth I/O traffic. Avalue of 16 indicate that file smaller than 16
> +filesystem block size will use group based preallocation.
> 
>  
> --
>  Summary
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 982cf1a..921eeec 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3232,19 +3232,21 @@ int bh_uptodate_or_lock(struct buffer_head *bh)
>   return 1;
>  }
>  EXPORT_SYMBOL(bh_uptodate_or_lock);
> +
>  /**
>   * bh_submit_read: Submit a locked buffer for reading
>   * @bh: struct buffer_head
>   *
> - * Returns a negative error
> + * Returns zero on success and -EIO on error.
>   */
>  int bh_submit_read(struct buffer_head *bh)
>  {
> - if (!buffer_locked(bh))
> - lock_buffer(bh);
> + BUG_ON(!buffer_locked(bh));
> 
> - if (buffer_uptodate(bh))
> + if (buffer_uptodate(bh)) {
> + unlock_buffer(bh);
>   return 0;
> + }
> 
>   get_bh(bh);
>   bh->b_end_io = end_buffer_read_sync;
> @@ -3255,6 +3257,7 @@ int bh_submit_read(struct buffer_head *bh)
>   return -EIO;
>  }
>  EXPORT_SYMBOL(bh_submit_read);
> +
>  void __init buffer_init(void)
>  {
>   int nrpages;
> diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
> index 4ef3dc0..0d76c74 100644
> --- a/fs/ext4/defrag.c
> +++ b/fs/ext4/defrag.c
> @@ -30,14 +30,6 @@ ext4_fsblk_t idx_pblock(struct ext4_extent

Re: Patch queue update

2008-01-24 Thread Eric Sandeen
Andreas Dilger wrote:
> On Jan 24, 2008  20:20 +0530, Aneesh Kumar K.V wrote:
>> @@ -89,6 +89,8 @@ When mounting an ext4 filesystem, the following option are 
>> accepted:
>>  extents ext4 will use extents to address file data.  The
>>  file system will no longer be mountable by ext3.
>>  
>> +noextents   ext4 will not use extents for new files created.
>> +
> 
> s/new files created/newly created files/

Would a blurb about keeping ext3 disk-format compatibility be worthwhile
here?

>>  journal_checksumEnable checksumming of the journal transactions.
>>  This will allow the recovery code in e2fsck and the
>>  kernel to detect corruption in the kernel.  It is a
>> @@ -206,6 +208,10 @@ nobh(a) cache disk block mapping 
>> information
>>  "nobh" option tries to avoid associating buffer
>>  heads (supported only for "writeback" mode).
>>  
>> +mballoc (*) Use the mutliblock allocator for block 
>> allocation

spling on "mutliblock" too :)

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch queue update

2008-01-24 Thread Andreas Dilger
On Jan 24, 2008  20:20 +0530, Aneesh Kumar K.V wrote:
> @@ -89,6 +89,8 @@ When mounting an ext4 filesystem, the following option are 
> accepted:
>  extents  ext4 will use extents to address file data.  The
>   file system will no longer be mountable by ext3.
>  
> +noextentsext4 will not use extents for new files created.
> +

s/new files created/newly created files/

>  journal_checksum Enable checksumming of the journal transactions.
>   This will allow the recovery code in e2fsck and the
>   kernel to detect corruption in the kernel.  It is a
> @@ -206,6 +208,10 @@ nobh (a) cache disk block mapping 
> information
>   "nobh" option tries to avoid associating buffer
>   heads (supported only for "writeback" mode).
>  
> +mballoc  (*) Use the mutliblock allocator for block 
> allocation
> +nomballocdisabled multiblock allocator for block allocation.
> +stripe=n filesystem blocks per stripe for a RAID configuration.

Please provide a more verbose description of what a "stripe" is, since the
RAID terminology is sadly vague.  Something like "number of filesystem blocks 
that mballoc will try to use for allocation size and alignment.  For RAID5/6
systems this should be the number of data disks * number of filesystem blocks
per data disk."


> @@ -3948,9 +3942,8 @@ repeat:
>   spin_unlock(&pa->pa_lock);
>   spin_unlock(&ei->i_prealloc_lock);
>   printk(KERN_ERR "uh-oh! used pa while discarding\n");
> - dump_stack();
> - current->state = TASK_UNINTERRUPTIBLE;
> - schedule_timeout(HZ);
> + WARN_ON(1);

This printk and dump stack can just go away, we have removed it from our
mballoc patch as well because it was only needed for determining how often
this condition is hit and is otherwise useless.

> @@ -577,14 +529,12 @@ err_out:
>*
>* FIXME!! we may be touching bitmaps in different block groups.
>*/
> -
>   if (ext4_journal_extend(handle,
>   4 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb)) != 0) {
>  
>   ext4_journal_restart(handle,
>   4 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb));
>   }
> -

There don't actually need to be braces here either.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch queue update

2008-01-10 Thread Aneesh Kumar K.V
On Thu, Jan 10, 2008 at 02:43:23PM -0700, Andreas Dilger wrote:
> On Jan 10, 2008  21:03 +0530, Aneesh Kumar K.V wrote:
> > if (i >= sbi->s_mb_order2_reqs) {
> > -   i--;
> > -   if ((ac->ac_g_ex.fe_len & (~(1 << i))) == 0)
> > +   /*
> > +* This should tell if fe_len is exactly power of 2
> > +*/
> > +   if ((ac->ac_g_ex.fe_len & (~(1 << (i - 1 == 0)
> > ac->ac_2order = i;
> 
> While you changed i to (i - 1) in the "if" you didn't change it when
> setting ac_2order...  Is that incorrect?

Yes that ac_2order should be i - 1;
Will fix it in the next update.

I see that the patch queue update doesn't have most of the changes I
have placed at http://www.radian.org/~kvaneesh/ext4/jan-10-2008-ver2/

> 
> > /*
> > +* Yield the CPU here so that we don't get soft lockup
> >  */
> > -   schedule_timeout(HZ);
> > +   schedule();
> > goto repeat;
> > }
> >  
> > @@ -3808,7 +3820,7 @@ repeat:
> > printk(KERN_ERR "uh-oh! used pa while discarding\n");
> > dump_stack();
> > current->state = TASK_UNINTERRUPTIBLE;
> > -   schedule();
> > +   schedule_timeout(HZ);
> > goto repeat;
> 
> Is this change to schedule_timeout() intentional?  The earlier code is
> removing the use of schedule_timeout.  I could be wrong, as I didn't
> follow this discussion closely, but sometimes changes like this happen
> accidentally and people don't look at the patch itself...


The patch queue had it modified from schedule_timeout to schedule(). I
am moving it back to the original version. If we have set the task state
to TASK_UNINTERRUPTIBLE it should be schedule_timeout. And at these
place we intent to wait uninterrupted for 1 sec. The place where we
wanted to just yield is ext4_mb_discard_group_preallocations.


> > +static unsigned long ext4_get_stripe_size(struct ext4_sb_info *sbi)
> > +{
> > +   unsigned long stride = le16_to_cpu(sbi->s_es->s_raid_stride);
> > +   unsigned long stripe_width = 
> > le32_to_cpu(sbi->s_es->s_raid_stripe_width);
> > +
> > +   if (sbi->s_stripe && sbi->s_stripe <= sbi->s_blocks_per_group) {
> > +   return sbi->s_stripe;
> > +   } else if (stripe_width <= sbi->s_blocks_per_group) {
> > +   return stripe_width;
> > +   } else if (stride <= sbi->s_blocks_per_group) {
> > +   return stride;
> > +   }
> 
> If you are doing "return XXX" you don't need "else".
> 
> > +   /*
> > +* set the stripe size. If we have specified it via mount option, then
> > +* use the mount option value. If the value specified at mount time is
> > +* greater than the blocks per group use the super block value.
> > +* Allocator needs it be less than blocks per group.
> > +*/
> > +   sbi->s_stripe = ext4_get_stripe_size(sbi);
> 
> This comment should probably go by ext4_get_stripe_size() definition instead
> of here at the caller.

Will move that to the function definition.

-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch queue update

2008-01-10 Thread Andreas Dilger
On Jan 10, 2008  21:03 +0530, Aneesh Kumar K.V wrote:
>   if (i >= sbi->s_mb_order2_reqs) {
> - i--;
> - if ((ac->ac_g_ex.fe_len & (~(1 << i))) == 0)
> + /*
> +  * This should tell if fe_len is exactly power of 2
> +  */
> + if ((ac->ac_g_ex.fe_len & (~(1 << (i - 1 == 0)
>   ac->ac_2order = i;

While you changed i to (i - 1) in the "if" you didn't change it when
setting ac_2order...  Is that incorrect?

>   /*
> +  * Yield the CPU here so that we don't get soft lockup
>*/
> - schedule_timeout(HZ);
> + schedule();
>   goto repeat;
>   }
>  
> @@ -3808,7 +3820,7 @@ repeat:
>   printk(KERN_ERR "uh-oh! used pa while discarding\n");
>   dump_stack();
>   current->state = TASK_UNINTERRUPTIBLE;
> - schedule();
> + schedule_timeout(HZ);
>   goto repeat;

Is this change to schedule_timeout() intentional?  The earlier code is
removing the use of schedule_timeout.  I could be wrong, as I didn't
follow this discussion closely, but sometimes changes like this happen
accidentally and people don't look at the patch itself...

> +static unsigned long ext4_get_stripe_size(struct ext4_sb_info *sbi)
> +{
> + unsigned long stride = le16_to_cpu(sbi->s_es->s_raid_stride);
> + unsigned long stripe_width = 
> le32_to_cpu(sbi->s_es->s_raid_stripe_width);
> +
> + if (sbi->s_stripe && sbi->s_stripe <= sbi->s_blocks_per_group) {
> + return sbi->s_stripe;
> + } else if (stripe_width <= sbi->s_blocks_per_group) {
> + return stripe_width;
> + } else if (stride <= sbi->s_blocks_per_group) {
> + return stride;
> + }

If you are doing "return XXX" you don't need "else".

> + /*
> +  * set the stripe size. If we have specified it via mount option, then
> +  * use the mount option value. If the value specified at mount time is
> +  * greater than the blocks per group use the super block value.
> +  * Allocator needs it be less than blocks per group.
> +  */
> + sbi->s_stripe = ext4_get_stripe_size(sbi);

This comment should probably go by ext4_get_stripe_size() definition instead
of here at the caller.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html