Re: [2.6.25-rc2-mm1] Oops in __kmalloc

2008-02-26 Thread Andrew Morton
On Tue, 26 Feb 2008 16:18:55 +0100 Jiri Slaby <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> while booting up a notebook on 32 bit, this oopses appeared on the console
> after ext3 fsck:
> http://www.fi.muni.cz/~xslaby/sklad/mem_oops/
> 
> It's 2.6.25-rc2-mm1, I can't find similar reports, is this known or hardware
> issue (unlikely, 2.6.24.2 seems to be OK)?

I don't recall seeing a similar report and yes, it'll be a kernel bug.

We've fixed a few things and it could be that this will just go away in
next -mm.  If it doesn't, a bisection search would be good, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] make ext{3,4}_xattr_list() static

2008-02-18 Thread Andrew Morton
On Sun, 17 Feb 2008 10:18:23 +0200 Adrian Bunk <[EMAIL PROTECTED]> wrote:

>  fs/ext3/xattr.c |4 +++-
>  fs/ext3/xattr.h |7 ---
>  fs/ext4/xattr.c |4 +++-
>  fs/ext4/xattr.h |7 ---

one patch per fs, please.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Linux 2.6.25-rc2 - Kernel Ooops while running dbench

2008-02-18 Thread Andrew Morton
On Sat, 16 Feb 2008 11:14:46 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:

> The 2.6.25-rc2 kernel oopses while running dbench on ext3 filesystem
> mounted with mount -o data=writeback,nobh option on the x86_64 box
> 
> BUG: unable to handle kernel NULL pointer dereference at 
> IP: [] kmem_cache_alloc+0x3a/0x6c
> PGD 1f6860067 PUD 1f5d64067 PMD 0 
> Oops:  [1] SMP 
> CPU 3 
> Modules linked in:
> Pid: 4271, comm: dbench Not tainted 2.6.25-rc2-autotest #1
> RIP: 0010:[]  [] 
> kmem_cache_alloc+0x3a/0x6c
> RSP: :8101fb041dc8  EFLAGS: 00010246
> RAX:  RBX: 810180033c00 RCX: 8027b269
> RDX:  RSI: 80d0 RDI: 80632d70
> RBP: 80d0 R08: 0001 R09: 
> R10: 8101feb36e50 R11: 0190 R12: 0001
> R13:  R14: 8101f8f38000 R15: ff9c
> FS:  () GS:8101fff0f000(0063) knlGS:f7e41460
> CS:  0010 DS: 002b ES: 002b CR0: 80050033
> CR2:  CR3: 0001f562 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process dbench (pid: 4271, threadinfo 8101fb04, task 8101fb18)
> Stack:  0001 8101fb041ea8 0001 8027b269
>  8101fb041ea8 80281fe8 0001 
>  8101fb041ea8 ff9c 000b 0001
> Call Trace:
>  [] get_empty_filp+0x55/0xf9
>  [] __path_lookup_intent_open+0x22/0x8f
>  [] open_namei+0x86/0x5a7
>  [] vfs_stat_fd+0x3c/0x4a
>  [] do_filp_open+0x1c/0x3d
>  [] get_unused_fd_flags+0x79/0x111
>  [] do_sys_open+0x46/0xca
>  [] ia32_sysret+0x0/0xa
> 

Looks to me like we broke slab.  Christoph is offline until the 27th..
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] ext2/3/4: convert byte order of constant instead of variable

2008-02-14 Thread Andrew Morton
On Sun, 10 Feb 2008 11:10:15 +0100
Marcin Slusarz <[EMAIL PROTECTED]> wrote:

>  fs/ext2/super.c |8 +++-
>  fs/ext3/super.c |2 +-
>  fs/ext4/super.c |2 +-

Please don't bundle the filesystem patches in this manner.  I split
it into three patches.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9911] New: fsync blocks concurrent writes to file

2008-02-07 Thread Andrew Morton
On Thu,  7 Feb 2008 17:40:57 -0800 (PST) [EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9911
> 
>Summary: fsync blocks concurrent writes to file
>Product: File System
>Version: 2.5
>  KernelVersion: 2.6.23.8
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: ext3
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> Latest working kernel version: none
> Earliest failing kernel version: 2.6.18
> Distribution: RHEL 4 & Fedora 8
> Hardware Environment: Dual Xeon & dual-core Athlon
> Software Environment: Java 1.5.0_14
> Problem Description: Multiple threads append transactions to a single file.
> When fsyncs are issued, no writes complete until the fsync completes. This has
> the effect of yielding the same write rate regardless of how many threads are
> writing to the file. Solaris 10 does not exhibit this problem and scales well
> as additional threads are added.
> 
> I have confirmed this by running strace and witnessing that writes block until
> immediately after the fsync completes.
> 
> When using truss on Solaris I have witnessed multiple writes completing while
> an fsync was in progress.
> 
> Steps to reproduce:
> 
> Open a file for writing.
> Launch multiple threads with each one writing data and followed by an fsync,
> but only if data has been written since the last fsync.
> 
> 
> I have googled this issue substantially and have found no answers.
> 

I supposing teaching java about sync_file_range() would be all too hard.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-05 Thread Andrew Morton
On Tue, 05 Feb 2008 17:02:20 -0800 Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Tue, 2008-02-05 at 13:26 -0800, Mingming Cao wrote:
> > On Mon, 2008-02-04 at 10:00 -0500, Theodore Tso wrote:
> > > On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> > > > On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> 
> > > > wrote:
> > > > 
> > > > > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > > > > When I merge David's iget coversion patches this will instead wreck 
> > > > > > the
> > > > > > ext4 patchset.
> > > > > 
> > > > > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > > > > will you be able to merge David's iget converstion patches?
> > > > 
> > > > They're about 1,000 patches back.
> > > 
> > > OK, if you're not planning on pushing David's changes to Linus right
> > > away, what if I pull in David's
> > > 
> > >   iget-stop-ext4-from-using-iget-and-read_inode-try.patch 
> > > 
> > > and push it plus some other ext4 bug fixes directly to Linus, and let
> > > you know when that has happened so you can drop David's patch from
> > > your queue?
> > > 
> > > David's changes to ext4 can be applied standalone without the rest of
> > > his series, so it would be safe to push that to Linus independently
> > > and in advance of the rest of his series.  
> > 
> > I get compile error when builing ext4 patch queue with
> > iget-stop-ext4-from-using-iget-and-read_inode-try.patch applied, against
> > 2.6.24-git14.
> > 
> > It seems iget-stop-ext4-from-using-iget-and-read_inode-try.patch depends
> > on patches:  
> > [PATCH 01/31] Add an ERR_CAST() macro to complement ERR_PTR and co.
> > [PATCH 03/32] IGET: Introduce a function to register iget failure
> 
> It seems to me the easiest way to bring ext4 patches back to mm tree, is
> to carry above two patches in ext4 patch queue, like we did for other
> ext4 patches that depend on generic code in the past.

It doesn't matter a lot because I won't be doing another -mm until all this
is merged up anyway.

> So I added above two patches to ext4 patch queue, now that ext4 patches
> could apply cleanly to linus git tree, and Andrew should able to easily
> pull ext4 patches after removing the duplicated patches. 
> 
> Ted, I have the ext4 patch queue updated for this. 

This means that I merge part of the iget patch series, then twiddle thumbs
until the ext4 tree merges, then merge the remainder of the iget series.

So unless Ted intends to merge RSN I think it'd be preferable if I were to
just merge the lot, sorry.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge plans, was Re: - disable-ext4.patch removed from -mm tree

2008-02-05 Thread Andrew Morton
On Tue, 5 Feb 2008 10:44:54 +0100 Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Mon, Feb 04, 2008 at 02:35:29PM -0800, Andrew Morton wrote:
> > On Mon, 4 Feb 2008 15:24:18 -0500
> > Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> > 
> > > On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> > > > On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> 
> > > > wrote:
> > > > 
> > > > > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > > > > When I merge David's iget coversion patches this will instead wreck 
> > > > > > the
> > > > > > ext4 patchset.
> > > > > 
> > > > > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > > > > will you be able to merge David's iget converstion patches?
> > > > 
> > > > They're about 1,000 patches back
> > > 
> > > Care to post a merge plan so we have a slight chance to make sure not
> > > too much crap is hiding in these 1000 patches?
> > 
> > Pretty much everything up to
> > 
> > #
> > # end
> > #
> > reiser4-sb_sync_inodes.patch
> 
> That includes the git trees?

No, I don't merge git trees.

>  Defintive NACK to unionfs.

Agree, but it'd be nice to get some movement and resolution here.  The
thing's actively and enthusiastically maintained and is apparently useful. 
There's not much point in allowing developers to expend cycles on something
which doesn't have a future.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge plans, was Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 15:24:18 -0500
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> > On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:
> > 
> > > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > > When I merge David's iget coversion patches this will instead wreck the
> > > > ext4 patchset.
> > > 
> > > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > > will you be able to merge David's iget converstion patches?
> > 
> > They're about 1,000 patches back
> 
> Care to post a merge plan so we have a slight chance to make sure not
> too much crap is hiding in these 1000 patches?

Pretty much everything up to

#
# end
#
reiser4-sb_sync_inodes.patch


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 10:00:44 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> > On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:
> > 
> > > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > > When I merge David's iget coversion patches this will instead wreck the
> > > > ext4 patchset.
> > > 
> > > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > > will you be able to merge David's iget converstion patches?
> > 
> > They're about 1,000 patches back.
> 
> OK, if you're not planning on pushing David's changes to Linus right
> away, what if I pull in David's
> 
>   iget-stop-ext4-from-using-iget-and-read_inode-try.patch 
> 
> and push it plus some other ext4 bug fixes directly to Linus, and let
> you know when that has happened so you can drop David's patch from
> your queue?

Sure, go for it.

> David's changes to ext4 can be applied standalone without the rest of
> his series, so it would be safe to push that to Linus independently
> and in advance of the rest of his series.  That should also help
> reduce the number of inter-patch queue dependencies.

That patch series is kind of logjammed anyway because it breaks isofs. 
Last time I discussed this with David he seemed to find this amusing rather
than an urgent problem.  I'd drop the whole lot if there weren't lots of
other patches dependent upon them.  Mayeb I can do a selective droppage,
but I hate going out-of-order, and merging untested patch combinations.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-03 Thread Andrew Morton
On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > When I merge David's iget coversion patches this will instead wreck the
> > ext4 patchset.
> 
> That's ok, it shouldn't be hard for me to fix this up.  How quickly
> will you be able to merge David's iget converstion patches?

They're about 1,000 patches back.

>  Could
> this get pushed to Linus, say, tomorrow?

No.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-03 Thread Andrew Morton
On Sun, 03 Feb 2008 12:18:35 -0800 [EMAIL PROTECTED] wrote:

> 
> The patch titled
>  disable-ext4
> has been removed from the -mm tree.  Its filename was
>  disable-ext4.patch

I dropped the entire ext4 patch series, because the newly-added
convert-to-iget_locked patch has wrecked the iget-coversion patch which I
have queued.  Please do not merge that patch under my feet.

When I merge David's iget coversion patches this will instead wreck the
ext4 patchset.

Dunno what to do about this sort of thing, apart from encouraging
developers to take a look at what's happening in other trees before merging
stuff.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 9866] New: chattr sticky behaviour and Orlov block allocator

2008-02-01 Thread Andrew Morton
On Fri,  1 Feb 2008 07:25:09 -0800 (PST) [EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9866
> 
>Summary: chattr sticky behaviour and Orlov block allocator
>Product: File System
>Version: 2.5
>  KernelVersion: 2.6.22
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: ext3
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> Latest working kernel version: 2.6.22
> Earliest failing kernel version: 2.6.18 (oldest tested)
> Distribution: Ubuntu Gutsy
> Hardware Environment: Fujitsu Siemens C1110 laptop
> Software Environment: chattr 1.40.2 (12-Jul-2007), mke2fs 1.40.2 (12-Jul-2007)
> Problem Description:
> Some file attributes have sticky behaviour in ext2/3 whereas I believe
> it should not. Namely, a directory marked with chattr +T serves "to be
> the top of directory hierarchies for the purposes of the Orlov block
> allocator", as the manpage says.
> 
> It seems wrong that subdirectories stickily inherit the +T flag: they
> are obviously not top-level! Even files get the T attribute this way,
> even though the chattr manpages only documents it for directories.
> 
> References:
> http://lwn.net/Articles/14633/
> http://lwn.net/Articles/14447/ Theodore Ts'o ext3 patch
> http://en.wikipedia.org/wiki/Orlov_block_allocator
> 
> 
> Steps to reproduce:
> The following shows the sticky behaviour:
> $ mkdir tests
> chattr -V +T tests
> chattr 1.40.2 (12-Jul-2007)
> Flags von tests wie folgt gesetzt: T-
> $ mkdir tests/src
> $ cp -dp /etc/fstab tests/src
> $ lsattr . tests tests/src
> T- ./tests
> T- tests/src
> T- tests/src/fstab
> 
> I verified (using cp -dpr and ls -lifR) that indeed, the inherited T
> attribute causes all subdirectories to be created in inodes far apart
> (like with mount -o remount,oldalloc), whereas the Orlov allocator
> normally would put the directories next to each other.  My expectation
> is that only e.g. /home/user{1,2,3,4,5,6} should be created spread
> out, whereas each user home's contents would be next each other
> (/home/user1/{foo,bar,etc}).
> 
> The effect of this bug makes chattr +T unusable, because Orlov
> optimization is de facto eliminated.
> 
> Regards,
>  Jörg Höhle
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> --- You are receiving this mail because: ---
> You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bugme-new] [Bug 9849] New: NULL pointer deref in journal_wait_on_commit_record

2008-01-30 Thread Andrew Morton
On Wed, 30 Jan 2008 15:17:57 -0800
Mingming Cao <[EMAIL PROTECTED]> wrote:

> The buufer head pointer passed to journal_wait_on_commit_record()
> could be NULL if the previous journal_submit_commit_record() failed
> or journal has already aborted.
> 
> We need to check the error returns from journal_submit_commit_record()
> and avoid calling journal_wait_on_commit_record() in the failure case.
> 
> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> ---
>  fs/jbd2/commit.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.24-rc8/fs/jbd2/commit.c
> ===
> --- linux-2.6.24-rc8.orig/fs/jbd2/commit.c2008-01-30 14:12:10.0 
> -0800
> +++ linux-2.6.24-rc8/fs/jbd2/commit.c 2008-01-30 15:09:50.0 -0800
> @@ -872,7 +872,8 @@ wait_for_iobuf:
>   if (err)
>   __jbd2_journal_abort_hard(journal);
>   }
> - err = journal_wait_on_commit_record(cbh);
> + if (!err && !is_journal_aborted(journal))
> + err = journal_wait_on_commit_record(cbh);
>  
>   if (err)
>   jbd2_journal_abort(journal, err);

Thanks.  Please note that I Cc'ed [EMAIL PROTECTED] on this, for a 2.6.24.x
backport.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

2008-01-30 Thread Andrew Morton
On Wed, 30 Jan 2008 14:29:27 -0800 (PST)
[EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9855
> 
>Summary: ext3 ACL corruption
>Product: File System
>Version: 2.5
>  KernelVersion: 2.6.23
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: ext3
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> Latest working kernel version: Unknown
> Earliest failing kernel version: Definitely 2.6.23 and 2.6.23.8 but earlier is
> possible
> Distribution: Debian Etch
> Hardware Environment: Multiple x86 machines
> 
> Software Environment:
> Filesystem is Ext3 on LVM on RAID-1 (on SATA).
> # e2fsck -V
> e2fsck 1.40-WIP (14-Nov-2006)
> Using EXT2FS Library version 1.40-WIP, 14-Nov-2006
> 
> Problem Description:
> On several occasions now I have had e2fsck prune away ACLs on my file systems
> during a file system check after rebooting a number of (reasonably) long
> running Samba servers. This morning I decided to manually run fsck before
> rebooting one of these:
> 
> # e2fsck -pfv /dev/mapper/vg_main-lv_samba
> (entry->e_value_offs + entry->e_value_size: 116, offs: 120)
> /dev/mapper/vg_main-lv_samba: Extended attribute in inode 163841 has a value
> offset (56) which is invalid
> CLEARED.
> (entry->e_value_offs + entry->e_value_size: 116, offs: 120)
> /dev/mapper/vg_main-lv_samba: Extended attribute in inode 262146 has a value
> offset (56) which is invalid
> CLEARED.
> [ snip lots of (near) identical errors]
> 
> 8301 inodes used (0.08%)
> 1621 non-contiguous inodes (19.5%)
>  # of inodes with ind/dind/tind blocks: 3837/24/0
>  1108478 blocks used (5.29%)
>0 bad blocks
>1 large file
> 
> 7590 regular files
>  662 directories
>0 character device files
>0 block device files
>0 fifos
>0 links
>   40 symbolic links (38 fast symbolic links)
>0 sockets
> 
> 8292 files
> 
> (Note: after remounting)
> # tune2fs -l /dev/mapper/vg_main-lv_samba 
> tune2fs 1.40-WIP (14-Nov-2006)
> Filesystem volume name:   
> Last mounted on:  
> Filesystem UUID:  88677414-c1f8-41ba-b737-d9f6170d771b
> Filesystem magic number:  0xEF53
> Filesystem revision #:1 (dynamic)
> Filesystem features:  has_journal ext_attr resize_inode dir_index filetype
> needs_recovery sparse_super large_file
> Filesystem flags: signed directory hash 
> Default mount options:(none)
> Filesystem state: clean
> Errors behavior:  Continue
> Filesystem OS type:   Linux
> Inode count:  10485760
> Block count:  20971520
> Reserved block count: 1048576
> Free blocks:  19863038
> Free inodes:  10477459
> First block:  0
> Block size:   4096
> Fragment size:4096
> Reserved GDT blocks:  1019
> Blocks per group: 32768
> Fragments per group:  32768
> Inodes per group: 16384
> Inode blocks per group:   1024
> Filesystem created:   Wed Feb 21 21:38:33 2007
> Last mount time:  Thu Jan 31 03:18:54 2008
> Last write time:  Thu Jan 31 03:18:54 2008
> Mount count:  1
> Maximum mount count:  30
> Last checked: Thu Jan 31 03:16:51 2008
> Check interval:   15552000 (6 months)
> Next check after: Tue Jul 29 02:16:51 2008
> Reserved blocks uid:  0 (user root)
> Reserved blocks gid:  0 (group root)
> First inode:  11
> Inode size:   256
> Journal inode:8
> Default directory hash:   tea
> Directory Hash Seed:  be8c201b-3563-4fa5-a2a6-e2864e4b73e2
> Journal backup:   inode blocks
> 
> 
> Steps to reproduce:
> Unfortunately, precise steps are not known. Restoring all the filesystem's 
> ACLs
> from a recent dump made using "getfacl -RP" fixes the ACLs without causing the
> corruption to return.
> 
> These are production Samba servers making fairly extensive use of file and
> directory ACLs. Thus far, I've only noticed the corruptions when it came time
> to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
> Note that I've never noticed any issues at runtime because of this - only when
> I later realised that ACLs had been removed from random files and/or
> directories.
> 
> I think I will implement some scripts to unmount and run fsck nightly from
> cron, so I can at least detect the corruption a little earlier. If there is
> some more helpful debugging output I can provide, please let me know.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bugme-new] [Bug 9849] New: NULL pointer deref in journal_wait_on_commit_record

2008-01-30 Thread Andrew Morton


Begin forwarded message:

Date: Wed, 30 Jan 2008 03:24:08 -0800 (PST)
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bugme-new] [Bug 9849] New: NULL pointer deref in 
journal_wait_on_commit_record


http://bugzilla.kernel.org/show_bug.cgi?id=9849

   Summary: NULL pointer deref in journal_wait_on_commit_record
   Product: File System
   Version: 2.5
 KernelVersion: 2.6.24-03997-g85004cc
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: ext4
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


Latest working kernel version: -
Earliest failing kernel version: 2.6.24-03863-g0ba6c33
Distribution: Ubuntu
Problem Description:

using a corrupted image causes an oops in unmount, seems as if
journal_wait_on_commit_record() gets passed a NULL pointer

Steps to reproduce:

using fsfuzz with ext4, I'll attach the image which causes this for me

one oops can be found here
http://kerneloops.org/raw.php?rawid=3160&msgid=

here is another one with full jbd2 debugging enabled (there are a lot of
log_do_checkpoint messages above this)

[  242.863778] (fs/jbd2/checkpoint.c, 308): jbd2_log_do_checkpoint: Start
checkpoint
[  242.863790] (fs/jbd2/checkpoint.c, 316): jbd2_log_do_checkpoint:
cleanup_journal_tail returned 1
[  242.863810] (fs/jbd2/checkpoint.c, 308): jbd2_log_do_checkpoint: Start
checkpoint
[  242.863822] (fs/jbd2/checkpoint.c, 316): jbd2_log_do_checkpoint:
cleanup_journal_tail returned 1
[  242.863842] (fs/jbd2/checkpoint.c, 308): jbd2_log_do_checkpoint: Start
checkpoint
[  242.863854] (fs/jbd2/checkpoint.c, 316): jbd2_log_do_checkpoint:
cleanup_journal_tail returned 1
[  242.863874] (fs/jbd2/checkpoint.c, 308): jbd2_log_do_checkpoint: Start
checkpoint
[  242.863886] (fs/jbd2/checkpoint.c, 316): jbd2_log_do_checkpoint:
cleanup_journal_tail returned 1
[  242.864017] (fs/jbd2/journal.c, 193): kjournald2: kjournald2 wakes
[  242.864027] (fs/jbd2/journal.c, 201): kjournald2: woke because of timeout
[  242.864035] (fs/jbd2/journal.c, 145): kjournald2: commit_sequence=1,
commit_request=2
[  242.864044] (fs/jbd2/journal.c, 148): kjournald2: OK, requests differ
[  242.864055] (fs/jbd2/commit.c, 415): jbd2_journal_commit_transaction: super
block updated
[  242.864066] (fs/jbd2/journal.c, 1264): jbd2_journal_update_superblock: JBD:
updating superblock (start 15335425, seq 2, errno 0)
[  242.864385] (fs/jbd2/commit.c, 428): jbd2_journal_commit_transaction: JBD:
starting commit of transaction 2
[  242.864409] (fs/jbd2/commit.c, 501): jbd2_journal_commit_transaction: JBD:
commit phase 1
[  242.864428] (fs/jbd2/commit.c, 519): jbd2_journal_commit_transaction: JBD:
commit phase 2
[  242.864459] (fs/jbd2/revoke.c, 537): jbd2_journal_write_revoke_records:
Wrote 0 revoke records
[  242.864469] (fs/jbd2/commit.c, 561): jbd2_journal_commit_transaction: JBD:
commit phase 2
[  242.864478] (fs/jbd2/commit.c, 571): jbd2_journal_commit_transaction: JBD:
commit phase 3
[  242.864487] (fs/jbd2/commit.c, 780): jbd2_journal_commit_transaction: JBD:
commit phase 4
[  242.864496] (fs/jbd2/commit.c, 839): jbd2_journal_commit_transaction: JBD:
commit phase 5
[  242.864505] (fs/jbd2/commit.c, 866): jbd2_journal_commit_transaction: JBD:
commit phase 6
[  242.864599] attempt to access beyond end of device
[  242.864609] loop0: rw=0, want=200708, limit=16384
[  242.864633] jbd2_journal_bmap: journal block not found at offset 15335425 on
loop0
[  242.864680] Aborting journal on device loop0.
[  242.864733] (fs/jbd2/journal.c, 1264): jbd2_journal_update_superblock: JBD:
updating superblock (start 15335425, seq 2, errno -5)
[  242.864868] BUG: unable to handle kernel NULL pointer dereference at virtual
address 
[  242.864962] printing eip: c023c2a7 *pde =  
[  242.865048] Oops: 0002 [#1] PREEMPT 
[  242.865108] Modules linked in:
[  242.865218] 
[  242.865243] Pid: 3698, comm: kjournald2 Not tainted (2.6.24-03997-g85004cc
#16)
[  242.865268] EIP: 0060:[] EFLAGS: 00010202 CPU: 0
[  242.865382] EIP is at journal_wait_on_commit_record+0x7/0x50
[  242.865407] EAX:  EBX:  ECX: 0001 EDX: 0001
[  242.865431] ESI:  EDI: c07835d2 EBP: cb229ee4 ESP: cb229edc
[  242.865455]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[  242.865539] Process kjournald2 (pid: 3698, ti=cb229000 task=cb208000
task.ti=cb229000)
[  242.865564] Stack:   cb229f88 c023cb07  c07835d2
0362 c069c620 
[  242.865864]cb2316e0 cb231504 cb2314f0 cb134960  cb231920
  
[  242.865918]cb208000   0008  
  
[  242.865918] Call Trace:
[  242.865918]  [] show_trace_log_lvl+0x1a/0x30
[  242.865918]  [] show_stack_log_lvl+0xa9/0xd0
[  242.865918]  [] show_registers+0xca/0x250
[  242.865918]  [] die+0x101/0x220
[  242.865918]  [] do_page_fault+0x28b/0x630
[ 

Re: [PATCH] [3/18] BKL-removal: Convert ext3 to use unlocked_ioctl

2008-01-27 Thread Andrew Morton
On Sun, 27 Jan 2008 03:17:09 +0100 (CET) Andi Kleen <[EMAIL PROTECTED]> wrote:

> 
> I checked ext3_ioctl and it looked largely safe to not be used
> without BKL.  So convert it over to unlocked_ioctl.
> 
> The only case where I wasn't quite sure was for the
> dynamic fs grow ioctls versus umounting -- I kept the BKL for those. 
> 

Please cpoy linux-ext4 on ext2/3/4 material.

I skippped a lot of these patches because I just got bored of fixing
rejects.  Now is a very optimistic time to be raising patches against
mainline.

I'm going to work on getting a unified devel tree operating: one which
contains everyone's latest stuff and is updated daily.  Basically it'll be
-mm without a couple of the quilt trees.  People can then prepare patches
against that, as it seems that most can't be bothered patching against -mm,
let alone building and testing it.  More later.

> + /* AK: not sure the BKL is needed, but this might prevent
> +  * races against umount */
> + lock_kernel();
>   err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
>   journal_lock_updates(EXT3_SB(sb)->s_journal);
>   journal_flush(EXT3_SB(sb)->s_journal);
>   journal_unlock_updates(EXT3_SB(sb)->s_journal);
> + unlock_kernel();
>  
>   return err;
>   }
> @@ -245,11 +249,14 @@ flags_err:
>   if (copy_from_user(&input, (struct ext3_new_group_input __user 
> *)arg,
>   sizeof(input)))
>   return -EFAULT;
> -
> + /* AK: not sure the BKL is needed, but this might prevent
> +  * races against umount */
> + lock_kernel();
>   err = ext3_group_add(sb, &input);
>   journal_lock_updates(EXT3_SB(sb)->s_journal);
>   journal_flush(EXT3_SB(sb)->s_journal);
>   journal_unlock_updates(EXT3_SB(sb)->s_journal);
> + unlock_kernel();
>  

The ext3_ioctl() caller has an open fd against the fs - should be
sufficient to keep unmount away?

(gets even more rejects, drops all the fasync patches too)

It's all reached the stage of stupid.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix commit block write in JBD

2008-01-26 Thread Andrew Morton
> On Wed, 23 Jan 2008 20:09:43 +0100 Jan Kara <[EMAIL PROTECTED]> wrote:
> 
> Commit block is expected to have several copies of the header. Fix the
> bug Andrew has spotted ages ago.
> 

"ages" indeed.

> 
> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
> index 610264b..a69b240 100644
> --- a/fs/jbd/commit.c
> +++ b/fs/jbd/commit.c
> @@ -116,9 +116,8 @@ static int journal_write_commit_record(journal_t *journal,
>  
>   bh = jh2bh(descriptor);
>  
> - /* AKPM: buglet - add `i' to tmp! */
>   for (i = 0; i < bh->b_size; i += 512) {
> - journal_header_t *tmp = (journal_header_t*)bh->b_data;
> + journal_header_t *tmp = (journal_header_t*)(bh->b_data+i);
>   tmp->h_magic = cpu_to_be32(JFS_MAGIC_NUMBER);
>   tmp->h_blocktype = cpu_to_be32(JFS_COMMIT_BLOCK);
>   tmp->h_sequence = cpu_to_be32(commit_transaction->t_tid);

But I don't think we can _use_ this feature now.  Because there are
1000 disks out there which didn't implement it.

So why not just remove the loop and do a single write?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/49] Add buffer head related helper functions

2008-01-24 Thread Andrew Morton
> On Thu, 24 Jan 2008 10:52:27 +0530 "Aneesh Kumar K.V" <[EMAIL PROTECTED]> 
> wrote:
> + * Returns zero on success and -EIO on error.If the input
> + * buffer is not locked returns -EINVAL
> + *
> + */
> +int bh_submit_read(struct buffer_head *bh)
> +{
> + if (!buffer_locked(bh))
> + return -EINVAL;

Is this case just catching a programming bug?

If so, a plain old BUG_ON would be better.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-23 Thread Andrew Morton
> On Wed, 23 Jan 2008 04:12:16 -0500 Abhishek Rai <[EMAIL PROTECTED]> wrote:
> > I'm wondering about the interaction between this code and the
> > buffer_boundary() logic.  I guess we should disable the buffer_boundary()
> > handling when this code is in effect.  Have you reviewed and tested that
> > aspect?
> 
> Thanks for pointing this out, I had totally missed this issue in my change. 
> I've now made the call to set_buffer_boundary() in ext3_get_blocks_handle() 
> subject to metacluster option being set.
> 

Did it make any performance difference?  iirc the buffer_boundary stuff was
worth around 10% on a single linear read of a large, well-laid-out file.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 36/49] ext4: Add EXT4_IOC_MIGRATE ioctl

2008-01-23 Thread Andrew Morton
> On Mon, 21 Jan 2008 22:02:15 -0500 "Theodore Ts'o" <[EMAIL PROTECTED]> wrote:
> The below patch add ioctl for migrating ext3 indirect block mapped inode
> to ext4 extent mapped inode.

This patch adds lots of weird and inexplicable single- and double-newlines
in inappropriate places.  However it frequently forgets to add newlines
between end-of-locals and start-of-code, which is usual practice.


+struct list_blocks_struct {
+   ext4_lblk_t first_block, last_block;
+   ext4_fsblk_t first_pblock, last_pblock;
+};

This structure would benefit from some code comments.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 30/49] ext4: Convert truncate_mutex to read write semaphore.

2008-01-23 Thread Andrew Morton
> On Mon, 21 Jan 2008 22:02:09 -0500 "Theodore Ts'o" <[EMAIL PROTECTED]> wrote:
> +int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t 
> block,
> + unsigned long max_blocks, struct buffer_head *bh,
> + int create, int extend_disksize)
> +{
> + int retval;
> + if (create) {
> + down_write((&EXT4_I(inode)->i_data_sem));
> + } else {
> + down_read((&EXT4_I(inode)->i_data_sem));
> + }
> + if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
> + retval =  ext4_ext_get_blocks(handle, inode, block, max_blocks,
> + bh, create, extend_disksize);
> + } else {
> + retval = ext4_get_blocks_handle(handle, inode, block,
> + max_blocks, bh, create, extend_disksize);
> + }
> + if (create) {
> + up_write((&EXT4_I(inode)->i_data_sem));
> + } else {
> + up_read((&EXT4_I(inode)->i_data_sem));
> + }

This function has many unneeded braces.  checkpatch used to detect this
but it seems to have broken.

> + return retval;
> +}
>  static int ext4_get_block(struct inode *inode, sector_t iblock,
>   struct buffer_head *bh_result, int create)

Mising newline.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 41/49] ext4: Add multi block allocator for ext4

2008-01-23 Thread Andrew Morton
> On Mon, 21 Jan 2008 22:02:20 -0500 "Theodore Ts'o" <[EMAIL PROTECTED]> wrote:
> From: Alex Tomas <[EMAIL PROTECTED]>
> 
> Signed-off-by: Alex Tomas <[EMAIL PROTECTED]>
> Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
> Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
> Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
>
> ...
>
> +#if BITS_PER_LONG == 64
> +#define mb_correct_addr_and_bit(bit, addr)   \
> +{\
> + bit += ((unsigned long) addr & 7UL) << 3;   \
> + addr = (void *) ((unsigned long) addr & ~7UL);  \
> +}
> +#elif BITS_PER_LONG == 32
> +#define mb_correct_addr_and_bit(bit, addr)   \
> +{\
> + bit += ((unsigned long) addr & 3UL) << 3;   \
> + addr = (void *) ((unsigned long) addr & ~3UL);  \
> +}
> +#else
> +#error "how many bits you are?!"
> +#endif

Why do these exist?

> +static inline int mb_test_bit(int bit, void *addr)
> +{
> + mb_correct_addr_and_bit(bit, addr);
> + return ext4_test_bit(bit, addr);
> +}

ext2_test_bit() already handles bitnum > wordsize.

If mb_correct_addr_and_bit() is actually needed then some suitable comment
would help.

> +static inline void mb_set_bit(int bit, void *addr)
> +{
> + mb_correct_addr_and_bit(bit, addr);
> + ext4_set_bit(bit, addr);
> +}
> +
> +static inline void mb_set_bit_atomic(spinlock_t *lock, int bit, void *addr)
> +{
> + mb_correct_addr_and_bit(bit, addr);
> + ext4_set_bit_atomic(lock, bit, addr);
> +}
> +
> +static inline void mb_clear_bit(int bit, void *addr)
> +{
> + mb_correct_addr_and_bit(bit, addr);
> + ext4_clear_bit(bit, addr);
> +}
> +
> +static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void *addr)
> +{
> + mb_correct_addr_and_bit(bit, addr);
> + ext4_clear_bit_atomic(lock, bit, addr);
> +}
> +
> +static inline void *mb_find_buddy(struct ext4_buddy *e4b, int order, int 
> *max)

uninlining this will save about eighty squigabytes of text.

Please review all of ext4/jbd2 with a view to removig unnecessary and wrong
inlings.

> +{
> + char *bb;
> +
> + /* FIXME!! is this needed */
> + BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b));
> + BUG_ON(max == NULL);
> +
> + if (order > e4b->bd_blkbits + 1) {
> + *max = 0;
> + return NULL;
> + }
> +
> + /* at order 0 we see each particular block */
> + *max = 1 << (e4b->bd_blkbits + 3);
> + if (order == 0)
> + return EXT4_MB_BITMAP(e4b);
> +
> + bb = EXT4_MB_BUDDY(e4b) + EXT4_SB(e4b->bd_sb)->s_mb_offsets[order];
> + *max = EXT4_SB(e4b->bd_sb)->s_mb_maxs[order];
> +
> + return bb;
> +}
> +
>
> ...
>
> +#else
> +#define mb_free_blocks_double(a, b, c, d)
> +#define mb_mark_used_double(a, b, c)
> +#define mb_cmp_bitmaps(a, b)
> +#endif

Please use the do{}while(0) thing.  Or, better, proper C functions which
have typechecking (unless this will cause undefined-var compile errors,
which happens sometimes)

> +/* find most significant bit */
> +static int fmsb(unsigned short word)
> +{
> + int order;
> +
> + if (word > 255) {
> + order = 7;
> + word >>= 8;
> + } else {
> + order = -1;
> + }
> +
> + do {
> + order++;
> + word >>= 1;
> + } while (word != 0);
> +
> + return order;
> +}

Did we just reinvent fls()?

> +/* FIXME!! need more doc */
> +static void ext4_mb_mark_free_simple(struct super_block *sb,
> + void *buddy, unsigned first, int len,
> + struct ext4_group_info *grp)
> +{
> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> + unsigned short min;
> + unsigned short max;
> + unsigned short chunk;
> + unsigned short border;
> +
> + BUG_ON(len >= EXT4_BLOCKS_PER_GROUP(sb));
> +
> + border = 2 << sb->s_blocksize_bits;

Won't this explode with >= 32k blocksize?

> + while (len > 0) {
> + /* find how many blocks can be covered since this position */
> + max = ffs(first | border) - 1;
> +
> + /* find how many blocks of power 2 we need to mark */
> + min = fmsb(len);
> +
> + if (max < min)
> + min = max;
> + chunk = 1 << min;
> +
> + /* mark multiblock chunks only */
> + grp->bb_counters[min]++;
> + if (min > 0)
> + mb_clear_bit(first >> min,
> +  buddy + sbi->s_mb_offsets[min]);
> +
> + len -= chunk;
> + first += chunk;
> + }
> +}
> +
> 
> ...
>
> +static int ext4_mb_init_cache(struct page *page, char *incore)
> +{
> + int blocksize;
> + int blocks_per_page;
> + int groups_per_page;
> + int err = 0;
> + int i;
> + ext4_group_t first_group;
> + int first_block;

Re: [PATCH 24/49] ext4: add block bitmap validation

2008-01-23 Thread Andrew Morton
> On Mon, 21 Jan 2008 22:02:03 -0500 "Theodore Ts'o" <[EMAIL PROTECTED]> wrote:
> + if (bh_submit_read(bh) < 0) {
> + brelse(bh);
> + ext4_error(sb, __FUNCTION__,
>   "Cannot read block bitmap - "
> - "block_group = %lu, block_bitmap = %llu",
> - block_group, bitmap_blk);
> + "block_group = %d, block_bitmap = %llu",
> + (int)block_group, (unsigned long long)bitmap_blk);
> + return NULL;
> + }
> + if (!ext4_valid_block_bitmap(sb, desc, block_group, bh)) {
> + brelse(bh);
> + return NULL;
> + }

brelse() should only be used when the bh might be NULL - put_bh()
can be used here.

Please review all ext4/jbd2 code for this trivial speedup.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/49] Add buffer head related helper functions

2008-01-23 Thread Andrew Morton
> On Mon, 21 Jan 2008 22:02:02 -0500 "Theodore Ts'o" <[EMAIL PROTECTED]> wrote:
> +}
> +EXPORT_SYMBOL(bh_uptodate_or_lock);
> +/**

Missing newline.

> + * bh_submit_read: Submit a locked buffer for reading
> + * @bh: struct buffer_head
> + *
> + * Returns a negative error
> + */
> +int bh_submit_read(struct buffer_head *bh)
> +{
> + if (!buffer_locked(bh))
> + lock_buffer(bh);
> +
> + if (buffer_uptodate(bh))
> + return 0;

Here it can lock the buffer then return zero

> + get_bh(bh);
> + bh->b_end_io = end_buffer_read_sync;
> + submit_bh(READ, bh);
> + wait_on_buffer(bh);
> + if (buffer_uptodate(bh))
> + return 0;

Here it will unlock the buffer and return zero.

This function is unusable when passed an unlocked buffer.

The return value should (always) be documented.

> + return -EIO;
> +}
> +EXPORT_SYMBOL(bh_submit_read);
>  void __init buffer_init(void)

Missing newline.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 33/49] ext4: Add the journal checksum feature

2008-01-23 Thread Andrew Morton
> On Mon, 21 Jan 2008 22:02:12 -0500 "Theodore Ts'o" <[EMAIL PROTECTED]> wrote:
> From: Girish Shilamkar <[EMAIL PROTECTED]>
> 
> The journal checksum feature adds two new flags i.e
> JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.
> 
> JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
> checksum for the blocks described by the descriptor blocks.
> Due to checksums, writing of the commit record no longer needs to be
> synchronous. Now commit record can be sent to disk without waiting for
> descriptor blocks to be written to disk. This behavior is controlled
> using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
> able to recover the journal with _ASYNC_COMMIT hence it is made
> incompat.
> The commit header has been extended to hold the checksum along with the
> type of the checksum.
> 
> For recovery in pass scan checksums are verified to ensure the sanity
> and completeness(in case of _ASYNC_COMMIT) of every transaction.
> 
>  ...
>
> +static inline __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head 
> *bh)

unneeded inlining.

> +{
> + struct page *page = bh->b_page;
> + char *addr;
> + __u32 checksum;
> +
> + addr = kmap_atomic(page, KM_USER0);
> + checksum = crc32_be(crc32_sum,
> + (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> + kunmap_atomic(addr, KM_USER0);
> +
> + return checksum;
> +}

Can this buffer actually be in highmem?

>  static inline void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
>  unsigned long long block)

More unnecessary inlining.

> +/*
> + * jbd2_journal_clear_features () - Clear a given journal feature in the
> + *   superblock
> + * @journal: Journal to act on.
> + * @compat: bitmask of compatible features
> + * @ro: bitmask of features that force read-only mount
> + * @incompat: bitmask of incompatible features
> + *
> + * Clear a given journal feature as present on the
> + * superblock.  Returns true if the requested features could be reset.
> + */
> +int jbd2_journal_clear_features(journal_t *journal, unsigned long compat,
> + unsigned long ro, unsigned long incompat)
> +{
> + journal_superblock_t *sb;
> +
> + jbd_debug(1, "Clear features 0x%lx/0x%lx/0x%lx\n",
> +   compat, ro, incompat);
> +
> + sb = journal->j_superblock;
> +
> + sb->s_feature_compat&= ~cpu_to_be32(compat);
> + sb->s_feature_ro_compat &= ~cpu_to_be32(ro);
> + sb->s_feature_incompat  &= ~cpu_to_be32(incompat);
> +
> + return 1;
> +}
> +EXPORT_SYMBOL(jbd2_journal_clear_features);

Kernel usually returns 0 on success.  So we can return a useful errno on
failure.

> +/*
> + * calc_chksums calculates the checksums for the blocks described in the
> + * descriptor block.
> + */
> +static int calc_chksums(journal_t *journal, struct buffer_head *bh,
> + unsigned long *next_log_block, __u32 *crc32_sum)
> +{
> + int i, num_blks, err;
> + unsigned io_block;
> + struct buffer_head *obh;
> +
> + num_blks = count_tags(journal, bh);
> + /* Calculate checksum of the descriptor block. */
> + *crc32_sum = crc32_be(*crc32_sum, (void *)bh->b_data, bh->b_size);
> +
> + for (i = 0; i < num_blks; i++) {
> + io_block = (*next_log_block)++;

 unsigned <- unsigned long.

Are all the types appropriate in here?

> + wrap(journal, *next_log_block);
> + err = jread(&obh, journal, io_block);
> + if (err) {
> + printk(KERN_ERR "JBD: IO error %d recovering block "
> + "%u in log\n", err, io_block);
> + return 1;
> + } else {
> + *crc32_sum = crc32_be(*crc32_sum, (void *)obh->b_data,
> +  obh->b_size);
> + }
> + }
> + return 0;
> +}
> +
>  static int do_one_pass(journal_t *journal,
>   struct recovery_info *info, enum passtype pass)
>  {
> @@ -328,6 +360,7 @@ static int do_one_pass(journal_t *journal,
>   unsigned intsequence;
>   int blocktype;
>   int tag_bytes = journal_tag_bytes(journal);
> + __u32   crc32_sum = ~0; /* Transactional Checksums */
>  
>   /* Precompute the maximum metadata descriptors in a descriptor block */
>   int MAX_BLOCKS_PER_DESC;
> @@ -419,9 +452,23 @@ static int do_one_pass(journal_t *journal,
>   switch(blocktype) {
>   case JBD2_DESCRIPTOR_BLOCK:
>   /* If it is a valid descriptor block, replay it
> -  * in pass REPLAY; otherwise, just skip over the
> -  * blocks it describes. */
> +  * in pass REPLAY; if journal_checksums enabled, then
> +  * calcul

Re: [PATCH] Do not try lock_acquire after handle made invalid

2008-01-15 Thread Andrew Morton
On Wed, 16 Jan 2008 00:39:26 +0100
Jonas Bonn <[EMAIL PROTECTED]> wrote:

> This likely fixes the oops in __lock_acquire reported as:
> 
> http://www.kerneloops.org/raw.php?rawid=2753&msgid=
> http://www.kerneloops.org/raw.php?rawid=2749&msgid=
> 
> In these reported oopses, start_this_handle is returning -EROFS.
> 
> Signed-off-by: Jonas Bonn <[EMAIL PROTECTED]>
> ---
>  fs/jbd/transaction.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index 08ff6c7..038ed74 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -288,10 +288,12 @@ handle_t *journal_start(journal_t *journal, int nblocks)
>   jbd_free_handle(handle);
>   current->journal_info = NULL;
>   handle = ERR_PTR(err);
> + goto out;
>   }
>  
>   lock_acquire(&handle->h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
>  
> +out:
>   return handle;
>  }

Yeah, thanks.

It looks like we forgot to port this lockdep support into ext4.  This is
bad.  What else got lost?

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-15 Thread Andrew Morton

I'm wondering about the real value of this change, really.

In any decent environment, people will fsck their ext3 filesystems during
planned downtime, and the benefit of reducing that downtime from 6
hours/machine to 2 hours/machine is probably fairly small, given that there
is no service interruption.  (The same applies to desktops and laptops).

Sure, the benefit is not *zero*, but it's small.  Much less than it would
be with ext2.  I mean, the "avoid unplanned fscks" feature is the whole
reason why ext3 has journalling (and boy is that feature expensive during
normal operation).

So...  it's unobvious that the benefit of this feature is worth its risks
and costs?

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-14 Thread Andrew Morton
On Mon, 14 Jan 2008 08:39:01 -0500
Abhishek Rai <[EMAIL PROTECTED]> wrote:

> This is the patch for 2.6.24-rc6 -mm tree, please let me know if anyone would 
> like a patch against another recent kernel. Ingo Molnar has already posted a 
> patch for 2.6.24-rc7.
> 

Please always retain and maintain the full changelog with each version of a
patch.  The changelog in your earlier "Clustering indirect blocks in Ext3"
patch was good.  I retained that (after renaming it to "ext3: indirect
block clustering" rather than this dopey name) and then, err, dropped the
patch ;)

Please cc linux-ext4 on ext2/3/4-related work.

Please update Documentation/filesystems/ext3.txt when adding new mount
options.

Here's a quick low-levelish nitpickingish implementation review.  I haven't
thought about the design-level things yet.

This code will need careful checking regarding the journalling side of
things - to verify that if the machine crashes at any stage, recovery will
produce a consistent and secure fs.  It is all-nigh impossible to spot bugs
in this area via the usual kernel bug reporting process and it is very hard
to effectively runtime test recovery even in a development situation. 
Careful review unusually important here.

> +/
> + * Count number of free blocks in a block group that don't lie in the
> + * metacluster region of the block group.
> + */
> +static void
> +ext3_init_grp_free_nonmc_blocks(struct super_block *sb,
> + struct buffer_head *bitmap_bh,
> + unsigned long block_group)
> +{
> + struct ext3_sb_info *sbi = EXT3_SB(sb);
> + struct ext3_bg_info *bgi = &sbi->s_bginfo[block_group];
> +
> + BUG_ON(!test_opt(sb, METACLUSTER));
> +
> + spin_lock(sb_bgl_lock(sbi, block_group));
> + if (bgi->bgi_free_nonmc_blocks_count >= 0)
> + goto out;
> +
> + bgi->bgi_free_nonmc_blocks_count =
> + ext3_count_free(bitmap_bh, sbi->s_nonmc_blocks_per_group/8);
> +
> +out:
> + spin_unlock(sb_bgl_lock(sbi, block_group));
> + BUG_ON(bgi->bgi_free_nonmc_blocks_count >
> + sbi->s_nonmc_blocks_per_group);
> +}

hm, so ext3_count_free() hits prime time.

ext3_count_free() should be converted to use the standard hweight8(). 
Really it should be converted to hweight_long() or hweight32() or something
- it is presently probably inefficient.

This function appears to be called frequently and it calls the slow-looking
ext3_count_free() under spinlock.  This might have scalability problems on
the right machine running the right workload.

Can we address that before we hit it?

> +/*
> + * ext3_update_nonmc_block_count:
> + *   Update bgi_free_nonmc_blocks_count for block group 'group_no' following
> + *   an allocation or deallocation.
> + *
> + *   @group_no:  affected block group
> + *   @start: start of the [de]allocated range
> + *   @count: number of blocks [de]allocated
> + *   @allocation:1 if blocks were allocated, 0 otherwise.
> + */
> +static inline void
> +ext3_update_nonmc_block_count(struct ext3_sb_info *sbi, unsigned long 
> group_no,
> + ext3_grpblk_t start, unsigned long count,
> + int allocation)
> +{
> + struct ext3_bg_info *bginfo = &sbi->s_bginfo[group_no];
> + ext3_grpblk_t change;
> +
> + BUG_ON(bginfo->bgi_free_nonmc_blocks_count < 0);
> + BUG_ON(start >= sbi->s_nonmc_blocks_per_group);
> +
> + change = min_t(ext3_grpblk_t, start + count,
> + sbi->s_nonmc_blocks_per_group) - start;
> +
> + spin_lock(sb_bgl_lock(sbi, group_no));
> + BUG_ON(bginfo->bgi_free_nonmc_blocks_count >
> + sbi->s_nonmc_blocks_per_group);
> + BUG_ON(allocation && bginfo->bgi_free_nonmc_blocks_count < change);
> +
> + bginfo->bgi_free_nonmc_blocks_count += (allocation ? -change : change);
> +
> + BUG_ON(bginfo->bgi_free_nonmc_blocks_count >
> + sbi->s_nonmc_blocks_per_group);
> + spin_unlock(sb_bgl_lock(sbi, group_no));
> +}

Far too large to inline.  Please just remove all inlines from the patch. 
The compiler will sort it out.

> +static ext3_grpblk_t
> +bitmap_find_prev_zero_bit(char *map, ext3_grpblk_t start, ext3_grpblk_t 
> lowest)
> +{
> + ext3_grpblk_t k, blk;
> +
> + k = start & ~7;
> + while (lowest <= k) {
> + if (map[k/8] != '\255' &&
> + (blk = ext3_find_next_zero_bit(map, k + 8, k))
> +  < (k + 8))
> + return blk;
> +
> + k -= 8;
> + }
> + return -1;
> +}

Please rename `k' to something meaningful.

The logic in here looks odd.  If (map[k/8] != 0xff) then the
ext3_find_next_zero_bit() cannot fail.  So why do we test for it in this
fashion?

Perhaps some commnetary is needed here.

> +static ext3_grpblk_t
> +bitmap_search_prev_usable_block(ext3_grpblk_t start, struct buffer_head *bh,
> + ext3_grpblk_

Re: [Bug 9692] New: journal_data mount option causes filesystem corruption with blocksize != 4096

2008-01-05 Thread Andrew Morton
On Sat,  5 Jan 2008 09:52:15 -0800 (PST) [EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9692
>
>Summary: journal_data mount option causes filesystem corruption
> with blocksize != 4096
>Product: File System
>Version: 2.5
>  KernelVersion: 2.6.23.9
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: high
>   Priority: P1
>  Component: ext3
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> Most recent kernel where this bug did not occur: -
> Older kernels have this problem too (I think I noticed this booting >= 2.6.21,
> definitely 2.6.22).

I'm getting the feeling that we should just disable data=journal.  Make it
use data=ordered mode instead.  It isn't getting a lot of attention..

> Distribution: Gentoo Linux x86
> This bug seems to be hardware-independent (tested on three different machines
> which all use quite different drivers). If you need hardware information or 
> any
> other log or configuration files, let me know please.
> 
> Problem Description:
> When creating an ext3 filesystem with journal_data option and block sizes
> different than 4096 (tested: 1024, 2048) filesystem corruption will occur if
> certain operations are performed (see below).
> Corruption will not occur if 4096 block size is used, or if any other block
> size is used together with journal_data_ordered or journal_data_writeback.
> No errors in dmesg.
> 
> Steps to reproduce:
> I found this bug using an audio file tagger, so you need exfalso which is part
> of quodlibet (http://www.sacredchao.net/quodlibet/). No other file tagger I
> used produced this kind of problem. Still, this has to be a kernel problem,
> right??
> 
> 1. Create ext3 file system:
> mkfs.ext3 -O has_journal,dir_index -b 1024 /dev/sdd1
> tune2fs -c 0 -i 0 -m 0 -o journal_data /dev/sdd1
> 
> tune2fs 1.40.3 (05-Dec-2007)  (filtered)
> Filesystem volume name:   
> Last mounted on:  
> Filesystem magic number:  0xEF53
> Filesystem revision #:1 (dynamic)
> Filesystem features:  has_journal resize_inode dir_index filetype
> sparse_super
> Filesystem flags: signed directory hash
> Default mount options:journal_data
> Filesystem state: clean
> Errors behavior:  Continue
> Filesystem OS type:   Linux
> Inode count:  126976
> Block count:  1012060
> Reserved block count: 0
> Free blocks:  976865
> Free inodes:  126965
> First block:  1
> Block size:   1024
> Fragment size:1024
> Reserved GDT blocks:  256
> Blocks per group: 8192
> Fragments per group:  8192
> Inodes per group: 1024
> Inode blocks per group:   128
> Last mount time:  n/a
> Mount count:  0
> Maximum mount count:  -1
> Check interval:   0 ()
> Reserved blocks uid:  0 (user root)
> Reserved blocks gid:  0 (group root)
> First inode:  11
> Inode size:   128
> Journal inode:8
> Default directory hash:   tea
> Journal backup:   inode blocks
> 
> 2. Mount it and copy mp3,ogg,... files to it. This does not cause any file
> system corruption (which you can confirm by running fsck).
> 
> pmount /dev/sdd1:
> /dev/sdd1 on /media/sdd1 type ext3 (rw,noexec,nosuid,nodev,errors=continue)
> 
> 3. Use quodlibet/exfalso to change the id3 tags. Add tags to it if not 
> present,
> or delete them if already present. This will lead to file system corruption.
> 
> brw-r- 1 root disk 8, 49 /dev/sdd1
> 
> 4. Unmount the volume.
> pumount /dev/sdd1
> 
> 5. Run fsck -fvD /dev/sdd1. It will complain about wrong i_size.
> 
> e2fsck 1.40.3 (05-Dec-2007)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 47106, i_size is 5015509, should be 5017600.  Fix? yes
> Inode 47107, i_size is 4657736, should be 4661248.  Fix? yes
> Inode 47109, i_size is 11928555, should be 11931648.  Fix? yes
> Inode 47111, i_size is 5698454, should be 5701632.  Fix? yes
> Inode 47112, i_size is 9384018, should be 9388032.  Fix? yes
> Inode 47114, i_size is 5679228, should be 5681152.  Fix? yes
> Inode 47115, i_size is 6107218, should be 6111232.  Fix? yes
> Inode 47117, i_size is 4354297, should be 4358144.  Fix? yes
> Inode 47118, i_size is 4512286, should be 4513792.  Fix? yes
> Inode 47120, i_size is 7010846, should be 7012352.  Fix? yes
> 
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 3A: Optimizing directories
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> 
> /dev/sdd1: * FILE SYSTEM WAS MODIFIED *
> 
>   28 inodes used (0.02%)
>   14 non-contiguous inodes (50.0%)
>  # of inodes with ind/dind/tind blocks: 15/15/0
>   123417 blocks used (12.19%)
>0 bad blocks
>0 large files
> 
>   16

ext4 still broken on multiple architectures

2007-12-12 Thread Andrew Morton
fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy':
fs/ext4/mballoc.c:836: error: implicit declaration of function 
'ext2_find_next_bit'

Can someone please get this fixed?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext[234]: cleanup ext[234]_bg_num_gdb()

2007-12-11 Thread Andrew Morton
On Sun, 9 Dec 2007 00:31:56 +0900 Akinobu Mita <[EMAIL PROTECTED]> wrote:

> Use ext[234]_bg_has_super() to remove duplicate code.

Would prefer one patch per filesystem, please.

After fixing the rejects, I now have patches which affect ext2 and ext3
but which have a merging dependency upon ext4.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered

2007-12-11 Thread Andrew Morton

(switching to email - please respond via emailed reply-to-all, not via the
bugzilla web interface)

On Tue, 11 Dec 2007 11:36:39 -0800 (PST)
[EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9546
> 
>Summary: Huge latency in concurrent I/O when using data=ordered
>Product: File System
>Version: 2.5
>  KernelVersion: 2.6.24-rc4
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: ext3
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> Most recent kernel where this bug did not occur:
> Unknown, certainly not a regression, but something specific to ext3 algorithm
> 
> Distribution:
> Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12
> 
> Hardware Environment:
> Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM
> Athlon 64X2 4200+SATA 7200 200GB drive+1GB
> Athlon 2800+IDE 7200 40GB drive+512MB
> 
> Software Environment:
> dd, cp, konqueror/KDE, mount/tune2fs
> 
> Problem Description:
> When the system does heavy input/output operations on big files, small files
> access from other applications are always not served for very long time. This
> can cause huge latencies. The system is really not usable at all, even with 
> all
> the recent improvements done to increase interactivity on desktop.
> 
> This behaviour is very visible with the simple following test case:
> 1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies the
> files in the DVD stucture, then pass on them to fix VOBUs, but this part is 
> not
> very long so this is not the main problem).
> 2. While the computer is doing this, try to open a web browser such as
> konqueror. Then open a page from bookmark. Then open a new tab, then open
> another page from bookmark. Switch bak to first page.
> 
> What I get is:
> 35 seconds to open Konqueror.
> 8 seconds to open the "bookmark menu". Incredible.
> 30 seconds to open the web page (DSL/10GBits).
> 5 seconds to open the second tab.
> 6 seconds to reopen the menu.
> 36 seconds to open the second page.
> 14 seconds to come back to first tab.
> This is unbelievable! The system is completely trashed, with more than 1GB 
> RAM,
> whatever the hardware configuration is used.
> 
> Of course, I investigated the problem... First, DMA is OK. Second, I thought
> cache would make memory swapped. So I used echo 0 > swapiness. Then (of 
> course,
> the system was not swapping at all), I thought TEXT sections from software
> discarded (that would be simply stupid, but who knows?). I then tried to make
> the writing process throttled with dirty_background_ratio (say 10%) while
> reserving a greater RAM portion for the rest of the system with dirty_ratio
> (say 70%). No way. Then I launched top, and looked at the WCHAN to see what 
> was
> the problem for the frozen process (ie: konqueror). The I saw the faulty guy:
> log_wait_commit!
> 
> So I concluded there is unfair access to the filesystem journal. So I tried
> other journaling options than the default "ordered" data mode. The results 
> were
> really different: "5s, 2s, 4s, etc.", both with journal and write back mode!
> 
> I therefore think there is a great lock and even maybe a priority inversion in
> log_wait_commit of the ext3 filesystem. I think that, even if it is throttled,
> the writing process always get access to the journal in ordered mode, simply
> because it writes many pages at a time and because the ordered mode indeed
> implies... ordering of requests (as I understand it).
> 
> It's sad this is the default option that gives the worst interactivity
> problems. Indeed, this messes all previous work done to enhance desktop
> experience I think, too bad!
> 
> Btw, I've also seen on Internet that some people reported that journal data
> mode gives "better" performance. I think the problem was indeed related to
> latency rather than performance (timing the writing process effectively shows 
> a
> output rate halved with journal data mode, and twice the time to process).
> 
> Steps to reproduce:
> I did a simple script:
> #!/bin/bash
> 
> SRC1=src1.bin
> SRC2=src2.bin
> DEST_DIR=tmpdir
> DST1=dst.bin
> 
> # First, create the source files:
> if [ ! -e $SRC1 ] ; then
> dd if=/dev/zero of=$SRC1 bs=10k count=15
> fi
> if [ ! -e $SRC2 ] ; then
> dd if=/dev/zero of=$SRC2 bs=10k count=15
> fi
> mkdir $DEST_DIR > /dev/null 2>&1
> sync
> 
> # Do the test:
> echo "Trashing the system..."
> rm $DEST_DIR/$DST1 > /dev/null 2>&1
> cp $SRC1 $DEST_DIR/$DST1
> cat $SRC2 >> $DEST_DIR/$DST1
> echo "Done!"
> 
> #rm -rf $DEST_DIR $SRC1 $SRC2
> 
> While running it, try to use "normally" the interactive programs, such as
> konqueror (the program should have to access files, such as cookies, cache and
> so for konqueror). Then remount/tune the filesystem to use another data mode
> for ext3.
> 
> I didn

Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread Andrew Morton
On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Sat, 8 Dec 2007 10:22:39 -0800
> 
> > That's
> > 
> > J_ASSERT_BH(bh, !buffer_jbddirty(bh));
> > 
> > at the end of journal_unmap_buffer().
> > 
> > I don't recall seeing that before and I can't think of anything we've
> > done recently which could cause it, sorry.
> 
> If the per-cpu data patches are in the -mm tree that is the first
> place I would start looking at for possible cause.

They aren't.  The dust hadn't settled enough on those when Christoph shot
through on vacation.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-08 Thread Andrew Morton
On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote:

>   The box is sun ultra 60 (dual sparc64). This was caught when
> system (gentoo) was emerging some package. 
> 
> [27006.402237] kernel BUG at fs/jbd/transaction.c:1894!

That's

J_ASSERT_BH(bh, !buffer_jbddirty(bh));

at the end of journal_unmap_buffer().

I don't recall seeing that before and I can't think of anything we've
done recently which could cause it, sorry.

> [27006.402268]   \|/  \|/
> [27006.402274]   "@'/ .. \`@"
> [27006.402279]   /_| \__/ |_\
> [27006.402285]  \__U_/

x86 needs that.

> [27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
> [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
> 0053b1d0 Y: Not tainted
> [27006.402579] TPC: 
> [27006.402593] g0: 0002 g1:  g2: 0001 
> g3: f800a7d9
> [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
> g7: 0076d868
> [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
> o3: 0001
> [27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
> ret_pc: 0053b1c4
> [27006.402665] RPC: 
> [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
> l3: 
> [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
> l7: 0001
> [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
> i3: 00727000
> [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
> i7: 00529254
> [27006.402763] I7: 
> [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
> [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
> [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
> [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
> [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
> [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
> [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
> [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
> [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
> [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
> [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
> [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
> [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
> [27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 <91d02005> 
> 92102780  7ffbbeab  90122260  91d02005  7ffbbea8
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 9483] circular locking dependency detected

2007-12-04 Thread Andrew Morton
On Tue, 4 Dec 2007 22:25:18 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Aneesh Kumar K.V <[EMAIL PROTECTED]> wrote:
> 
> > ===
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.24-rc3 #6
> > ---
> > bash/2294 is trying to acquire lock:
> > (&journal->j_list_lock){--..}, at: [] 
> > journal_try_to_free_buffers+0x76/0x10c
> > 
> > but task is already holding lock:
> > (inode_lock){--..}, at: [] drop_pagecache+0x48/0xd8
> > 
> > which lock already depends on the new lock.
> 
> Andrew, drop_pagecache() is root-only and it has some known deadlock, 
> right?
> 

yup.   It takes inode_lock at too high a level so it can walk the per-sb inode
lists.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/8] ext4: fix MB_DEBUG format warnings

2007-11-30 Thread Andrew Morton
On Fri, 30 Nov 2007 16:17:32 -0800
Mingming Cao <[EMAIL PROTECTED]> wrote:

> Thanks for resending the patches. The first three patches in this series
> were missed in the ext4 patch queue, the rest of them are already queued
> there.  They are all in ext4 patch queue now. I will check other ext4
> patches you plan to merge to mm tree and get them sync in ext4 tree.
> 
> http://repo.or.cz/w/ext4-patch-queue.git
> git://repo.or.cz/ext4-patch-queue.git

What's all this about?  I've always been pulling
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-patches/LATEST/broken-out/
(and continuously fixing the same rejects each time).


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4 still broken on arm (at least)

2007-11-27 Thread Andrew Morton
On Wed, 28 Nov 2007 11:00:20 +0530 "Aneesh Kumar K.V" <[EMAIL PROTECTED]> wrote:

> On Tue, Nov 27, 2007 at 09:14:44PM -0800, Andrew Morton wrote:
> > fs/ext4/mballoc.c: In function `ext4_mb_generate_buddy':
> > fs/ext4/mballoc.c:836: error: implicit declaration of function 
> > `ext2_find_next_bit'
> >  
> 
> 
> I actually sent in a patch which changes asking for review to
> linux-arch. I haven't got the response yet.

Don't expect one...

> Attaching the patch
> below 


> Introduce ext4_find_next_bit
> 
> From: Aneesh Kumar K.V <[EMAIL PROTECTED]>
> 
> This gets used by the ext4 multi block allocator patches.
> 
> Also add generic_find_next_le_bit
> 
> Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
> Cc: 
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
> 
>  include/asm-arm/bitops.h |2 +
>  include/asm-generic/bitops/ext2-non-atomic.h |2 +
>  include/asm-generic/bitops/le.h  |4 ++
>  include/asm-m68k/bitops.h|2 +
>  include/asm-m68knommu/bitops.h   |2 +
>  include/asm-powerpc/bitops.h |4 ++
>  include/asm-s390/bitops.h|2 +
>  include/linux/ext4_fs.h  |1 +
>  lib/find_next_bit.c  |   43 
> ++

May as well merge it I guess.  I'll do that.  Possibly it should be
merged into the ext4 tree instead, so that people can build the ext4 devel
tree on other architectures but obviously there's no call for that.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ext4 still broken on arm (at least)

2007-11-27 Thread Andrew Morton
fs/ext4/mballoc.c: In function `ext4_mb_generate_buddy':
fs/ext4/mballoc.c:836: error: implicit declaration of function 
`ext2_find_next_bit'
 
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Add buffer head related helper functions

2007-11-16 Thread Andrew Morton
On Thu, 15 Nov 2007 17:40:43 +0530
"Aneesh Kumar K.V" <[EMAIL PROTECTED]> wrote:

> Add buffer head related helper function
> bh_uptodate_or_lock and bh_submit_read
> which can be used by file system
> 

The patches look sane.

> ---
>  include/linux/buffer_head.h |   29 +
>  1 files changed, 29 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index da0d83f..82cc9ef 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -327,6 +327,35 @@ static inline void lock_buffer(struct buffer_head *bh)
>  
>  extern int __set_page_dirty_buffers(struct page *page);
>  
> +/* Return true if the buffer is up-to-date.
> + * Return false, with the buffer locked, if not.
> + */
> +static inline int bh_uptodate_or_lock(struct buffer_head *bh)
> +{
> + if (!buffer_uptodate(bh)) {
> + lock_buffer(bh);
> + if (!buffer_uptodate(bh))
> + return 0;
> + unlock_buffer(bh);
> + }
> + return 1;
> +}
> +/*
> + * Submit a locked buffer for reading,
> + * return a negative error and release
> + * the buffer if failed.
> + */
> +static inline int bh_submit_read(struct buffer_head *bh)
> +{
> + get_bh(bh);
> + bh->b_end_io = end_buffer_read_sync;
> + submit_bh(READ, bh);
> + wait_on_buffer(bh);
> + if (buffer_uptodate(bh))
> + return 0;
> + brelse(bh);
> + return -EIO;
> +}
>  #else /* CONFIG_BLOCK */

But these are waay too big to be inlined.  Could I ask that they be
turned into regular EXPORT_SYMBOL()ed functions in buffer.c?

Might as well turn the comments into regular kerneldoc format, too.

The function names are a bit awkward, but I can't think of anything better.

I'm surprised that we don't already have a function which does what
bh_submit_read() does.

bh_submit_read() might become a bit simpler if it called ll_rw_block().

I'd have thought that to be more general, bh_submit_read() should return
immediately if the buffer is uptodate.  ll_rw_block() sort of helps there.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext3,4:fdatasync should skip metadata writeout

2007-11-15 Thread Andrew Morton
On Thu, 15 Nov 2007 22:47:40 -0500 Wendy Cheng <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> 
> >On Fri, 16 Nov 2007 11:47:27 +0900 Hisashi Hifumi <[EMAIL PROTECTED]> wrote:
> >
> >  
> >
> >>Currently fdatasync is identical to fsync in ext3,4.
> >>I think fdatasync should skip journal flush in data=ordered and 
> >>data=writeback mode
> >>because this syscall is not required to synchronize the metadata.
> >>
> >>
> >
> >I suppose so.  Although one wonders what earthly point there is in syncing
> >a file's data if we haven't yet written out the metadata which is required
> >for locating that data.
> >
> >IOW, fdatasync() is only useful if the application knows that it is 
> >overwriting
> >already-instantiated blocks.
> >
> >In which case it might as well have used fsync().  For ext2-style 
> >filesystems,
> >anyway.
> >
> >hm.  It needs some thought.
> >
> >  
> >
> 
> There are non-trivial amount of performance critical programs, 
> particularly in financial application segment ported from legacy UNIX 
> platforms, know the difference between fsync() and fdatasync(). Those 
> can certainly take advantages of this separation. Don't underestimate 
> the talents of these application programmers.
> 

If they're that good, they'll be using sync_file_range() ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext3,4:fdatasync should skip metadata writeout

2007-11-15 Thread Andrew Morton
On Fri, 16 Nov 2007 11:47:27 +0900 Hisashi Hifumi <[EMAIL PROTECTED]> wrote:

> Currently fdatasync is identical to fsync in ext3,4.
> I think fdatasync should skip journal flush in data=ordered and 
> data=writeback mode
> because this syscall is not required to synchronize the metadata.

I suppose so.  Although one wonders what earthly point there is in syncing
a file's data if we haven't yet written out the metadata which is required
for locating that data.

IOW, fdatasync() is only useful if the application knows that it is overwriting
already-instantiated blocks.

In which case it might as well have used fsync().  For ext2-style filesystems,
anyway.

hm.  It needs some thought.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Introduce ext4_find_next_bit

2007-11-13 Thread Andrew Morton
On Wed, 14 Nov 2007 00:41:03 +0530 "Aneesh Kumar K.V" <[EMAIL PROTECTED]> wrote:

> 
> 
> Andrew Morton wrote:
> > On Fri, 21 Sep 2007 10:55:05 +0530 "Aneesh Kumar K.V" <[EMAIL PROTECTED]> 
> > wrote:
> > 
> >> Also add generic_find_next_le_bit
> >>
> >> This gets used by the ext4 multi block allocator patches.
> >>
> > 
> > arm allmodconfig:
> > 
> > fs/ext4/mballoc.c: In function `ext4_mb_generate_buddy':
> > fs/ext4/mballoc.c:836: error: implicit declaration of function 
> > `ext2_find_next_bit'
> > 
> > This patch makes my head spin.
> > 
> > Why did we declare generic_find_next_le_bit() in
> > include/asm-powerpc/bitops.h (wrong) as well as in
> > include/asm-generic/bitops/le.h (presumably correct)?
> > 
> 
> I was following the coding style used for rest of the APIs
> like ext4_set_bit.

Well.  There's quite a bit of cruft in there.  If you do come across
something which isn't right, please do try to find the time to fix it up
first.

That might be non-trivial - powerpc does seem to have gone off on a strange
tangent there.

> 
> > Why is it touching a powerpc file and no any other architectures? 
> > Something screwed up in powerpc land?
> > 
> > And why did arm break?
> 
> arm and below list of arch doesn't include the 
> asm-generic/bitops/ext2-non-atomic.h 
> 
> I did a grep and that list the below architectures as also affected.
> arm, m68k, m68knommu, s390
> 
> > 
> > Shudder.  Anyway, please fix, and if that fix requires that various
> > braindamaged be repaired, please repair the braindamage rather than going
> > along with it.
> > 
> > 
> 
> That should be a separate patch altogether. I wanted to do the cleanup
> along with the usages such as but never got time to do the same.
> 
> #define ocfs2_set_bit ext2_set_bit
> #define udf_set_bit(nr,addr) ext2_set_bit(nr,addr)
> direct usage in mb
> md/bitmap.c +799
> md/dm-log.c +177
> 
> I will send a patch tomorrow that fix  arm and other architectures. I guess 
> the cleanup
> can be a separate patch ? 
> 

Yes, that's a separate work, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Introduce ext4_find_next_bit

2007-11-12 Thread Andrew Morton
On Fri, 21 Sep 2007 10:55:05 +0530 "Aneesh Kumar K.V" <[EMAIL PROTECTED]> wrote:

> Also add generic_find_next_le_bit
> 
> This gets used by the ext4 multi block allocator patches.
> 

arm allmodconfig:

fs/ext4/mballoc.c: In function `ext4_mb_generate_buddy':
fs/ext4/mballoc.c:836: error: implicit declaration of function 
`ext2_find_next_bit'

This patch makes my head spin.

Why did we declare generic_find_next_le_bit() in
include/asm-powerpc/bitops.h (wrong) as well as in
include/asm-generic/bitops/le.h (presumably correct)?

Why is it touching a powerpc file and no any other architectures? 
Something screwed up in powerpc land?

And why did arm break?

Shudder.  Anyway, please fix, and if that fix requires that various
braindamaged be repaired, please repair the braindamage rather than going
along with it.

Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9329] New: ext4: delalloc space accounting problem drops data

2007-11-08 Thread Andrew Morton
> On Thu,  8 Nov 2007 09:42:10 -0800 (PST) [EMAIL PROTECTED] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9329
> 
>Summary: ext4: delalloc space accounting problem drops data
>Product: File System
>Version: 2.5
>  KernelVersion: 2.6.24-rc1
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: ext4
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> 2.6.24-rc1 + ext4 git patch queue from last week or so.
> 
> It appears that delalloc does not track used space properly, and fails to
> return ENOSPC as appropriate:
> 
> [EMAIL PROTECTED] ~]# mkfs.ext3 -I 256 /dev/sdb7 32768
> [EMAIL PROTECTED] ~]# mount -t ext4dev -o 
> data=writeback,delalloc,extents,mballoc
> /dev/sdb7 /mnt/test
> [EMAIL PROTECTED] ~]# df -h /mnt/test
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sdb7  30M  4.5M   24M  16% /mnt/test
> [EMAIL PROTECTED] ~]# du -h /tmp/1Mfile 
> 1.1M/tmp/1Mfile
> [EMAIL PROTECTED] ~]# for I in `seq 1 50`; do cp /tmp/1Mfile 
> /mnt/test/1Mfile-$I;
> done
> [EMAIL PROTECTED] ~]# df -h /mnt/test
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sdb7  30M   30M 0 100% /mnt/test
> 
> all resulting files are 1M in length:
> [EMAIL PROTECTED] ~]# ls -l /mnt/test/1M* | grep -v 1048576
> [EMAIL PROTECTED] ~]# ls -l /mnt/test/1M* | grep 1048576 | wc -l
> 50
> but many of them have silently dropped data on the floor:
> [EMAIL PROTECTED] ~]# du -hc /mnt/test/1Mfile-* | grep -v "1.0M"
> 596K/mnt/test/1Mfile-26
> 0   /mnt/test/1Mfile-27
> 0   /mnt/test/1Mfile-28
> 0   /mnt/test/1Mfile-29
> 0   /mnt/test/1Mfile-30
> 
> 
> When mounted with nodelalloc, I get proper behavior:
> 
> [EMAIL PROTECTED] ~]# for I in `seq 1 50`; do cp /tmp/1Mfile 
> /mnt/test/1Mfile-$I;
> done
> cp: writing `/mnt/test/1Mfile-26': No space left on device
> cp: writing `/mnt/test/1Mfile-27': No space left on device
> cp: writing `/mnt/test/1Mfile-28': No space left on device
> cp: writing `/mnt/test/1Mfile-29': No space left on device
> 
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> --- You are receiving this mail because: ---
> You are on the CC list for the bug, or are watching someone who is.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] ext4 update

2007-10-25 Thread Andrew Morton
On Thu, 25 Oct 2007 16:44:21 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Thu, 25 Oct 2007, Andrew Morton wrote:
> > 
> > There shouldn't have been conflicts here - if there were I wouldn't have
> > sent those patches.  Unless there were things in the ext4 pull which
> > weren't present in the ext4 quilt tree which I included in 2.6.23-mm1?
> 
> Well, you merge your patch-series by patching.
> 
> You should have noticed by now that GNU patch in particular will happily 
> apply a patch whether it conflicts or not. So it's entirely possible that 
> it didn't conflict for you, but applied cleanly and sanely.
> 

hrm, could be.  It would be strange for that to happen quietly with fuzz=1
and to still produce a compileable result.

But there weren't any patches in this git-merge which weren't in 2.6.23-mm1
so maybe something like that happened.  Or maybe that fact that this pull
only contained _some_ of the ext4 patches which were in -mm somehow affected
things.

Oh well, I should have sent the ext4 changes via Ted anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] ext4 update

2007-10-25 Thread Andrew Morton
On Wed, 17 Oct 2007 08:59:53 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Wed, 17 Oct 2007, Theodore Ts'o wrote:
> > 
> > Please pull from:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git 
> > for_linus
> 
> This conflicts in nontrivial ways with
> 
>   commit 7c9e69faa28027913ee059c285a5ea8382e24b5d
>   Author: Aneesh Kumar K.V <[EMAIL PROTECTED]>
>   Date:   Tue Oct 16 23:27:02 2007 -0700
> 
>   ext2/ext3/ext4: add block bitmap validation
> 
> which I just merged from -mm.



There shouldn't have been conflicts here - if there were I wouldn't have
sent those patches.  Unless there were things in the ext4 pull which
weren't present in the ext4 quilt tree which I included in 2.6.23-mm1?

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm] Split fs/Kconfig: ext[234]

2007-10-22 Thread Andrew Morton
On Sat, 6 Oct 2007 12:15:08 +0400
Alexey Dobriyan <[EMAIL PROTECTED]> wrote:

> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> ---
> 
>  fs/Kconfig  |  191 --
>  fs/ext2/Kconfig |   55 +
>  fs/ext3/Kconfig |   67 
>  fs/ext4/Kconfig |   65 +++
>  4 files changed, 190 insertions(+), 188 deletions(-)

A reasonable thing to do, but poorly timed.  I'd prefer not to carry a
patch like this through two months of development, please.

Late in the -rc timeframe would be a better time.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-18 Thread Andrew Morton
On Thu, 18 Oct 2007 02:03:39 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Wed, 17 Oct 2007, Andrew Morton wrote:
> 
> > b) what happens when an old ext2 driver tries to read and/or write this
> >directory entry?  Do we need a compat flag for it?
> 
> Old ext2 only supports up to 4k
> 
> include/linux/ext2_fs.h:
> 
> #define EXT2_MIN_BLOCK_SIZE 1024
> #define EXT2_MAX_BLOCK_SIZE 4096
> #define EXT2_MIN_BLOCK_LOG_SIZE   10
> 
> Should fail to mount the volume since the block size is too large.

should, but does it?

box:/usr/src/25> grep MAX_BLOCK_SIZE fs/ext2/*.[ch] include/linux/ext2*
include/linux/ext2_fs.h:#define EXT2_MAX_BLOCK_SIZE 4096
box:/usr/src/25> 

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-17 Thread Andrew Morton
On Thu, 11 Oct 2007 13:18:49 +0200 Jan Kara <[EMAIL PROTECTED]> wrote:

> With 64KB blocksize, a directory entry can have size 64KB which does not fit
> into 16 bits we have for entry lenght. So we store 0x instead and convert
> value when read from / written to disk.

btw, this changes ext2's on-disk format.

a) is the ext2 format documented anywhere?  If so, that document will
   need updating.

b) what happens when an old ext2 driver tries to read and/or write this
   directory entry?  Do we need a compat flag for it?

c) what happens when old and new ext3 or ext4 try to read/write this
   directory entry?


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-17 Thread Andrew Morton
On Thu, 11 Oct 2007 13:18:49 +0200 Jan Kara <[EMAIL PROTECTED]> wrote:

> +static inline __le16 ext2_rec_len_to_disk(unsigned len)
> +{
> + if (len == (1 << 16))
> + return cpu_to_le16(EXT2_MAX_REC_LEN);
> + else if (len > (1 << 16))
> + BUG();
> + return cpu_to_le16(len);
> +}

Of course, ext2 shouldn't be trying to write a bad record length into a
directory entry.  But are we sure that there is no way in which this
situation could occur is the on-disk data was _already_ bad?

Because it is very bad for a fileysstem to go BUG in response to unexpected
data on the disk.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext2 statfs improvement for block and inode free count

2007-10-16 Thread Andrew Morton
On Fri, 13 Jul 2007 18:36:54 -0700
Badari Pulavarty <[EMAIL PROTECTED]> wrote:

> 
> More statfs() improvements for ext2. ext2 already maintains
> percpu counters for free blocks and inodes. Derive free
> block count and inode count by summing up percpu counters,
> instead of counting up all the groups in the filesystem
> each time.
> 
> 
> Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> Acked-by: Andreas Dilger <[EMAIL PROTECTED]>
> 
>  fs/ext2/super.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.22/fs/ext2/super.c
> ===
> --- linux-2.6.22.orig/fs/ext2/super.c 2007-07-13 20:06:38.0 -0700
> +++ linux-2.6.22/fs/ext2/super.c  2007-07-13 20:06:51.0 -0700
> @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * 
>   buf->f_type = EXT2_SUPER_MAGIC;
>   buf->f_bsize = sb->s_blocksize;
>   buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead;
> - buf->f_bfree = ext2_count_free_blocks(sb);
> + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter);
>   buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count);
>   if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count))
>   buf->f_bavail = 0;
>   buf->f_files = le32_to_cpu(es->s_inodes_count);
> - buf->f_ffree = ext2_count_free_inodes(sb);
> + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter);
>   buf->f_namelen = EXT2_NAME_LEN;
>   fsid = le64_to_cpup((void *)es->s_uuid) ^
>  le64_to_cpup((void *)es->s_uuid + sizeof(u64));
> 

Guys, I have this patch in a stalled state pending some convincing
demonstration/argument that it's actually a worthwhile change.

How much hurt will it cause on large-numa, and how much benefit do we get
for that hurt? 
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-11 Thread Andrew Morton
On Thu, 11 Oct 2007 12:30:03 +0200 Jan Kara <[EMAIL PROTECTED]> wrote:

> On Thu 04-10-07 16:11:21, Andrew Morton wrote:
> > On Thu, 4 Oct 2007 16:40:44 -0600
> > Andreas Dilger <[EMAIL PROTECTED]> wrote:
> > 
> > > On Oct 04, 2007  13:12 -0700, Andrew Morton wrote:
> > > > On Mon, 01 Oct 2007 17:35:46 -0700
> > > > > ext2: Avoid rec_len overflow with 64KB block size
> > > > > 
> > > > > into 16 bits we have for entry lenght. So we store 0x instead and
> > > > > convert value when read from / written to disk.
> > > > 
> > > > This patch clashes in non-trivial ways with
> > > > ext2-convert-to-new-aops-fix.patch and perhaps other things which are
> > > > already queued for 2.6.24 inclusion, so I'll need to ask for an updated
> > > > patch, please.
> > > 
> > > If the rel_len overflow patch isn't going to make it, then we also need
> > > to revert the EXT*_MAX_BLOCK_SIZE change to 65536.  It would be possible
> > > to allow this to be up to 32768 w/o the rec_len overflow fix however.
> > > 
> > 
> > Ok, thanks, I dropped ext3-support-large-blocksize-up-to-pagesize.patch and
> > ext2-support-large-blocksize-up-to-pagesize.patch.
>   Sorry, for the delayed answer but I had some urgent bugs to fix...

You exceeded my memory span.

> Why did you drom ext3-support-large-blocksize-up-to-pagesize.patch?

I forget.  I'll bring it back and see what happens.

> As far
> as I understand your previous email (and also as I've checked against
> 2.6.23-rc8-mm2), the patch fixing rec_len overflow clashes only for ext2...
>   I'll send you an updated patch for ext2 in a moment.

ok..  I'm basically not applying anything any more - the whole thing
is a teetering wreck.   I need to go through the input queue delicately
adding things which look important or relatively non-injurious.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-04 Thread Andrew Morton
On Thu, 4 Oct 2007 16:40:44 -0600
Andreas Dilger <[EMAIL PROTECTED]> wrote:

> On Oct 04, 2007  13:12 -0700, Andrew Morton wrote:
> > On Mon, 01 Oct 2007 17:35:46 -0700
> > > ext2: Avoid rec_len overflow with 64KB block size
> > > 
> > > into 16 bits we have for entry lenght. So we store 0x instead and
> > > convert value when read from / written to disk.
> > 
> > This patch clashes in non-trivial ways with
> > ext2-convert-to-new-aops-fix.patch and perhaps other things which are
> > already queued for 2.6.24 inclusion, so I'll need to ask for an updated
> > patch, please.
> 
> If the rel_len overflow patch isn't going to make it, then we also need
> to revert the EXT*_MAX_BLOCK_SIZE change to 65536.  It would be possible
> to allow this to be up to 32768 w/o the rec_len overflow fix however.
> 

Ok, thanks, I dropped ext3-support-large-blocksize-up-to-pagesize.patch and
ext2-support-large-blocksize-up-to-pagesize.patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-04 Thread Andrew Morton
On Mon, 01 Oct 2007 17:35:46 -0700
Mingming Cao <[EMAIL PROTECTED]> wrote:

> ext2: Avoid rec_len overflow with 64KB block size
> 
> From: Jan Kara <[EMAIL PROTECTED]>
> 
> With 64KB blocksize, a directory entry can have size 64KB which does not fit
> into 16 bits we have for entry lenght. So we store 0x instead and convert
> value when read from / written to disk.

This patch clashes in non-trivial ways with
ext2-convert-to-new-aops-fix.patch and perhaps other things which are
already queued for 2.6.24 inclusion, so I'll need to ask for an updated
patch, please.

Also, I'm planing on merging the ext2 reservations code into 2.6.24, so if
we're aiming for complete support of 64k blocksize in 2.6.24's ext2,
additional testing and checking will be needed.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new mballoc patches.

2007-09-26 Thread Andrew Morton
On Tue, 11 Sep 2007 13:59:26 +0530 "Aneesh Kumar K.V" <[EMAIL PROTECTED]> wrote:

> I have updated the mballoc patches.

Has anyone reviewed this stuff?  I don't see much evidence of it here?

Just a quick scan shows up heavy over-inlining, many
macros-which-should-be-functions and numerous needlessly global symbols. 
Whether there are more substantial problems in there I have not the time to
tell.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd : config_jbd_debug cannot create /proc entry

2007-09-26 Thread Andrew Morton
On Tue, 25 Sep 2007 16:36:08 +0200
Jan Kara <[EMAIL PROTECTED]> wrote:

> > On Tue, 25 Sep 2007 07:49:38 -0500
> > "Jose R. Santos" <[EMAIL PROTECTED]> wrote:
> > 
> > > On Tue, 25 Sep 2007 13:50:46 +0200
> > > Jan Kara <[EMAIL PROTECTED]> wrote:
> > > > > Jan Kara wrote:
> > > > > >>
> > > > > >-#define create_jbd_proc_entry() do {} while (0)
> > > > > >-#define remove_jbd_proc_entry() do {} while (0)
> > > > > >+static ctl_table fs_table[] = {
> > > > > >+{
> > > > > >+.ctl_name   = -1,   /* Don't want it */
> > > > > 
> > > > > 
> > > > > 
> > > > > shouldn't this be CTL_UNNUMBERED ?
> > > >   Oh, it should be. I didn't notice we have this :) Thanks for notifying
> > > > me. Attached is a fixed version.
> > > 
> > > This was fixed in JBD2 by moving the jbd-debug file to debugfs:
> > > http://lkml.org/lkml/2007/7/11/334
> > > 
> > > Since this code is already in the kernel, we should keep it consistent. 
> > > 
> > 
> > OK.  Here's a quick patch to fix this.  Adapted from the JBD2 patch.
> > Let me know what you think.
>   Looks fine - exactly what I've just done here :).

hm.  I found rather a lot of issues.  If this patch is derived from the
JBD2 patch then perhaps the JBD2 patch needs some looking at.

> > Signed-off-by: Jose R. Santos <[EMAIL PROTECTED]>
>   You can add Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

I suspect you might be getting your signed-off-bys and acked-bys mixed up. 
(If not this patch, then the previous one).  Please see
Documentation/SubmittingPatches section 13 for the difference.

Jose, please review and if possible runtime test these proposed changes?



From: Andrew Morton <[EMAIL PROTECTED]>

- use `#ifdef foo' instead of `#if defined(foo)'

- CONFIG_JBD_DEBUG depends on CONFIG_DEBUG_FS so we don't need to duplicate
  that logic in the .c file ifdefs

- Make journal_enable_debug __read_mostly just for the heck of it

- Make jbd_debugfs_dir and jbd_debug static

- debugfs_remove(NULL) is legal: remove unneeded tests

- jbd_create_debugfs_entry is a better name than create_jbd_debugfs_entry

- ditto remove_jbd_debugfs_entry

- C functions are preferred over macros

Cc: "Jose R. Santos" <[EMAIL PROTECTED]>
Cc: 
Cc: Jan Kara <[EMAIL PROTECTED]>
Cc: Jose R. Santos <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---


diff -puN fs/jbd/journal.c~jbd-config_jbd_debug-cannot-create-proc-entry-fix 
fs/jbd/journal.c
--- a/fs/jbd/journal.c~jbd-config_jbd_debug-cannot-create-proc-entry-fix
+++ a/fs/jbd/journal.c
@@ -1853,16 +1853,15 @@ void journal_put_journal_head(struct jou
 /*
  * debugfs tunables
  */
-#if defined(CONFIG_JBD_DEBUG)
-u8 journal_enable_debug;
-EXPORT_SYMBOL(journal_enable_debug);
-#endif
+#ifdef CONFIG_JBD_DEBUG
 
-#if defined(CONFIG_JBD_DEBUG) && defined(CONFIG_DEBUG_FS)
+u8 journal_enable_debug __read_mostly;
+EXPORT_SYMBOL(journal_enable_debug);
 
-struct dentry  *jbd_debugfs_dir, *jbd_debug;
+static struct dentry *jbd_debugfs_dir;
+static struct dentry *jbd_debug;
 
-static void __init create_jbd_debugfs_entry(void)
+static void __init jbd_create_debugfs_entry(void)
 {
jbd_debugfs_dir = debugfs_create_dir("jbd", NULL);
if (jbd_debugfs_dir)
@@ -1871,18 +1870,21 @@ static void __init create_jbd_debugfs_en
   &journal_enable_debug);
 }
 
-static void __exit remove_jbd_debugfs_entry(void)
+static void __exit jbd_remove_debugfs_entry(void)
 {
-   if (jbd_debug)
-   debugfs_remove(jbd_debug);
-   if (jbd_debugfs_dir)
-   debugfs_remove(jbd_debugfs_dir);
+   debugfs_remove(jbd_debug);
+   debugfs_remove(jbd_debugfs_dir);
 }
 
 #else
 
-#define create_jbd_debugfs_entry() do {} while (0)
-#define remove_jbd_debugfs_entry() do {} while (0)
+static inline void jbd_create_debugfs_entry(void)
+{
+}
+
+static inline void jbd_remove_debugfs_entry(void)
+{
+}
 
 #endif
 
@@ -1940,7 +1942,7 @@ static int __init journal_init(void)
ret = journal_init_caches();
if (ret != 0)
journal_destroy_caches();
-   create_jbd_debugfs_entry();
+   jbd_create_debugfs_entry();
return ret;
 }
 
@@ -1951,7 +1953,7 @@ static void __exit journal_exit(void)
if (n)
printk(KERN_EMERG "JBD: leaked %d journal_heads!\n", n);
 #endif
-   remove_jbd_debugfs_entry();
+   jbd_remove_debugfs_entry();
journal_destroy_caches();
 }
 
_

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-26 Thread Andrew Morton
On Fri, 21 Sep 2007 16:13:56 -0700
Mingming Cao <[EMAIL PROTECTED]> wrote:

> Convert kmalloc to kzalloc() and get rid of the memset().

I split this into separate ext3/jbd and ext4/jbd2 patches.  It's generally
better to raise separate patches, please - the ext3 patches I'll merge
directly but the ext4 patches should go through (and be against) the ext4
devel tree.

I fixed lots of rejects against the already-pending changes to these
filesystems.

You forgot to remove the memsets in both start_this_handle()s.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc6: hanging ext3 dbench tests

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 11:01:58 -0700
Badari Pulavarty <[EMAIL PROTECTED]> wrote:

> Hi Andy,
> 
> I managed to reproduce the dbench problem. (not sure if its the same
> thing or not - but symptoms are same). My problem has nothing to do 
> with ext3. I can produce it on ext2, jfs also.
> 
> Whats happening on my machine is ..
> 
> dbench forks of 4 children and sends them a signal to start the work.
> 3 out of 4 children gets the signal and does the work. One of the child
> never gets the signal so, it waits forever in pause(). So, parent waits
> for a longtime to kill it.
> 
> BTW, I was trying to find out when this problem started showing up.
> So far, I managed to track it to 2.6.23-rc4. (2.6.23-rc3 doesn't seem
> to have this problem). I am going to do bi-sect and find out which
> patch caused this.
> 
> I am using dbench-2.0 which consistently reproduces the problem on
> my x86-64 box. Did you find anything new with your setup ?
> 

Thanks, Badari.  That sounds like a bit of a showstopper for 2.6.23.
Michal, do you have this regression in the dirt file?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-20 Thread Andrew Morton
On Tue, 18 Sep 2007 17:25:31 -0700
Avantika Mathur <[EMAIL PROTECTED]> wrote:

> In pass1 of e2fsck, every inode table in the fileystem is scanned and 
> checked, 
> regardless of whether it is in use.  This is this the most time consuming 
> part 
> of the filesystem check.  The unintialized block group feature can greatly 
> reduce e2fsck time by eliminating checking of uninitialized inodes.  
> 
> With this feature, there is a a high water mark of used inodes for each block 
> group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
> group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
> of each group descriptor is used to ensure that corruption in the group
> descriptor's bit flags does not cause incorrect operation.

This needed a few fixups due to conflicts with
ext2-ext3-ext4-add-block-bitmap-validation.patch but they were pretty
straightforward.  Please check that the result is OK.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD: use GFP_NOFS in kmalloc

2007-09-19 Thread Andrew Morton
On Wed, 19 Sep 2007 12:22:09 -0700
Mingming Cao <[EMAIL PROTECTED]> wrote:

> Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
> with the rest of kmalloc flag used in the JBD/JBD2 layer.
> 
> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> 
> ---
>  fs/jbd/journal.c  |6 +++---
>  fs/jbd/revoke.c   |8 
>  fs/jbd2/journal.c |6 +++---
>  fs/jbd2/revoke.c  |8 
>  4 files changed, 14 insertions(+), 14 deletions(-)
> 
> Index: linux-2.6.23-rc6/fs/jbd/journal.c
> ===
> --- linux-2.6.23-rc6.orig/fs/jbd/journal.c2007-09-19 11:51:10.0 
> -0700
> +++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-19 11:51:57.0 -0700
> @@ -653,7 +653,7 @@ static journal_t * journal_init_common (
>   journal_t *journal;
>   int err;
>  
> - journal = kmalloc(sizeof(*journal), GFP_KERNEL);
> + journal = kmalloc(sizeof(*journal), GFP_NOFS);
>   if (!journal)
>   goto fail;
>   memset(journal, 0, sizeof(*journal));
> @@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
>   journal->j_blocksize = blocksize;
>   n = journal->j_blocksize / sizeof(journal_block_tag_t);
>   journal->j_wbufsize = n;
> - journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
> + journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
>   if (!journal->j_wbuf) {
>   printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
>   __FUNCTION__);
> @@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
>   /* journal descriptor can store up to n blocks -bzzz */
>   n = journal->j_blocksize / sizeof(journal_block_tag_t);
>   journal->j_wbufsize = n;
> - journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
> + journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
>   if (!journal->j_wbuf) {
>   printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
>   __FUNCTION__);
> Index: linux-2.6.23-rc6/fs/jbd/revoke.c
> ===
> --- linux-2.6.23-rc6.orig/fs/jbd/revoke.c 2007-09-19 11:51:30.0 
> -0700
> +++ linux-2.6.23-rc6/fs/jbd/revoke.c  2007-09-19 11:52:34.0 -0700
> @@ -206,7 +206,7 @@ int journal_init_revoke(journal_t *journ
>   while((tmp >>= 1UL) != 0UL)
>   shift++;
>  
> - journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
> GFP_KERNEL);
> + journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
> GFP_NOFS);
>   if (!journal->j_revoke_table[0])
>   return -ENOMEM;
>   journal->j_revoke = journal->j_revoke_table[0];
> @@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
>   journal->j_revoke->hash_shift = shift;
>  
>   journal->j_revoke->hash_table =
> - kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
> + kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
>   if (!journal->j_revoke->hash_table) {
>   kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
>   journal->j_revoke = NULL;
> @@ -229,7 +229,7 @@ int journal_init_revoke(journal_t *journ
>   for (tmp = 0; tmp < hash_size; tmp++)
>   INIT_LIST_HEAD(&journal->j_revoke->hash_table[tmp]);
>  
> - journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
> GFP_KERNEL);
> + journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
> GFP_NOFS);
>   if (!journal->j_revoke_table[1]) {
>   kfree(journal->j_revoke_table[0]->hash_table);
>   kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
> @@ -246,7 +246,7 @@ int journal_init_revoke(journal_t *journ
>   journal->j_revoke->hash_shift = shift;
>  
>   journal->j_revoke->hash_table =
> - kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
> + kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
>   if (!journal->j_revoke->hash_table) {
>   kfree(journal->j_revoke_table[0]->hash_table);
>   kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);

These were all OK using GFP_KERNEL.

GFP_NOFS should only be used when the caller is holding some fs locks which
might cause a deadlock if that caller reentered the fs in ->writepage (and
maybe put_inode and such).  That isn't the case in any of the above code,
which is all mount time stuff (I think).

ext3/4 should be using GFP_NOFS when the caller has a transaction open, has
a page locked, is holding i_mutex, etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur <[EMAIL PROTECTED]> wrote:

> +
> +__u16 crc16(__u16 crc, __u8 const *buffer, size_t len)

And is we really really have to do this, then the ext4-private crc16() 
should have static scope.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur <[EMAIL PROTECTED]> wrote:

> +#if !defined(CONFIG_CRC16)
> +/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */
> +__u16 const crc16_table[256] = {
> + 0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,
> + 0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440,
> + 0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40,
> + 0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841,
> + 0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40,
> + 0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41,

That's rather sad.  A plain old "depends on" would be better.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote:

> JBD: Replace slab allocations with page cache allocations
> 
> JBD allocate memory for committed_data and frozen_data from slab. However
> JBD should not pass slab pages down to the block layer. Use page allocator 
> pages instead. This will also prepare JBD for the large blocksize patchset.
> 
> 
> Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

__GFP_NOFAIL should only be used when we have no way of recovering
from failure.  The allocation in journal_init_common() (at least)
_can_ recover and hence really shouldn't be using __GFP_NOFAIL.

(Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
there as a marker which says "we really shouldn't be doing this but
we don't know how to fix it").

So sometime it'd be good if you could review all the __GFP_NOFAILs in
there and see if we can remove some, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext34: ensure do_split leaves enough free space in both blocks

2007-09-17 Thread Andrew Morton
On Mon, 17 Sep 2007 12:21:40 -0500
Eric Sandeen <[EMAIL PROTECTED]> wrote:

> Eric Sandeen wrote:
> > The do_split() function for htree dir blocks is intended to split a
> > leaf block to make room for a new entry.  It sorts the entries in the
> > original block by hash value, then moves the last half of the entries to 
> > the new block - without accounting for how much space this actually moves.  
> > (IOW, it moves half of the entry *count* not half of the entry *space*).
> > If by chance we have both large & small entries, and we move only the 
> > smallest entries, and we have a large new entry to insert, we may not have 
> > created enough space for it.
> 
> (btw, the upshot of this is that in add_dirent_to_buf(),
> memcpy(de->name, name, namelen) will overshoot the buffer and actually
> corrupt memory.)

Nice!

So this looks like 2.6.22 and 2.6.23 material, but the timing is getting
pretty squeezy.  Could people please give this change an extra-close
review, let me know?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-08-17 Thread Andrew Morton
On Fri, 17 Aug 2007 12:36:32 +0400 Alex Tomas <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > Sort-of.  But the per-superpblock, per-inode writeback code is pretty
> > careful to avoid livelocks.  The per-inode writeback is a strict single
> > linear sweep across the file.  It'll basically write out anything which was
> > dirty when it was called.  The per-superblock inode walk isn't as accurate
> > as that, becuase of the difficulties of juggling list_heads.  But we're
> > slowly working on that, and I suspect it'll be ggod enough for ext3
> > purposes already.
> 
> I'd say that these are two different mechanism solving different problems:
> 1) VFS/MM does periodic updates and uses regular writeback
> 2) data=ordered is to avoid metadata pointing to not-written-yet data

VFS/MM can do _much_ more than that!  Look at struct writeback_control.

That code path has many different modes of operation: it is used for
regular pdflush writeback, sync, fsync, throttling, etc.  Probably one of
its modes will be sufficient.  If we want to change ext3's existing
semantics and add an "only writeback uninitialised blocks" mode then
that'll be pretty straightforward: add more control information to
writeback_control and go for it.

> we can't use regular writeback in commit thread as long as it can fall into
> allocation. so, we'd have to add one more WB mode (btw, i have a patch which
> skips non-allocated blocks in writeback if special WB mode is requested).

yup

> OTOH, the faster we go through data sync part of commit, the better. given
> that lots of inodes can be dirty with no data to sync, it's going to take
> long in some cases. it's especially bad because commit doesn't scale to many
> CPUs.

eh?

> also, why would we need to flush *everything* every 5s? just because ext3 does
> this? sounds strange. if somebody really need this we could add this 
> possibility
> to regular writeback path (making it tunable). but I'd rather prefer to have
> a separate (fast, lightweight, scalable) mechanism to support data=ordered.
> 

Yeah, that would make sense, perhaps.

Or just speed the existing stuff up.  iirc the main problem in there is 
unrelated
to data writeback.  There are situations where the running transaction has to 
block
behind metadata writeout which the committing transaction is performing.  I
reluctantly put that in years ago to get us out of a tight spot and it
never got optimised.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-08-16 Thread Andrew Morton
On Fri, 17 Aug 2007 06:24:47 +0400 Alex Tomas <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Thu, 16 Aug 2007 22:20:06 +0400
> > Alex Tomas <[EMAIL PROTECTED]> wrote:
> > 
> >> Andrew Morton wrote:
> >>>>> But under this proposal, t_sync_datalist just gets removed: the new
> >>>>> ordered-data mode _only_ need to do the sb->inode->page walk.  So if I'm
> >>>>> understanding you, the way in which we'd handle any such race is to make
> >>>>> kjournald's writeback of the dirty pages block in lock_page().  Once it
> >>>>> gets the page lock it can look to see if some other thread has mapped 
> >>>>> the
> >>>>> page to disk.
> >>>> if I'm right holding number of pages locked, then they won't be locked, 
> >>>> but
> >>>> writeback. of course kjournald can block on writeback as well, but how 
> >>>> does
> >>>> it find pages with *newly allocated* blocks only?
> >>> I don't think we'd want kjournald to do that.  Even if a page was dirtied
> >>> by an overwrite, we'd want to write it back during commit, just from a
> >>> quality-of-implementation point of view.  If we were to leave these pages
> >>> unwritten during commit then a post-recovery file could have a mix of
> >>> up-to-five-second-old data and up-to-30-seconds-old data.
> >> trying to implement this I've got to think that there is one significant
> >> difference between t_sync_datalist and sb->inode->page walk: 
> >> t_sync_datalist
> >> is per-transaction. IOW, it doesn't change once transaction is closed. in
> >> contrast, nothing (currently) would prevent others to modify pages while
> >> commit is in progress.
> > 
> > That can happen at present - there's nothing to stop a process from 
> > modifying
> > a page which is undergoing ordered-data commit-time writeout.
> 
> I tend to think it's still a bit different: set of pages doesn't change with
> t_sync_datalist. with sb->inode->page approach even silly dd will be able to
> *add* a bunch of new pages while we're syncing first ones. why shouldn't we
> fix this?
> 

Sort-of.  But the per-superpblock, per-inode writeback code is pretty
careful to avoid livelocks.  The per-inode writeback is a strict single
linear sweep across the file.  It'll basically write out anything which was
dirty when it was called.  The per-superblock inode walk isn't as accurate
as that, becuase of the difficulties of juggling list_heads.  But we're
slowly working on that, and I suspect it'll be ggod enough for ext3
purposes already.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-08-16 Thread Andrew Morton
On Thu, 16 Aug 2007 22:20:06 +0400
Alex Tomas <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> >>> But under this proposal, t_sync_datalist just gets removed: the new
> >>> ordered-data mode _only_ need to do the sb->inode->page walk.  So if I'm
> >>> understanding you, the way in which we'd handle any such race is to make
> >>> kjournald's writeback of the dirty pages block in lock_page().  Once it
> >>> gets the page lock it can look to see if some other thread has mapped the
> >>> page to disk.
> >> if I'm right holding number of pages locked, then they won't be locked, but
> >> writeback. of course kjournald can block on writeback as well, but how does
> >> it find pages with *newly allocated* blocks only?
> > 
> > I don't think we'd want kjournald to do that.  Even if a page was dirtied
> > by an overwrite, we'd want to write it back during commit, just from a
> > quality-of-implementation point of view.  If we were to leave these pages
> > unwritten during commit then a post-recovery file could have a mix of
> > up-to-five-second-old data and up-to-30-seconds-old data.
> 
> trying to implement this I've got to think that there is one significant
> difference between t_sync_datalist and sb->inode->page walk: t_sync_datalist
> is per-transaction. IOW, it doesn't change once transaction is closed. in
> contrast, nothing (currently) would prevent others to modify pages while
> commit is in progress.

That can happen at present - there's nothing to stop a process from modifying
a page which is undergoing ordered-data commit-time writeout.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-08 Thread Andrew Morton
On Wed, 8 Aug 2007 08:54:35 -0400 Jeff Layton <[EMAIL PROTECTED]> wrote:

> On Tue, 7 Aug 2007 17:15:01 -0700
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, 6 Aug 2007 09:54:03 -0400
> > Jeff Layton <[EMAIL PROTECTED]> wrote:
> > 
> > Is there any way in which we can prevent these problems?  Say
> > 
> > - rename something so that unconverted filesystems will reliably fail to
> >   compile?
> > 
> 
> I suppose we could rename the .setattr inode operation to something
> else, but then we'll be stuck with it for at least a while. That seems
> sort of kludgey too...

Sure.  We're changing the required behaviour of .setattr.  Changing its
name is a fine and reasonably reliable way to communicate that fact.

> > - leave existing filesystems alone, but add a new
> >   inode_operations.setattr_jeff, which the networked filesytems can
> >   implement, and teach core vfs to call setattr_jeff in preference to
> >   setattr?
> > 
> > Something else?
> 
> There's also the approach suggested by Miklos: Add a new inode flag that
> tells notify_change not to convert ATTR_KILL_S* flags into a mode
> change. Basically, allow filesystems to "opt out" of that behavior. 
> 
> I'd definitly pick that over a new inode op. That would also allow the
> default case be for the VFS to continue handling these flags.
> Everything would continue to work but filesystems that need to handle
> these flags differently would be able to do so.
> 

We should opt for whatever produces the best end state in the kernel tree. 
ie: if it takes more work and a larger patch to create a better result,
let's go for the better result.  We merge large patches all the time.  We
prefer to smash through, get it right whatever the transient cost.  But
quietly making out-of-tree filesystems less secure is a pretty high cost.

I'm suspecting that adding more flags and some code to test them purely to
minimise the size of the patch and to retain compatibility with the old
.setattr is not a good tradeoff, given that we'd carry the flags and tests
for evermore.

So I'd suggest s/setattr/something_else/g.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Andrew Morton
On Tue, 07 Aug 2007 20:45:34 -0400
Trond Myklebust <[EMAIL PROTECTED]> wrote:

> > - rename something so that unconverted filesystems will reliably fail to
> >   compile?
> > 
> > - leave existing filesystems alone, but add a new
> >   inode_operations.setattr_jeff, which the networked filesytems can
> >   implement, and teach core vfs to call setattr_jeff in preference to
> >   setattr?
> 
> If you really need to know that the filesystem is handling the flags,
> then how about instead having ->setattr() return something which
> indicates which flags it actually handled? That is likely to be a far
> more intrusive change, but it is one which is future-proof.

If we change ->setattr so that it will return a positive, non-zero value
which the caller can then check and reliably do printk("that filesystem
needs updating") then that addresses my concern, sure.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Andrew Morton
On Mon, 6 Aug 2007 09:54:03 -0400
Jeff Layton <[EMAIL PROTECTED]> wrote:

> Apologies for the resend, but the original sending had the date in the
> email header and it caused some of these to bounce...
> 
> ( Please consider trimming the Cc list if discussing some aspect of this
> that doesn't concern everyone.)
> 
> When an unprivileged process attempts to modify a file that has the
> setuid or setgid bits set, the VFS will attempt to clear these bits. The
> VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid
> mask, and then call notify_change to clear these bits and set the mode
> accordingly.
> 
> With a networked filesystem (NFS in particular but most likely others),
> the client machine may not have credentials that allow for the clearing
> of these bits. In some situations, this can lead to file corruption, or
> to an operation failing outright because the setattr fails.
> 
> In this situation, we'd like to just leave the handling of this to
> the server and ignore these bits. The problem is that by the time
> nfs_setattr is called, the VFS has already reinterpreted the ATTR_KILL_*
> bits into a mode change. We can't fix this in the filesystems where
> this is a problem, as doing so would leave us having to second-guess
> what the VFS wants us to do. So we need to change it so that filesystems
> have more flexibility in how to interpret the ATTR_KILL_* bits.
> 
> The first patch in the following patchset moves this logic into a helper
> function, and then only calls this helper function for inodes that do
> not have a setattr operation defined. The subsequent patches fix up
> individual filesystem setattr functions to call this helper function.
> 
> The upshot of this is that with this change, filesystems that define
> a setattr inode operation are now responsible for handling the ATTR_KILL
> bits as well. They can trivially do so by calling the helper, but they
> must do so.
> 
> Some of the follow-on patches may not be strictly necessary, but I
> decided that it was better to take the conservative approach and call
> the helper when I wasn't sure. I've tried to CC the maintainers
> for the individual filesystems as well where I could find them,
> please let me know if there are others who should be informed.
> 
> Comments and suggestions appreciated...
> 

>From a purely practical standpoint: it's a concern that all filesytems need
patching to continue to correctly function after this change.  There might
be filesystems which you missed, and there are out-of-tree filesystems
which won't be updated.

And I think the impact upon the out-of-tree filesystems would be fairly
severe: they quietly and subtly get their secutiry guarantees broken (I
think?)

Is there any way in which we can prevent these problems?  Say

- rename something so that unconverted filesystems will reliably fail to
  compile?

- leave existing filesystems alone, but add a new
  inode_operations.setattr_jeff, which the networked filesytems can
  implement, and teach core vfs to call setattr_jeff in preference to
  setattr?

Something else?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-30 Thread Andrew Morton
On Mon, 30 Jul 2007 10:49:14 -0700
Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Sun, 2007-07-29 at 20:24 +0100, Christoph Hellwig wrote:
> > On Sun, Jul 29, 2007 at 11:30:36AM -0600, Andreas Dilger wrote:
> > > Sigh, we HAVE a patch that was only adding delalloc to ext4, but it
> > > was rejected because "that functionality should go into the VFS".
> > > Since the performance improvement of delalloc is quite large, we'd
> > > like to get this into the kernel one way or another.  Can we make a
> > > decision if the ext4-specific delalloc is acceptable?
> > 
> > I'm a big proponent of having proper common delalloc code, but the
> > one proposed here is not generic for the existing filesystem using
> > delalloc.  
> 
> To be fair, what Alex have so far is probably good enough for ext2/3
> delayed allocation.
> 
> > It's still on my todo list to revamp the xfs code to get
> > rid of some of the existing mess and make it useable genericly.  If
> > the ext4 users are fine with the end result we could move to generic
> > code.
> > 
> 
> Are you okay with having a ext4 delayed allocation implementation (i.e.
> moving the code proposed in this thread to fs/ext4) first?  Then later
> when you come up with a generic delayed allocation for both ext4 and xfs
> we could make use of that generic implementation. Is that a acceptable
> approach? 
> 
> Andrew, what do you think?
> 

There's a decent risk that the generic implementation would never happen. 

I'd have thought that it'd be pretty tricky to make anything which is in
XFS suitable for general use, because after years of tuning and tweaking
it'll be full of xfs-specific things, but I haven't looked.

And a similar thing will happen if an ext4-specific version is merged.

The sad fact is that if we have a generic version, it turns out being a
least-common-denominator thing which never fully meets the requirements of
any of its users.  We end up filling the generic code up with
caller-selectable optional functionality for each filesystem.  (See
fs/direct-io.c).

The whole approach of making the pagecache/data handling be part of the VFS
hasn't been a great success, IMO.  It was fine for ext2 and similar (jfs,
minix, etc).  But for filesytems which do fancier things with data it
hasn't worked out well.  otoh, moving it all into the fs would have been a
bad decision too, so we just muddle through, making compromises.

So, umm, yes, on balance I do agree that we should explore doing some of
this in the VFS, and I believe that we should do it on the initial merge
rather than promising to ourselves that we'll fix it up later.  This will
devolve into the ext4 and xfs people working out which bits can and should
be moved into the VFS, and working out what they should look like.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] readahead drop behind and size adjustment

2007-07-23 Thread Andrew Morton
On Tue, 24 Jul 2007 08:47:28 +0800
Fengguang Wu <[EMAIL PROTECTED]> wrote:

> Subject: convert ill defined log2() to ilog2()
> 
> It's *wrong* to have
>   #define log2(n) ffz(~(n))
> It should be *reversed*:
>   #define log2(n) flz(~(n))
> or
>   #define log2(n) fls(n)
> or just use
>   ilog2(n) defined in linux/log2.h.
> 
> This patch follows the last solution, recommended by Andrew Morton.
> 
> //Or are they simply the wrong naming, and is in fact correct in behavior?
> 
> Cc: linux-ext4@vger.kernel.org
> Cc: Mingming Cao <[EMAIL PROTECTED]>
> Cc: Bjorn Helgaas <[EMAIL PROTECTED]>
> Cc: Chris Ahna <[EMAIL PROTECTED]>
> Cc: David Mosberger-Tang <[EMAIL PROTECTED]>
> Cc: Kyle McMartin <[EMAIL PROTECTED]>
> Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
> ---
>  drivers/char/agp/hp-agp.c |9 +++--
>  drivers/char/agp/i460-agp.c   |5 ++---
>  drivers/char/agp/parisc-agp.c |7 ++-
>  fs/ext4/super.c   |6 ++
>  4 files changed, 9 insertions(+), 18 deletions(-)

hm, yes, there is a risk that the code was accidentally correct.  Or the
code has only ever dealt with power-of-2 inputs, in which case it happens
to work either way.

David(s) and ext4-people: could we please have a close review of these
changes?

Thanks.

> --- linux-2.6.22-rc6-mm1.orig/drivers/char/agp/hp-agp.c
> +++ linux-2.6.22-rc6-mm1/drivers/char/agp/hp-agp.c
> @@ -14,15 +14,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
>  #include "agp.h"
>  
> -#ifndef log2
> -#define log2(x)  ffz(~(x))
> -#endif
> -
>  #define HP_ZX1_IOC_OFFSET0x1000  /* ACPI reports SBA, we want IOC */
>  
>  /* HP ZX1 IOC registers */
> @@ -256,7 +253,7 @@ hp_zx1_configure (void)
>   readl(hp->ioc_regs+HP_ZX1_IMASK);
>   writel(hp->iova_base|1, hp->ioc_regs+HP_ZX1_IBASE);
>   readl(hp->ioc_regs+HP_ZX1_IBASE);
> - writel(hp->iova_base|log2(HP_ZX1_IOVA_SIZE), 
> hp->ioc_regs+HP_ZX1_PCOM);
> + writel(hp->iova_base|ilog2(HP_ZX1_IOVA_SIZE), 
> hp->ioc_regs+HP_ZX1_PCOM);
>   readl(hp->ioc_regs+HP_ZX1_PCOM);
>   }
>  
> @@ -284,7 +281,7 @@ hp_zx1_tlbflush (struct agp_memory *mem)
>  {
>   struct _hp_private *hp = &hp_private;
>  
> - writeq(hp->gart_base | log2(hp->gart_size), hp->ioc_regs+HP_ZX1_PCOM);
> + writeq(hp->gart_base | ilog2(hp->gart_size), hp->ioc_regs+HP_ZX1_PCOM);
>   readq(hp->ioc_regs+HP_ZX1_PCOM);
>  }
>  
> --- linux-2.6.22-rc6-mm1.orig/drivers/char/agp/i460-agp.c
> +++ linux-2.6.22-rc6-mm1/drivers/char/agp/i460-agp.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "agp.h"
>  
> @@ -59,8 +60,6 @@
>   */
>  #define WR_FLUSH_GATT(index) RD_GATT(index)
>  
> -#define log2(x)  ffz(~(x))
> -
>  static struct {
>   void *gatt; /* ioremap'd GATT area */
>  
> @@ -148,7 +147,7 @@ static int i460_fetch_size (void)
>* values[i].size.
>*/
>   values[i].num_entries = (values[i].size << 8) >> 
> (I460_IO_PAGE_SHIFT - 12);
> - values[i].page_order = log2((sizeof(u32)*values[i].num_entries) 
> >> PAGE_SHIFT);
> + values[i].page_order = 
> ilog2((sizeof(u32)*values[i].num_entries) >> PAGE_SHIFT);
>   }
>  
>   for (i = 0; i < agp_bridge->driver->num_aperture_sizes; i++) {
> --- linux-2.6.22-rc6-mm1.orig/drivers/char/agp/parisc-agp.c
> +++ linux-2.6.22-rc6-mm1/drivers/char/agp/parisc-agp.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -27,10 +28,6 @@
>  #define DRVNAME  "quicksilver"
>  #define DRVPFX   DRVNAME ": "
>  
> -#ifndef log2
> -#define log2(x)  ffz(~(x))
> -#endif
> -
>  #define AGP8X_MODE_BIT   3
>  #define AGP8X_MODE   (1 << AGP8X_MODE_BIT)
>  
> @@ -92,7 +89,7 @@ parisc_agp_tlbflush(struct agp_memory *m
>  {
>   struct _parisc_agp_info *info = &parisc_agp_info;
>  
> - writeq(info->gart_base | log2(info->gart_size), 
> info->ioc_regs+IOC_PCOM);
> + writeq(info->gart_base | ilog2(info->gart_size), 
> info->ioc_regs+IOC_PCOM);
>   readq(info->ioc_regs+IOC_PCOM); /* flush */
>  }
>  
> --- linux-2.6.22-rc6-mm1.orig/fs/ext4/s

Re: [PATCH 1/3] ext2: show all mount options

2007-07-23 Thread Andrew Morton
On Mon, 23 Jul 2007 22:12:54 +0200
Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>

Could we have changelogs for these patches, please?

ie: what's wrong with the existing code, what change does this patch make?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Faster ext2_clear_inode()

2007-07-19 Thread Andrew Morton
On Mon, 9 Jul 2007 22:00:03 +0200
Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Mon, 9 July 2007 22:01:48 +0400, Alexey Dobriyan wrote:
> > 
> > Yes. Note that ext2_clear_inode() is referenced from ext2_sops, so even
> > empty, it leaves traces in resulting kernel.
> 
> Is that your opinion or have you actually measured a difference?
> I strongly suspect that compilers are smart enough to optimize away a
> call to an empty static function.
> 

It saves a big 16 bytes of text here.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext2 statfs improvement for block and inode free count

2007-07-18 Thread Andrew Morton
On Fri, 13 Jul 2007 18:36:54 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:

> More statfs() improvements for ext2. ext2 already maintains
> percpu counters for free blocks and inodes. Derive free
> block count and inode count by summing up percpu counters,
> instead of counting up all the groups in the filesystem
> each time.
> 

hm, another speedup patch with no measurements which demonstrate its
benefit.

> 
> Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> Acked-by: Andreas Dilger <[EMAIL PROTECTED]>
> 
>  fs/ext2/super.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.22/fs/ext2/super.c
> ===
> --- linux-2.6.22.orig/fs/ext2/super.c 2007-07-13 20:06:38.0 -0700
> +++ linux-2.6.22/fs/ext2/super.c  2007-07-13 20:06:51.0 -0700
> @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * 
>   buf->f_type = EXT2_SUPER_MAGIC;
>   buf->f_bsize = sb->s_blocksize;
>   buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead;
> - buf->f_bfree = ext2_count_free_blocks(sb);
> + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter);
>   buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count);
>   if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count))
>   buf->f_bavail = 0;
>   buf->f_files = le32_to_cpu(es->s_inodes_count);
> - buf->f_ffree = ext2_count_free_inodes(sb);
> + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter);
>   buf->f_namelen = EXT2_NAME_LEN;
>   fsid = le64_to_cpup((void *)es->s_uuid) ^
>  le64_to_cpup((void *)es->s_uuid + sizeof(u64));
> 

Well there's a tradeoff here.  At large CPU counts, percpu_counter_sum()
becomes quite expensive - it takes a global lock and then goes off fishing
in every CPU's percpu_alloced memory.

So there is some value of (num_online_cpus / sb->s_groups_count) at which
this change becomes a loss.  Where does that value lie?

Bear in mind that the global lock in percpu_counter_sum() will tilt the
scales quite a bit.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Andrew Morton
On Sun, 15 Jul 2007 21:21:03 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> Shows the current stacktrace where we violate the previously established
> locking order.

yup, but the lock_page() which we did inside truncate_mutex was a 
lock_page() against a different address_space: the blockdev mapping.

So this is OK - we'll never take truncate_mutex against the blockdev
mapping (it doesn't have one, for a start ;))

This is similar to the quite common case where we take inode A's
i_mutex inside inode B's i_mutex, which needs special lockdep annotations.

I think.  I haven't looked into this in detail.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Andrew Morton
On Sun, 15 Jul 2007 15:02:23 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:
> 
> > Peter, do you have any interest in seeing how far we can get
> > at tracking lock_page()?  I'm not holding my breath, but any little bit
> > would probably help.
> 
> Would this be a valid report? 
> 
> ( /me goes hunt a x86_64 unwinder patch that will apply to this tree.
>   These stacktraces are pain )

They are.  lockdep reports are a pain too.  It's still a struggle to
understand wtf they're trying to tell you.  Mabe it's just me.

> ===
> [ INFO: possible circular locking dependency detected ]
> [ 2.6.22-rt3-dirty #34
> ---
> mount/1296 is trying to acquire lock:
>  (&ei->truncate_mutex){--..}, at: [] 
> ext3_get_blocks_handle+0x1a4/0x8f7
> 
> but task is already holding lock:
>  (lock_page_0){--..}, at: [] 
> generic_file_buffered_write+0x1ee/0x646
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (lock_page_0){--..}:
>[] __lock_acquire+0xa72/0xc35
>[] lock_acquire+0x48/0x61
>[] add_to_page_cache_lru+0xe/0x23
>[] add_to_page_cache+0x1de/0x2c1
>[] add_to_page_cache_lru+0xe/0x23
>[] find_or_create_page+0x4c/0x73
>[] __getblk+0x118/0x23c
>[] __bread+0x6/0x9c
>[] read_block_bitmap+0x34/0x65
>[] ext3_free_blocks_sb+0xec/0x3d4
>[] ext3_free_blocks+0x2e/0x61
>[] ext3_free_data+0xaa/0xda
>[] ext3_truncate+0x4d2/0x84e
>[] pagevec_lookup+0x17/0x1e
>[] truncate_inode_pages_range+0x1f4/0x323
>[] add_preempt_count+0x14/0xe4
>[] journal_stop+0x1fe/0x21d
>[] vmtruncate+0xa2/0xc0
>[] inode_setattr+0x22/0x10a
>[] ext3_setattr+0x136/0x18f
>[] notify_change+0x10a/0x241
>[] notify_change+0x128/0x241
>[] do_truncate+0x56/0x7f
>[] do_truncate+0x61/0x7f
>[] get_write_access+0x3f/0x45
>[] may_open+0x193/0x1af
>[] open_namei+0x2cb/0x63e
>[] rt_up_read+0x53/0x5c
>[] do_page_fault+0x479/0x7cc
>[] do_filp_open+0x1c/0x38
>[] rt_spin_unlock+0x17/0x47
>[] get_unused_fd+0xf9/0x107
>[] do_sys_open+0x48/0xd5
>[] system_call+0x7e/0x83
>[] 0x

I guess we're doing lock_page() against a blockdev pagecache page here
while holding truncate_mutex against some S_ISREG file.

> -> #0 (&ei->truncate_mutex){--..}:
>[] print_circular_bug_header+0xcc/0xd3
>[] __lock_acquire+0x96e/0xc35
>[] lock_acquire+0x48/0x61
>[] ext3_get_blocks_handle+0x1a4/0x8f7
>[] _mutex_lock+0x26/0x52
>[] ext3_get_blocks_handle+0x1a4/0x8f7
>[] find_usage_backwards+0xb0/0xd9
>[] find_usage_backwards+0xb0/0xd9
>[] debug_check_no_locks_freed+0x11d/0x129
>[] trace_hardirqs_on_caller+0x115/0x138
>[] lockdep_init_map+0xac/0x41f
>[] add_preempt_count+0x14/0xe4
>[] ext3_get_block+0xc2/0xe4
>[] __block_prepare_write+0x195/0x442
>[] ext3_get_block+0x0/0xe4
>[] block_prepare_write+0x1a/0x25
>[] ext3_prepare_write+0xb2/0x17b
>[] generic_file_buffered_write+0x298/0x646
>[] current_fs_time+0x3b/0x40
>[] add_preempt_count+0x14/0xe4
>[] __generic_file_aio_write_nolock+0x34f/0x3b9
>[] put_lock_stats+0xe/0x2a
>[] generic_file_aio_write+0x4c/0xc4
>[] generic_file_aio_write+0x61/0xc4
>[] ext3_orphan_del+0x53/0x19f
>[] ext3_file_write+0x1c/0x9d
>[] do_sync_write+0xcc/0x10f
>[] autoremove_wake_function+0x0/0x2e
>[] get_lock_stats+0xe/0x3f
>[] lock_release_holdtime+0x41/0x4f
>[] put_lock_stats+0xe/0x2a
>[] sys_fchmod+0xa3/0xbd
>[] _mutex_unlock+0x17/0x20
>[] vfs_write+0xb6/0x148
>[] sys_write+0x48/0x74
>[] system_call+0x7e/0x83
>[] 0x

Here we're taking a file's truncate_mutex while holding lock_page() against
one of its pages.  This is the correct lock ranking, I suppose.

This is one of those fairly common cross-inode notabugs, I suspect.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Andrew Morton
On Fri, 13 Jul 2007 15:33:41 +0200
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-07-13 at 02:05 -0700, Andrew Morton wrote:
> 
> > Except lockdep doesn't know about journal_start(), which has ranking
> > requirements similar to a semaphore.  
> 
> Something like so?

Looks OK.

> Or can journal_stop() be done by a different task than the one that did
> journal_start()? - in which case nothing much can be done :-/

Yeah, journal_start() and journal_stop() are well-behaved.
 
> This seems to boot... albeit I did not push it hard.

I fear the consequences of this change :(

Oh well, please keep it alive, maybe beat on it a bit, resend it
later on?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-13 Thread Andrew Morton
On Fri, 13 Jul 2007 16:00:48 +0530 Kalpak Shah <[EMAIL PROTECTED]> wrote:

> > >  
> > > - if (inode->i_nlink >= EXT4_LINK_MAX)
> > > + if (EXT4_DIR_LINK_MAX(inode))
> > >   return -EMLINK;
> > 
> > argh.  WHY_IS_EXT4_FULL_OF_UPPER_CASE_MACROS_WHICH_COULD_BE_IMPLEMENTED
> > as_lower_case_inlines?  Sigh.  It's all the old-timers, I guess.
> > 
> > EXT4_DIR_LINK_MAX() is buggy: it evaluates its arg twice.
> 
> #define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir) && (dir)->i_nlink >= 
> EXT4_LINK_MAX)
> 
> This just checks if directory has hash indexing in which case we need not 
> worry about EXT4_LINK_MAX subdir limit. If directory is not hash indexed then 
> we will need to enforce a max subdir limit. 
> 
> Sorry, I didn't understand what is the problem with this macro?

Macros should never evaluate their argument more than once, because if they
do they will misbehave when someone passes them an
expression-with-side-effects:

struct inode *p = q;

EXT4_DIR_LINK_MAX(p++);

one expects `p' to have the value q+1 here.  But it might be q+2.

and

EXT4_DIR_LINK_MAX(some_function());

might cause some_function() to be called twice.


This is one of the many problems which gets fixed when we write code in C
rather than in cpp.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Andrew Morton
On Tue, 10 Jul 2007 16:32:47 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote:

> > +   brelse(bh);
> > +   up_write(&EXT4_I(inode)->xattr_sem);
> > +   return error;
> > +}
> > +
> 
> We're doing GFP_KERNEL memory allocations while holding xattr_sem.  This
> can cause the VM to reenter the filesystem, perhaps taking i_mutex and/or
> i_truncate_sem and/or journal_start() (I forget whether this still
> happens).  Have we checked whether this can occur and if so, whether we are
> OK from a lock ranking POV?  Bear in mind that journalled-data mode is more
> complex in this regard.

I notice that everyone carefully avoided addressing this ;)

Oh well, hopefully people are testing with lockdep enabled.  As long
as the fs is put under extreme memory pressure, most bugs should be reported.

Except lockdep doesn't know about journal_start(), which has ranking
requirements similar to a semaphore.  Nor does it know about lock_page().
We already have hard-to-hit but deadlockable bugs in this area.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks

2007-07-12 Thread Andrew Morton
On Thu, 12 Jul 2007 08:57:51 -0500 Dave Kleikamp <[EMAIL PROTECTED]> wrote:

> On Thu, 2007-07-12 at 12:38 +0100, Andy Whitcroft wrote:
> > Andrew Morton wrote:
> > >> +if (ext4_ext_check_header(inode, 
> > >> ext_block_hdr(bh),
> > >> +depth - i - 1)) 
> > >> {
> > >> +err = -EIO;
> > >> +break;
> > >> +}
> > >> +path[i+1].p_bh = bh;
> > > 
> > > Really that should have been "i + 1".  checkpatch misses this.  It seems 
> > > to
> > > be missing more stuff that it used to lately.
> > 
> > This one is difficult.  The rules up to now have been consistent spacing
> > is required on both sides of mathematics operators.  I personally like
> > spaces always, but we do tend to use them without spaces too where the
> > binding is effectivly part of the value -- the classic case is something
> > like:
> > 
> > pfn << MAX_ORDER-1
> > 
> > In allowing that sort of thing, we implictly allow the one you note
> > above.  We have tried to be overly annoying on these things, and so the
> > check is consistancy, spaces both or neither.  We could be stricter.
> 
> I personally think stricter is better.  An occasionally false-positive
> isn't going to hurt anyone.  (Well, maybe the checkpatch.pl maintainers
> will get nagged.)  It at least will cause the developer to look at the
> line of code in question and make a conscious decision to leave it as it
> is.  I'm assuming that upstream maintainers use checkpatch.pl with some
> constraint, and don't throw every patch that produces a warning back at
> the submitter.
> 

I'm in two minds.  Missing-the-spaces is pretty damn common and is sometimes
a reasonable way of saving quite a lot of horizontal space.  I spose we could
take it out again if it's causing problems.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-11 Thread Andrew Morton
On Wed, 11 Jul 2007 06:10:56 -0600 Andreas Dilger <[EMAIL PROTECTED]> wrote:

> On Jul 10, 2007  16:32 -0700, Andrew Morton wrote:
> > >   err = ext4_reserve_inode_write(handle, inode, &iloc);
> > > + if (EXT4_I(inode)->i_extra_isize <
> > > + EXT4_SB(inode->i_sb)->s_want_extra_isize &&
> > > + !(EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND)) {
> > > + /* We need extra buffer credits since we may write into EA block
> > > +  * with this same handle */
> > > + if ((jbd2_journal_extend(handle,
> > > +  EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) {
> > > + ret = ext4_expand_extra_isize(inode,
> > > + EXT4_SB(inode->i_sb)->s_want_extra_isize,
> > > + iloc, handle);
> > > + if (ret) {
> > > + EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND;
> > > + if (!expand_message) {
> > > + ext4_warning(inode->i_sb, __FUNCTION__,
> > > + "Unable to expand inode %lu. Delete"
> > > + " some EAs or run e2fsck.",
> > > + inode->i_ino);
> > > + expand_message = 1;
> > > + }
> > > + }
> > > + }
> > > + }
> > 
> > Maybe that message should come out once per mount rather than once per boot
> > (or once per modprobe)?
> 
> Probably true.
> 
> > What are the consequences of a jbd2_journal_extend() failure in here?
> 
> Not fatal, just like every user of i_extra_isize.  If the inode isn't a
> large inode, or it can't be expanded then there will be a minor loss of
> functionality on that inode.  If the i_extra_isize is critical, then
> the sysadmin will have run e2fsck to force s_min_extra_isize large enough.
> 
> Note that this is only applicable for filesystems which are upgraded.  For
> new inodes (i.e. all inodes that exist in the filesystem if it was always
> run on a kernel with the currently understood extra fields) then this will
> never be invoked (until such a time new extra fields are added).

I'd suggest that we get a comment in the code explaining this: this
unchecked error does rather stand out.

> > > + if (EXT4_I(inode)->i_file_acl) {
> > > + bh = sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl);
> > > + error = -EIO;
> > > + if (!bh)
> > > + goto cleanup;
> > > + if (ext4_xattr_check_block(bh)) {
> > > + ext4_error(inode->i_sb, __FUNCTION__,
> > > + "inode %lu: bad block %llu", inode->i_ino,
> > > + EXT4_I(inode)->i_file_acl);
> > > + error = -EIO;
> > > + goto cleanup;
> > > + }
> > > + base = BHDR(bh);
> > > + first = BFIRST(bh);
> > > + end = bh->b_data + bh->b_size;
> > > + min_offs = end - base;
> > > + free = ext4_xattr_free_space(first, &min_offs, base,
> > > +  &total_blk);
> > > + if (free < new_extra_isize) {
> > > + if (!tried_min_extra_isize && s_min_extra_isize) {
> > > + tried_min_extra_isize++;
> > > + new_extra_isize = s_min_extra_isize;
> > 
> > Aren't we missing a brelse(bh) here?
> 
> Seems likely, yes.

OK - could we get a positive ack from someone indicating that this will get
looked at?  Because I am about to forget about it.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 8][PATCH 1/1]Add journal checksums

2007-07-11 Thread Andrew Morton
On Wed, 11 Jul 2007 07:01:08 -0600 Andreas Dilger <[EMAIL PROTECTED]> wrote:

> > > > -   /* AKPM: buglet - add `i' to tmp! */
> > > 
> > > Damn.  After, what, seven years, someone actually fixed it?
> > > 
> > > > for (i = 0; i < bh->b_size; i += 512) {
> > > > -   journal_header_t *tmp = (journal_header_t*)bh->b_data;
> > > > +   struct commit_header *tmp =
> > > > +   (struct commit_header *)(bh->b_data + i);
> > > > tmp->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
> > > > tmp->h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
> > > > tmp->h_sequence = 
> > > > cpu_to_be32(commit_transaction->t_tid);
> > > > +
> > > > +   if (JBD2_HAS_COMPAT_FEATURE(journal,
> > > > +   
> > > > JBD2_FEATURE_COMPAT_CHECKSUM)) {
> > > > +   tmp->h_chksum_type  = JBD2_CRC32_CHKSUM;
> > > > +   tmp->h_chksum_size  = 
> > > > JBD2_CRC32_CHKSUM_SIZE;
> > > > +   tmp->h_chksum[0]= 
> > > > cpu_to_be32(crc32_sum);
> > > > +   }
> > > > }
> > > 
> > > And in doing so, changed the on-disk format of the journal commit blocks.
> > > 
> > > Surely this was worth a mention in the changelog, if not a standalone 
> > > patch?
> > > 
> > > I don't think this is worth doing, really.  Why not just leave the format
> > > as it was, take the loop out and run this code once rather than eight
> > > times?
> 
> Well, we aren't using the rest of the commit block in any case.  I think
> the original intention was that we'd get 8 copies of the commit block so
> we would be sure to get a good one.
> 
> I don't know whether we'd rather have 8 copies of the commit block, or
> more potential to expand the commit block?  I don't personally have any
> preference, since the checksum should be a more robust way of checking
> validity than having multiple copies, so we may as well remove the loop
> and stick with a single copy for now.

We've never altered any commit block sectors apart from the zeroeth one
(eight times) due to the above bug.  So I'd suggest that we should formalise
the old bug and leave the format as-is.  That'll leave lots of space spare in
the commit block.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Tue, 10 Jul 2007 23:18:50 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-07-10 at 22:17 -0700, Andrew Morton wrote:
> > On Tue, 10 Jul 2007 22:09:08 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:
> > 
> > > David Chinneer pointed that we need to journal the version number
> > > updates together with the operations that causes the change of the inode
> > > version number, in order to survive server crashes so clients won't see
> > > the counter go backwards.
> > > 
> > > So increment i_version in fs code is probably the place to ensure the
> > > inode version changes are stored to disk. It's seems update the ext4
> > > inode version in every ext4_mark_inode_dirty() is the easiest way.
> > 
> > That still makes us dependent upon _something_ changing the inode.  For
> > overwrites the only something is mtime.
> > 
> > If we don't want to have a peculiar dependency upon s_time_gran=1e9 (and
> > I don't think we do) then I guess we'll need new code in or around
> > file_update_time() to do this.
> 
> do you mean mark inode dirty all the times in file_update_time()? Not
> sure about the overhead for ext3/4.
> 

hm, I hadn't thought about it in any detail.

Maybe something like

--- a/fs/inode.c~a
+++ a/fs/inode.c
@@ -1238,6 +1238,11 @@ void file_update_time(struct file *file)
sync_it = 1;
}
 
+   if (IS_I_VERSION_64(inode)) {
+   inode_inc_iversion(inode);  /* Takes i_lock on 32-bit */
+   sync_it = 1;
+   }
+
if (sync_it)
mark_inode_dirty_sync(inode);
 }
_

?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 2][PATCH 5/5] cleanups: Export jbd2-debug via debugfs

2007-07-10 Thread Andrew Morton
On Wed, 11 Jul 2007 00:38:09 -0500 "Jose R. Santos" <[EMAIL PROTECTED]> wrote:

> 
> > Alternatively (and preferably) do this via an update to
> > Documentation/filesystems/ext4.txt.
> 
> Seems like I also need to update the doc on Kconfig as well.  Do you
> prefer this in separate patches? (current patch, kconfig patch, ext4
> doc update patch?

All these changes are logically connected (aren't they?).  A single patch
is fine.

> > Shoudln't all this debug info be a per-superblock thing rather than
> > kernel-wide?
> 
> I don't think it is worth pursuing this feature since this seems to
> have been broken for a while now (its been there since the first git
> revission in ext3) and nobody has noticed it until now.  It could be
> address on a later patch though, since the initial purpose of the patch
> was to fix the broken JBD2_DEBUG option. Of course, this may not be
> clearly express in the changelog. :)
> 

I don't think that making it all per-superblock is worth the effort - it's
a developer-only thing and developer will have the knowledge to test ext4
on an otherwise-ext3 setup if they're really fussed about the accuracy.

So yes, a bare make-it-work patch sounds appropriate.  Or remove it, but
hey, it might be useful.  The timestamping stuff certainly looks useful.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 9][PATCH 5/5]Extent micro cleanups

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:38:59 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> From: Dmitry Monakhov <[EMAIL PROTECTED]>
> Subject: ext4: extent macros cleanup
> 
> - Replace math equation to it's macro equivalent

s/it's/its/;)

> - make ext4_ext_grow_indepth() indexes/leaf correct

hm, what was wrong with it?

> Signed-off-by: Dmitry Monakhov <[EMAIL PROTECTED]>
> Acked-by: Alex Tomas <[EMAIL PROTECTED]>
> Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>
> ---
>  fs/ext4/extents.c |   11 +++
>  1 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 12fe3d7..1fd00ac 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -375,7 +375,7 @@ ext4_ext_binsearch_idx(struct inode *inode, struct 
> ext4_ext_path *path, int bloc
>   ext_debug("binsearch for %d(idx):  ", block);
> 
>   l = EXT_FIRST_INDEX(eh) + 1;
> - r = EXT_FIRST_INDEX(eh) + le16_to_cpu(eh->eh_entries) - 1;
> + r = EXT_LAST_INDEX(eh);
>   while (l <= r) {
>   m = l + (r - l) / 2;
>   if (block < le32_to_cpu(m->ei_block))
> @@ -440,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, struct 
> ext4_ext_path *path, int block)
>   ext_debug("binsearch for %d:  ", block);
> 
>   l = EXT_FIRST_EXTENT(eh) + 1;
> - r = EXT_FIRST_EXTENT(eh) + le16_to_cpu(eh->eh_entries) - 1;
> + r = EXT_LAST_EXTENT(eh);
> 
>   while (l <= r) {
>   m = l + (r - l) / 2;
> @@ -922,8 +922,11 @@ static int ext4_ext_grow_indepth(handle_t *handle, 
> struct inode *inode,
>   curp->p_hdr->eh_max = cpu_to_le16(ext4_ext_space_root_idx(inode));
>   curp->p_hdr->eh_entries = cpu_to_le16(1);
>   curp->p_idx = EXT_FIRST_INDEX(curp->p_hdr);
> - /* FIXME: it works, but actually path[0] can be index */
> - curp->p_idx->ei_block = EXT_FIRST_EXTENT(path[0].p_hdr)->ee_block;
> + 
> + if (path[0].p_hdr->eh_depth)
> +   curp->p_idx->ei_block = EXT_FIRST_INDEX(path[0].p_hdr)->ei_block;
> + else
> +   curp->p_idx->ei_block = EXT_FIRST_EXTENT(path[0].p_hdr)->ee_block;

whitespace bustage there.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 9][PATCH 4/5]Morecleanups:ext4_extent_compilation_fixes

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:38:51 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> Subject: [EXT4 set 9][PATCH 4/5]Morecleanups:ext4_extent_compilation_fixes
> Date: Sun, 01 Jul 2007 03:38:51 -0400
> Sender: [EMAIL PROTECTED]
> Organization: IBM Linux Technology Center
> X-Mailer: Evolution 2.8.0 (2.8.0-33.el5) 
> 
> From: Dmitry Monakhov <[EMAIL PROTECTED]>
> Subject: ext4: extent compilation fixes

I hope this patch series will hit git with nice titles like "ext4: extent
compilation fixes" and not funny ones like
"Morecleanups:ext4_extent_compilation_fixes".

> Fix compilation with EXT_DEBUG, also fix leXX_to_cpu convertions.

"conversions".
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 8][PATCH 1/1]Add journal checksums

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:38:25 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> Journal checksum feature has been added to detect corruption of journal.

That was brief.  No description of what it does, how it does it, why it
does it, how one operates it, why (or why not) one would choose to enable
it nor of the costs of enabling it.

> Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
> Signed-off-by: Girish Shilamkar <[EMAIL PROTECTED]>
> Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>
> 
> diff -Nurp linux024/fs/ext4/super.c linux/fs/ext4/super.c
> --- linux024/fs/ext4/super.c  2007-06-25 16:19:24.0 -0500
> +++ linux/fs/ext4/super.c 2007-06-26 08:35:16.0 -0500
> @@ -721,6 +721,7 @@ enum {
>   Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
>   Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_bh,
>   Opt_commit, Opt_journal_update, Opt_journal_inum, Opt_journal_dev,
> + Opt_journal_checksum, Opt_journal_async_commit,

A new user-visible feature should be accompanied by an update to
Documentation/filesystems/ext4.txt?

>   Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
>   Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
>   Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
> @@ -760,6 +761,8 @@ static match_table_t tokens = {
>   {Opt_journal_update, "journal=update"},
>   {Opt_journal_inum, "journal=%u"},
>   {Opt_journal_dev, "journal_dev=%u"},
> + {Opt_journal_checksum, "journal_checksum"},
> + {Opt_journal_async_commit, "journal_async_commit"},

What's journal_async_commit?

>   {Opt_abort, "abort"},
>   {Opt_data_journal, "data=journal"},
>   {Opt_data_ordered, "data=ordered"},
> @@ -948,6 +951,13 @@ static int parse_options (char *options,
>   return 0;
>   *journal_devnum = option;
>   break;
> + case Opt_journal_checksum:
> + set_opt (sbi->s_mount_opt, JOURNAL_CHECKSUM);
> + break;
> + case Opt_journal_async_commit:
> + set_opt (sbi->s_mount_opt, JOURNAL_ASYNC_COMMIT);
> + set_opt (sbi->s_mount_opt, JOURNAL_CHECKSUM);
> + break;
>   case Opt_noload:
>   set_opt (sbi->s_mount_opt, NOLOAD);
>   break;
> @@ -1817,6 +1827,21 @@ static int ext4_fill_super (struct super
>   goto failed_mount4;
>   }
>  
> + if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
> + jbd2_journal_set_features(sbi->s_journal,
> + JBD2_FEATURE_COMPAT_CHECKSUM, 0,
> + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
> + } else if (test_opt(sb, JOURNAL_CHECKSUM)) {
> + jbd2_journal_set_features(sbi->s_journal,
> + JBD2_FEATURE_COMPAT_CHECKSUM, 0, 0);
> + jbd2_journal_clear_features(sbi->s_journal, 0, 0,
> + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
> + } else {
> + jbd2_journal_clear_features(sbi->s_journal,
> + JBD2_FEATURE_COMPAT_CHECKSUM, 0,
> + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
> + }

Some discussion of the forward- and backward- compatibility design would be
appropriate.  Also a description of whether and how fsck can be used to fix
up these feature flags.

>   /* We have now updated the journal if required, so we can
>* validate the data journaling mode. */
>   switch (test_opt(sb, DATA_FLAGS)) {
> diff -Nurp linux024/fs/jbd2/commit.c linux/fs/jbd2/commit.c
> --- linux024/fs/jbd2/commit.c 2007-06-25 16:19:25.0 -0500
> +++ linux/fs/jbd2/commit.c2007-06-26 08:40:03.0 -0500
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

I think we just broke CONFIG_EXT4=y, CONFIG_CRC32=n builds.

>  /*
>   * Default IO end handler for temporary BJ_IO buffer_heads.
> @@ -93,15 +94,18 @@ static int inverted_lock(journal_t *jour
>   return 1;
>  }
>  
> -/* Done it all: now write the commit record.  We should have
> +/*
> + * Done it all: now submit the commit record.  We should have
>   * cleaned up our previous buffers by now, so if we are in abort
>   * mode we can now just skip the rest of the journal write
>   * entirely.
>   *
>   * Returns 1 if the journal needs to be aborted or 0 on success
>   */
> -static int journal_write_commit_record(journal_t *journal,
> - transaction_t *commit_transaction)
> +static int journal_submit_commit_record(journal_t *journal,
> + transaction_t *commit_transaction,
> + struct buffer_head **cbh,
> + __u32 crc32_sum)
>  {
>   struct journal_head *descriptor;
>   struct buffer_head *bh;
> @@ -1

Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:38:18 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> >From [EMAIL PROTECTED] Thu May 17 17:21:08 2007
> Hi,
> 
> I have rebased this patch to 2.6.22-rc1 so that it can be added to the
> ext4 patch queue. It has been tested by creating more than 65000 subdirs
> and then deleting them and checking the nlinks. The e2fsprogs part of
> this patch was sent earlier by me to linux-ext4 and doesn't need any
> changes, so not submitting it again.
> 
> --
> This patch adds support to ext4 for allowing more than 65000
> subdirectories. Currently the maximum number of subdirectories is capped
> at 32000.
> 
> If we exceed 65000 subdirectories in an htree directory it sets the
> inode link count to 1 and no longer counts subdirectories.  The
> directory link count is not actually used when determining if a
> directory is empty, as that only counts subdirectories and not regular
> files that might be in there. 
> 
> A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
> the subdir count for any directory crosses 65000.
> 

Would I be correct in assuming that a later fsck will clear
EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any >65000 subdir
directories?

If so, that is worth a mention in the changelog, perhaps?

Please remind us what is the behaviour of an RO_COMPAT flag?  It means that
old ext4, ext3 and ext2 can only mount this fs read-only, yes?

> 
> Index: linux-2.6.22-rc4/fs/ext4/namei.c
> ===
> --- linux-2.6.22-rc4.orig/fs/ext4/namei.c 2007-06-14 17:30:47.0 
> -0700
> +++ linux-2.6.22-rc4/fs/ext4/namei.c  2007-06-14 17:32:55.0 -0700
> @@ -1619,6 +1619,27 @@ static int ext4_delete_entry (handle_t *
>   return -ENOENT;
>  }
>  
> +static inline void ext4_inc_count(handle_t *handle, struct inode *inode)
> +{
> + inc_nlink(inode);
> + if (is_dx(inode) && inode->i_nlink > 1) {
> + /* limit is 16-bit i_links_count */
> + if (inode->i_nlink >= EXT4_LINK_MAX || inode->i_nlink == 2) {
> + inode->i_nlink = 1;
> + EXT4_SET_RO_COMPAT_FEATURE(inode->i_sb,
> +   EXT4_FEATURE_RO_COMPAT_DIR_NLINK);
> + }
> + }
> +}

Looks too big to be inlined.

Why do we set EXT4_FEATURE_RO_COMPAT_DIR_NLINK if i_nlink==2?

> +static inline void ext4_dec_count(handle_t *handle, struct inode *inode)
> +{
> + drop_nlink(inode);
> + if (S_ISDIR(inode->i_mode) && inode->i_nlink == 0)
> + inc_nlink(inode);
> +}

Probably too big to inline.

> +
>  static int ext4_add_nondir(handle_t *handle,
>   struct dentry *dentry, struct inode *inode)
>  {
> @@ -1715,7 +1736,7 @@ static int ext4_mkdir(struct inode * dir
>   struct ext4_dir_entry_2 * de;
>   int err, retries = 0;
>  
> - if (dir->i_nlink >= EXT4_LINK_MAX)
> + if (EXT4_DIR_LINK_MAX(dir))
>   return -EMLINK;
>  
>  retry:
> @@ -1738,7 +1759,7 @@ retry:
>   inode->i_size = EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize;
>   dir_block = ext4_bread (handle, inode, 0, 1, &err);
>   if (!dir_block) {
> - drop_nlink(inode); /* is this nlink == 0? */
> + ext4_dec_count(handle, inode); /* is this nlink == 0? */
>   ext4_mark_inode_dirty(handle, inode);
>   iput (inode);
>   goto out_stop;
> @@ -1770,7 +1791,7 @@ retry:
>   iput (inode);
>   goto out_stop;
>   }
> - inc_nlink(dir);
> + ext4_inc_count(handle, dir);
>   ext4_update_dx_flag(dir);
>   ext4_mark_inode_dirty(handle, dir);
>   d_instantiate(dentry, inode);
> @@ -2035,10 +2056,10 @@ static int ext4_rmdir (struct inode * di
>   retval = ext4_delete_entry(handle, dir, de, bh);
>   if (retval)
>   goto end_rmdir;
> - if (inode->i_nlink != 2)
> - ext4_warning (inode->i_sb, "ext4_rmdir",
> -   "empty directory has nlink!=2 (%d)",
> -   inode->i_nlink);
> + if (!EXT4_DIR_LINK_EMPTY(inode))
> + ext4_warning(inode->i_sb, "ext4_rmdir",
> +  "empty directory has too many links (%d)",
> +  inode->i_nlink);
>   inode->i_version++;
>   clear_nlink(inode);
>   /* There's no need to set i_disksize: the fact that i_nlink is
> @@ -2048,7 +2069,7 @@ static int ext4_rmdir (struct inode * di
>   ext4_orphan_add(handle, inode);
>   inode->i_ctime = dir->i_ctime = dir->i_mtime = ext4_current_time(inode);
>   ext4_mark_inode_dirty(handle, inode);
> - drop_nlink(dir);
> + ext4_dec_count(handle, dir);
>   ext4_update_dx_flag(dir);
>   ext4_mark_inode_dirty(handle, dir);
>  
> @@ -2099,7 +2120,7 @@ static int ext4_unlink(struct inode * di
>   dir->i_ctime = dir->i_mtime

Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Wed, 11 Jul 2007 15:05:27 +1000 Neil Brown <[EMAIL PROTECTED]> wrote:

> 
> It just occurred to me:
> 
>  If i_version is 64bit, then knfsd would need to be careful when
>  reading it on a 32bit host.  What are the locking rules?
> 
>  Presumably it is only updated under i_mutex protection, but having to
>  get i_mutex to read it would seem a little heavy handed.
> 
>  Should it use a seqlock like i_size?
>  Could we use the same seqlock that i_size uses, or would we need a
>  separate one?
> 

seqlocks are a bit of a pain to use (we've had plenty of deadlocks on the
i_size one).  We could reuse inode.i_lock for this modification.  Its
mandate is "general purpose innermost lock to protect stuff in this inode".


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Tue, 10 Jul 2007 22:09:08 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> David Chinneer pointed that we need to journal the version number
> updates together with the operations that causes the change of the inode
> version number, in order to survive server crashes so clients won't see
> the counter go backwards.
> 
> So increment i_version in fs code is probably the place to ensure the
> inode version changes are stored to disk. It's seems update the ext4
> inode version in every ext4_mark_inode_dirty() is the easiest way.

That still makes us dependent upon _something_ changing the inode.  For
overwrites the only something is mtime.

If we don't want to have a peculiar dependency upon s_time_gran=1e9 (and
I don't think we do) then I guess we'll need new code in or around
file_update_time() to do this.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-07-10 Thread Andrew Morton
On Tue, 10 Jul 2007 23:21:49 -0400 "Cédric Augonnet" <[EMAIL PROTECTED]> wrote:

> 2007/7/10, Andrew Morton <[EMAIL PROTECTED]>:
> 
> Hi all,
> 
> > > + size = sizeof(struct transaction_stats_s);
> > > + s->stats = kmalloc(size, GFP_KERNEL);
> > > + if (s == NULL) {
> > > + kfree(s);
> > > + return -EIO;
> >
> > ENOMEM
> 
> I'm sorry if i missed some point, but i just don't see the use of such
> a kfree here, not sure Andrew meant you should only return ENOMEM
> instead, but why issuing those kfree(NULL), instead of just a if (!s)
> return ENOMEM ?
> 

You found a bug.  It was meant to be

if (s->stats == NULL)


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Tue, 10 Jul 2007 20:19:16 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-07-10 at 18:22 -0700, Andrew Morton wrote:
> > On Tue, 10 Jul 2007 18:09:40 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:
> > 
> > > On Tue, 2007-07-10 at 16:30 -0700, Andrew Morton wrote:
> > > > On Sun, 01 Jul 2007 03:37:04 -0400
> > > > Mingming Cao <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > This patch converts the 32-bit i_version in the generic inode to a 
> > > > > 64-bit
> > > > > i_version field.
> > > > > 
> > > > 
> > > > That's obvious from the patch.  But what was the reason for making this
> > > > (unrelated to ext4) change?
> > > > 
> > > 
> > > The need is came from NFSv4
> > > 
> > > On Fri, 2007-05-25 at 18:25 +0200, Jean noel Cordenner wrote: 
> > > > Hi,
> > > > 
> > > > This is an update of the i_version patch.
> > > > The i_version field is a 64bit counter that is set on every inode
> > > > creation and that is incremented every time the inode data is modified
> > > > (similarly to the "ctime" time-stamp).
> > > > The aim is to fulfill a NFSv4 requirement for rfc3530:
> > > > "5.5.  Mandatory Attributes - Definitions
> > > > Name#   DataType   Access   Description
> > > > ___
> > > > change  3   uint64   READ A value created by the
> > > > server that the client can use to determine if file
> > > > data, directory contents or attributes of the object
> > > > have been modified.  The servermay return the object's
> > > > time_metadata attribute for this attribute's value but
> > > > only if the filesystem object can not be updated more
> > > > frequently than the resolution of time_metadata.
> > > > "
> > > > 
> > > 
> > > > Please update the changelog for this.
> > > > 
> > > 
> > > Is above description clear to you?
> > > 
> > 
> > Yes, thanks.  It doesn't actually tell us why we want to implement
> > this attribute and it doesn't tell us what the implications of failing
> > to do so are, but I guess we can take that on trust from the NFS guys.
> > 
> > But I suspect the ext4 implementation doesn't actually do this.  afaict we
> > won't update i_version for file overwrites (especially if s_time_gran can
> > indeed be 1,000,000,000) and of course for MAP_SHARED modifications.  What
> > would be the implications of this?
> > 
> 
> In the case of overwrite (file date updated), I assume the ctime/mtime
> is being updated and the inode is being dirtied, so the version number
> is being updated.
> 
>  vfs_write()->..
> ->__generic_file_aio_write_nolock()
> ->file_update_time()
> ->mark_inode_dirty_sync()
> ->__mark_inode_dirty(I_DIRTY_SYNC)
> ->ext4_dirty_inode()
> ->ext4_mark_inode_dirty()

That assumes an mtime update for every write().  OK, so two writes in a
single nanosecond won't be happening.  But in that case why is this code:

static inline struct timespec ext4_current_time(struct inode *inode)
{
return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ?
current_fs_time(inode->i_sb) : CURRENT_TIME_SEC;
}

checking (s_time_gran < NSEC_PER_SEC) ??

Overall it is a bit unpleasing to rely upon mtime updates for a correct NFS
server implementation: if we were to later decrease s_time_gran (as we
might do, for performance reasons), the NFS server implementation starts
reporting incorrect information.

> > And how does the NFS server know that the filesystem implements i_version? 
> > Will a zero-value of i_version have special significance, telling the
> > server to not send this attribute, perhaps?
> 
> Bruce raised up this question a few days back when he reviewed this
> patch, I think the solution is add a superblock flag for fs support
> inode versioning, probably at VFS layer?

That would work.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:38:10 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> [PATCH] jbd2 stats through procfs
> 
> The patch below updates the jbd stats patch to 2.6.20/jbd2.
> The initial patch was posted by Alex Tomas in December 2005
> (http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
> It provides statistics via procfs such as transaction lifetime and size.
> 
> [ This probably should be rewritten to use debugfs?   -- Ted]
> 

I suppose that given that we're creating a spot in debugfs for the jbd2
debug code, yes, this also should be moved over.

But the jbd2 debug debugfs entries were kernel-wide whereas this is
per-superblock.  I think.

That email from Alex contains pretty important information.  I suggest that
it be added to the changelog after accuracy-checking.  Addition to
Documentation/filesystems/ext4.txt would be good.

> --
> 
> Index: linux-2.6.22-rc4/include/linux/jbd2.h
> ===
> --- linux-2.6.22-rc4.orig/include/linux/jbd2.h2007-06-11 
> 17:28:17.0 -0700
> +++ linux-2.6.22-rc4/include/linux/jbd2.h 2007-06-13 10:45:21.0 
> -0700
> @@ -408,6 +408,16 @@
>  };
>  
>  
> +/*
> + * Some stats for checkpoint phase
> + */
> +struct transaction_chp_stats_s {
> + unsigned long   cs_chp_time;
> + unsigned long   cs_forced_to_close;
> + unsigned long   cs_written;
> + unsigned long   cs_dropped;
> +};

It would be nice to document what units all these fields are in.  Jiffies,
I assume.

>  /* The transaction_t type is the guts of the journaling mechanism.  It
>   * tracks a compound transaction through its various states:
>   *
> @@ -543,6 +553,21 @@
>   spinlock_t  t_handle_lock;
>  
>   /*
> +  * Longest time some handle had to wait for running transaction
> +  */
> + unsigned long   t_max_wait;
> +
> + /*
> +  * When transaction started
> +  */
> + unsigned long   t_start;
> +
> + /*
> +  * Checkpointing stats [j_checkpoint_sem]
> +  */
> + struct transaction_chp_stats_s t_chp_stats;
> +
> + /*
>* Number of outstanding updates running on this transaction
>* [t_handle_lock]
>*/
> @@ -573,6 +598,57 @@
>  
>  };
>  
> +struct transaction_run_stats_s {
> + unsigned long   rs_wait;
> + unsigned long   rs_running;
> + unsigned long   rs_locked;
> + unsigned long   rs_flushing;
> + unsigned long   rs_logging;
> +
> + unsigned long   rs_handle_count;
> + unsigned long   rs_blocks;
> + unsigned long   rs_blocks_logged;
> +};
> +
> +struct transaction_stats_s
> +{
> + int ts_type;
> + unsigned long   ts_tid;
> + union {
> + struct transaction_run_stats_s run;
> + struct transaction_chp_stats_s chp;
> + } u;
> +};
> +
> +#define JBD2_STATS_RUN   1
> +#define JBD2_STATS_CHECKPOINT2
> +
> +#define ts_wait  u.run.rs_wait
> +#define ts_running   u.run.rs_running
> +#define ts_lockedu.run.rs_locked
> +#define ts_flushing  u.run.rs_flushing
> +#define ts_logging   u.run.rs_logging
> +#define ts_handle_count  u.run.rs_handle_count
> +#define ts_blocksu.run.rs_blocks
> +#define ts_blocks_logged u.run.rs_blocks_logged
> +
> +#define ts_chp_time  u.chp.cs_chp_time
> +#define ts_forced_to_close   u.chp.cs_forced_to_close
> +#define ts_written   u.chp.cs_written
> +#define ts_dropped   u.chp.cs_dropped

That's a bit sleazy.  We can drop the "u" from 'struct transaction_stats_s'
and make it an anonymous union, then open-code foo.run.rs_wait everywhere.

But that's a bit sleazy too, because the reader of the code would not know
that a write to foo.run.rs_wait will stomp on the value of
foo.chp.cs_chp_time.

So to minimize reader confusion it would be best I think to just open-code
the full u.run.rs_wait at all code-sites.

The macros above are the worst possible choice: they hide information from
the code-reader just to save the code-writer a bit of typing.  But we very
much want to optimise for code-readers, not for code-writers.

> +#define CURRENT_MSECS(jiffies_to_msecs(jiffies))

hm, that isn't something which should be in an ext4 header file.  And it
shouldn't be in UPPER CASE and it shouldn't pretend to be a constant (or a
global scalar).

IOW: yuk.

How's about raising a separate, standalone patch which creates a new
kernel-wide coded-in-C function such as

unsigned long jiffies_in_msecs(void);

?  (That's assuming we don't already have one.  Most likely we have seven
of them hiding in various dark corners).

> +static inline unsigned int
> +jbd2_time_diff(unsigned int start, unsigned int end)
> +{
> + if (unlikely(start > end))
> + end = end + (~

Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Tue, 10 Jul 2007 18:09:40 -0400 Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-07-10 at 16:30 -0700, Andrew Morton wrote:
> > On Sun, 01 Jul 2007 03:37:04 -0400
> > Mingming Cao <[EMAIL PROTECTED]> wrote:
> > 
> > > This patch converts the 32-bit i_version in the generic inode to a 64-bit
> > > i_version field.
> > > 
> > 
> > That's obvious from the patch.  But what was the reason for making this
> > (unrelated to ext4) change?
> > 
> 
> The need is came from NFSv4
> 
> On Fri, 2007-05-25 at 18:25 +0200, Jean noel Cordenner wrote: 
> > Hi,
> > 
> > This is an update of the i_version patch.
> > The i_version field is a 64bit counter that is set on every inode
> > creation and that is incremented every time the inode data is modified
> > (similarly to the "ctime" time-stamp).
> > The aim is to fulfill a NFSv4 requirement for rfc3530:
> > "5.5.  Mandatory Attributes - Definitions
> > Name#   DataType   Access   Description
> > ___
> > change  3   uint64   READ A value created by the
> > server that the client can use to determine if file
> > data, directory contents or attributes of the object
> > have been modified.  The servermay return the object's
> > time_metadata attribute for this attribute's value but
> > only if the filesystem object can not be updated more
> > frequently than the resolution of time_metadata.
> > "
> > 
> 
> > Please update the changelog for this.
> > 
> 
> Is above description clear to you?
> 

Yes, thanks.  It doesn't actually tell us why we want to implement
this attribute and it doesn't tell us what the implications of failing
to do so are, but I guess we can take that on trust from the NFS guys.

But I suspect the ext4 implementation doesn't actually do this.  afaict we
won't update i_version for file overwrites (especially if s_time_gran can
indeed be 1,000,000,000) and of course for MAP_SHARED modifications.  What
would be the implications of this?

And how does the NFS server know that the filesystem implements i_version? 
Will a zero-value of i_version have special significance, telling the
server to not send this attribute, perhaps?


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATHC] Fix for ext2 reservation (Re: -mm merge plans for 2.6.23)

2007-07-10 Thread Andrew Morton
On Tue, 10 Jul 2007 15:15:57 -0700
Badari Pulavarty <[EMAIL PROTECTED]> wrote:

> Well, I looked at the problem now and here is the fix :)

whee, thanks.

> Greg, Please consider this for stable release also.

No, it is only relevant to -mm's ext2-reservations.patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:38:01 -0400
Mingming Cao <[EMAIL PROTECTED]> wrote:

> This patch is on top of the nanosecond timestamp and i_version_hi
> patches. 

This sort of information isn't needed (or desired) when this patch hits the
git tree.  Please ensure that things like this are cleaned up before the
patches go to Linus.

> We need to make sure that existing filesystems can also avail the new
> fields that have been added to the inode. We use s_want_extra_isize and
> s_min_extra_isize to decide by how much we should expand the inode. If
> EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand by
> max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) -
> EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question
> about whether users should be able to set s_*_extra_isize smaller than
> the known fields or not. 
> 
> This patch also adds the functionality to expand inodes to include the
> newly added fields. We start by trying to expand by s_want_extra_isize
> bytes and if its fails we try to expand by s_min_extra_isize bytes. This
> is done by changing the i_extra_isize if enough space is available in
> the inode and no EAs are present. If EAs are present and there is enough
> space in the inode then the EAs in the inode are shifted to make space.
> If enough space is not available in the inode due to the EAs then 1 or
> more EAs are shifted to the external EA block. In the worst case when
> even the external EA block does not have enough space we inform the user
> that some EA would need to be deleted or s_min_extra_isize would have to
> be reduced.
> 
> This would be online expansion of inodes. I am also working on adding an
> "expand_inodes" option to e2fsck which will expand all the used inodes.
> 
> Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>
> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.22-rc4/fs/ext4/inode.c
> ===
> --- linux-2.6.22-rc4.orig/fs/ext4/inode.c 2007-06-14 17:32:27.0 
> -0700
> +++ linux-2.6.22-rc4/fs/ext4/inode.c  2007-06-14 17:32:41.0 -0700
> @@ -3120,6 +3120,40 @@ ext4_reserve_inode_write(handle_t *handl
>  }
>  
>  /*
> + * Expand an inode by new_extra_isize bytes.
> + * Returns 0 on success or negative error number on failure.
> + */
> +int ext4_expand_extra_isize(struct inode *inode, unsigned int 
> new_extra_isize,
> + struct ext4_iloc iloc, handle_t *handle)
> +{
> + struct ext4_inode *raw_inode;
> + struct ext4_xattr_ibody_header *header;
> + struct ext4_xattr_entry *entry;
> +
> + if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) {
> + return 0;
> + }

This patch has lots of unneeded braces in it.  checkpatch appears to catch
them.


> + raw_inode = ext4_raw_inode(&iloc);
> +
> + header = IHDR(inode, raw_inode);
> + entry = IFIRST(header);
> +
> + /* No extended attributes present */
> + if (!(EXT4_I(inode)->i_state & EXT4_STATE_XATTR) ||
> + header->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC)) {
> + memset((void *)raw_inode + EXT4_GOOD_OLD_INODE_SIZE, 0,
> + new_extra_isize);
> + EXT4_I(inode)->i_extra_isize = new_extra_isize;
> + return 0;
> + }
> +
> + /* try to expand with EA present */
> + return ext4_expand_extra_isize_ea(inode, new_extra_isize,
> + raw_inode, handle);
> +}
> +
> +/*
>   * What we do here is to mark the in-core inode as clean with respect to 
> inode
>   * dirtiness (it may still be data-dirty).
>   * This means that the in-core inode may be reaped by prune_icache
> @@ -3143,10 +3177,33 @@ ext4_reserve_inode_write(handle_t *handl
>  int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode)
>  {
>   struct ext4_iloc iloc;
> - int err;
> + int err, ret;
> + static int expand_message;
>  
>   might_sleep();
>   err = ext4_reserve_inode_write(handle, inode, &iloc);
> + if (EXT4_I(inode)->i_extra_isize <
> + EXT4_SB(inode->i_sb)->s_want_extra_isize &&
> + !(EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND)) {
> + /* We need extra buffer credits since we may write into EA block
> +  * with this same handle */
> + if ((jbd2_journal_extend(handle,
> +  EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) {
> + ret = ext4_expand_extra_isize(inode,
> + EXT4_SB(inode->i_sb)->s_want_extra_isize,
> + iloc, handle);
> + if (ret) {
> + EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND;
> + if (!expand_message) {
> + ext4_warning(inode->i_sb, __FUNCTION__,
> + "Unable 

Re: [EXT4 set 4][PATCH 5/5] i_version: noversion mount option to disable inode version updates

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:37:53 -0400
Mingming Cao <[EMAIL PROTECTED]> wrote:

> Add a "noversion" mount option to disable inode version updates.

Why is this option being offered to our users?  To reduce disk traffic,
like noatime?

If so, what are the implications of this?  What would the user lose?

> Index: linux-2.6.21/fs/ext4/super.c
> ===
> --- linux-2.6.21.orig/fs/ext4/super.c
> +++ linux-2.6.21/fs/ext4/super.c
> @@ -725,7 +725,7 @@ enum {
>   Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
>   Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
>   Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
> - Opt_grpquota, Opt_extents, Opt_noextents,
> + Opt_grpquota, Opt_extents, Opt_noextents, Opt_noversion,
>  };
> 
>  static match_table_t tokens = {
> @@ -777,6 +777,7 @@ static match_table_t tokens = {
>   {Opt_barrier, "barrier=%u"},
>   {Opt_extents, "extents"},
>   {Opt_noextents, "noextents"},
> + {Opt_noversion, "noversion"},
>   {Opt_err, NULL},
>   {Opt_resize, "resize"},
>  };
> @@ -1115,6 +1116,9 @@ clear_qf_name:
>   case Opt_noextents:
>   clear_opt (sbi->s_mount_opt, EXTENTS);
>   break;
> + case Opt_noversion:
> + set_opt(sbi->s_mount_opt, NOVERSION);
> + break;
>   default:
>   printk (KERN_ERR
>   "EXT4-fs: Unrecognized mount option \"%s\" "
> Index: linux-2.6.21/include/linux/ext4_fs.h
> ===
> --- linux-2.6.21.orig/include/linux/ext4_fs.h
> +++ linux-2.6.21/include/linux/ext4_fs.h
> @@ -473,6 +473,7 @@ do {  
>\
>  #define EXT4_MOUNT_USRQUOTA  0x10 /* "old" user quota */
>  #define EXT4_MOUNT_GRPQUOTA  0x20 /* "old" group quota */
>  #define EXT4_MOUNT_EXTENTS   0x40 /* Extents support */
> +#define EXT4_MOUNT_NOVERSION 0x80 /* No inode version updates */
> 
>  /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at once */
>  #ifndef _LINUX_EXT2_FS_H
> Index: linux-2.6.21/fs/ext4/inode.c
> ===
> --- linux-2.6.21.orig/fs/ext4/inode.c
> +++ linux-2.6.21/fs/ext4/inode.c
> @@ -3082,7 +3082,9 @@ int ext4_mark_iloc_dirty(handle_t *handl
>  {
>   int err = 0;
> 
> - inode->i_version++;
> + if (!test_opt(inode->i_sb, NOVERSION))
> + inode->i_version++;
> +
>   /* the do_update_inode consumes one bh->b_count */
>   get_bh(iloc->bh);

An update to Documentation/filesystems/ext4.txt would be an appropriate
way in which to address the above questions.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 4/5] i_version:ext4 inode version update

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:37:45 -0400
Mingming Cao <[EMAIL PROTECTED]> wrote:

> This patch is on top of i_version_update_vfs.
> The i_version field of the inode is set on inode creation and incremented when
> the inode is being modified.
> 

Again, I don't think I've ever seen this patch before.  It is at least a
month old.

> 
> Index: linux-2.6.22-rc4/fs/ext4/ialloc.c
> ===
> --- linux-2.6.22-rc4.orig/fs/ext4/ialloc.c2007-06-13 17:16:28.0 
> -0700
> +++ linux-2.6.22-rc4/fs/ext4/ialloc.c 2007-06-13 17:24:45.0 -0700
> @@ -565,6 +565,7 @@ got:
>   inode->i_blocks = 0;
>   inode->i_mtime = inode->i_atime = inode->i_ctime = ei->i_crtime =
>  ext4_current_time(inode);
> + inode->i_version = 1;
>  
>   memset(ei->i_data, 0, sizeof(ei->i_data));
>   ei->i_dir_start_lookup = 0;
> Index: linux-2.6.22-rc4/fs/ext4/inode.c
> ===
> --- linux-2.6.22-rc4.orig/fs/ext4/inode.c 2007-06-13 17:21:29.0 
> -0700
> +++ linux-2.6.22-rc4/fs/ext4/inode.c  2007-06-13 17:24:45.0 -0700
> @@ -3082,6 +3082,7 @@ int ext4_mark_iloc_dirty(handle_t *handl
>  {
>   int err = 0;
>  
> + inode->i_version++;
>   /* the do_update_inode consumes one bh->b_count */
>   get_bh(iloc->bh);
>  
> Index: linux-2.6.22-rc4/fs/ext4/super.c
> ===
> --- linux-2.6.22-rc4.orig/fs/ext4/super.c 2007-06-13 17:19:11.0 
> -0700
> +++ linux-2.6.22-rc4/fs/ext4/super.c  2007-06-13 17:24:45.0 -0700
> @@ -2846,8 +2846,8 @@ out:
>   i_size_write(inode, off+len-towrite);
>   EXT4_I(inode)->i_disksize = inode->i_size;
>   }
> - inode->i_version++;
>   inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> + inode->i_version = 1;
>   ext4_mark_inode_dirty(handle, inode);
>   mutex_unlock(&inode->i_mutex);
>   return len - towrite;

ext4 already has code to update i_version on directories.  Here we appear
to be udpating it on regular files?

But for what reason?  The changelog doesn't say?

AFAICT the code forgets to update i_version during file overwrites (unless
the overwrite was over a hole).  But without a decent description of this
change I cannot tell whether this was a bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 3/5] i_version:ext4 inode version read/store

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:37:36 -0400
Mingming Cao <[EMAIL PROTECTED]> wrote:

> This patch adds 64-bit inode version support to ext4. The lower 32 bits
> are stored in the osd1.linux1.l_i_version field while the high 32 bits
> are stored in the i_version_hi field newly created in the ext4_inode.

So reading the code here does serve to answer the question I raised against
the earlier patch.  A bit.

I'd have thought that this patch and the one which adds i_version_hi should
be folded into a single diff?

> 
> Index: linux-2.6.21/fs/ext4/inode.c
> ===
> --- linux-2.6.21.orig/fs/ext4/inode.c
> +++ linux-2.6.21/fs/ext4/inode.c
> @@ -2709,6 +2709,13 @@ void ext4_read_inode(struct inode * inod
>   EXT4_INODE_GET_XTIME(i_atime, inode, raw_inode);
>   EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
> 
> + inode->i_version = le32_to_cpu(raw_inode->i_disk_version);
> + if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
> + if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
> + inode->i_version |=
> + (__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;





> + }

I don't quite see how the above two tests are sufficient to unambiguously
determine that the i_version_hi field is present on-disk.

I guess we're implicitly assuming that if the on-disk inode is big enough
then it _must_ have i_version_hi in there?  If so, why is the comparison
with EXT4_GOOD_OLD_INODE_SIZE needed?

Some description of the overall approach to inode version identification
would be helpful here.

>   if (S_ISREG(inode->i_mode)) {
>   inode->i_op = &ext4_file_inode_operations;
>   inode->i_fop = &ext4_file_operations;
> @@ -2852,8 +2859,14 @@ static int ext4_do_update_inode(handle_t
>   } else for (block = 0; block < EXT4_N_BLOCKS; block++)
>   raw_inode->i_block[block] = ei->i_data[block];
> 
> - if (ei->i_extra_isize)
> + raw_inode->i_disk_version = cpu_to_le32(inode->i_version);
> + if (ei->i_extra_isize) {
> + if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi)) {

There's no comparison with EXT4_GOOD_OLD_INODE_SIZE here...

> + raw_inode->i_version_hi =
> + cpu_to_le32(inode->i_version >> 32);
> + }
>   raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize);
> + }
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >