[patch 068/265] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user

2008-02-04 Thread akpm
From: Christoph Lameter <[EMAIL PROTECTED]>

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.

zero_user_segment(page, start, end)

Same for a single segment.

zero_user(page, start, length)

Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[EMAIL PROTECTED]: fix nfs and ntfs build]
[EMAIL PROTECTED]: fix ntfs build some more]
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Cc: Steven French <[EMAIL PROTECTED]>
Cc: Michael Halcrow <[EMAIL PROTECTED]>
Cc: 
Cc: Steven Whitehouse <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
Cc: "J. Bruce Fields" <[EMAIL PROTECTED]>
Cc: Anton Altaparmakov <[EMAIL PROTECTED]>
Cc: Mark Fasheh <[EMAIL PROTECTED]>
Cc: David Chinner <[EMAIL PROTECTED]>
Cc: Michael Halcrow <[EMAIL PROTECTED]>
Cc: Steven French <[EMAIL PROTECTED]>
Cc: Steven Whitehouse <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/buffer.c|   44 ++--
 fs/cifs/inode.c|2 -
 fs/direct-io.c |4 +-
 fs/ecryptfs/mmap.c |5 +--
 fs/ext3/inode.c|4 +-
 fs/ext4/inode.c|4 +-
 fs/gfs2/bmap.c |2 -
 fs/gfs2/ops_address.c  |2 -
 fs/libfs.c |   11 ++--
 fs/mpage.c |7 +
 fs/nfs/read.c  |   10 +++
 fs/nfs/write.c |4 --
 fs/ntfs/aops.c |   20 --
 fs/ntfs/compress.c |2 -
 fs/ntfs/file.c |   32 ++-
 fs/ocfs2/alloc.c   |2 -
 fs/ocfs2/aops.c|6 ++--
 fs/reiserfs/inode.c|4 +-
 fs/xfs/linux-2.6/xfs_lrw.c |2 -
 include/linux/highmem.h|   48 +--
 mm/filemap_xip.c   |2 -
 mm/truncate.c  |2 -
 22 files changed, 103 insertions(+), 116 deletions(-)

diff -puN 
fs/buffer.c~pagecache-zeroing-zero_user_segment-zero_user_segments-and-zero_user
 fs/buffer.c
--- 
a/fs/buffer.c~pagecache-zeroing-zero_user_segment-zero_user_segments-and-zero_user
+++ a/fs/buffer.c
@@ -1798,7 +1798,7 @@ void page_zero_new_buffers(struct page *
start = max(from, block_start);
size = min(to, block_end) - start;
 
-   zero_user_page(page, start, size, 
KM_USER0);
+   zero_user(page, start, size);
set_buffer_uptodate(bh);
}
 
@@ -1861,19 +1861,10 @@ static int __block_prepare_write(struct 
mark_buffer_dirty(bh);
continue;
}
-   if (block_end > to || block_start < from) {
-   void *kaddr;
-
-   kaddr = kmap_atomic(page, KM_USER0);
-   if (block_end > to)
-   memset(kaddr+to, 0,
-   block_end-to);
-   if (block_start < from)
-   memset(kaddr+block_start,
-   0, from-block_start);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
+   if (block_end > to || block_start < from)
+   zero_user_segments(page,
+   to, block_end,
+  

Re: [Bug 9855] ext3 ACL corruption

2008-02-04 Thread Andreas Dilger
On Jan 31, 2008  22:44 +1030, Kevin Shanahan wrote:
> On Thu, 2008-01-31 at 03:05 -0700, Andreas Dilger wrote:
> > ...  To get the interesting bits you need:
> > 
> > debugfs: stat <95>   # prints decoded inode, "File ACL:" is a block 
> > number
> > debugfs: imap <95>   # prints inode block number, offset
> > 
> > dd if=/dev/mapper/vg_main-lv_samba of=/tmp/inode.bin bs=4k count=1 
> > skip={iblock}
> > dd if=/dev/mapper/vg_main-lv_samba of=/tmp/inode.bin bs=4k count=1 
> > skip={ACLblk}
> 
> Ah, ok - learning fast. Lets see how I go this time:
> 
> # debugfs /dev/mapper/vg_main-lv_samba
> debugfs:  stat <3342652>
> 
> Inode: 3342652   Type: regularMode:  0770   Flags: 0x0   Generation: 
> 3684645243
> User: 0   Group: 10140   Size: 18432
> File ACL: 0Directory ACL: 0
> Links: 1   Blockcount: 40
> Fragment:  Address: 0Number: 0Size: 0
> ctime: 0x475be06e -- Sun Dec  9 23:02:46 2007
> atime: 0x475d4073 -- Tue Dec 11 00:04:43 2007
> mtime: 0x45d2686a -- Wed Feb 14 12:09:54 2007
> Size of extra inode fields: 4
> Extended attributes stored in inode body: 
>= "01 00 00 00 01 00 07 00 04 00 05 00 08 00 05 00 d6 27 00 00 08 00 07 00 
> 09 28 00 00 08 00 07 00 0a 28 00 00 10 00 07 00 20 00 00 00 " (44)
>   DOSATTRIB = "0x20" (4)
> BLOCKS:
> (0):6713397, (1):6713399, (2):6713395, (3):6713405, (4):6713396
> TOTAL: 5
> 
> debugfs:  imap <3342652>
> Inode 3342652 is part of block group 204
>   located at block 6684693, offset 0x0b00

The hexdump of this data (od -Ax -tx4 -a) shows the EA is in good shape:

000b80 0004ea0200480200
   eot nul nul nul nul nul stx   j nul stx   H nul nul nul nul nul

inode.i_extra_isize=0x0004
ext3_xattr_ibody_header.h_magic=0xea02

[EA1 entry]
ext3_xattr_entry.e_name_len=0x00  (unused for POSIX_ACL_ACCESS)
ext3_xattr_entry.e_name_index=0x02 (EXT3_INDEX_POSIX_ACL_ACCESS)
ext3_xattr_entry.e_value_offs=0x0048 = 72
ext3_xattr_entry.e_value_block=0x  (unused)



000b90 002c00740109
 , nul nul nul nul nul nul nul  ht soh   t nul nul nul nul nul
[EA1 cont.]
ext3_xattr_entry.e_value_size=0x002c = 44
ext3_xattr_entry.e_hash=0x  (currently unused)

[EA2 entry]
ext3_xattr_entry.e_name_len=0x09
ext3_xattr_entry.e_name_index=0x01 (EXT3_INDEX_USER)
ext3_xattr_entry.e_value_offs=0x0074 = 116
ext3_xattr_entry.e_value_block=0x  (unused)

000ba0 000441534f4449525454
   eot nul nul nul nul nul nul nul   D   O   S   A   T   T   R   I
[EA2 cont.]
ext3_xattr_entry.e_value_size=0x002c = 44
ext3_xattr_entry.e_hash=0x  (currently unused)
ext3_xattr_entry.e_name=DOSATTRIB

000bb0 0042
 B nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
000bc0 
   nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul

000bd0 0001000700010005000400050008
   soh nul nul nul soh nul bel nul eot nul enq nul  bs nul enq nul
[EA1 data]
ext3_acl_header.a_version=0x0001
ext3_acl_entry_short[0].e_tag=0x0001
ext3_acl_entry_short[0].e_perm=0x0007
ext3_acl_entry_short[1].e_tag=0x0004
ext3_acl_entry_short[1].e_perm=0x0005
ext3_acl_entry_short[2].e_tag=0x0008
ext3_acl_entry_short[2].e_perm=0x0005

000be0 27d600070008280900070008
 V   ' nul nul  bs nul bel nul  ht   ( nul nul  bs nul bel nul
[EA1 data cont]
ext3_acl_entry_short[2].e_id=0x27d6
ext3_acl_entry[3].e_tag=0x0008
ext3_acl_entry[3].e_perm=0x0007
ext3_acl_entry[3].e_id=0x2809

ext3_acl_entry[5].e_tag=0x0008
ext3_acl_entry[5].e_perm=0x0007
ext3_acl_entry[5].e_id=0x280a

000bf0 280a00070010002030327830
nl   ( nul nul dle nul bel nul  sp nul nul nul   0   x   2   0
[EA1 data cont]
ext3_acl_entry[6].e_tag=0x0010
ext3_acl_entry[6].e_perm=0x0007
ext3_acl_entry[6].e_id=0x0020

[EA2 data]
"0x20"

> I'm assuming that "File ACL: 0" means that there's no ACL block.

Right.

> e2fsck 1.40-WIP (14-Nov-2006)
> Pass 1: Checking inodes, blocks, and sizes
> (entry->e_value_offs + entry->e_value_size: 116, offs: 120)
> Extended attribute in inode 3342652 has a value offset (72) which is
> invalid
> Clear? no
> ...

Hmm, I wonder if e2fsck is calculating the wrong file offset or something?
The kernel appears to be taking the EA data offset from the end of
i_extra_isize and the ext3_xattr_ibody_header fields (so 0x88 + e_value_offs
from the start of the inode).

Conversely, debugfs isn't having any problem with this EA at all.

h, I think I see the problem, and this was fixed in newer e2fsck.
The EAs are stored "out of order" in the inode and older e2fsprogs
considered that an error.  That was fixed in the final 1.40 release:

Remove check in e2fsck which requires EA's in inodes to be sorted;
they don't need to be s

Re: merge plans, was Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 15:24:18 -0500
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> > On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:
> > 
> > > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > > When I merge David's iget coversion patches this will instead wreck the
> > > > ext4 patchset.
> > > 
> > > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > > will you be able to merge David's iget converstion patches?
> > 
> > They're about 1,000 patches back
> 
> Care to post a merge plan so we have a slight chance to make sure not
> too much crap is hiding in these 1000 patches?

Pretty much everything up to

#
# end
#
reiser4-sb_sync_inodes.patch


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Theodore Tso
On Mon, Feb 04, 2008 at 12:11:31PM -0800, Andrew Morton wrote:
> 
> That patch series is kind of logjammed anyway because it breaks isofs. 
> Last time I discussed this with David he seemed to find this amusing rather
> than an urgent problem.  I'd drop the whole lot if there weren't lots of
> other patches dependent upon them.  Mayeb I can do a selective droppage,
> but I hate going out-of-order, and merging untested patch combinations.
> 

All of the patches which move from using iget() to iget_unlocked()
should in theory (if they are well written) be standalone and not have
any prerequisites, and a good thing to apply in and of themselves.  

So maybe the right thing to do is to push all of them right away, and
then if they all can go in safely w/o breaking filesystems, we can
actually do the deed about removing the iget() interface itself, and
if not, it can wait until 2.6.26.  It's not really that big of a deal
to actually nuke the interface itself, as long as we are gradually
reducing our usage of it  Or is there some patch which David is
trying to push that desperately depends on iget() going away?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


merge plans, was Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Christoph Hellwig
On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:
> 
> > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > When I merge David's iget coversion patches this will instead wreck the
> > > ext4 patchset.
> > 
> > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > will you be able to merge David's iget converstion patches?
> 
> They're about 1,000 patches back

Care to post a merge plan so we have a slight chance to make sure not
too much crap is hiding in these 1000 patches?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 10:00:44 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> > On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:
> > 
> > > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > > When I merge David's iget coversion patches this will instead wreck the
> > > > ext4 patchset.
> > > 
> > > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > > will you be able to merge David's iget converstion patches?
> > 
> > They're about 1,000 patches back.
> 
> OK, if you're not planning on pushing David's changes to Linus right
> away, what if I pull in David's
> 
>   iget-stop-ext4-from-using-iget-and-read_inode-try.patch 
> 
> and push it plus some other ext4 bug fixes directly to Linus, and let
> you know when that has happened so you can drop David's patch from
> your queue?

Sure, go for it.

> David's changes to ext4 can be applied standalone without the rest of
> his series, so it would be safe to push that to Linus independently
> and in advance of the rest of his series.  That should also help
> reduce the number of inter-patch queue dependencies.

That patch series is kind of logjammed anyway because it breaks isofs. 
Last time I discussed this with David he seemed to find this amusing rather
than an urgent problem.  I'd drop the whole lot if there weren't lots of
other patches dependent upon them.  Mayeb I can do a selective droppage,
but I hate going out-of-order, and merging untested patch combinations.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix ext4 bitops

2008-02-04 Thread Geert Uytterhoeven
On Mon, 4 Feb 2008, Aneesh Kumar K.V wrote:
> On Sun, Feb 03, 2008 at 01:39:02PM +0100, Geert Uytterhoeven wrote:
> > On Sun, 3 Feb 2008, Heiko Carstens wrote:
> > > On Fri, Feb 01, 2008 at 10:04:04PM +0100, Bastian Blank wrote:
> > > > On Fri, Feb 01, 2008 at 12:22:57PM -0800, Andrew Morton wrote:
> > > > > On Fri, 1 Feb 2008 21:02:08 +0100
> > > > > Bastian Blank <[EMAIL PROTECTED]> wrote:
> > > > > > Fix ext4 bitops.
> > > > > 
> > > > > This is incomplete.  Please tell us what was "fixed".
> > > > > 
> > > > > If it was a build error then please quote the compile error output in 
> > > > > the
> > > > > changelog, as well as the usual description of what the problem is, 
> > > > > and how
> > > > > it was fixed.
> > > > 
> > > > | fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy':
> > > > | fs/ext4/mballoc.c:954: error: implicit declaration of function 
> > > > 'generic_find_next_le_bit'
> > > > 
> > > > The s390 specific bitops uses parts of the generic implementation.
> > > > Include the correct header.
> > > 
> > > That doesn't work:
> > > 
> > > fs/built-in.o: In function `ext4_mb_release_inode_pa':
> > > mballoc.c:(.text+0x95a8a): undefined reference to 
> > > `generic_find_next_le_bit'
> > > fs/built-in.o: In function `ext4_mb_init_cache':
> > > mballoc.c:(.text+0x967ea): undefined reference to 
> > > `generic_find_next_le_bit'
> > > 
> > > This still needs generic_find_next_le_bit which comes
> > > from lib/find_next_bit.c. That one doesn't get built on s390 since we
> > > don't set GENERIC_FIND_NEXT_BIT.
> > > Currently we have the lengthly patch below queued.
> > 
> > Similar issue on m68k. As Bastian also saw it on powerpc, I'm getting the
> > impression the ext4 people don't (compile) test on big endian machines?
> 
> I have sent this patches to linux-arch expecting a review from
> different arch people. It is true that the patches are tested only on
> powerpc, x86-64, x86. That's the primary reason of me sending the
> patches to linux-arch.
> 
> http://marc.info/?l=linux-arch&m=119503501127737&w=2

Sometimes it's difficult to see what can go wrong due to a single patch that
just adds a #define. Sorry, I missed the lack of prototype for
generic_find_next_le_bit() and that we don't set GENERIC_FIND_NEXT_BIT,
just like s390.

And yes, usually I rely on the -mm autocompiler to catch things like
this, but this time it didn't work out...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Jan Kara
On Mon 04-02-08 22:42:08, Aneesh Kumar K.V wrote:
> On Mon, Feb 04, 2008 at 05:31:56PM +0100, Jan Kara wrote:
> >   Hi,
> > 
> > On Mon 04-02-08 15:42:28, Aneesh Kumar K.V wrote:
> > > This is with the new ext3 -> ext4 migrate code added. The recently added
> > > lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
> > > on the ext3 inode during migration to prevent walking the ext3 inode
> > > when it is being converted to ext4 format. Also we want to avoid 
> > > file truncation and new blocks being added while converting to ext4.
> > > Also we dont want to reserve large number of credits for journal.
> > > Any idea how to fix this ?
> >   Hmm, while briefly looking at the code - why do you introduce i_data_sem
> > and not use i_alloc_sem which is already in VFS inode? That is aimed
> > exactly at the serialization of truncates, writes and similar users.
> 
> How about read ? We are changing the format of inode. We don't want even
> the read to go through.
  I just meant that the code could use i_alloc_sem instead of i_data_sem in
all the places, remove i_data_sem and you safe some memory... It's just a
suggestion for a cleanup.

> >   One (stupid) solution to your problem is to make i_data_sem be
> > always locked before the transaction is started. It could possibly have
> > negative performance impact because you'd have to hold the semaphore for
> > a longer time and thus a writer would block readers for longer time. So one
> > would have to measure how big difference that would make.
> >   Another possibility is to start a single transaction for migration and
> > extend it as long as you can (as truncate does it). And when you can't
> > extend any more, you drop the i_data_sem and start a new transaction and
> > acquire the semaphore again. This has the disadvantage that after dropping
> > the semaphore you have to resync your original inode with the temporary
> > one your are building which probably ends up being ugly as night... Hmm,
> > but maybe we could get rid of this - hold i_mutex to protect against all
> > writes (that ranks outside of transaction start so you can hold it for the
> > whole migration time - maybe you even hold it if you are called from the
> > write path...). After dropping i_data_sem you let some readers proceed
> > but writers still wait on i_mutex so the file shouldn't change under you
> > (but I suggest adding some BUG_ONs to verify that the file really doesn't
> > change :).
> 
> A quick look says truncate can happen even when we hold i_mutex ??
  No, it shouldn't happen - do_truncate() in fs/open.c acquires i_mutex
before it calls notify_change().

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SUSE Labs, CR
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Aneesh Kumar K.V
On Mon, Feb 04, 2008 at 05:31:56PM +0100, Jan Kara wrote:
>   Hi,
> 
> On Mon 04-02-08 15:42:28, Aneesh Kumar K.V wrote:
> > This is with the new ext3 -> ext4 migrate code added. The recently added
> > lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
> > on the ext3 inode during migration to prevent walking the ext3 inode
> > when it is being converted to ext4 format. Also we want to avoid 
> > file truncation and new blocks being added while converting to ext4.
> > Also we dont want to reserve large number of credits for journal.
> > Any idea how to fix this ?
>   Hmm, while briefly looking at the code - why do you introduce i_data_sem
> and not use i_alloc_sem which is already in VFS inode? That is aimed
> exactly at the serialization of truncates, writes and similar users.

How about read ? We are changing the format of inode. We don't want even
the read to go through.


> That doesn't solve problems with lock ordering but I was just wondering...
>   Another problem - ext4_fallocate() has the same lock ordering problem as
> the migration code and maybe there are others...


I will look at the same when fixing this.

>   One (stupid) solution to your problem is to make i_data_sem be
> always locked before the transaction is started. It could possibly have
> negative performance impact because you'd have to hold the semaphore for
> a longer time and thus a writer would block readers for longer time. So one
> would have to measure how big difference that would make.
>   Another possibility is to start a single transaction for migration and
> extend it as long as you can (as truncate does it). And when you can't
> extend any more, you drop the i_data_sem and start a new transaction and
> acquire the semaphore again. This has the disadvantage that after dropping
> the semaphore you have to resync your original inode with the temporary
> one your are building which probably ends up being ugly as night... Hmm,
> but maybe we could get rid of this - hold i_mutex to protect against all
> writes (that ranks outside of transaction start so you can hold it for the
> whole migration time - maybe you even hold it if you are called from the
> write path...). After dropping i_data_sem you let some readers proceed
> but writers still wait on i_mutex so the file shouldn't change under you
> (but I suggest adding some BUG_ONs to verify that the file really doesn't
> change :).

A quick look says truncate can happen even when we hold i_mutex ??

But of all this looks like a workable solution. Will try this out.


-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Jan Kara
  Hi,

On Mon 04-02-08 15:42:28, Aneesh Kumar K.V wrote:
> This is with the new ext3 -> ext4 migrate code added. The recently added
> lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
> on the ext3 inode during migration to prevent walking the ext3 inode
> when it is being converted to ext4 format. Also we want to avoid 
> file truncation and new blocks being added while converting to ext4.
> Also we dont want to reserve large number of credits for journal.
> Any idea how to fix this ?
  Hmm, while briefly looking at the code - why do you introduce i_data_sem
and not use i_alloc_sem which is already in VFS inode? That is aimed
exactly at the serialization of truncates, writes and similar users.
That doesn't solve problems with lock ordering but I was just wondering...
  Another problem - ext4_fallocate() has the same lock ordering problem as
the migration code and maybe there are others...
  One (stupid) solution to your problem is to make i_data_sem be
always locked before the transaction is started. It could possibly have
negative performance impact because you'd have to hold the semaphore for
a longer time and thus a writer would block readers for longer time. So one
would have to measure how big difference that would make.
  Another possibility is to start a single transaction for migration and
extend it as long as you can (as truncate does it). And when you can't
extend any more, you drop the i_data_sem and start a new transaction and
acquire the semaphore again. This has the disadvantage that after dropping
the semaphore you have to resync your original inode with the temporary
one your are building which probably ends up being ugly as night... Hmm,
but maybe we could get rid of this - hold i_mutex to protect against all
writes (that ranks outside of transaction start so you can hold it for the
whole migration time - maybe you even hold it if you are called from the
write path...). After dropping i_data_sem you let some readers proceed
but writers still wait on i_mutex so the file shouldn't change under you
(but I suggest adding some BUG_ONs to verify that the file really doesn't
change :).

Honza

> ===
> [ INFO: possible circular locking dependency detected ]
> 2.6.24-rc8 #6
> ---
> rm/2401 is trying to acquire lock:
>  (&ei->i_data_sem){}, at: [] ext4_get_blocks_wrap+0x21/0x108
> 
> but task is already holding lock:
>  (jbd2_handle){--..}, at: [] jbd2_journal_start+0xd2/0xff
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (jbd2_handle){--..}:
>[] __lock_acquire+0xa31/0xc1a
>[] lock_acquire+0x7a/0x94
>[] jbd2_journal_start+0xf5/0xff
>[] ext4_journal_start_sb+0x48/0x4a
>[] ext4_ext_migrate+0x7d/0x535
>[] ext4_ioctl+0x528/0x56c
>[] do_ioctl+0x50/0x67
>[] vfs_ioctl+0x237/0x24a
>[] sys_ioctl+0x31/0x4b
>[] sysenter_past_esp+0x5f/0xa5
>[] 0x
> 
> -> #0 (&ei->i_data_sem){}:
>[] __lock_acquire+0x921/0xc1a
>[] lock_acquire+0x7a/0x94
>[] down_read+0x42/0x79
>[] ext4_get_blocks_wrap+0x21/0x108
>[] ext4_getblk+0x62/0x1c4
>[] ext4_find_entry+0x350/0x5b7
>[] ext4_unlink+0x6e/0x1a4
>[] vfs_unlink+0x49/0x89
>[] do_unlinkat+0x96/0x12c
>[] sys_unlink+0x10/0x12
>[] sysenter_past_esp+0x5f/0xa5
>[] 0x
> 
> other info that might help us debug this:
> 
> 3 locks held by rm/2401:
>  #0:  (&type->i_mutex_dir_key#5/1){--..}, at: [] 
> do_unlinkat+0x5e/0x12c
>  #1:  (&sb->s_type->i_mutex_key#8){--..}, at: [] 
> vfs_unlink+0x36/0x89
>  #2:  (jbd2_handle){--..}, at: [] jbd2_journal_start+0xd2/0xff
> 
> stack backtrace:
> Pid: 2401, comm: rm Not tainted 2.6.24-rc8 #6
>  [] show_trace_log_lvl+0x1a/0x2f
>  [] show_trace+0x12/0x14
>  [] dump_stack+0x6c/0x72
>  [] print_circular_bug_tail+0x5f/0x68
>  [] __lock_acquire+0x921/0xc1a
>  [] lock_acquire+0x7a/0x94
>  [] down_read+0x42/0x79
>  [] ext4_get_blocks_wrap+0x21/0x108
>  [] ext4_getblk+0x62/0x1c4
>  [] ext4_find_entry+0x350/0x5b7
>  [] ext4_unlink+0x6e/0x1a4
>  [] vfs_unlink+0x49/0x89
>  [] do_unlinkat+0x96/0x12c
>  [] sys_unlink+0x10/0x12
>  [] sysenter_past_esp+0x5f/0xa5
>  
-- 
Jan Kara <[EMAIL PROTECTED]>
SUSE Labs, CR
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: Use the right macro for testing the incompat feature.

2008-02-04 Thread Aneesh Kumar K.V
Incompat feature need to be checked using JBD2_HAS_INCOMPAT_FEATURE

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/jbd2/commit.c   |2 +-
 fs/jbd2/recovery.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 2b88ab0..9cd9288 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -142,7 +142,7 @@ static int journal_submit_commit_record(journal_t *journal,
bh->b_end_io = journal_end_buffer_io_sync;
 
if (journal->j_flags & JBD2_BARRIER &&
-   !JBD2_HAS_COMPAT_FEATURE(journal,
+   !JBD2_HAS_INCOMPAT_FEATURE(journal,
 JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
set_buffer_ordered(bh);
barrier_done = 1;
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 9216806..f8ec777 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -641,7 +641,7 @@ static int do_one_pass(journal_t *journal,
if (chksum_err) {
info->end_transaction = next_commit_ID;
 
-   if (!JBD2_HAS_COMPAT_FEATURE(journal,
+   if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
   JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)){
printk(KERN_ERR
   "JBD: Transaction %u "
-- 
1.5.4.rc3.24.gb53139-dirty

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: Fix reference counting on buffer head.

2008-02-04 Thread Aneesh Kumar K.V
With journal checksum patch we added asyn commit of journal commit headers.
During the conversion we missed to take a reference on buffer head. Before
the change sync_dirty_buffer did the get_bh(). The associative put_bh is
done by journal_wait_on_commit_record()

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/jbd2/commit.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index da8d0eb..2b88ab0 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -136,7 +136,7 @@ static int journal_submit_commit_record(journal_t *journal,
 
JBUFFER_TRACE(descriptor, "submit commit block");
lock_buffer(bh);
-
+   get_bh(bh);
set_buffer_dirty(bh);
set_buffer_uptodate(bh);
bh->b_end_io = journal_end_buffer_io_sync;
-- 
1.5.4.rc3.24.gb53139-dirty

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Aneesh Kumar K.V
On Mon, Feb 04, 2008 at 10:23:16AM -0500, Josef Bacik wrote:
> On Monday 04 February 2008 5:12:28 am Aneesh Kumar K.V wrote:
> > Hi,
> >
> > This is with the new ext3 -> ext4 migrate code added. The recently added
> > lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
> > on the ext3 inode during migration to prevent walking the ext3 inode
> > when it is being converted to ext4 format. Also we want to avoid
> > file truncation and new blocks being added while converting to ext4.
> > Also we dont want to reserve large number of credits for journal.
> > Any idea how to fix this ?
> >
> 
> Hello,
> 
> Seems you should be taking the i_data_sem after starting the journal in 
> ext4_ext_migrate.  I haven't looked at this stuff too much, but everywhere I 
> see i_data_sem taken its within a journal_start/journal_stop.  Here's the 
> patch.  Let me know if I'm way off btw, like I said I'm new to this :).  
> Thank 
> you,

But that doesn't help. Because we call ext4_journal_restart after taking
i_data_sem. That implies we can sleep waiting for other running
transaction to commit. And if that transaction (in the above example
rm) is waiting for i_data_sem we deadlock.

-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Josef Bacik
On Monday 04 February 2008 5:12:28 am Aneesh Kumar K.V wrote:
> Hi,
>
> This is with the new ext3 -> ext4 migrate code added. The recently added
> lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
> on the ext3 inode during migration to prevent walking the ext3 inode
> when it is being converted to ext4 format. Also we want to avoid
> file truncation and new blocks being added while converting to ext4.
> Also we dont want to reserve large number of credits for journal.
> Any idea how to fix this ?
>

Hello,

Seems you should be taking the i_data_sem after starting the journal in 
ext4_ext_migrate.  I haven't looked at this stuff too much, but everywhere I 
see i_data_sem taken its within a journal_start/journal_stop.  Here's the 
patch.  Let me know if I'm way off btw, like I said I'm new to this :).  Thank 
you,

Josef

Index: linux-2.6/fs/ext4/migrate.c
===
--- linux-2.6.orig/fs/ext4/migrate.c
+++ linux-2.6/fs/ext4/migrate.c
@@ -414,7 +414,6 @@ int ext4_ext_migrate(struct inode *inode
if ((EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EINVAL;
 
-   down_write(&EXT4_I(inode)->i_data_sem);
handle = ext4_journal_start(inode,
EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
@@ -424,6 +423,9 @@ int ext4_ext_migrate(struct inode *inode
retval = PTR_ERR(handle);
goto err_out;
}
+
+   down_write(&EXT4_I(inode)->i_data_sem);
+
tmp_inode = ext4_new_inode(handle,
inode->i_sb->s_root->d_inode,
S_IFREG);
@@ -549,10 +551,10 @@ err_out:
 */
tmp_inode->i_nlink = 0;
 
-   ext4_journal_stop(handle);
-
up_write(&EXT4_I(inode)->i_data_sem);
 
+   ext4_journal_stop(handle);
+
if (tmp_inode)
iput(tmp_inode);
 




-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Theodore Tso
On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:
> 
> > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > When I merge David's iget coversion patches this will instead wreck the
> > > ext4 patchset.
> > 
> > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > will you be able to merge David's iget converstion patches?
> 
> They're about 1,000 patches back.

OK, if you're not planning on pushing David's changes to Linus right
away, what if I pull in David's

iget-stop-ext4-from-using-iget-and-read_inode-try.patch 

and push it plus some other ext4 bug fixes directly to Linus, and let
you know when that has happened so you can drop David's patch from
your queue?

David's changes to ext4 can be applied standalone without the rest of
his series, so it would be safe to push that to Linus independently
and in advance of the rest of his series.  That should also help
reduce the number of inter-patch queue dependencies.

Regards,

  - Ted


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: Don't set EXTENTS_FL flag for fast symlinks

2008-02-04 Thread Valerie Clement
Don't set EXTENTS_FL flag for fast symlinks

From: Valerie Clement <[EMAIL PROTECTED]>

For fast symbolic links, the file content is stored in the i_block[]
array, which is not compatible with the new file extents format.
e2fsck reports error on such files because EXTENTS_FL is set.
Don't set the EXTENTS_FL flag when creating fast symlinks.

In the case of file migration, skip fast symbolic links.

Signed-off-by: Valerie Clement <[EMAIL PROTECTED]>

---

 fs/ext4/migrate.c |6 ++
 fs/ext4/namei.c   |1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 3ebc233..9ee1f7c 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -414,6 +414,12 @@ int ext4_ext_migrate(struct inode *inode, struct file 
*filp,
if ((EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EINVAL;
 
+   if (S_ISLNK(inode->i_mode) && inode->i_blocks == 0)
+   /*
+* don't migrate fast symlink
+*/
+   return retval;
+
down_write(&EXT4_I(inode)->i_data_sem);
handle = ext4_journal_start(inode,
EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 67b6d8a..71273c5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2234,6 +2234,7 @@ retry:
inode->i_op = &ext4_fast_symlink_inode_operations;
memcpy((char*)&EXT4_I(inode)->i_data,symname,l);
inode->i_size = l-1;
+   EXT4_I(inode)->i_flags &= ~EXT4_EXTENTS_FL;
}
EXT4_I(inode)->i_disksize = inode->i_size;
err = ext4_add_nondir(handle, dentry, inode);


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND] [PATCH] ext3,4:fdatasync should skip metadata writeout when overwriting

2008-02-04 Thread Hisashi Hifumi
Hi.

Currently fdatasync is identical to fsync in ext3,4.
I think fdatasync should skip journal flush in data=ordered and data=writeback 
mode
when it overwrites to already-instantiated blocks on HDD.
When I_DIRTY_DATASYNC flag is not set, fdatasync should skip journal writeout
because this indicates only atime or/and mtime updates.  

Following patch is the same approach of ext2's fsync code(ext2_sync_file).

I did a performance test using the sysbench.

#sysbench --num-threads=128 --max-requests=5 --test=fileio 
--file-total-size=128G 
--file-test-mode=rndwr --file-fsync-mode=fdatasync run

The result was:

-2.6.24
Operations performed:  0 Read, 50080 Write, 59600 Other = 109680 Total
Read 0b  Written 782.5Mb  Total transferred 782.5Mb  (12.116Mb/sec)
  775.45 Requests/sec executed

Test execution summary:
total time:  64.5814s
total number of events:  50080
total time taken by event execution: 3713.9836
per-request statistics:
 min:0.s
 avg:0.0742s
 max:0.9375s
 approx.  95 percentile: 0.2901s

Threads fairness:
events (avg/stddev):   391.2500/23.26
execution time (avg/stddev):   29.0155/1.99


-2.6.24-patched
Operations performed:  0 Read, 50009 Write, 61596 Other = 111605 Total
Read 0b  Written 781.39Mb  Total transferred 781.39Mb  (16.419Mb/sec)
 1050.83 Requests/sec executed

Test execution summary:
total time:  47.5900s
total number of events:  50009
total time taken by event execution: 2934.5768
per-request statistics:
 min:0.s
 avg:0.0587s
 max:0.8938s
 approx.  95 percentile: 0.1993s

Threads fairness:
events (avg/stddev):   390.6953/22.64
execution time (avg/stddev):   22.9264/1.17


Filesystem I/O throughput was improved.

Thanks.

Signed-off-by :Hisashi Hifumi <[EMAIL PROTECTED]>

diff -Nrup linux-2.6.24.org/fs/ext3/fsync.c linux-2.6.24/fs/ext3/fsync.c
--- linux-2.6.24.org/fs/ext3/fsync.c2008-01-25 07:58:37.0 +0900
+++ linux-2.6.24/fs/ext3/fsync.c2008-02-04 12:42:42.0 +0900
@@ -72,6 +72,9 @@ int ext3_sync_file(struct file * file, s
goto out;
}
 
+   if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+   goto out;
+
/*
 * The VFS has written the file data.  If the inode is unaltered
 * then we need not start a commit.
diff -Nrup linux-2.6.24.org/fs/ext4/fsync.c linux-2.6.24/fs/ext4/fsync.c
--- linux-2.6.24.org/fs/ext4/fsync.c2008-01-25 07:58:37.0 +0900
+++ linux-2.6.24/fs/ext4/fsync.c2008-02-04 12:43:37.0 +0900
@@ -72,6 +72,9 @@ int ext4_sync_file(struct file * file, s
goto out;
}
 
+if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+goto out;
+
/*
 * The VFS has written the file data.  If the inode is unaltered
 * then we need not start a commit.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Aneesh Kumar K.V
Hi,

This is with the new ext3 -> ext4 migrate code added. The recently added
lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
on the ext3 inode during migration to prevent walking the ext3 inode
when it is being converted to ext4 format. Also we want to avoid 
file truncation and new blocks being added while converting to ext4.
Also we dont want to reserve large number of credits for journal.
Any idea how to fix this ?


-aneesh

===
[ INFO: possible circular locking dependency detected ]
2.6.24-rc8 #6
---
rm/2401 is trying to acquire lock:
 (&ei->i_data_sem){}, at: [] ext4_get_blocks_wrap+0x21/0x108

but task is already holding lock:
 (jbd2_handle){--..}, at: [] jbd2_journal_start+0xd2/0xff

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (jbd2_handle){--..}:
   [] __lock_acquire+0xa31/0xc1a
   [] lock_acquire+0x7a/0x94
   [] jbd2_journal_start+0xf5/0xff
   [] ext4_journal_start_sb+0x48/0x4a
   [] ext4_ext_migrate+0x7d/0x535
   [] ext4_ioctl+0x528/0x56c
   [] do_ioctl+0x50/0x67
   [] vfs_ioctl+0x237/0x24a
   [] sys_ioctl+0x31/0x4b
   [] sysenter_past_esp+0x5f/0xa5
   [] 0x

-> #0 (&ei->i_data_sem){}:
   [] __lock_acquire+0x921/0xc1a
   [] lock_acquire+0x7a/0x94
   [] down_read+0x42/0x79
   [] ext4_get_blocks_wrap+0x21/0x108
   [] ext4_getblk+0x62/0x1c4
   [] ext4_find_entry+0x350/0x5b7
   [] ext4_unlink+0x6e/0x1a4
   [] vfs_unlink+0x49/0x89
   [] do_unlinkat+0x96/0x12c
   [] sys_unlink+0x10/0x12
   [] sysenter_past_esp+0x5f/0xa5
   [] 0x

other info that might help us debug this:

3 locks held by rm/2401:
 #0:  (&type->i_mutex_dir_key#5/1){--..}, at: [] 
do_unlinkat+0x5e/0x12c
 #1:  (&sb->s_type->i_mutex_key#8){--..}, at: [] vfs_unlink+0x36/0x89
 #2:  (jbd2_handle){--..}, at: [] jbd2_journal_start+0xd2/0xff

stack backtrace:
Pid: 2401, comm: rm Not tainted 2.6.24-rc8 #6
 [] show_trace_log_lvl+0x1a/0x2f
 [] show_trace+0x12/0x14
 [] dump_stack+0x6c/0x72
 [] print_circular_bug_tail+0x5f/0x68
 [] __lock_acquire+0x921/0xc1a
 [] lock_acquire+0x7a/0x94
 [] down_read+0x42/0x79
 [] ext4_get_blocks_wrap+0x21/0x108
 [] ext4_getblk+0x62/0x1c4
 [] ext4_find_entry+0x350/0x5b7
 [] ext4_unlink+0x6e/0x1a4
 [] vfs_unlink+0x49/0x89
 [] do_unlinkat+0x96/0x12c
 [] sys_unlink+0x10/0x12
 [] sysenter_past_esp+0x5f/0xa5
 
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bugme-new] [Bug 9849] New: NULL pointer deref in journal_wait_on_commit_record

2008-02-04 Thread Aneesh Kumar K.V
On Wed, Jan 30, 2008 at 03:17:57PM -0800, Mingming Cao wrote:
> 
> The buufer head pointer passed to journal_wait_on_commit_record() could
> be NULL if the previous journal_submit_commit_record() failed or journal
> has already aborted.
> 
> Looking at the jbd2 debug messages, before the oops happen, the jbd2 is
> aborted due to trying to access the next log block beyond the end of
> device. This might be caused by using a corrupted image.
> 
> We need to check the error returns from journal_submit_commit_record()
> and avoid calling journal_wait_on_commit_record() in the failure case.
> 
> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> The buufer head pointer passed to journal_wait_on_commit_record()
> could be NULL if the previous journal_submit_commit_record() failed
> or journal has already aborted.
> 
> We need to check the error returns from journal_submit_commit_record()
> and avoid calling journal_wait_on_commit_record() in the failure case.
> 
> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> ---
>  fs/jbd2/commit.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.24-rc8/fs/jbd2/commit.c
> ===
> --- linux-2.6.24-rc8.orig/fs/jbd2/commit.c2008-01-30 14:12:10.0 
> -0800
> +++ linux-2.6.24-rc8/fs/jbd2/commit.c 2008-01-30 15:09:50.0 -0800
> @@ -872,7 +872,8 @@ wait_for_iobuf:
>   if (err)
>   __jbd2_journal_abort_hard(journal);
>   }
> - err = journal_wait_on_commit_record(cbh);
> + if (!err && !is_journal_aborted(journal))
> + err = journal_wait_on_commit_record(cbh);
> 
>   if (err)
>   jbd2_journal_abort(journal, err);
> 
> 

Needs the below small change also. I don't see this patch in the patch
queue. So i guess we can add the below diff to the same. The change was
suggested by Girish. Before journal checksum changes sync_dirty_buffer
did the get_bh.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index da8d0eb..2b88ab0 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -136,7 +136,7 @@ static int journal_submit_commit_record(journal_t *journal,
 
JBUFFER_TRACE(descriptor, "submit commit block");
lock_buffer(bh);
-
+   get_bh(bh);
set_buffer_dirty(bh);
set_buffer_uptodate(bh);
bh->b_end_io = journal_end_buffer_io_sync;
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix ext4 bitops

2008-02-04 Thread Aneesh Kumar K.V
On Mon, Feb 04, 2008 at 10:24:36AM +0100, Heiko Carstens wrote:
> > > > > | fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy':
> > > > > | fs/ext4/mballoc.c:954: error: implicit declaration of function 
> > > > > 'generic_find_next_le_bit'
> > > > > 
> > > > > The s390 specific bitops uses parts of the generic implementation.
> > > > > Include the correct header.
> > > > 
> > > > That doesn't work:
> > > > 
> > > > fs/built-in.o: In function `ext4_mb_release_inode_pa':
> > > > mballoc.c:(.text+0x95a8a): undefined reference to 
> > > > `generic_find_next_le_bit'
> > > > fs/built-in.o: In function `ext4_mb_init_cache':
> > > > mballoc.c:(.text+0x967ea): undefined reference to 
> > > > `generic_find_next_le_bit'
> > > > 
> > > > This still needs generic_find_next_le_bit which comes
> > > > from lib/find_next_bit.c. That one doesn't get built on s390 since we
> > > > don't set GENERIC_FIND_NEXT_BIT.
> > > > Currently we have the lengthly patch below queued.
> > > 
> > > Similar issue on m68k. As Bastian also saw it on powerpc, I'm getting the
> > > impression the ext4 people don't (compile) test on big endian machines?
> > > 
> > > Gr{oetje,eeting}s,
> > > 
> > 
> > I have sent this patches to linux-arch expecting a review from
> > different arch people. It is true that the patches are tested only on
> > powerpc, x86-64, x86. That's the primary reason of me sending the
> > patches to linux-arch.
> 
> Is there anything special I need to do so the ext4 code actually uses
> ext2_find_next_bit() ? Haven't looked at the ext4 code, but I'd like to
> test if the s390 implementation is ok.

With the latest linus kernel in git you can test it by mounting  ext4

mount -t ext4dev  


-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix ext4 bitops

2008-02-04 Thread Heiko Carstens
> > > > | fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy':
> > > > | fs/ext4/mballoc.c:954: error: implicit declaration of function 
> > > > 'generic_find_next_le_bit'
> > > > 
> > > > The s390 specific bitops uses parts of the generic implementation.
> > > > Include the correct header.
> > > 
> > > That doesn't work:
> > > 
> > > fs/built-in.o: In function `ext4_mb_release_inode_pa':
> > > mballoc.c:(.text+0x95a8a): undefined reference to 
> > > `generic_find_next_le_bit'
> > > fs/built-in.o: In function `ext4_mb_init_cache':
> > > mballoc.c:(.text+0x967ea): undefined reference to 
> > > `generic_find_next_le_bit'
> > > 
> > > This still needs generic_find_next_le_bit which comes
> > > from lib/find_next_bit.c. That one doesn't get built on s390 since we
> > > don't set GENERIC_FIND_NEXT_BIT.
> > > Currently we have the lengthly patch below queued.
> > 
> > Similar issue on m68k. As Bastian also saw it on powerpc, I'm getting the
> > impression the ext4 people don't (compile) test on big endian machines?
> > 
> > Gr{oetje,eeting}s,
> > 
> 
> I have sent this patches to linux-arch expecting a review from
> different arch people. It is true that the patches are tested only on
> powerpc, x86-64, x86. That's the primary reason of me sending the
> patches to linux-arch.

Is there anything special I need to do so the ext4 code actually uses
ext2_find_next_bit() ? Haven't looked at the ext4 code, but I'd like to
test if the s390 implementation is ok.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG_ON at mballoc.c:3752

2008-02-04 Thread Eric Sesterhenn
* Aneesh Kumar K.V ([EMAIL PROTECTED]) wrote:
> On Thu, Jan 31, 2008 at 04:42:07PM +0100, Eric Sesterhenn wrote:
> > * Eric Sesterhenn ([EMAIL PROTECTED]) wrote:
> > > hi,
> > > 
> > > while running a modified version of fsfuzzer i triggered the BUG() in
> > > ext4_mb_release_inode_pa(). Sadly I am not able to reproduce this using
> > > the generated image, but running the fuzzer will usually trigger this in
> > > less than 40 attempts. Increasing the JBD2 Debug level didnt give more
> > > information. The kernel is current git with
> > > ext4-fix-null-pointer-deref-in-journal_wait_on_commit_record.patch
> > > applied. 
> > 
> > I am now able to reproduce this using this image:
> > http://www.cccmz.de/~snakebyte/ext4.24.img.bz2
> > 
> > the following commands will trigger the oops for me
> > 
> > mount cfs/ext4.24.img /media/test -t ext4dev -o extents -o loop
> > mkdir /media/test/stress
> > chown snakebyte:snakebyte /media/test/stress && sudo -u snakebyte fstest -n 
> > 10 -l 10 -f 5 -s 4 -p /media/test/stress/
> > 
> 
> The file system is corrupted. The BUG_ON indicate that the free spcae
> marked in the prealloc space and found by looking at the bitmap are not
> same. Do you have a set of steps that i can follow to reproduce this ?

just compile the mangle.c and run the modified fuzzer for a while (link
below) or use the steps above

> on a clean file system ?

had no luck on a clean fs with this

> Where do i find the fsfuzzer that you are using ?

http://www.cccmz.de/~snakebyte/fsfuzzer-0.6-lmh-eric.tar.bz2
http://www.cccmz.de/~snakebyte/fsfuzz.diff

needs user/group nobody:nobody and stuff like fstest, fsx, iozone,
fsstress

The changes i made are basically changing the fuzzing ratio, adding udf,
hfsplus and ext4, reducing the number of runs to 100 (not unlimited),
saving a backup of the image before mounting and running the tests, and
mounting ext3 with -o debug

Greetings, Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html