Re: BUG_ON at mballoc.c:3752

2008-02-04 Thread Eric Sesterhenn
* Aneesh Kumar K.V ([EMAIL PROTECTED]) wrote:
 On Thu, Jan 31, 2008 at 04:42:07PM +0100, Eric Sesterhenn wrote:
  * Eric Sesterhenn ([EMAIL PROTECTED]) wrote:
   hi,
   
   while running a modified version of fsfuzzer i triggered the BUG() in
   ext4_mb_release_inode_pa(). Sadly I am not able to reproduce this using
   the generated image, but running the fuzzer will usually trigger this in
   less than 40 attempts. Increasing the JBD2 Debug level didnt give more
   information. The kernel is current git with
   ext4-fix-null-pointer-deref-in-journal_wait_on_commit_record.patch
   applied. 
  
  I am now able to reproduce this using this image:
  http://www.cccmz.de/~snakebyte/ext4.24.img.bz2
  
  the following commands will trigger the oops for me
  
  mount cfs/ext4.24.img /media/test -t ext4dev -o extents -o loop
  mkdir /media/test/stress
  chown snakebyte:snakebyte /media/test/stress  sudo -u snakebyte fstest -n 
  10 -l 10 -f 5 -s 4 -p /media/test/stress/
  
 
 The file system is corrupted. The BUG_ON indicate that the free spcae
 marked in the prealloc space and found by looking at the bitmap are not
 same. Do you have a set of steps that i can follow to reproduce this ?

just compile the mangle.c and run the modified fuzzer for a while (link
below) or use the steps above

 on a clean file system ?

had no luck on a clean fs with this

 Where do i find the fsfuzzer that you are using ?

http://www.cccmz.de/~snakebyte/fsfuzzer-0.6-lmh-eric.tar.bz2
http://www.cccmz.de/~snakebyte/fsfuzz.diff

needs user/group nobody:nobody and stuff like fstest, fsx, iozone,
fsstress

The changes i made are basically changing the fuzzing ratio, adding udf,
hfsplus and ext4, reducing the number of runs to 100 (not unlimited),
saving a backup of the image before mounting and running the tests, and
mounting ext3 with -o debug

Greetings, Eric

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix ext4 bitops

2008-02-04 Thread Aneesh Kumar K.V
On Mon, Feb 04, 2008 at 10:24:36AM +0100, Heiko Carstens wrote:
 | fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy':
 | fs/ext4/mballoc.c:954: error: implicit declaration of function 
 'generic_find_next_le_bit'
 
 The s390 specific bitops uses parts of the generic implementation.
 Include the correct header.

That doesn't work:

fs/built-in.o: In function `ext4_mb_release_inode_pa':
mballoc.c:(.text+0x95a8a): undefined reference to 
`generic_find_next_le_bit'
fs/built-in.o: In function `ext4_mb_init_cache':
mballoc.c:(.text+0x967ea): undefined reference to 
`generic_find_next_le_bit'

This still needs generic_find_next_le_bit which comes
from lib/find_next_bit.c. That one doesn't get built on s390 since we
don't set GENERIC_FIND_NEXT_BIT.
Currently we have the lengthly patch below queued.
   
   Similar issue on m68k. As Bastian also saw it on powerpc, I'm getting the
   impression the ext4 people don't (compile) test on big endian machines?
   
   Gr{oetje,eeting}s,
   
  
  I have sent this patches to linux-arch expecting a review from
  different arch people. It is true that the patches are tested only on
  powerpc, x86-64, x86. That's the primary reason of me sending the
  patches to linux-arch.
 
 Is there anything special I need to do so the ext4 code actually uses
 ext2_find_next_bit() ? Haven't looked at the ext4 code, but I'd like to
 test if the s390 implementation is ok.

With the latest linus kernel in git you can test it by mounting  ext4

mount -t ext4dev device mntpoint


-aneesh
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bugme-new] [Bug 9849] New: NULL pointer deref in journal_wait_on_commit_record

2008-02-04 Thread Aneesh Kumar K.V
On Wed, Jan 30, 2008 at 03:17:57PM -0800, Mingming Cao wrote:
 
 The buufer head pointer passed to journal_wait_on_commit_record() could
 be NULL if the previous journal_submit_commit_record() failed or journal
 has already aborted.
 
 Looking at the jbd2 debug messages, before the oops happen, the jbd2 is
 aborted due to trying to access the next log block beyond the end of
 device. This might be caused by using a corrupted image.
 
 We need to check the error returns from journal_submit_commit_record()
 and avoid calling journal_wait_on_commit_record() in the failure case.
 
 Signed-off-by: Mingming Cao [EMAIL PROTECTED]
 The buufer head pointer passed to journal_wait_on_commit_record()
 could be NULL if the previous journal_submit_commit_record() failed
 or journal has already aborted.
 
 We need to check the error returns from journal_submit_commit_record()
 and avoid calling journal_wait_on_commit_record() in the failure case.
 
 Signed-off-by: Mingming Cao [EMAIL PROTECTED]
 ---
  fs/jbd2/commit.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 Index: linux-2.6.24-rc8/fs/jbd2/commit.c
 ===
 --- linux-2.6.24-rc8.orig/fs/jbd2/commit.c2008-01-30 14:12:10.0 
 -0800
 +++ linux-2.6.24-rc8/fs/jbd2/commit.c 2008-01-30 15:09:50.0 -0800
 @@ -872,7 +872,8 @@ wait_for_iobuf:
   if (err)
   __jbd2_journal_abort_hard(journal);
   }
 - err = journal_wait_on_commit_record(cbh);
 + if (!err  !is_journal_aborted(journal))
 + err = journal_wait_on_commit_record(cbh);
 
   if (err)
   jbd2_journal_abort(journal, err);
 
 

Needs the below small change also. I don't see this patch in the patch
queue. So i guess we can add the below diff to the same. The change was
suggested by Girish. Before journal checksum changes sync_dirty_buffer
did the get_bh.

Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index da8d0eb..2b88ab0 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -136,7 +136,7 @@ static int journal_submit_commit_record(journal_t *journal,
 
JBUFFER_TRACE(descriptor, submit commit block);
lock_buffer(bh);
-
+   get_bh(bh);
set_buffer_dirty(bh);
set_buffer_uptodate(bh);
bh-b_end_io = journal_end_buffer_io_sync;
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Aneesh Kumar K.V
Hi,

This is with the new ext3 - ext4 migrate code added. The recently added
lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
on the ext3 inode during migration to prevent walking the ext3 inode
when it is being converted to ext4 format. Also we want to avoid 
file truncation and new blocks being added while converting to ext4.
Also we dont want to reserve large number of credits for journal.
Any idea how to fix this ?


-aneesh

===
[ INFO: possible circular locking dependency detected ]
2.6.24-rc8 #6
---
rm/2401 is trying to acquire lock:
 (ei-i_data_sem){}, at: [c01dca58] ext4_get_blocks_wrap+0x21/0x108

but task is already holding lock:
 (jbd2_handle){--..}, at: [c01fc4a7] jbd2_journal_start+0xd2/0xff

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

- #1 (jbd2_handle){--..}:
   [c0143a5c] __lock_acquire+0xa31/0xc1a
   [c0143cbf] lock_acquire+0x7a/0x94
   [c01fc4ca] jbd2_journal_start+0xf5/0xff
   [c01e3539] ext4_journal_start_sb+0x48/0x4a
   [c01eb980] ext4_ext_migrate+0x7d/0x535
   [c01df328] ext4_ioctl+0x528/0x56c
   [c0177700] do_ioctl+0x50/0x67
   [c017794e] vfs_ioctl+0x237/0x24a
   [c0177992] sys_ioctl+0x31/0x4b
   [c0104f8a] sysenter_past_esp+0x5f/0xa5
   [] 0x

- #0 (ei-i_data_sem){}:
   [c014394c] __lock_acquire+0x921/0xc1a
   [c0143cbf] lock_acquire+0x7a/0x94
   [c044f247] down_read+0x42/0x79
   [c01dca58] ext4_get_blocks_wrap+0x21/0x108
   [c01dcba1] ext4_getblk+0x62/0x1c4
   [c01e0de9] ext4_find_entry+0x350/0x5b7
   [c01e200c] ext4_unlink+0x6e/0x1a4
   [c017449e] vfs_unlink+0x49/0x89
   [c0175f02] do_unlinkat+0x96/0x12c
   [c0175fa8] sys_unlink+0x10/0x12
   [c0104f8a] sysenter_past_esp+0x5f/0xa5
   [] 0x

other info that might help us debug this:

3 locks held by rm/2401:
 #0:  (type-i_mutex_dir_key#5/1){--..}, at: [c0175eca] 
do_unlinkat+0x5e/0x12c
 #1:  (sb-s_type-i_mutex_key#8){--..}, at: [c017448b] vfs_unlink+0x36/0x89
 #2:  (jbd2_handle){--..}, at: [c01fc4a7] jbd2_journal_start+0xd2/0xff

stack backtrace:
Pid: 2401, comm: rm Not tainted 2.6.24-rc8 #6
 [c0106017] show_trace_log_lvl+0x1a/0x2f
 [c0106893] show_trace+0x12/0x14
 [c0106b89] dump_stack+0x6c/0x72
 [c0141b26] print_circular_bug_tail+0x5f/0x68
 [c014394c] __lock_acquire+0x921/0xc1a
 [c0143cbf] lock_acquire+0x7a/0x94
 [c044f247] down_read+0x42/0x79
 [c01dca58] ext4_get_blocks_wrap+0x21/0x108
 [c01dcba1] ext4_getblk+0x62/0x1c4
 [c01e0de9] ext4_find_entry+0x350/0x5b7
 [c01e200c] ext4_unlink+0x6e/0x1a4
 [c017449e] vfs_unlink+0x49/0x89
 [c0175f02] do_unlinkat+0x96/0x12c
 [c0175fa8] sys_unlink+0x10/0x12
 [c0104f8a] sysenter_past_esp+0x5f/0xa5
 
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND] [PATCH] ext3,4:fdatasync should skip metadata writeout when overwriting

2008-02-04 Thread Hisashi Hifumi
Hi.

Currently fdatasync is identical to fsync in ext3,4.
I think fdatasync should skip journal flush in data=ordered and data=writeback 
mode
when it overwrites to already-instantiated blocks on HDD.
When I_DIRTY_DATASYNC flag is not set, fdatasync should skip journal writeout
because this indicates only atime or/and mtime updates.  

Following patch is the same approach of ext2's fsync code(ext2_sync_file).

I did a performance test using the sysbench.

#sysbench --num-threads=128 --max-requests=5 --test=fileio 
--file-total-size=128G 
--file-test-mode=rndwr --file-fsync-mode=fdatasync run

The result was:

-2.6.24
Operations performed:  0 Read, 50080 Write, 59600 Other = 109680 Total
Read 0b  Written 782.5Mb  Total transferred 782.5Mb  (12.116Mb/sec)
  775.45 Requests/sec executed

Test execution summary:
total time:  64.5814s
total number of events:  50080
total time taken by event execution: 3713.9836
per-request statistics:
 min:0.s
 avg:0.0742s
 max:0.9375s
 approx.  95 percentile: 0.2901s

Threads fairness:
events (avg/stddev):   391.2500/23.26
execution time (avg/stddev):   29.0155/1.99


-2.6.24-patched
Operations performed:  0 Read, 50009 Write, 61596 Other = 111605 Total
Read 0b  Written 781.39Mb  Total transferred 781.39Mb  (16.419Mb/sec)
 1050.83 Requests/sec executed

Test execution summary:
total time:  47.5900s
total number of events:  50009
total time taken by event execution: 2934.5768
per-request statistics:
 min:0.s
 avg:0.0587s
 max:0.8938s
 approx.  95 percentile: 0.1993s

Threads fairness:
events (avg/stddev):   390.6953/22.64
execution time (avg/stddev):   22.9264/1.17


Filesystem I/O throughput was improved.

Thanks.

Signed-off-by :Hisashi Hifumi [EMAIL PROTECTED]

diff -Nrup linux-2.6.24.org/fs/ext3/fsync.c linux-2.6.24/fs/ext3/fsync.c
--- linux-2.6.24.org/fs/ext3/fsync.c2008-01-25 07:58:37.0 +0900
+++ linux-2.6.24/fs/ext3/fsync.c2008-02-04 12:42:42.0 +0900
@@ -72,6 +72,9 @@ int ext3_sync_file(struct file * file, s
goto out;
}
 
+   if (datasync  !(inode-i_state  I_DIRTY_DATASYNC))
+   goto out;
+
/*
 * The VFS has written the file data.  If the inode is unaltered
 * then we need not start a commit.
diff -Nrup linux-2.6.24.org/fs/ext4/fsync.c linux-2.6.24/fs/ext4/fsync.c
--- linux-2.6.24.org/fs/ext4/fsync.c2008-01-25 07:58:37.0 +0900
+++ linux-2.6.24/fs/ext4/fsync.c2008-02-04 12:43:37.0 +0900
@@ -72,6 +72,9 @@ int ext4_sync_file(struct file * file, s
goto out;
}
 
+if (datasync  !(inode-i_state  I_DIRTY_DATASYNC))
+goto out;
+
/*
 * The VFS has written the file data.  If the inode is unaltered
 * then we need not start a commit.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Aneesh Kumar K.V
On Mon, Feb 04, 2008 at 05:31:56PM +0100, Jan Kara wrote:
   Hi,
 
 On Mon 04-02-08 15:42:28, Aneesh Kumar K.V wrote:
  This is with the new ext3 - ext4 migrate code added. The recently added
  lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
  on the ext3 inode during migration to prevent walking the ext3 inode
  when it is being converted to ext4 format. Also we want to avoid 
  file truncation and new blocks being added while converting to ext4.
  Also we dont want to reserve large number of credits for journal.
  Any idea how to fix this ?
   Hmm, while briefly looking at the code - why do you introduce i_data_sem
 and not use i_alloc_sem which is already in VFS inode? That is aimed
 exactly at the serialization of truncates, writes and similar users.

How about read ? We are changing the format of inode. We don't want even
the read to go through.


 That doesn't solve problems with lock ordering but I was just wondering...
   Another problem - ext4_fallocate() has the same lock ordering problem as
 the migration code and maybe there are others...


I will look at the same when fixing this.

   One (stupid) solution to your problem is to make i_data_sem be
 always locked before the transaction is started. It could possibly have
 negative performance impact because you'd have to hold the semaphore for
 a longer time and thus a writer would block readers for longer time. So one
 would have to measure how big difference that would make.
   Another possibility is to start a single transaction for migration and
 extend it as long as you can (as truncate does it). And when you can't
 extend any more, you drop the i_data_sem and start a new transaction and
 acquire the semaphore again. This has the disadvantage that after dropping
 the semaphore you have to resync your original inode with the temporary
 one your are building which probably ends up being ugly as night... Hmm,
 but maybe we could get rid of this - hold i_mutex to protect against all
 writes (that ranks outside of transaction start so you can hold it for the
 whole migration time - maybe you even hold it if you are called from the
 write path...). After dropping i_data_sem you let some readers proceed
 but writers still wait on i_mutex so the file shouldn't change under you
 (but I suggest adding some BUG_ONs to verify that the file really doesn't
 change :).

A quick look says truncate can happen even when we hold i_mutex ??

But of all this looks like a workable solution. Will try this out.


-aneesh
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Jan Kara
On Mon 04-02-08 22:42:08, Aneesh Kumar K.V wrote:
 On Mon, Feb 04, 2008 at 05:31:56PM +0100, Jan Kara wrote:
Hi,
  
  On Mon 04-02-08 15:42:28, Aneesh Kumar K.V wrote:
   This is with the new ext3 - ext4 migrate code added. The recently added
   lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
   on the ext3 inode during migration to prevent walking the ext3 inode
   when it is being converted to ext4 format. Also we want to avoid 
   file truncation and new blocks being added while converting to ext4.
   Also we dont want to reserve large number of credits for journal.
   Any idea how to fix this ?
Hmm, while briefly looking at the code - why do you introduce i_data_sem
  and not use i_alloc_sem which is already in VFS inode? That is aimed
  exactly at the serialization of truncates, writes and similar users.
 
 How about read ? We are changing the format of inode. We don't want even
 the read to go through.
  I just meant that the code could use i_alloc_sem instead of i_data_sem in
all the places, remove i_data_sem and you safe some memory... It's just a
suggestion for a cleanup.

One (stupid) solution to your problem is to make i_data_sem be
  always locked before the transaction is started. It could possibly have
  negative performance impact because you'd have to hold the semaphore for
  a longer time and thus a writer would block readers for longer time. So one
  would have to measure how big difference that would make.
Another possibility is to start a single transaction for migration and
  extend it as long as you can (as truncate does it). And when you can't
  extend any more, you drop the i_data_sem and start a new transaction and
  acquire the semaphore again. This has the disadvantage that after dropping
  the semaphore you have to resync your original inode with the temporary
  one your are building which probably ends up being ugly as night... Hmm,
  but maybe we could get rid of this - hold i_mutex to protect against all
  writes (that ranks outside of transaction start so you can hold it for the
  whole migration time - maybe you even hold it if you are called from the
  write path...). After dropping i_data_sem you let some readers proceed
  but writers still wait on i_mutex so the file shouldn't change under you
  (but I suggest adding some BUG_ONs to verify that the file really doesn't
  change :).
 
 A quick look says truncate can happen even when we hold i_mutex ??
  No, it shouldn't happen - do_truncate() in fs/open.c acquires i_mutex
before it calls notify_change().

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Josef Bacik
On Monday 04 February 2008 5:12:28 am Aneesh Kumar K.V wrote:
 Hi,

 This is with the new ext3 - ext4 migrate code added. The recently added
 lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
 on the ext3 inode during migration to prevent walking the ext3 inode
 when it is being converted to ext4 format. Also we want to avoid
 file truncation and new blocks being added while converting to ext4.
 Also we dont want to reserve large number of credits for journal.
 Any idea how to fix this ?


Hello,

Seems you should be taking the i_data_sem after starting the journal in 
ext4_ext_migrate.  I haven't looked at this stuff too much, but everywhere I 
see i_data_sem taken its within a journal_start/journal_stop.  Here's the 
patch.  Let me know if I'm way off btw, like I said I'm new to this :).  Thank 
you,

Josef

Index: linux-2.6/fs/ext4/migrate.c
===
--- linux-2.6.orig/fs/ext4/migrate.c
+++ linux-2.6/fs/ext4/migrate.c
@@ -414,7 +414,6 @@ int ext4_ext_migrate(struct inode *inode
if ((EXT4_I(inode)-i_flags  EXT4_EXTENTS_FL))
return -EINVAL;
 
-   down_write(EXT4_I(inode)-i_data_sem);
handle = ext4_journal_start(inode,
EXT4_DATA_TRANS_BLOCKS(inode-i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
@@ -424,6 +423,9 @@ int ext4_ext_migrate(struct inode *inode
retval = PTR_ERR(handle);
goto err_out;
}
+
+   down_write(EXT4_I(inode)-i_data_sem);
+
tmp_inode = ext4_new_inode(handle,
inode-i_sb-s_root-d_inode,
S_IFREG);
@@ -549,10 +551,10 @@ err_out:
 */
tmp_inode-i_nlink = 0;
 
-   ext4_journal_stop(handle);
-
up_write(EXT4_I(inode)-i_data_sem);
 
+   ext4_journal_stop(handle);
+
if (tmp_inode)
iput(tmp_inode);
 




-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: Fix reference counting on buffer head.

2008-02-04 Thread Aneesh Kumar K.V
With journal checksum patch we added asyn commit of journal commit headers.
During the conversion we missed to take a reference on buffer head. Before
the change sync_dirty_buffer did the get_bh(). The associative put_bh is
done by journal_wait_on_commit_record()

Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
---
 fs/jbd2/commit.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index da8d0eb..2b88ab0 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -136,7 +136,7 @@ static int journal_submit_commit_record(journal_t *journal,
 
JBUFFER_TRACE(descriptor, submit commit block);
lock_buffer(bh);
-
+   get_bh(bh);
set_buffer_dirty(bh);
set_buffer_uptodate(bh);
bh-b_end_io = journal_end_buffer_io_sync;
-- 
1.5.4.rc3.24.gb53139-dirty

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Jan Kara
  Hi,

On Mon 04-02-08 15:42:28, Aneesh Kumar K.V wrote:
 This is with the new ext3 - ext4 migrate code added. The recently added
 lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
 on the ext3 inode during migration to prevent walking the ext3 inode
 when it is being converted to ext4 format. Also we want to avoid 
 file truncation and new blocks being added while converting to ext4.
 Also we dont want to reserve large number of credits for journal.
 Any idea how to fix this ?
  Hmm, while briefly looking at the code - why do you introduce i_data_sem
and not use i_alloc_sem which is already in VFS inode? That is aimed
exactly at the serialization of truncates, writes and similar users.
That doesn't solve problems with lock ordering but I was just wondering...
  Another problem - ext4_fallocate() has the same lock ordering problem as
the migration code and maybe there are others...
  One (stupid) solution to your problem is to make i_data_sem be
always locked before the transaction is started. It could possibly have
negative performance impact because you'd have to hold the semaphore for
a longer time and thus a writer would block readers for longer time. So one
would have to measure how big difference that would make.
  Another possibility is to start a single transaction for migration and
extend it as long as you can (as truncate does it). And when you can't
extend any more, you drop the i_data_sem and start a new transaction and
acquire the semaphore again. This has the disadvantage that after dropping
the semaphore you have to resync your original inode with the temporary
one your are building which probably ends up being ugly as night... Hmm,
but maybe we could get rid of this - hold i_mutex to protect against all
writes (that ranks outside of transaction start so you can hold it for the
whole migration time - maybe you even hold it if you are called from the
write path...). After dropping i_data_sem you let some readers proceed
but writers still wait on i_mutex so the file shouldn't change under you
(but I suggest adding some BUG_ONs to verify that the file really doesn't
change :).

Honza

 ===
 [ INFO: possible circular locking dependency detected ]
 2.6.24-rc8 #6
 ---
 rm/2401 is trying to acquire lock:
  (ei-i_data_sem){}, at: [c01dca58] ext4_get_blocks_wrap+0x21/0x108
 
 but task is already holding lock:
  (jbd2_handle){--..}, at: [c01fc4a7] jbd2_journal_start+0xd2/0xff
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 - #1 (jbd2_handle){--..}:
[c0143a5c] __lock_acquire+0xa31/0xc1a
[c0143cbf] lock_acquire+0x7a/0x94
[c01fc4ca] jbd2_journal_start+0xf5/0xff
[c01e3539] ext4_journal_start_sb+0x48/0x4a
[c01eb980] ext4_ext_migrate+0x7d/0x535
[c01df328] ext4_ioctl+0x528/0x56c
[c0177700] do_ioctl+0x50/0x67
[c017794e] vfs_ioctl+0x237/0x24a
[c0177992] sys_ioctl+0x31/0x4b
[c0104f8a] sysenter_past_esp+0x5f/0xa5
[] 0x
 
 - #0 (ei-i_data_sem){}:
[c014394c] __lock_acquire+0x921/0xc1a
[c0143cbf] lock_acquire+0x7a/0x94
[c044f247] down_read+0x42/0x79
[c01dca58] ext4_get_blocks_wrap+0x21/0x108
[c01dcba1] ext4_getblk+0x62/0x1c4
[c01e0de9] ext4_find_entry+0x350/0x5b7
[c01e200c] ext4_unlink+0x6e/0x1a4
[c017449e] vfs_unlink+0x49/0x89
[c0175f02] do_unlinkat+0x96/0x12c
[c0175fa8] sys_unlink+0x10/0x12
[c0104f8a] sysenter_past_esp+0x5f/0xa5
[] 0x
 
 other info that might help us debug this:
 
 3 locks held by rm/2401:
  #0:  (type-i_mutex_dir_key#5/1){--..}, at: [c0175eca] 
 do_unlinkat+0x5e/0x12c
  #1:  (sb-s_type-i_mutex_key#8){--..}, at: [c017448b] 
 vfs_unlink+0x36/0x89
  #2:  (jbd2_handle){--..}, at: [c01fc4a7] jbd2_journal_start+0xd2/0xff
 
 stack backtrace:
 Pid: 2401, comm: rm Not tainted 2.6.24-rc8 #6
  [c0106017] show_trace_log_lvl+0x1a/0x2f
  [c0106893] show_trace+0x12/0x14
  [c0106b89] dump_stack+0x6c/0x72
  [c0141b26] print_circular_bug_tail+0x5f/0x68
  [c014394c] __lock_acquire+0x921/0xc1a
  [c0143cbf] lock_acquire+0x7a/0x94
  [c044f247] down_read+0x42/0x79
  [c01dca58] ext4_get_blocks_wrap+0x21/0x108
  [c01dcba1] ext4_getblk+0x62/0x1c4
  [c01e0de9] ext4_find_entry+0x350/0x5b7
  [c01e200c] ext4_unlink+0x6e/0x1a4
  [c017449e] vfs_unlink+0x49/0x89
  [c0175f02] do_unlinkat+0x96/0x12c
  [c0175fa8] sys_unlink+0x10/0x12
  [c0104f8a] sysenter_past_esp+0x5f/0xa5
  
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: Don't set EXTENTS_FL flag for fast symlinks

2008-02-04 Thread Valerie Clement
Don't set EXTENTS_FL flag for fast symlinks

From: Valerie Clement [EMAIL PROTECTED]

For fast symbolic links, the file content is stored in the i_block[]
array, which is not compatible with the new file extents format.
e2fsck reports error on such files because EXTENTS_FL is set.
Don't set the EXTENTS_FL flag when creating fast symlinks.

In the case of file migration, skip fast symbolic links.

Signed-off-by: Valerie Clement [EMAIL PROTECTED]

---

 fs/ext4/migrate.c |6 ++
 fs/ext4/namei.c   |1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 3ebc233..9ee1f7c 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -414,6 +414,12 @@ int ext4_ext_migrate(struct inode *inode, struct file 
*filp,
if ((EXT4_I(inode)-i_flags  EXT4_EXTENTS_FL))
return -EINVAL;
 
+   if (S_ISLNK(inode-i_mode)  inode-i_blocks == 0)
+   /*
+* don't migrate fast symlink
+*/
+   return retval;
+
down_write(EXT4_I(inode)-i_data_sem);
handle = ext4_journal_start(inode,
EXT4_DATA_TRANS_BLOCKS(inode-i_sb) +
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 67b6d8a..71273c5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2234,6 +2234,7 @@ retry:
inode-i_op = ext4_fast_symlink_inode_operations;
memcpy((char*)EXT4_I(inode)-i_data,symname,l);
inode-i_size = l-1;
+   EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
}
EXT4_I(inode)-i_disksize = inode-i_size;
err = ext4_add_nondir(handle, dentry, inode);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Theodore Tso
On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
 On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso [EMAIL PROTECTED] wrote:
 
  On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
   When I merge David's iget coversion patches this will instead wreck the
   ext4 patchset.
  
  That's ok, it shouldn't be hard for me to fix this up.  How quickly
  will you be able to merge David's iget converstion patches?
 
 They're about 1,000 patches back.

OK, if you're not planning on pushing David's changes to Linus right
away, what if I pull in David's

iget-stop-ext4-from-using-iget-and-read_inode-try.patch 

and push it plus some other ext4 bug fixes directly to Linus, and let
you know when that has happened so you can drop David's patch from
your queue?

David's changes to ext4 can be applied standalone without the rest of
his series, so it would be safe to push that to Linus independently
and in advance of the rest of his series.  That should also help
reduce the number of inter-patch queue dependencies.

Regards,

  - Ted


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jbd2_handle and i_data_sem circular locking dependency detected

2008-02-04 Thread Aneesh Kumar K.V
On Mon, Feb 04, 2008 at 10:23:16AM -0500, Josef Bacik wrote:
 On Monday 04 February 2008 5:12:28 am Aneesh Kumar K.V wrote:
  Hi,
 
  This is with the new ext3 - ext4 migrate code added. The recently added
  lockdep for jbd2 helped to find this out. We want to hold the i_data_sem
  on the ext3 inode during migration to prevent walking the ext3 inode
  when it is being converted to ext4 format. Also we want to avoid
  file truncation and new blocks being added while converting to ext4.
  Also we dont want to reserve large number of credits for journal.
  Any idea how to fix this ?
 
 
 Hello,
 
 Seems you should be taking the i_data_sem after starting the journal in 
 ext4_ext_migrate.  I haven't looked at this stuff too much, but everywhere I 
 see i_data_sem taken its within a journal_start/journal_stop.  Here's the 
 patch.  Let me know if I'm way off btw, like I said I'm new to this :).  
 Thank 
 you,

But that doesn't help. Because we call ext4_journal_restart after taking
i_data_sem. That implies we can sleep waiting for other running
transaction to commit. And if that transaction (in the above example
rm) is waiting for i_data_sem we deadlock.

-aneesh
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 10:00:44 -0500 Theodore Tso [EMAIL PROTECTED] wrote:

 On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
  On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso [EMAIL PROTECTED] wrote:
  
   On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
When I merge David's iget coversion patches this will instead wreck the
ext4 patchset.
   
   That's ok, it shouldn't be hard for me to fix this up.  How quickly
   will you be able to merge David's iget converstion patches?
  
  They're about 1,000 patches back.
 
 OK, if you're not planning on pushing David's changes to Linus right
 away, what if I pull in David's
 
   iget-stop-ext4-from-using-iget-and-read_inode-try.patch 
 
 and push it plus some other ext4 bug fixes directly to Linus, and let
 you know when that has happened so you can drop David's patch from
 your queue?

Sure, go for it.

 David's changes to ext4 can be applied standalone without the rest of
 his series, so it would be safe to push that to Linus independently
 and in advance of the rest of his series.  That should also help
 reduce the number of inter-patch queue dependencies.

That patch series is kind of logjammed anyway because it breaks isofs. 
Last time I discussed this with David he seemed to find this amusing rather
than an urgent problem.  I'd drop the whole lot if there weren't lots of
other patches dependent upon them.  Mayeb I can do a selective droppage,
but I hate going out-of-order, and merging untested patch combinations.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge plans, was Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 15:24:18 -0500
Christoph Hellwig [EMAIL PROTECTED] wrote:

 On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
  On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso [EMAIL PROTECTED] wrote:
  
   On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
When I merge David's iget coversion patches this will instead wreck the
ext4 patchset.
   
   That's ok, it shouldn't be hard for me to fix this up.  How quickly
   will you be able to merge David's iget converstion patches?
  
  They're about 1,000 patches back
 
 Care to post a merge plan so we have a slight chance to make sure not
 too much crap is hiding in these 1000 patches?

Pretty much everything up to

#
# end
#
reiser4-sb_sync_inodes.patch


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix ext4 bitops

2008-02-04 Thread Geert Uytterhoeven
On Mon, 4 Feb 2008, Aneesh Kumar K.V wrote:
 On Sun, Feb 03, 2008 at 01:39:02PM +0100, Geert Uytterhoeven wrote:
  On Sun, 3 Feb 2008, Heiko Carstens wrote:
   On Fri, Feb 01, 2008 at 10:04:04PM +0100, Bastian Blank wrote:
On Fri, Feb 01, 2008 at 12:22:57PM -0800, Andrew Morton wrote:
 On Fri, 1 Feb 2008 21:02:08 +0100
 Bastian Blank [EMAIL PROTECTED] wrote:
  Fix ext4 bitops.
 
 This is incomplete.  Please tell us what was fixed.
 
 If it was a build error then please quote the compile error output in 
 the
 changelog, as well as the usual description of what the problem is, 
 and how
 it was fixed.

| fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy':
| fs/ext4/mballoc.c:954: error: implicit declaration of function 
'generic_find_next_le_bit'

The s390 specific bitops uses parts of the generic implementation.
Include the correct header.
   
   That doesn't work:
   
   fs/built-in.o: In function `ext4_mb_release_inode_pa':
   mballoc.c:(.text+0x95a8a): undefined reference to 
   `generic_find_next_le_bit'
   fs/built-in.o: In function `ext4_mb_init_cache':
   mballoc.c:(.text+0x967ea): undefined reference to 
   `generic_find_next_le_bit'
   
   This still needs generic_find_next_le_bit which comes
   from lib/find_next_bit.c. That one doesn't get built on s390 since we
   don't set GENERIC_FIND_NEXT_BIT.
   Currently we have the lengthly patch below queued.
  
  Similar issue on m68k. As Bastian also saw it on powerpc, I'm getting the
  impression the ext4 people don't (compile) test on big endian machines?
 
 I have sent this patches to linux-arch expecting a review from
 different arch people. It is true that the patches are tested only on
 powerpc, x86-64, x86. That's the primary reason of me sending the
 patches to linux-arch.
 
 http://marc.info/?l=linux-archm=119503501127737w=2

Sometimes it's difficult to see what can go wrong due to a single patch that
just adds a #define. Sorry, I missed the lack of prototype for
generic_find_next_le_bit() and that we don't set GENERIC_FIND_NEXT_BIT,
just like s390.

And yes, usually I rely on the -mm autocompiler to catch things like
this, but this time it didn't work out...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


merge plans, was Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Christoph Hellwig
On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
 On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso [EMAIL PROTECTED] wrote:
 
  On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
   When I merge David's iget coversion patches this will instead wreck the
   ext4 patchset.
  
  That's ok, it shouldn't be hard for me to fix this up.  How quickly
  will you be able to merge David's iget converstion patches?
 
 They're about 1,000 patches back

Care to post a merge plan so we have a slight chance to make sure not
too much crap is hiding in these 1000 patches?
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Theodore Tso
On Mon, Feb 04, 2008 at 12:11:31PM -0800, Andrew Morton wrote:
 
 That patch series is kind of logjammed anyway because it breaks isofs. 
 Last time I discussed this with David he seemed to find this amusing rather
 than an urgent problem.  I'd drop the whole lot if there weren't lots of
 other patches dependent upon them.  Mayeb I can do a selective droppage,
 but I hate going out-of-order, and merging untested patch combinations.
 

All of the patches which move from using iget() to iget_unlocked()
should in theory (if they are well written) be standalone and not have
any prerequisites, and a good thing to apply in and of themselves.  

So maybe the right thing to do is to push all of them right away, and
then if they all can go in safely w/o breaking filesystems, we can
actually do the deed about removing the iget() interface itself, and
if not, it can wait until 2.6.26.  It's not really that big of a deal
to actually nuke the interface itself, as long as we are gradually
reducing our usage of it  Or is there some patch which David is
trying to push that desperately depends on iget() going away?

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 9855] ext3 ACL corruption

2008-02-04 Thread Andreas Dilger
On Jan 31, 2008  22:44 +1030, Kevin Shanahan wrote:
 On Thu, 2008-01-31 at 03:05 -0700, Andreas Dilger wrote:
  ...  To get the interesting bits you need:
  
  debugfs: stat 95   # prints decoded inode, File ACL: is a block 
  number
  debugfs: imap 95   # prints inode block number, offset
  
  dd if=/dev/mapper/vg_main-lv_samba of=/tmp/inode.bin bs=4k count=1 
  skip={iblock}
  dd if=/dev/mapper/vg_main-lv_samba of=/tmp/inode.bin bs=4k count=1 
  skip={ACLblk}
 
 Ah, ok - learning fast. Lets see how I go this time:
 
 # debugfs /dev/mapper/vg_main-lv_samba
 debugfs:  stat 3342652
 
 Inode: 3342652   Type: regularMode:  0770   Flags: 0x0   Generation: 
 3684645243
 User: 0   Group: 10140   Size: 18432
 File ACL: 0Directory ACL: 0
 Links: 1   Blockcount: 40
 Fragment:  Address: 0Number: 0Size: 0
 ctime: 0x475be06e -- Sun Dec  9 23:02:46 2007
 atime: 0x475d4073 -- Tue Dec 11 00:04:43 2007
 mtime: 0x45d2686a -- Wed Feb 14 12:09:54 2007
 Size of extra inode fields: 4
 Extended attributes stored in inode body: 
= 01 00 00 00 01 00 07 00 04 00 05 00 08 00 05 00 d6 27 00 00 08 00 07 00 
 09 28 00 00 08 00 07 00 0a 28 00 00 10 00 07 00 20 00 00 00  (44)
   DOSATTRIB = 0x20 (4)
 BLOCKS:
 (0):6713397, (1):6713399, (2):6713395, (3):6713405, (4):6713396
 TOTAL: 5
 
 debugfs:  imap 3342652
 Inode 3342652 is part of block group 204
   located at block 6684693, offset 0x0b00

The hexdump of this data (od -Ax -tx4 -a) shows the EA is in good shape:

000b80 0004ea0200480200
   eot nul nul nul nul nul stx   j nul stx   H nul nul nul nul nul

inode.i_extra_isize=0x0004
ext3_xattr_ibody_header.h_magic=0xea02

[EA1 entry]
ext3_xattr_entry.e_name_len=0x00  (unused for POSIX_ACL_ACCESS)
ext3_xattr_entry.e_name_index=0x02 (EXT3_INDEX_POSIX_ACL_ACCESS)
ext3_xattr_entry.e_value_offs=0x0048 = 72
ext3_xattr_entry.e_value_block=0x  (unused)



000b90 002c00740109
 , nul nul nul nul nul nul nul  ht soh   t nul nul nul nul nul
[EA1 cont.]
ext3_xattr_entry.e_value_size=0x002c = 44
ext3_xattr_entry.e_hash=0x  (currently unused)

[EA2 entry]
ext3_xattr_entry.e_name_len=0x09
ext3_xattr_entry.e_name_index=0x01 (EXT3_INDEX_USER)
ext3_xattr_entry.e_value_offs=0x0074 = 116
ext3_xattr_entry.e_value_block=0x  (unused)

000ba0 000441534f4449525454
   eot nul nul nul nul nul nul nul   D   O   S   A   T   T   R   I
[EA2 cont.]
ext3_xattr_entry.e_value_size=0x002c = 44
ext3_xattr_entry.e_hash=0x  (currently unused)
ext3_xattr_entry.e_name=DOSATTRIB

000bb0 0042
 B nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
000bc0 
   nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul

000bd0 0001000700010005000400050008
   soh nul nul nul soh nul bel nul eot nul enq nul  bs nul enq nul
[EA1 data]
ext3_acl_header.a_version=0x0001
ext3_acl_entry_short[0].e_tag=0x0001
ext3_acl_entry_short[0].e_perm=0x0007
ext3_acl_entry_short[1].e_tag=0x0004
ext3_acl_entry_short[1].e_perm=0x0005
ext3_acl_entry_short[2].e_tag=0x0008
ext3_acl_entry_short[2].e_perm=0x0005

000be0 27d600070008280900070008
 V   ' nul nul  bs nul bel nul  ht   ( nul nul  bs nul bel nul
[EA1 data cont]
ext3_acl_entry_short[2].e_id=0x27d6
ext3_acl_entry[3].e_tag=0x0008
ext3_acl_entry[3].e_perm=0x0007
ext3_acl_entry[3].e_id=0x2809

ext3_acl_entry[5].e_tag=0x0008
ext3_acl_entry[5].e_perm=0x0007
ext3_acl_entry[5].e_id=0x280a

000bf0 280a00070010002030327830
nl   ( nul nul dle nul bel nul  sp nul nul nul   0   x   2   0
[EA1 data cont]
ext3_acl_entry[6].e_tag=0x0010
ext3_acl_entry[6].e_perm=0x0007
ext3_acl_entry[6].e_id=0x0020

[EA2 data]
0x20

 I'm assuming that File ACL: 0 means that there's no ACL block.

Right.

 e2fsck 1.40-WIP (14-Nov-2006)
 Pass 1: Checking inodes, blocks, and sizes
 (entry-e_value_offs + entry-e_value_size: 116, offs: 120)
 Extended attribute in inode 3342652 has a value offset (72) which is
 invalid
 Clear? no
 ...

Hmm, I wonder if e2fsck is calculating the wrong file offset or something?
The kernel appears to be taking the EA data offset from the end of
i_extra_isize and the ext3_xattr_ibody_header fields (so 0x88 + e_value_offs
from the start of the inode).

Conversely, debugfs isn't having any problem with this EA at all.

h, I think I see the problem, and this was fixed in newer e2fsck.
The EAs are stored out of order in the inode and older e2fsprogs
considered that an error.  That was fixed in the final 1.40 release:

Remove check in e2fsck which requires EA's in inodes to be sorted;
they don't need to be sorted, and e2fsck was previously wrongly
clearing unsorted EA's 

[patch 068/265] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user

2008-02-04 Thread akpm
From: Christoph Lameter [EMAIL PROTECTED]

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.

zero_user_segment(page, start, end)

Same for a single segment.

zero_user(page, start, length)

Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[EMAIL PROTECTED]: fix nfs and ntfs build]
[EMAIL PROTECTED]: fix ntfs build some more]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Cc: Steven French [EMAIL PROTECTED]
Cc: Michael Halcrow [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Cc: Steven Whitehouse [EMAIL PROTECTED]
Cc: Trond Myklebust [EMAIL PROTECTED]
Cc: J. Bruce Fields [EMAIL PROTECTED]
Cc: Anton Altaparmakov [EMAIL PROTECTED]
Cc: Mark Fasheh [EMAIL PROTECTED]
Cc: David Chinner [EMAIL PROTECTED]
Cc: Michael Halcrow [EMAIL PROTECTED]
Cc: Steven French [EMAIL PROTECTED]
Cc: Steven Whitehouse [EMAIL PROTECTED]
Cc: Trond Myklebust [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/buffer.c|   44 ++--
 fs/cifs/inode.c|2 -
 fs/direct-io.c |4 +-
 fs/ecryptfs/mmap.c |5 +--
 fs/ext3/inode.c|4 +-
 fs/ext4/inode.c|4 +-
 fs/gfs2/bmap.c |2 -
 fs/gfs2/ops_address.c  |2 -
 fs/libfs.c |   11 ++--
 fs/mpage.c |7 +
 fs/nfs/read.c  |   10 +++
 fs/nfs/write.c |4 --
 fs/ntfs/aops.c |   20 --
 fs/ntfs/compress.c |2 -
 fs/ntfs/file.c |   32 ++-
 fs/ocfs2/alloc.c   |2 -
 fs/ocfs2/aops.c|6 ++--
 fs/reiserfs/inode.c|4 +-
 fs/xfs/linux-2.6/xfs_lrw.c |2 -
 include/linux/highmem.h|   48 +--
 mm/filemap_xip.c   |2 -
 mm/truncate.c  |2 -
 22 files changed, 103 insertions(+), 116 deletions(-)

diff -puN 
fs/buffer.c~pagecache-zeroing-zero_user_segment-zero_user_segments-and-zero_user
 fs/buffer.c
--- 
a/fs/buffer.c~pagecache-zeroing-zero_user_segment-zero_user_segments-and-zero_user
+++ a/fs/buffer.c
@@ -1798,7 +1798,7 @@ void page_zero_new_buffers(struct page *
start = max(from, block_start);
size = min(to, block_end) - start;
 
-   zero_user_page(page, start, size, 
KM_USER0);
+   zero_user(page, start, size);
set_buffer_uptodate(bh);
}
 
@@ -1861,19 +1861,10 @@ static int __block_prepare_write(struct 
mark_buffer_dirty(bh);
continue;
}
-   if (block_end  to || block_start  from) {
-   void *kaddr;
-
-   kaddr = kmap_atomic(page, KM_USER0);
-   if (block_end  to)
-   memset(kaddr+to, 0,
-   block_end-to);
-   if (block_start  from)
-   memset(kaddr+block_start,
-   0, from-block_start);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
+   if (block_end  to || block_start  from)
+   zero_user_segments(page,
+   to, block_end,
+