Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:

> Peter, do you have any interest in seeing how far we can get
> at tracking lock_page()?  I'm not holding my breath, but any little bit
> would probably help.

I ran headfirst into the fact the unlock_page() need not be called by
the same task that did lock_page().

Esp IO-completion interrupts love to unlock pages they did not lock
themselves. Not at all sure that is fixable, it seems to be the nature
of the async structure of the problem :-(



-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:

> Peter, do you have any interest in seeing how far we can get
> at tracking lock_page()?  I'm not holding my breath, but any little bit
> would probably help.

Would this be a valid report? 

( /me goes hunt a x86_64 unwinder patch that will apply to this tree.
  These stacktraces are pain )

===
[ INFO: possible circular locking dependency detected ]
[ 2.6.22-rt3-dirty #34
---
mount/1296 is trying to acquire lock:
 (&ei->truncate_mutex){--..}, at: [] 
ext3_get_blocks_handle+0x1a4/0x8f7

but task is already holding lock:
 (lock_page_0){--..}, at: [] 
generic_file_buffered_write+0x1ee/0x646

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (lock_page_0){--..}:
   [] __lock_acquire+0xa72/0xc35
   [] lock_acquire+0x48/0x61
   [] add_to_page_cache_lru+0xe/0x23
   [] add_to_page_cache+0x1de/0x2c1
   [] add_to_page_cache_lru+0xe/0x23
   [] find_or_create_page+0x4c/0x73
   [] __getblk+0x118/0x23c
   [] __bread+0x6/0x9c
   [] read_block_bitmap+0x34/0x65
   [] ext3_free_blocks_sb+0xec/0x3d4
   [] ext3_free_blocks+0x2e/0x61
   [] ext3_free_data+0xaa/0xda
   [] ext3_truncate+0x4d2/0x84e
   [] pagevec_lookup+0x17/0x1e
   [] truncate_inode_pages_range+0x1f4/0x323
   [] add_preempt_count+0x14/0xe4
   [] journal_stop+0x1fe/0x21d
   [] vmtruncate+0xa2/0xc0
   [] inode_setattr+0x22/0x10a
   [] ext3_setattr+0x136/0x18f
   [] notify_change+0x10a/0x241
   [] notify_change+0x128/0x241
   [] do_truncate+0x56/0x7f
   [] do_truncate+0x61/0x7f
   [] get_write_access+0x3f/0x45
   [] may_open+0x193/0x1af
   [] open_namei+0x2cb/0x63e
   [] rt_up_read+0x53/0x5c
   [] do_page_fault+0x479/0x7cc
   [] do_filp_open+0x1c/0x38
   [] rt_spin_unlock+0x17/0x47
   [] get_unused_fd+0xf9/0x107
   [] do_sys_open+0x48/0xd5
   [] system_call+0x7e/0x83
   [] 0x

-> #0 (&ei->truncate_mutex){--..}:
   [] print_circular_bug_header+0xcc/0xd3
   [] __lock_acquire+0x96e/0xc35
   [] lock_acquire+0x48/0x61
   [] ext3_get_blocks_handle+0x1a4/0x8f7
   [] _mutex_lock+0x26/0x52
   [] ext3_get_blocks_handle+0x1a4/0x8f7
   [] find_usage_backwards+0xb0/0xd9
   [] find_usage_backwards+0xb0/0xd9
   [] debug_check_no_locks_freed+0x11d/0x129
   [] trace_hardirqs_on_caller+0x115/0x138
   [] lockdep_init_map+0xac/0x41f
   [] add_preempt_count+0x14/0xe4
   [] ext3_get_block+0xc2/0xe4
   [] __block_prepare_write+0x195/0x442
   [] ext3_get_block+0x0/0xe4
   [] block_prepare_write+0x1a/0x25
   [] ext3_prepare_write+0xb2/0x17b
   [] generic_file_buffered_write+0x298/0x646
   [] current_fs_time+0x3b/0x40
   [] add_preempt_count+0x14/0xe4
   [] __generic_file_aio_write_nolock+0x34f/0x3b9
   [] put_lock_stats+0xe/0x2a
   [] generic_file_aio_write+0x4c/0xc4
   [] generic_file_aio_write+0x61/0xc4
   [] ext3_orphan_del+0x53/0x19f
   [] ext3_file_write+0x1c/0x9d
   [] do_sync_write+0xcc/0x10f
   [] autoremove_wake_function+0x0/0x2e
   [] get_lock_stats+0xe/0x3f
   [] lock_release_holdtime+0x41/0x4f
   [] put_lock_stats+0xe/0x2a
   [] sys_fchmod+0xa3/0xbd
   [] _mutex_unlock+0x17/0x20
   [] vfs_write+0xb6/0x148
   [] sys_write+0x48/0x74
   [] system_call+0x7e/0x83
   [] 0x

other info that might help us debug this:

2 locks held by mount/1296:
 #0:  (&inode->i_mutex){--..}, at: [] 
generic_file_aio_write+0x4c/0xc4
 #1:  (lock_page_0){--..}, at: [] 
generic_file_buffered_write+0x1ee/0x646

stack backtrace:

Call Trace:
 [] print_circular_bug_tail+0x69/0x72
 [] print_circular_bug_header+0xcc/0xd3
 [] __lock_acquire+0x96e/0xc35
 [] lock_acquire+0x48/0x61
 [] ext3_get_blocks_handle+0x1a4/0x8f7
 [] _mutex_lock+0x26/0x52
 [] ext3_get_blocks_handle+0x1a4/0x8f7
 [] find_usage_backwards+0xb0/0xd9
 [] find_usage_backwards+0xb0/0xd9
 [] debug_check_no_locks_freed+0x11d/0x129
 [] trace_hardirqs_on_caller+0x115/0x138
 [] lockdep_init_map+0xac/0x41f
 [] add_preempt_count+0x14/0xe4
 [] ext3_get_block+0xc2/0xe4
 [] __block_prepare_write+0x195/0x442
 [] ext3_get_block+0x0/0xe4
 [] block_prepare_write+0x1a/0x25
 [] ext3_prepare_write+0xb2/0x17b
 [] generic_file_buffered_write+0x298/0x646
 [] current_fs_time+0x3b/0x40
 [] add_preempt_count+0x14/0xe4
 [] __generic_file_aio_write_nolock+0x34f/0x3b9
 [] put_lock_stats+0xe/0x2a
 [] generic_file_aio_write+0x4c/0xc4
 [] generic_file_aio_write+0x61/0xc4
 [] ext3_orphan_del+0x53/0x19f
 [] ext3_file_write+0x1c/0x9d
 [] do_sync_write+0xcc/0x10f
 [] autoremove_wake_function+0x0/0x2e
 [] get_lock_stats+0xe/0x3f
 [] lock_release_holdtime+0x41/0x4f
 [] put_lock_stats+0xe/0x2a
 [] sys_fchmod+0xa3/0xbd
 [] _mutex_unlock+0x17/0x20
 [] vfs_writ

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Sun, 2007-07-15 at 15:02 +0200, Peter Zijlstra wrote:
> On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:
> 
> > Peter, do you have any interest in seeing how far we can get
> > at tracking lock_page()?  I'm not holding my breath, but any little bit
> > would probably help.
> 
> Would this be a valid report? 

===
[ INFO: possible circular locking dependency detected ]
[ 2.6.22-rt3-dirty #35
---
mkdir/1662 is trying to acquire lock:
 (lock_page_0){--..}, at: [] add_to_page_cache_lru+0xe/0x23

but task is already holding lock:
 (jbd_handle){--..}, at: [] journal_start+0x108/0x12c

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (jbd_handle){--..}:
   [] __lock_acquire+0xa72/0xc35
   [] lock_acquire+0x48/0x61
   [] journal_start+0x108/0x12c
   [] journal_start+0x124/0x12c
   [] ext3_prepare_write+0x42/0x17b
   [] generic_file_buffered_write+0x298/0x646
   [] current_fs_time+0x3b/0x40
   [] add_preempt_count+0x14/0xe4
   [] __generic_file_aio_write_nolock+0x34f/0x3b9
   [] put_lock_stats+0xe/0x2a
   [] generic_file_aio_write+0x4c/0xc4
   [] generic_file_aio_write+0x61/0xc4
   [] ext3_orphan_del+0x53/0x19f
   [] ext3_file_write+0x1c/0x9d
   [] do_sync_write+0xcc/0x10f
   [] autoremove_wake_function+0x0/0x2e
   [] get_lock_stats+0xe/0x3f
   [] lock_release_holdtime+0x41/0x4f
   [] put_lock_stats+0xe/0x2a
   [] sys_fchmod+0xa3/0xbd
   [] _mutex_unlock+0x17/0x20
   [] vfs_write+0xb6/0x148
   [] sys_write+0x48/0x74
   [] system_call+0x7e/0x83
   [] 0x

-> #0 (lock_page_0){--..}:
   [] print_circular_bug_header+0xcc/0xd3
   [] __lock_acquire+0x96e/0xc35
   [] lock_acquire+0x48/0x61
   [] add_to_page_cache_lru+0xe/0x23
   [] add_to_page_cache+0x1de/0x2c1
   [] add_to_page_cache_lru+0xe/0x23
   [] find_or_create_page+0x4c/0x73
   [] __getblk+0x118/0x23c
   [] ext3_getblk+0xf2/0x23b
   [] journal_dirty_metadata+0x1a8/0x1b3
   [] __ext3_journal_dirty_metadata+0x1e/0x46
   [] ext3_mark_iloc_dirty+0x293/0x30a
   [] ext3_mark_inode_dirty+0x3f/0x48
   [] ext3_new_inode+0x8ff/0x943
   [] ext3_bread+0x11/0x84
   [] ext3_mkdir+0xdd/0x24f
   [] vfs_mkdir+0x6d/0xb5
   [] sys_mkdirat+0xa1/0xec
   [] trace_hardirqs_on_thunk+0x3a/0x3c
   [] trace_hardirqs_on_caller+0x115/0x138
   [] trace_hardirqs_on_thunk+0x3a/0x3c
   [] system_call+0x7e/0x83
   [] 0x

other info that might help us debug this:

2 locks held by mkdir/1662:
 #0:  (&inode->i_mutex/1){--..}, at: [] 
lookup_create+0x26/0x8b
 #1:  (jbd_handle){--..}, at: [] journal_start+0x108/0x12c

stack backtrace:

Call Trace:
 [] print_circular_bug_tail+0x69/0x72
 [] print_circular_bug_header+0xcc/0xd3
 [] __lock_acquire+0x96e/0xc35
 [] lock_acquire+0x48/0x61
 [] add_to_page_cache_lru+0xe/0x23
 [] add_to_page_cache+0x1de/0x2c1
 [] add_to_page_cache_lru+0xe/0x23
 [] find_or_create_page+0x4c/0x73
 [] __getblk+0x118/0x23c
 [] ext3_getblk+0xf2/0x23b
 [] journal_dirty_metadata+0x1a8/0x1b3
 [] __ext3_journal_dirty_metadata+0x1e/0x46
 [] ext3_mark_iloc_dirty+0x293/0x30a
 [] ext3_mark_inode_dirty+0x3f/0x48
 [] ext3_new_inode+0x8ff/0x943
 [] ext3_bread+0x11/0x84
 [] ext3_mkdir+0xdd/0x24f
 [] vfs_mkdir+0x6d/0xb5
 [] sys_mkdirat+0xa1/0xec
 [] trace_hardirqs_on_thunk+0x3a/0x3c
 [] trace_hardirqs_on_caller+0x115/0x138
 [] trace_hardirqs_on_thunk+0x3a/0x3c
 [] system_call+0x7e/0x83

INFO: lockdep is turned off.
---
| preempt count:  ]
| 0-level deep critical section nesting:




-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Andrew Morton
On Sun, 15 Jul 2007 15:02:23 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:
> 
> > Peter, do you have any interest in seeing how far we can get
> > at tracking lock_page()?  I'm not holding my breath, but any little bit
> > would probably help.
> 
> Would this be a valid report? 
> 
> ( /me goes hunt a x86_64 unwinder patch that will apply to this tree.
>   These stacktraces are pain )

They are.  lockdep reports are a pain too.  It's still a struggle to
understand wtf they're trying to tell you.  Mabe it's just me.

> ===
> [ INFO: possible circular locking dependency detected ]
> [ 2.6.22-rt3-dirty #34
> ---
> mount/1296 is trying to acquire lock:
>  (&ei->truncate_mutex){--..}, at: [] 
> ext3_get_blocks_handle+0x1a4/0x8f7
> 
> but task is already holding lock:
>  (lock_page_0){--..}, at: [] 
> generic_file_buffered_write+0x1ee/0x646
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (lock_page_0){--..}:
>[] __lock_acquire+0xa72/0xc35
>[] lock_acquire+0x48/0x61
>[] add_to_page_cache_lru+0xe/0x23
>[] add_to_page_cache+0x1de/0x2c1
>[] add_to_page_cache_lru+0xe/0x23
>[] find_or_create_page+0x4c/0x73
>[] __getblk+0x118/0x23c
>[] __bread+0x6/0x9c
>[] read_block_bitmap+0x34/0x65
>[] ext3_free_blocks_sb+0xec/0x3d4
>[] ext3_free_blocks+0x2e/0x61
>[] ext3_free_data+0xaa/0xda
>[] ext3_truncate+0x4d2/0x84e
>[] pagevec_lookup+0x17/0x1e
>[] truncate_inode_pages_range+0x1f4/0x323
>[] add_preempt_count+0x14/0xe4
>[] journal_stop+0x1fe/0x21d
>[] vmtruncate+0xa2/0xc0
>[] inode_setattr+0x22/0x10a
>[] ext3_setattr+0x136/0x18f
>[] notify_change+0x10a/0x241
>[] notify_change+0x128/0x241
>[] do_truncate+0x56/0x7f
>[] do_truncate+0x61/0x7f
>[] get_write_access+0x3f/0x45
>[] may_open+0x193/0x1af
>[] open_namei+0x2cb/0x63e
>[] rt_up_read+0x53/0x5c
>[] do_page_fault+0x479/0x7cc
>[] do_filp_open+0x1c/0x38
>[] rt_spin_unlock+0x17/0x47
>[] get_unused_fd+0xf9/0x107
>[] do_sys_open+0x48/0xd5
>[] system_call+0x7e/0x83
>[] 0x

I guess we're doing lock_page() against a blockdev pagecache page here
while holding truncate_mutex against some S_ISREG file.

> -> #0 (&ei->truncate_mutex){--..}:
>[] print_circular_bug_header+0xcc/0xd3
>[] __lock_acquire+0x96e/0xc35
>[] lock_acquire+0x48/0x61
>[] ext3_get_blocks_handle+0x1a4/0x8f7
>[] _mutex_lock+0x26/0x52
>[] ext3_get_blocks_handle+0x1a4/0x8f7
>[] find_usage_backwards+0xb0/0xd9
>[] find_usage_backwards+0xb0/0xd9
>[] debug_check_no_locks_freed+0x11d/0x129
>[] trace_hardirqs_on_caller+0x115/0x138
>[] lockdep_init_map+0xac/0x41f
>[] add_preempt_count+0x14/0xe4
>[] ext3_get_block+0xc2/0xe4
>[] __block_prepare_write+0x195/0x442
>[] ext3_get_block+0x0/0xe4
>[] block_prepare_write+0x1a/0x25
>[] ext3_prepare_write+0xb2/0x17b
>[] generic_file_buffered_write+0x298/0x646
>[] current_fs_time+0x3b/0x40
>[] add_preempt_count+0x14/0xe4
>[] __generic_file_aio_write_nolock+0x34f/0x3b9
>[] put_lock_stats+0xe/0x2a
>[] generic_file_aio_write+0x4c/0xc4
>[] generic_file_aio_write+0x61/0xc4
>[] ext3_orphan_del+0x53/0x19f
>[] ext3_file_write+0x1c/0x9d
>[] do_sync_write+0xcc/0x10f
>[] autoremove_wake_function+0x0/0x2e
>[] get_lock_stats+0xe/0x3f
>[] lock_release_holdtime+0x41/0x4f
>[] put_lock_stats+0xe/0x2a
>[] sys_fchmod+0xa3/0xbd
>[] _mutex_unlock+0x17/0x20
>[] vfs_write+0xb6/0x148
>[] sys_write+0x48/0x74
>[] system_call+0x7e/0x83
>[] 0x

Here we're taking a file's truncate_mutex while holding lock_page() against
one of its pages.  This is the correct lock ranking, I suppose.

This is one of those fairly common cross-inode notabugs, I suspect.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Sun, 2007-07-15 at 11:11 -0700, Andrew Morton wrote:
> On Sun, 15 Jul 2007 15:02:23 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:
> > 
> > > Peter, do you have any interest in seeing how far we can get
> > > at tracking lock_page()?  I'm not holding my breath, but any little bit
> > > would probably help.
> > 
> > Would this be a valid report? 
> > 
> > ( /me goes hunt a x86_64 unwinder patch that will apply to this tree.
> >   These stacktraces are pain )
> 
> They are.  lockdep reports are a pain too.  It's still a struggle to
> understand wtf they're trying to tell you.  Mabe it's just me.

It got you confused alright,..

> > ===
> > [ INFO: possible circular locking dependency detected ]
> > [ 2.6.22-rt3-dirty #34
> > ---
> > mount/1296 is trying to acquire lock:
> >  (&ei->truncate_mutex){--..}, at: [] 
> > ext3_get_blocks_handle+0x1a4/0x8f7
> > 
> > but task is already holding lock:
> >  (lock_page_0){--..}, at: [] 
> > generic_file_buffered_write+0x1ee/0x646
> >
> > which lock already depends on the new lock.
> >

So, the offence is trying to acquire &ei->truncate_mutex while already
holding lock_page_0.

These traces show how the previous (reverse) dependancy came into being

> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #1 (lock_page_0){--..}:
> >[] __lock_acquire+0xa72/0xc35
> >[] lock_acquire+0x48/0x61
> >[] add_to_page_cache_lru+0xe/0x23
> >[] add_to_page_cache+0x1de/0x2c1
> >[] add_to_page_cache_lru+0xe/0x23
> >[] find_or_create_page+0x4c/0x73
> >[] __getblk+0x118/0x23c
> >[] __bread+0x6/0x9c
> >[] read_block_bitmap+0x34/0x65
> >[] ext3_free_blocks_sb+0xec/0x3d4
> >[] ext3_free_blocks+0x2e/0x61
> >[] ext3_free_data+0xaa/0xda
> >[] ext3_truncate+0x4d2/0x84e
> >[] pagevec_lookup+0x17/0x1e
> >[] truncate_inode_pages_range+0x1f4/0x323
> >[] add_preempt_count+0x14/0xe4
> >[] journal_stop+0x1fe/0x21d
> >[] vmtruncate+0xa2/0xc0
> >[] inode_setattr+0x22/0x10a
> >[] ext3_setattr+0x136/0x18f
> >[] notify_change+0x10a/0x241
> >[] notify_change+0x128/0x241
> >[] do_truncate+0x56/0x7f
> >[] do_truncate+0x61/0x7f
> >[] get_write_access+0x3f/0x45
> >[] may_open+0x193/0x1af
> >[] open_namei+0x2cb/0x63e
> >[] rt_up_read+0x53/0x5c
> >[] do_page_fault+0x479/0x7cc
> >[] do_filp_open+0x1c/0x38
> >[] rt_spin_unlock+0x17/0x47
> >[] get_unused_fd+0xf9/0x107
> >[] do_sys_open+0x48/0xd5
> >[] system_call+0x7e/0x83
> >[] 0x
> 
> I guess we're doing lock_page() against a blockdev pagecache page here
> while holding truncate_mutex against some S_ISREG file.

So this trace ( -> #1 ) shows how lock_page_0 became to depend on
&ei->truncate_mutex ( -> #0 ).

> > -> #0 (&ei->truncate_mutex){--..}:
> >[] print_circular_bug_header+0xcc/0xd3
> >[] __lock_acquire+0x96e/0xc35
> >[] lock_acquire+0x48/0x61
> >[] ext3_get_blocks_handle+0x1a4/0x8f7
> >[] _mutex_lock+0x26/0x52
> >[] ext3_get_blocks_handle+0x1a4/0x8f7
> >[] find_usage_backwards+0xb0/0xd9
> >[] find_usage_backwards+0xb0/0xd9
> >[] debug_check_no_locks_freed+0x11d/0x129
> >[] trace_hardirqs_on_caller+0x115/0x138
> >[] lockdep_init_map+0xac/0x41f
> >[] add_preempt_count+0x14/0xe4
> >[] ext3_get_block+0xc2/0xe4
> >[] __block_prepare_write+0x195/0x442
> >[] ext3_get_block+0x0/0xe4
> >[] block_prepare_write+0x1a/0x25
> >[] ext3_prepare_write+0xb2/0x17b
> >[] generic_file_buffered_write+0x298/0x646
> >[] current_fs_time+0x3b/0x40
> >[] add_preempt_count+0x14/0xe4
> >[] __generic_file_aio_write_nolock+0x34f/0x3b9
> >[] put_lock_stats+0xe/0x2a
> >[] generic_file_aio_write+0x4c/0xc4
> >[] generic_file_aio_write+0x61/0xc4
> >[] ext3_orphan_del+0x53/0x19f
> >[] ext3_file_write+0x1c/0x9d
> >[] do_sync_write+0xcc/0x10f
> >[] autoremove_wake_function+0x0/0x2e
> >[] get_lock_stats+0xe/0x3f
> >[] lock_release_holdtime+0x41/0x4f
> >[] put_lock_stats+0xe/0x2a
> >[] sys_fchmod+0xa3/0xbd
> >[] _mutex_unlock+0x17/0x20
> >[] vfs_write+0xb6/0x148
> >[] sys_write+0x48/0x74
> >[] system_call+0x7e/0x83
> >[] 0x
> 
> Here we're taking a file's truncate_mutex while holding lock_page() against
> one of its pages.  This is the correct lock ranking, I suppose.
> 
> This is one of those fairly common cross-inode notabugs, I suspect.

So #0 and #1 are compatible, and build the dependancy:

  &ei->truncate_mutex
   

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Andrew Morton
On Sun, 15 Jul 2007 21:21:03 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> Shows the current stacktrace where we violate the previously established
> locking order.

yup, but the lock_page() which we did inside truncate_mutex was a 
lock_page() against a different address_space: the blockdev mapping.

So this is OK - we'll never take truncate_mutex against the blockdev
mapping (it doesn't have one, for a start ;))

This is similar to the quite common case where we take inode A's
i_mutex inside inode B's i_mutex, which needs special lockdep annotations.

I think.  I haven't looked into this in detail.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Sun, 2007-07-15 at 12:59 -0700, Andrew Morton wrote:
> On Sun, 15 Jul 2007 21:21:03 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > Shows the current stacktrace where we violate the previously established
> > locking order.
> 
> yup, but the lock_page() which we did inside truncate_mutex was a 
> lock_page() against a different address_space: the blockdev mapping.
> 
> So this is OK - we'll never take truncate_mutex against the blockdev
> mapping (it doesn't have one, for a start ;))
> 
> This is similar to the quite common case where we take inode A's
> i_mutex inside inode B's i_mutex, which needs special lockdep annotations.
> 
> I think.  I haven't looked into this in detail.

Right, I can make lock_page classes per address space. Lets see if this
one goes away.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6][TAKE7] manpage for fallocate

2007-07-15 Thread Amit K. Arora
On Sat, Jul 14, 2007 at 10:23:42AM +0200, Michael Kerrisk wrote:
> [CC += [EMAIL PROTECTED]
> 
> Amit,
 
Hi Michael,

> Thanks for this page.  I will endeavour to review it in 
> the coming days.  In the meantime, the better address to CC
> me on fot man pages stuff is [EMAIL PROTECTED]

Sure.

BTW, this man page has changed a bit and the one in TAKE8 of fallocate
patches is the latest one. You are copied on that too.
I will forward that mail to "[EMAIL PROTECTED]" id also, so that you
do not miss it. Thanks!

--
Regards,
Amit Arora

> 
> Cheers,
> 
> Michael
> 
> > Following is the modified version of the manpage originally submitted by
> > David Chinner. Please use `nroff -man fallocate.2 | less` to view.
> > 
> > This includes changes suggested by Heikki Orsila and Barry Naujok.
> > 
> > 
> > .TH fallocate 2
> > .SH NAME
> > fallocate \- allocate or remove file space
> > .SH SYNOPSIS
> > .nf
> > .B #include 
> > .PP
> > .BI "long fallocate(int " fd ", int " mode ", loff_t " offset ", loff_t "
> > len);
> > .SH DESCRIPTION
> > The
> > .B fallocate
> > syscall allows a user to directly manipulate the allocated disk space
> > for the file referred to by
> > .I fd
> > for the byte range starting at
> > .I offset
> > and continuing for
> > .I len
> > bytes.
> > The
> > .I mode
> > parameter determines the operation to be performed on the given range.
> > Currently there are two modes:
> > .TP
> > .B FALLOC_ALLOCATE
> > allocates and initialises to zero the disk space within the given range.
> > After a successful call, subsequent writes are guaranteed not to fail
> > because
> > of lack of disk space.  If the size of the file is less than
> > .IR offset + len ,
> > then the file is increased to this size; otherwise the file size is left
> > unchanged.
> > .B FALLOC_ALLOCATE
> > closely resembles
> > .BR posix_fallocate (3)
> > and is intended as a method of optimally implementing this function.
> > .B FALLOC_ALLOCATE
> > may allocate a larger range than that was specified.
> > .TP
> > .B FALLOC_RESV_SPACE
> > provides the same functionality as
> > .B FALLOC_ALLOCATE
> > except it does not ever change the file size. This allows allocation
> > of zero blocks beyond the end of file and is useful for optimising
> > append workloads.
> > .SH RETURN VALUE
> > .B fallocate
> > returns zero on success, or an error number on failure.
> > Note that
> > .I errno
> > is not set.
> > .SH ERRORS
> > .TP
> > .B EBADF
> > .I fd
> > is not a valid file descriptor, or is not opened for writing.
> > .TP
> > .B EFBIG
> > .IR offset + len
> > exceeds the maximum file size.
> > .TP
> > .B EINVAL
> > .I offset
> > was less than 0, or
> > .I len
> > was less than or equal to 0.
> > .TP
> > .B ENODEV
> > .I fd
> > does not refer to a regular file or a directory.
> > .TP
> > .B ENOSPC
> > There is not enough space left on the device containing the file
> > referred to by
> > .IR fd .
> > .TP
> > .B ESPIPE
> > .I fd
> > refers to a pipe of file descriptor.
> > .TP
> > .B ENOSYS
> > The filesystem underlying the file descriptor does not support this
> > operation.
> > .TP
> > .B EINTR
> > A signal was caught during execution
> > .TP
> > .B EIO
> > An I/O error occurred while reading from or writing to a file system.
> > .TP
> > .B EOPNOTSUPP
> > The mode is not supported on the file descriptor.
> > .SH AVAILABILITY
> > The
> > .B fallocate
> > system call is available since 2.6.XX
> > .SH SEE ALSO
> > .BR syscall (2),
> > .BR posix_fadvise (3),
> > .BR ftruncate (3).
> 
> -- 
> Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
> Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ia64 fallocate system call

2007-07-15 Thread David Chinner
sys_fallocate for ia64. This uses the empty slot originally
reserved for move_pages.

Signed-Off-By: Dave Chinner <[EMAIL PROTECTED]>

---
 arch/ia64/kernel/entry.S  |2 +-
 include/asm-ia64/unistd.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: 2.6.x-xfs-new/arch/ia64/kernel/entry.S
===
--- 2.6.x-xfs-new.orig/arch/ia64/kernel/entry.S 2007-07-16 14:18:51.432168485 
+1000
+++ 2.6.x-xfs-new/arch/ia64/kernel/entry.S  2007-07-16 14:22:08.582454284 
+1000
@@ -1581,7 +1581,7 @@ sys_call_table:
data8 sys_sync_file_range   // 1300
data8 sys_tee
data8 sys_vmsplice
-   data8 sys_ni_syscall// reserved for move_pages
+   data8 sys_fallocate
data8 sys_getcpu
data8 sys_epoll_pwait   // 1305
data8 sys_utimensat
Index: 2.6.x-xfs-new/include/asm-ia64/unistd.h
===
--- 2.6.x-xfs-new.orig/include/asm-ia64/unistd.h2007-06-08 
21:36:31.0 +1000
+++ 2.6.x-xfs-new/include/asm-ia64/unistd.h 2007-07-16 14:22:41.166204402 
+1000
@@ -292,7 +292,7 @@
 #define __NR_sync_file_range   1300
 #define __NR_tee   1301
 #define __NR_vmsplice  1302
-/* 1303 reserved for move_pages */
+#define __NR_fallocate 1303
 #define __NR_getcpu1304
 #define __NR_epoll_pwait   1305
 #define __NR_utimensat 1306
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfs: implement fallocate V2

2007-07-15 Thread David Chinner
Initial implementation of ->fallocate for XFS.

Version 2:

o Make allocation and setting the file size atomic.
o Drop deallocate/punch functionality
o use mode field appropriately to determine if size needs changing.

---
 fs/xfs/linux-2.6/xfs_iops.c |   47 
 1 file changed, 47 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_iops.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_iops.c  2007-07-16 
14:16:02.090255611 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_iops.c   2007-07-16 14:50:07.087885337 
+1000
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Get a XFS inode from a given vnode.
@@ -812,6 +813,51 @@ xfs_vn_removexattr(
return namesp->attr_remove(vp, attr, xflags);
 }
 
+/*
+ * generic space allocation vector.
+ *
+ * This should really through a bhv_vop before stuffing around
+ * with xfs_inodes and such.
+ */
+STATIC long
+xfs_vn_fallocate(
+   struct inode*inode,
+   int mode,
+   loff_t  offset,
+   loff_t  len)
+{
+   longerror = -EOPNOTSUPP;
+   bhv_vnode_t *vp = vn_from_inode(inode);
+   bhv_desc_t  *bdp;
+   loff_t  new_size = 0;
+   xfs_flock64_t   bf;
+
+   bf.l_whence = 0;
+   bf.l_start = offset;
+   bf.l_len = len;
+
+   bdp = bhv_lookup_range(VN_BHV_HEAD(vp), VNODE_POSITION_XFS,
+   VNODE_POSITION_XFS);
+
+   xfs_ilock(xfs_vtoi(vp), XFS_IOLOCK_EXCL);
+   error = xfs_change_file_space(bdp, XFS_IOC_RESVSP, &bf, 0, NULL,
+   ATTR_NOLOCK);
+   if (!error && !(mode & FALLOC_FL_KEEP_SIZE) &&
+   offset + len > i_size_read(inode))
+   new_size = offset + len;
+
+   /* Change file size if needed */
+   if (new_size) {
+   bhv_vattr_t va;
+
+   va.va_mask = XFS_AT_SIZE;
+   va.va_size = new_size;
+   error = bhv_vop_setattr(vp, &va, ATTR_NOLOCK, NULL);
+   }
+
+   xfs_iunlock(xfs_vtoi(vp), XFS_IOLOCK_EXCL);
+   return error;
+}
 
 const struct inode_operations xfs_inode_operations = {
.permission = xfs_vn_permission,
@@ -822,6 +868,7 @@ const struct inode_operations xfs_inode_
.getxattr   = xfs_vn_getxattr,
.listxattr  = xfs_vn_listxattr,
.removexattr= xfs_vn_removexattr,
+   .fallocate  = xfs_vn_fallocate,
 };
 
 const struct inode_operations xfs_dir_inode_operations = {
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] introduce fallocate support into xfs_io

2007-07-15 Thread David Chinner
FYI.

Initial support for fallocate-based pre-allocation in
xfs_io for testing. This currently only works on ia64 because
of the hard coded syscall number and will require autoconf
magic to conditionally compile in this support.

This allows simple command-line based testing of fallocate
based allocation such as:

# ~/xfs_io -f -c "falloc_resvsp 0 1024k" -c "bmap -vp" -c stat /mnt/scratch/fred
/mnt/scratch/fred:
 EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
   0: [0..2047]:   96..2143  0 (96..2143)2048 1
fd.path = "/mnt/scratch/fred"
fd.flags = non-sync,non-direct,read-write
stat.ino = 131
stat.type = regular file
stat.size = 0
stat.blocks = 2048
fsxattr.xflags = 0x2 [-p]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136

Or more complex cases:

# ~/xfs_io -f \
> -c "falloc_allocsp 0 1024k" \
> -c "unresvsp 32k 32k" \
> -c "unresvsp 128k 64k" \
> -c "unresvsp 512k 256k" \
> -c"pwrite 0 16k" \
> -c "pwrite 96k 128k" \
> -c "pwrite 640k 384k" \
> -c "bmap -vp" \
> -c "falloc_resvsp 0 1024k" \
> -c "bmap -vvp" /mnt/scratch/fred
wrote 16384/16384 bytes at offset 0
16 KiB, 4 ops; 0. sec (274.123 MiB/sec and 70175.4386 ops/sec)
wrote 131072/131072 bytes at offset 98304
128 KiB, 32 ops; 0. sec (338.753 MiB/sec and 86720.8672 ops/sec)
wrote 393216/393216 bytes at offset 655360
384 KiB, 96 ops; 0. sec (386.200 MiB/sec and 98867.1473 ops/sec)
/mnt/scratch/fred:
 EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
   0: [0..31]: 96..127   0 (96..127)   32
   1: [32..63]:128..159  0 (128..159)  32 1
   2: [64..127]:   hole64
   3: [128..191]:  224..287  0 (224..287)  64 1
   4: [192..447]:  288..543  0 (288..543) 256
   5: [448..1023]: 544..1119 0 (544..1119)576 1
   6: [1024..1279]:hole   256
   7: [1280..2047]:1376..21430 (1376..2143)   768
/mnt/scratch/fred:
 EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
   0: [0..31]: 96..127   0 (96..127)   32
   1: [32..191]:   128..287  0 (128..287) 160 1
   2: [192..447]:  288..543  0 (288..543) 256
   3: [448..1279]: 544..1375 0 (544..1375)832 1
   4: [1280..2047]:1376..21430 (1376..2143)   768
 FLAG Values:
01 Unwritten preallocated extent
001000 Doesn't begin on stripe unit
000100 Doesn't end   on stripe unit
10 Doesn't begin on stripe width
01 Doesn't end   on stripe width

Yes, that looks like it filled all the holes properly, and the allocator
allocated the right holes on disk to merge adjacent extents when hole
filling. ;)

---
 xfsprogs/io/prealloc.c |   72 +
 1 file changed, 72 insertions(+)

Index: xfs-cmds/xfsprogs/io/prealloc.c
===
--- xfs-cmds.orig/xfsprogs/io/prealloc.c2006-11-15 19:00:31.0 
+1100
+++ xfs-cmds/xfsprogs/io/prealloc.c 2007-07-16 15:25:44.041513574 +1000
@@ -26,6 +26,8 @@ static cmdinfo_t allocsp_cmd;
 static cmdinfo_t freesp_cmd;
 static cmdinfo_t resvsp_cmd;
 static cmdinfo_t unresvsp_cmd;
+static cmdinfo_t falloc_allocsp_cmd;
+static cmdinfo_t falloc_resvsp_cmd;
 
 static int
 offset_length(
@@ -119,6 +121,56 @@ unresvsp_f(
return 0;
 }
 
+/*
+ * Hack, hack, hackety-hack-hack.
+ *
+ * This only works for ia64...
+ */
+#define __NR_fallocate1303
+
+/*
+ * someday there'll be a real header file
+ */
+#define FALLOC_FL_KEEP_SIZE 0x01
+#define FALLOC_ALLOCATE 0x0
+#define FALLOC_RESV_SPACE   FALLOC_FL_KEEP_SIZE
+
+static int
+fallocate_allocsp_f(
+   int argc,
+   char**argv)
+{
+   xfs_flock64_t   segment;
+
+   if (!offset_length(argv[1], argv[2], &segment))
+   return 0;
+
+   if (syscall(__NR_fallocate, file->fd, FALLOC_ALLOCATE,
+   segment.l_start, segment.l_len)) {
+   perror("FALLOC_ALLOCATE");
+   return 0;
+   }
+   return 0;
+}
+
+static int
+fallocate_resvsp_f(
+   int argc,
+   char**argv)
+{
+   xfs_flock64_t   segment;
+
+   if (!offset_length(argv[1], argv[2], &segment))
+   return 0;
+
+   if (syscall(__NR_fallocate, file->fd, FALLOC_RESV_SPACE,
+   segment.l_start, segment.l_len)) {
+   perror("FALLOC_ALLOCATE");
+   return 0;
+   }
+   return 0;
+}
+
 void
 prealloc_init(void)
 {
@@ -156,8 +208,28 @@ prealloc_init(void)
unresvsp_cmd.oneline =
_("frees reserved space

Re: Initial results of FLEX_BG feature.

2007-07-15 Thread Andreas Dilger
On Jul 12, 2007  10:09 -0500, Jose R. Santos wrote:
> @@ -1271,6 +1271,9 @@ static int ext4_check_descriptors (struc
>  
>   ext4_debug ("Checking group descriptors");
>  
> + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
> + return 1;
> +
>   for (i = 0; i < sbi->s_groups_count; i++)
>   {
>   if (i == sbi->s_groups_count - 1)

It looks pretty straight forward to just change this code to leave
first_block at s_first_data_block, and leave last_block at ext4_blocks_count()
if FLEX_BG is set.

Even with FLEX_BG we want to keep the group metadata within the bounds of
the filesystem.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 003/268] jbd2 commit: fix transaction dropping

2007-07-15 Thread akpm
From: Jan Kara <[EMAIL PROTECTED]>

We have to check that also the second checkpoint list is non-empty before
dropping the transaction.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Cc: Chuck Ebbert <[EMAIL PROTECTED]>
Cc: Kirill Korotaev <[EMAIL PROTECTED]>
Cc: 
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/jbd2/commit.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN fs/jbd2/commit.c~jbd2-commit-fix-transaction-dropping fs/jbd2/commit.c
--- a/fs/jbd2/commit.c~jbd2-commit-fix-transaction-dropping
+++ a/fs/jbd2/commit.c
@@ -896,7 +896,8 @@ restart_loop:
journal->j_committing_transaction = NULL;
spin_unlock(&journal->j_state_lock);
 
-   if (commit_transaction->t_checkpoint_list == NULL) {
+   if (commit_transaction->t_checkpoint_list == NULL &&
+   commit_transaction->t_checkpoint_io_list == NULL) {
__jbd2_journal_drop_transaction(journal, commit_transaction);
} else {
if (journal->j_checkpoint_transactions == NULL) {
_
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 002/268] jbd commit: fix transaction dropping

2007-07-15 Thread akpm
From: Jan Kara <[EMAIL PROTECTED]>

We have to check that also the second checkpoint list is non-empty before
dropping the transaction.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Cc: Chuck Ebbert <[EMAIL PROTECTED]>
Cc: Kirill Korotaev <[EMAIL PROTECTED]>
Cc: 
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/jbd/commit.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN fs/jbd/commit.c~jbd-commit-fix-transaction-dropping fs/jbd/commit.c
--- a/fs/jbd/commit.c~jbd-commit-fix-transaction-dropping
+++ a/fs/jbd/commit.c
@@ -887,7 +887,8 @@ restart_loop:
journal->j_committing_transaction = NULL;
spin_unlock(&journal->j_state_lock);
 
-   if (commit_transaction->t_checkpoint_list == NULL) {
+   if (commit_transaction->t_checkpoint_list == NULL &&
+   commit_transaction->t_checkpoint_io_list == NULL) {
__journal_drop_transaction(journal, commit_transaction);
} else {
if (journal->j_checkpoint_transactions == NULL) {
_
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 243/268] ext4: fix error handling in ext4_create_journal

2007-07-15 Thread akpm
From: Borislav Petkov <[EMAIL PROTECTED]>

Fix error handling in ext4_create_journal according to kernel conventions.

Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
Cc: 
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/ext4/super.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff -puN fs/ext4/super.c~ext4-fix-error-handling-in-ext4_create_journal 
fs/ext4/super.c
--- a/fs/ext4/super.c~ext4-fix-error-handling-in-ext4_create_journal
+++ a/fs/ext4/super.c
@@ -2158,6 +2158,7 @@ static int ext4_create_journal(struct su
   unsigned int journal_inum)
 {
journal_t *journal;
+   int err;
 
if (sb->s_flags & MS_RDONLY) {
printk(KERN_ERR "EXT4-fs: readonly filesystem when trying to "
@@ -2165,13 +2166,15 @@ static int ext4_create_journal(struct su
return -EROFS;
}
 
-   if (!(journal = ext4_get_journal(sb, journal_inum)))
+   journal = ext4_get_journal(sb, journal_inum);
+   if (!journal)
return -EINVAL;
 
printk(KERN_INFO "EXT4-fs: creating new journal on inode %u\n",
   journal_inum);
 
-   if (jbd2_journal_create(journal)) {
+   err = jbd2_journal_create(journal);
+   if (err) {
printk(KERN_ERR "EXT4-fs: error creating journal.\n");
jbd2_journal_destroy(journal);
return -EIO;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 203/268] ext3: remove extra IS_RDONLY() check

2007-07-15 Thread akpm
From: Dave Hansen <[EMAIL PROTECTED]>

ext3_change_inode_journal_flag() is only called from one location:
ext3_ioctl(EXT3_IOC_SETFLAGS).  That ioctl case already has a IS_RDONLY()
call in it so this one is superfluous.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
Cc: 
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/ext3/inode.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/ext3/inode.c~ext3-remove-extra-is_rdonly-check fs/ext3/inode.c
--- a/fs/ext3/inode.c~ext3-remove-extra-is_rdonly-check
+++ a/fs/ext3/inode.c
@@ -3195,7 +3195,7 @@ int ext3_change_inode_journal_flag(struc
 */
 
journal = EXT3_JOURNAL(inode);
-   if (is_journal_aborted(journal) || IS_RDONLY(inode))
+   if (is_journal_aborted(journal))
return -EROFS;
 
journal_lock_updates(journal);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 195/268] ext3: fix deadlock in ext3_remount() and orphan list handling

2007-07-15 Thread akpm
From: Jan Kara <[EMAIL PROTECTED]>

ext3_orphan_add() and ext3_orphan_del() functions lock sb->s_lock with a
transaction started with ext3_mark_recovery_complete() waits for a transaction
holding sb->s_lock, thus leading to a possible deadlock.  At the moment we
call ext3_mark_recovery_complete() from ext3_remount() we have done all the
work needed for remounting and thus we are safe to drop sb->s_lock before we
wait for transactions to commit.  Note that at this moment we are still
guarded by s_umount lock against other remounts/umounts.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Cc: Eric Sandeen <[EMAIL PROTECTED]>
Cc: 
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/ext3/super.c |8 
 1 file changed, 8 insertions(+)

diff -puN 
fs/ext3/super.c~ext3-fix-deadlock-in-ext3_remount-and-orphan-list-handling 
fs/ext3/super.c
--- a/fs/ext3/super.c~ext3-fix-deadlock-in-ext3_remount-and-orphan-list-handling
+++ a/fs/ext3/super.c
@@ -2147,12 +2147,14 @@ static void ext3_mark_recovery_complete(
 
journal_lock_updates(journal);
journal_flush(journal);
+   lock_super(sb);
if (EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER) &&
sb->s_flags & MS_RDONLY) {
EXT3_CLEAR_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
sb->s_dirt = 0;
ext3_commit_super(sb, es, 1);
}
+   unlock_super(sb);
journal_unlock_updates(journal);
 }
 
@@ -2341,7 +2343,13 @@ static int ext3_remount (struct super_bl
(sbi->s_mount_state & EXT3_VALID_FS))
es->s_state = cpu_to_le16(sbi->s_mount_state);
 
+   /*
+* We have to unlock super so that we can wait for
+* transactions.
+*/
+   unlock_super(sb);
ext3_mark_recovery_complete(sb, es);
+   lock_super(sb);
} else {
__le32 ret;
if ((ret = EXT3_HAS_RO_COMPAT_FEATURE(sb,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 196/268] ext4: fix deadlock in ext4_remount() and orphan list handling

2007-07-15 Thread akpm
From: Jan Kara <[EMAIL PROTECTED]>

ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
transaction started with ext4_mark_recovery_complete() waits for a transaction
holding sb->s_lock, thus leading to a possible deadlock.  At the moment we
call ext4_mark_recovery_complete() from ext4_remount() we have done all the
work needed for remounting and thus we are safe to drop sb->s_lock before we
wait for transactions to commit.  Note that at this moment we are still
guarded by s_umount lock against other remounts/umounts.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Cc: Eric Sandeen <[EMAIL PROTECTED]>
Cc: 
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/ext4/super.c |8 
 1 file changed, 8 insertions(+)

diff -puN 
fs/ext4/super.c~ext4-fix-deadlock-in-ext4_remount-and-orphan-list-handling 
fs/ext4/super.c
--- a/fs/ext4/super.c~ext4-fix-deadlock-in-ext4_remount-and-orphan-list-handling
+++ a/fs/ext4/super.c
@@ -,12 +,14 @@ static void ext4_mark_recovery_complete(
 
jbd2_journal_lock_updates(journal);
jbd2_journal_flush(journal);
+   lock_super(sb);
if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER) &&
sb->s_flags & MS_RDONLY) {
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
sb->s_dirt = 0;
ext4_commit_super(sb, es, 1);
}
+   unlock_super(sb);
jbd2_journal_unlock_updates(journal);
 }
 
@@ -2416,7 +2418,13 @@ static int ext4_remount (struct super_bl
(sbi->s_mount_state & EXT4_VALID_FS))
es->s_state = cpu_to_le16(sbi->s_mount_state);
 
+   /*
+* We have to unlock super so that we can wait for
+* transactions.
+*/
+   unlock_super(sb);
ext4_mark_recovery_complete(sb, es);
+   lock_super(sb);
} else {
__le32 ret;
if ((ret = EXT4_HAS_RO_COMPAT_FEATURE(sb,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html