Re: [PATCH] isofs: mounting to regular file may succeed

2007-07-15 Thread Kirill Kuvaldin
On Sat, Jul 14, 2007 at 09:16:51PM +0200, Jan Engelhardt wrote:
 
 On Jul 14 2007 03:47, Kirill Kuvaldin wrote:
 
 We then can mount it to a regular file:
 
 Wow, this is news to me. Since when is it possible to mount files to files?
 

It is possible to mount a regular file to another one with --bind.
The problem in question is that mounting a malformed ISO 9660 image to a
directory fails, but to a regular file - succeeds.

Kirill
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:

 Peter, do you have any interest in seeing how far we can get
 at tracking lock_page()?  I'm not holding my breath, but any little bit
 would probably help.

I ran headfirst into the fact the unlock_page() need not be called by
the same task that did lock_page().

Esp IO-completion interrupts love to unlock pages they did not lock
themselves. Not at all sure that is fixable, it seems to be the nature
of the async structure of the problem :-(



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:

 Peter, do you have any interest in seeing how far we can get
 at tracking lock_page()?  I'm not holding my breath, but any little bit
 would probably help.

Would this be a valid report? 

( /me goes hunt a x86_64 unwinder patch that will apply to this tree.
  These stacktraces are pain )

===
[ INFO: possible circular locking dependency detected ]
[ 2.6.22-rt3-dirty #34
---
mount/1296 is trying to acquire lock:
 (ei-truncate_mutex){--..}, at: [802f75e5] 
ext3_get_blocks_handle+0x1a4/0x8f7

but task is already holding lock:
 (lock_page_0){--..}, at: [80267107] 
generic_file_buffered_write+0x1ee/0x646

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

- #1 (lock_page_0){--..}:
   [80251b26] __lock_acquire+0xa72/0xc35
   [802520c9] lock_acquire+0x48/0x61
   [80265e22] add_to_page_cache_lru+0xe/0x23
   [80265d31] add_to_page_cache+0x1de/0x2c1
   [80265e22] add_to_page_cache_lru+0xe/0x23
   [80266985] find_or_create_page+0x4c/0x73
   [802ae716] __getblk+0x118/0x23c
   [802afa91] __bread+0x6/0x9c
   [802f382d] read_block_bitmap+0x34/0x65
   [802f3e1b] ext3_free_blocks_sb+0xec/0x3d4
   [802f4131] ext3_free_blocks+0x2e/0x61
   [802f82bc] ext3_free_data+0xaa/0xda
   [802f8976] ext3_truncate+0x4d2/0x84e
   [8026df5a] pagevec_lookup+0x17/0x1e
   [8026e7b1] truncate_inode_pages_range+0x1f4/0x323
   [802614b4] add_preempt_count+0x14/0xe4
   [80304d13] journal_stop+0x1fe/0x21d
   [8027661a] vmtruncate+0xa2/0xc0
   [802a292b] inode_setattr+0x22/0x10a
   [802f9b51] ext3_setattr+0x136/0x18f
   [802a2b1d] notify_change+0x10a/0x241
   [802a2b3b] notify_change+0x128/0x241
   [8028e35e] do_truncate+0x56/0x7f
   [8028e369] do_truncate+0x61/0x7f
   [80296278] get_write_access+0x3f/0x45
   [802973c7] may_open+0x193/0x1af
   [80299869] open_namei+0x2cb/0x63e
   [8025718b] rt_up_read+0x53/0x5c
   [8056da59] do_page_fault+0x479/0x7cc
   [8028dce1] do_filp_open+0x1c/0x38
   [8056a4f9] rt_spin_unlock+0x17/0x47
   [8028da05] get_unused_fd+0xf9/0x107
   [8028dd45] do_sys_open+0x48/0xd5
   [8020950e] system_call+0x7e/0x83
   [] 0x

- #0 (ei-truncate_mutex){--..}:
   [802503b9] print_circular_bug_header+0xcc/0xd3
   [80251a22] __lock_acquire+0x96e/0xc35
   [802520c9] lock_acquire+0x48/0x61
   [802f75e5] ext3_get_blocks_handle+0x1a4/0x8f7
   [8056a6d4] _mutex_lock+0x26/0x52
   [802f75e5] ext3_get_blocks_handle+0x1a4/0x8f7
   [802504b2] find_usage_backwards+0xb0/0xd9
   [802504b2] find_usage_backwards+0xb0/0xd9
   [80250d7c] debug_check_no_locks_freed+0x11d/0x129
   [80250c33] trace_hardirqs_on_caller+0x115/0x138
   [8024efdc] lockdep_init_map+0xac/0x41f
   [802614b4] add_preempt_count+0x14/0xe4
   [802f8035] ext3_get_block+0xc2/0xe4
   [802aeed3] __block_prepare_write+0x195/0x442
   [802f7f73] ext3_get_block+0x0/0xe4
   [802af19a] block_prepare_write+0x1a/0x25
   [802f93e9] ext3_prepare_write+0xb2/0x17b
   [802671b1] generic_file_buffered_write+0x298/0x646
   [8023944e] current_fs_time+0x3b/0x40
   [802614b4] add_preempt_count+0x14/0xe4
   [802678ae] __generic_file_aio_write_nolock+0x34f/0x3b9
   [8024ed3d] put_lock_stats+0xe/0x2a
   [80267964] generic_file_aio_write+0x4c/0xc4
   [80267979] generic_file_aio_write+0x61/0xc4
   [802fcf18] ext3_orphan_del+0x53/0x19f
   [802f5768] ext3_file_write+0x1c/0x9d
   [8028ef31] do_sync_write+0xcc/0x10f
   [80246f9c] autoremove_wake_function+0x0/0x2e
   [8024ecfe] get_lock_stats+0xe/0x3f
   [8024ed9a] lock_release_holdtime+0x41/0x4f
   [8024ed3d] put_lock_stats+0xe/0x2a
   [8028dfb1] sys_fchmod+0xa3/0xbd
   [8056a717] _mutex_unlock+0x17/0x20
   [8028f6cd] vfs_write+0xb6/0x148
   [8028fc61] sys_write+0x48/0x74
   [8020950e] system_call+0x7e/0x83
   [] 0x

other info that might help us debug this:

2 locks held by mount/1296:
 #0:  (inode-i_mutex){--..}, at: [80267964] 
generic_file_aio_write+0x4c/0xc4
 #1:  (lock_page_0){--..}, at: [80267107] 

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Sun, 2007-07-15 at 15:02 +0200, Peter Zijlstra wrote:
 On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:
 
  Peter, do you have any interest in seeing how far we can get
  at tracking lock_page()?  I'm not holding my breath, but any little bit
  would probably help.
 
 Would this be a valid report? 

===
[ INFO: possible circular locking dependency detected ]
[ 2.6.22-rt3-dirty #35
---
mkdir/1662 is trying to acquire lock:
 (lock_page_0){--..}, at: [80265df6] add_to_page_cache_lru+0xe/0x23

but task is already holding lock:
 (jbd_handle){--..}, at: [80305797] journal_start+0x108/0x12c

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

- #1 (jbd_handle){--..}:
   [80251b16] __lock_acquire+0xa72/0xc35
   [802520b9] lock_acquire+0x48/0x61
   [80305797] journal_start+0x108/0x12c
   [803057b3] journal_start+0x124/0x12c
   [802f92e9] ext3_prepare_write+0x42/0x17b
   [80267185] generic_file_buffered_write+0x298/0x646
   [8023943e] current_fs_time+0x3b/0x40
   [802614a4] add_preempt_count+0x14/0xe4
   [80267882] __generic_file_aio_write_nolock+0x34f/0x3b9
   [8024ed2d] put_lock_stats+0xe/0x2a
   [80267938] generic_file_aio_write+0x4c/0xc4
   [8026794d] generic_file_aio_write+0x61/0xc4
   [802fce88] ext3_orphan_del+0x53/0x19f
   [802f56d8] ext3_file_write+0x1c/0x9d
   [8028eedd] do_sync_write+0xcc/0x10f
   [80246f8d] autoremove_wake_function+0x0/0x2e
   [8024ecee] get_lock_stats+0xe/0x3f
   [8024ed8a] lock_release_holdtime+0x41/0x4f
   [8024ed2d] put_lock_stats+0xe/0x2a
   [8028df5d] sys_fchmod+0xa3/0xbd
   [8056a6d7] _mutex_unlock+0x17/0x20
   [8028f679] vfs_write+0xb6/0x148
   [8028fc0d] sys_write+0x48/0x74
   [8020950e] system_call+0x7e/0x83
   [] 0x

- #0 (lock_page_0){--..}:
   [802503a9] print_circular_bug_header+0xcc/0xd3
   [80251a12] __lock_acquire+0x96e/0xc35
   [802520b9] lock_acquire+0x48/0x61
   [80265df6] add_to_page_cache_lru+0xe/0x23
   [80265d05] add_to_page_cache+0x1de/0x2c1
   [80265df6] add_to_page_cache_lru+0xe/0x23
   [80266959] find_or_create_page+0x4c/0x73
   [802ae6c2] __getblk+0x118/0x23c
   [802f7d9a] ext3_getblk+0xf2/0x23b
   [80306337] journal_dirty_metadata+0x1a8/0x1b3
   [80301e3e] __ext3_journal_dirty_metadata+0x1e/0x46
   [802f6c63] ext3_mark_iloc_dirty+0x293/0x30a
   [802f70a1] ext3_mark_inode_dirty+0x3f/0x48
   [802f644e] ext3_new_inode+0x8ff/0x943
   [802f8c9c] ext3_bread+0x11/0x84
   [802fccc3] ext3_mkdir+0xdd/0x24f
   [80296893] vfs_mkdir+0x6d/0xb5
   [8029902e] sys_mkdirat+0xa1/0xec
   [80569f4d] trace_hardirqs_on_thunk+0x3a/0x3c
   [80250c23] trace_hardirqs_on_caller+0x115/0x138
   [80569f4d] trace_hardirqs_on_thunk+0x3a/0x3c
   [8020950e] system_call+0x7e/0x83
   [] 0x

other info that might help us debug this:

2 locks held by mkdir/1662:
 #0:  (inode-i_mutex/1){--..}, at: [80297154] 
lookup_create+0x26/0x8b
 #1:  (jbd_handle){--..}, at: [80305797] journal_start+0x108/0x12c

stack backtrace:

Call Trace:
 [8024ffad] print_circular_bug_tail+0x69/0x72
 [802503a9] print_circular_bug_header+0xcc/0xd3
 [80251a12] __lock_acquire+0x96e/0xc35
 [802520b9] lock_acquire+0x48/0x61
 [80265df6] add_to_page_cache_lru+0xe/0x23
 [80265d05] add_to_page_cache+0x1de/0x2c1
 [80265df6] add_to_page_cache_lru+0xe/0x23
 [80266959] find_or_create_page+0x4c/0x73
 [802ae6c2] __getblk+0x118/0x23c
 [802f7d9a] ext3_getblk+0xf2/0x23b
 [80306337] journal_dirty_metadata+0x1a8/0x1b3
 [80301e3e] __ext3_journal_dirty_metadata+0x1e/0x46
 [802f6c63] ext3_mark_iloc_dirty+0x293/0x30a
 [802f70a1] ext3_mark_inode_dirty+0x3f/0x48
 [802f644e] ext3_new_inode+0x8ff/0x943
 [802f8c9c] ext3_bread+0x11/0x84
 [802fccc3] ext3_mkdir+0xdd/0x24f
 [80296893] vfs_mkdir+0x6d/0xb5
 [8029902e] sys_mkdirat+0xa1/0xec
 [80569f4d] trace_hardirqs_on_thunk+0x3a/0x3c
 [80250c23] trace_hardirqs_on_caller+0x115/0x138
 [80569f4d] trace_hardirqs_on_thunk+0x3a/0x3c
 [8020950e] system_call+0x7e/0x83

INFO: lockdep is turned off.
---
| preempt count:  ]
| 0-level deep critical section nesting:



Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Sun, 2007-07-15 at 11:11 -0700, Andrew Morton wrote:
 On Sun, 15 Jul 2007 15:02:23 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote:
 
  On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote:
  
   Peter, do you have any interest in seeing how far we can get
   at tracking lock_page()?  I'm not holding my breath, but any little bit
   would probably help.
  
  Would this be a valid report? 
  
  ( /me goes hunt a x86_64 unwinder patch that will apply to this tree.
These stacktraces are pain )
 
 They are.  lockdep reports are a pain too.  It's still a struggle to
 understand wtf they're trying to tell you.  Mabe it's just me.

It got you confused alright,..

  ===
  [ INFO: possible circular locking dependency detected ]
  [ 2.6.22-rt3-dirty #34
  ---
  mount/1296 is trying to acquire lock:
   (ei-truncate_mutex){--..}, at: [802f75e5] 
  ext3_get_blocks_handle+0x1a4/0x8f7
  
  but task is already holding lock:
   (lock_page_0){--..}, at: [80267107] 
  generic_file_buffered_write+0x1ee/0x646
 
  which lock already depends on the new lock.
 

So, the offence is trying to acquire ei-truncate_mutex while already
holding lock_page_0.

These traces show how the previous (reverse) dependancy came into being

  
  the existing dependency chain (in reverse order) is:
  
  - #1 (lock_page_0){--..}:
 [80251b26] __lock_acquire+0xa72/0xc35
 [802520c9] lock_acquire+0x48/0x61
 [80265e22] add_to_page_cache_lru+0xe/0x23
 [80265d31] add_to_page_cache+0x1de/0x2c1
 [80265e22] add_to_page_cache_lru+0xe/0x23
 [80266985] find_or_create_page+0x4c/0x73
 [802ae716] __getblk+0x118/0x23c
 [802afa91] __bread+0x6/0x9c
 [802f382d] read_block_bitmap+0x34/0x65
 [802f3e1b] ext3_free_blocks_sb+0xec/0x3d4
 [802f4131] ext3_free_blocks+0x2e/0x61
 [802f82bc] ext3_free_data+0xaa/0xda
 [802f8976] ext3_truncate+0x4d2/0x84e
 [8026df5a] pagevec_lookup+0x17/0x1e
 [8026e7b1] truncate_inode_pages_range+0x1f4/0x323
 [802614b4] add_preempt_count+0x14/0xe4
 [80304d13] journal_stop+0x1fe/0x21d
 [8027661a] vmtruncate+0xa2/0xc0
 [802a292b] inode_setattr+0x22/0x10a
 [802f9b51] ext3_setattr+0x136/0x18f
 [802a2b1d] notify_change+0x10a/0x241
 [802a2b3b] notify_change+0x128/0x241
 [8028e35e] do_truncate+0x56/0x7f
 [8028e369] do_truncate+0x61/0x7f
 [80296278] get_write_access+0x3f/0x45
 [802973c7] may_open+0x193/0x1af
 [80299869] open_namei+0x2cb/0x63e
 [8025718b] rt_up_read+0x53/0x5c
 [8056da59] do_page_fault+0x479/0x7cc
 [8028dce1] do_filp_open+0x1c/0x38
 [8056a4f9] rt_spin_unlock+0x17/0x47
 [8028da05] get_unused_fd+0xf9/0x107
 [8028dd45] do_sys_open+0x48/0xd5
 [8020950e] system_call+0x7e/0x83
 [] 0x
 
 I guess we're doing lock_page() against a blockdev pagecache page here
 while holding truncate_mutex against some S_ISREG file.

So this trace ( - #1 ) shows how lock_page_0 became to depend on
ei-truncate_mutex ( - #0 ).

  - #0 (ei-truncate_mutex){--..}:
 [802503b9] print_circular_bug_header+0xcc/0xd3
 [80251a22] __lock_acquire+0x96e/0xc35
 [802520c9] lock_acquire+0x48/0x61
 [802f75e5] ext3_get_blocks_handle+0x1a4/0x8f7
 [8056a6d4] _mutex_lock+0x26/0x52
 [802f75e5] ext3_get_blocks_handle+0x1a4/0x8f7
 [802504b2] find_usage_backwards+0xb0/0xd9
 [802504b2] find_usage_backwards+0xb0/0xd9
 [80250d7c] debug_check_no_locks_freed+0x11d/0x129
 [80250c33] trace_hardirqs_on_caller+0x115/0x138
 [8024efdc] lockdep_init_map+0xac/0x41f
 [802614b4] add_preempt_count+0x14/0xe4
 [802f8035] ext3_get_block+0xc2/0xe4
 [802aeed3] __block_prepare_write+0x195/0x442
 [802f7f73] ext3_get_block+0x0/0xe4
 [802af19a] block_prepare_write+0x1a/0x25
 [802f93e9] ext3_prepare_write+0xb2/0x17b
 [802671b1] generic_file_buffered_write+0x298/0x646
 [8023944e] current_fs_time+0x3b/0x40
 [802614b4] add_preempt_count+0x14/0xe4
 [802678ae] __generic_file_aio_write_nolock+0x34f/0x3b9
 [8024ed3d] put_lock_stats+0xe/0x2a
 [80267964] generic_file_aio_write+0x4c/0xc4
 [80267979] generic_file_aio_write+0x61/0xc4
 [802fcf18] 

*at syscalls for xattrs?

2007-07-15 Thread Jan Engelhardt
Hi,


recently, the family of *at() syscalls and functions (openat, fstatat, 
etc.) have been added to Linux and Glibc, respectively.
In short: I am missing xattr at functions :)

BTW, why is fstatat called fstatat and not statat? (Same goes for 
futimesat.) It does not take a file descriptor for the file argument. 
Otherwise we'd also need fopenat/funlinkat, etc. Any reasons?


Thanks,
Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Andrew Morton
On Sun, 15 Jul 2007 21:21:03 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote:

 Shows the current stacktrace where we violate the previously established
 locking order.

yup, but the lock_page() which we did inside truncate_mutex was a 
lock_page() against a different address_space: the blockdev mapping.

So this is OK - we'll never take truncate_mutex against the blockdev
mapping (it doesn't have one, for a start ;))

This is similar to the quite common case where we take inode A's
i_mutex inside inode B's i_mutex, which needs special lockdep annotations.

I think.  I haven't looked into this in detail.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-15 Thread Peter Zijlstra
On Sun, 2007-07-15 at 12:59 -0700, Andrew Morton wrote:
 On Sun, 15 Jul 2007 21:21:03 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote:
 
  Shows the current stacktrace where we violate the previously established
  locking order.
 
 yup, but the lock_page() which we did inside truncate_mutex was a 
 lock_page() against a different address_space: the blockdev mapping.
 
 So this is OK - we'll never take truncate_mutex against the blockdev
 mapping (it doesn't have one, for a start ;))
 
 This is similar to the quite common case where we take inode A's
 i_mutex inside inode B's i_mutex, which needs special lockdep annotations.
 
 I think.  I haven't looked into this in detail.

Right, I can make lock_page classes per address space. Lets see if this
one goes away.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: *at syscalls for xattrs?

2007-07-15 Thread Al Viro
On Sun, Jul 15, 2007 at 09:46:27PM +0200, Jan Engelhardt wrote:
 Hi,
 
 
 recently, the family of *at() syscalls and functions (openat, fstatat, 
 etc.) have been added to Linux and Glibc, respectively.
 In short: I am missing xattr at functions :)

No.  They are not fscking forks.  They are almost as revolting, but
not quite on the same level.

 BTW, why is fstatat called fstatat and not statat? (Same goes for 
 futimesat.) It does not take a file descriptor for the file argument. 
 Otherwise we'd also need fopenat/funlinkat, etc. Any reasons?

Ulrich having an odd taste?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: *at syscalls for xattrs?

2007-07-15 Thread Al Viro
On Sun, Jul 15, 2007 at 02:13:21PM -0700, Nicholas Miell wrote:
 
 I suspect he was asking for 
 
 int getxattrat(int fd, const char *path, const char *name, void *value, 
   size_t size, int flags)
 int setxattrat(int fd, const char *path, const char *name, void *value,
   size_t size, int xattrflags, int atflags)
 
 rather than the ability to access xattrs as files.

Just one question: what the bleeding hell for?  Not that the rest of
..at() family made any damn sense as an interface...

   BTW, why is fstatat called fstatat and not statat? (Same goes for 
   futimesat.) It does not take a file descriptor for the file argument. 
   Otherwise we'd also need fopenat/funlinkat, etc. Any reasons?
  
  Ulrich having an odd taste?
 
 Solaris compatibility.

Sun having no taste whatsoever
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: *at syscalls for xattrs?

2007-07-15 Thread H. Peter Anvin
Al Viro wrote:
 
 BTW, why is fstatat called fstatat and not statat? (Same goes for 
 futimesat.) It does not take a file descriptor for the file argument. 
 Otherwise we'd also need fopenat/funlinkat, etc. Any reasons?
 Ulrich having an odd taste?
 Solaris compatibility.
 
 Sun having no taste whatsoever

Yup.  I filed an objection to this with the POSIX committee, but it was
rejected :(

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6][TAKE7] manpage for fallocate

2007-07-15 Thread Amit K. Arora
On Sat, Jul 14, 2007 at 10:23:42AM +0200, Michael Kerrisk wrote:
 [CC += [EMAIL PROTECTED]
 
 Amit,
 
Hi Michael,

 Thanks for this page.  I will endeavour to review it in 
 the coming days.  In the meantime, the better address to CC
 me on fot man pages stuff is [EMAIL PROTECTED]

Sure.

BTW, this man page has changed a bit and the one in TAKE8 of fallocate
patches is the latest one. You are copied on that too.
I will forward that mail to [EMAIL PROTECTED] id also, so that you
do not miss it. Thanks!

--
Regards,
Amit Arora

 
 Cheers,
 
 Michael
 
  Following is the modified version of the manpage originally submitted by
  David Chinner. Please use `nroff -man fallocate.2 | less` to view.
  
  This includes changes suggested by Heikki Orsila and Barry Naujok.
  
  
  .TH fallocate 2
  .SH NAME
  fallocate \- allocate or remove file space
  .SH SYNOPSIS
  .nf
  .B #include fcntl.h
  .PP
  .BI long fallocate(int  fd , int  mode , loff_t  offset , loff_t 
  len);
  .SH DESCRIPTION
  The
  .B fallocate
  syscall allows a user to directly manipulate the allocated disk space
  for the file referred to by
  .I fd
  for the byte range starting at
  .I offset
  and continuing for
  .I len
  bytes.
  The
  .I mode
  parameter determines the operation to be performed on the given range.
  Currently there are two modes:
  .TP
  .B FALLOC_ALLOCATE
  allocates and initialises to zero the disk space within the given range.
  After a successful call, subsequent writes are guaranteed not to fail
  because
  of lack of disk space.  If the size of the file is less than
  .IR offset + len ,
  then the file is increased to this size; otherwise the file size is left
  unchanged.
  .B FALLOC_ALLOCATE
  closely resembles
  .BR posix_fallocate (3)
  and is intended as a method of optimally implementing this function.
  .B FALLOC_ALLOCATE
  may allocate a larger range than that was specified.
  .TP
  .B FALLOC_RESV_SPACE
  provides the same functionality as
  .B FALLOC_ALLOCATE
  except it does not ever change the file size. This allows allocation
  of zero blocks beyond the end of file and is useful for optimising
  append workloads.
  .SH RETURN VALUE
  .B fallocate
  returns zero on success, or an error number on failure.
  Note that
  .I errno
  is not set.
  .SH ERRORS
  .TP
  .B EBADF
  .I fd
  is not a valid file descriptor, or is not opened for writing.
  .TP
  .B EFBIG
  .IR offset + len
  exceeds the maximum file size.
  .TP
  .B EINVAL
  .I offset
  was less than 0, or
  .I len
  was less than or equal to 0.
  .TP
  .B ENODEV
  .I fd
  does not refer to a regular file or a directory.
  .TP
  .B ENOSPC
  There is not enough space left on the device containing the file
  referred to by
  .IR fd .
  .TP
  .B ESPIPE
  .I fd
  refers to a pipe of file descriptor.
  .TP
  .B ENOSYS
  The filesystem underlying the file descriptor does not support this
  operation.
  .TP
  .B EINTR
  A signal was caught during execution
  .TP
  .B EIO
  An I/O error occurred while reading from or writing to a file system.
  .TP
  .B EOPNOTSUPP
  The mode is not supported on the file descriptor.
  .SH AVAILABILITY
  The
  .B fallocate
  system call is available since 2.6.XX
  .SH SEE ALSO
  .BR syscall (2),
  .BR posix_fadvise (3),
  .BR ftruncate (3).
 
 -- 
 Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
 Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ia64 fallocate system call

2007-07-15 Thread David Chinner
sys_fallocate for ia64. This uses the empty slot originally
reserved for move_pages.

Signed-Off-By: Dave Chinner [EMAIL PROTECTED]

---
 arch/ia64/kernel/entry.S  |2 +-
 include/asm-ia64/unistd.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: 2.6.x-xfs-new/arch/ia64/kernel/entry.S
===
--- 2.6.x-xfs-new.orig/arch/ia64/kernel/entry.S 2007-07-16 14:18:51.432168485 
+1000
+++ 2.6.x-xfs-new/arch/ia64/kernel/entry.S  2007-07-16 14:22:08.582454284 
+1000
@@ -1581,7 +1581,7 @@ sys_call_table:
data8 sys_sync_file_range   // 1300
data8 sys_tee
data8 sys_vmsplice
-   data8 sys_ni_syscall// reserved for move_pages
+   data8 sys_fallocate
data8 sys_getcpu
data8 sys_epoll_pwait   // 1305
data8 sys_utimensat
Index: 2.6.x-xfs-new/include/asm-ia64/unistd.h
===
--- 2.6.x-xfs-new.orig/include/asm-ia64/unistd.h2007-06-08 
21:36:31.0 +1000
+++ 2.6.x-xfs-new/include/asm-ia64/unistd.h 2007-07-16 14:22:41.166204402 
+1000
@@ -292,7 +292,7 @@
 #define __NR_sync_file_range   1300
 #define __NR_tee   1301
 #define __NR_vmsplice  1302
-/* 1303 reserved for move_pages */
+#define __NR_fallocate 1303
 #define __NR_getcpu1304
 #define __NR_epoll_pwait   1305
 #define __NR_utimensat 1306
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfs: implement fallocate V2

2007-07-15 Thread David Chinner
Initial implementation of -fallocate for XFS.

Version 2:

o Make allocation and setting the file size atomic.
o Drop deallocate/punch functionality
o use mode field appropriately to determine if size needs changing.

---
 fs/xfs/linux-2.6/xfs_iops.c |   47 
 1 file changed, 47 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_iops.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_iops.c  2007-07-16 
14:16:02.090255611 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_iops.c   2007-07-16 14:50:07.087885337 
+1000
@@ -51,6 +51,7 @@
 #include linux/xattr.h
 #include linux/namei.h
 #include linux/security.h
+#include linux/falloc.h
 
 /*
  * Get a XFS inode from a given vnode.
@@ -812,6 +813,51 @@ xfs_vn_removexattr(
return namesp-attr_remove(vp, attr, xflags);
 }
 
+/*
+ * generic space allocation vector.
+ *
+ * This should really through a bhv_vop before stuffing around
+ * with xfs_inodes and such.
+ */
+STATIC long
+xfs_vn_fallocate(
+   struct inode*inode,
+   int mode,
+   loff_t  offset,
+   loff_t  len)
+{
+   longerror = -EOPNOTSUPP;
+   bhv_vnode_t *vp = vn_from_inode(inode);
+   bhv_desc_t  *bdp;
+   loff_t  new_size = 0;
+   xfs_flock64_t   bf;
+
+   bf.l_whence = 0;
+   bf.l_start = offset;
+   bf.l_len = len;
+
+   bdp = bhv_lookup_range(VN_BHV_HEAD(vp), VNODE_POSITION_XFS,
+   VNODE_POSITION_XFS);
+
+   xfs_ilock(xfs_vtoi(vp), XFS_IOLOCK_EXCL);
+   error = xfs_change_file_space(bdp, XFS_IOC_RESVSP, bf, 0, NULL,
+   ATTR_NOLOCK);
+   if (!error  !(mode  FALLOC_FL_KEEP_SIZE) 
+   offset + len  i_size_read(inode))
+   new_size = offset + len;
+
+   /* Change file size if needed */
+   if (new_size) {
+   bhv_vattr_t va;
+
+   va.va_mask = XFS_AT_SIZE;
+   va.va_size = new_size;
+   error = bhv_vop_setattr(vp, va, ATTR_NOLOCK, NULL);
+   }
+
+   xfs_iunlock(xfs_vtoi(vp), XFS_IOLOCK_EXCL);
+   return error;
+}
 
 const struct inode_operations xfs_inode_operations = {
.permission = xfs_vn_permission,
@@ -822,6 +868,7 @@ const struct inode_operations xfs_inode_
.getxattr   = xfs_vn_getxattr,
.listxattr  = xfs_vn_listxattr,
.removexattr= xfs_vn_removexattr,
+   .fallocate  = xfs_vn_fallocate,
 };
 
 const struct inode_operations xfs_dir_inode_operations = {
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] introduce fallocate support into xfs_io

2007-07-15 Thread David Chinner
FYI.

Initial support for fallocate-based pre-allocation in
xfs_io for testing. This currently only works on ia64 because
of the hard coded syscall number and will require autoconf
magic to conditionally compile in this support.

This allows simple command-line based testing of fallocate
based allocation such as:

# ~/xfs_io -f -c falloc_resvsp 0 1024k -c bmap -vp -c stat /mnt/scratch/fred
/mnt/scratch/fred:
 EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
   0: [0..2047]:   96..2143  0 (96..2143)2048 1
fd.path = /mnt/scratch/fred
fd.flags = non-sync,non-direct,read-write
stat.ino = 131
stat.type = regular file
stat.size = 0
stat.blocks = 2048
fsxattr.xflags = 0x2 [-p]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136

Or more complex cases:

# ~/xfs_io -f \
 -c falloc_allocsp 0 1024k \
 -c unresvsp 32k 32k \
 -c unresvsp 128k 64k \
 -c unresvsp 512k 256k \
 -cpwrite 0 16k \
 -c pwrite 96k 128k \
 -c pwrite 640k 384k \
 -c bmap -vp \
 -c falloc_resvsp 0 1024k \
 -c bmap -vvp /mnt/scratch/fred
wrote 16384/16384 bytes at offset 0
16 KiB, 4 ops; 0. sec (274.123 MiB/sec and 70175.4386 ops/sec)
wrote 131072/131072 bytes at offset 98304
128 KiB, 32 ops; 0. sec (338.753 MiB/sec and 86720.8672 ops/sec)
wrote 393216/393216 bytes at offset 655360
384 KiB, 96 ops; 0. sec (386.200 MiB/sec and 98867.1473 ops/sec)
/mnt/scratch/fred:
 EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
   0: [0..31]: 96..127   0 (96..127)   32
   1: [32..63]:128..159  0 (128..159)  32 1
   2: [64..127]:   hole64
   3: [128..191]:  224..287  0 (224..287)  64 1
   4: [192..447]:  288..543  0 (288..543) 256
   5: [448..1023]: 544..1119 0 (544..1119)576 1
   6: [1024..1279]:hole   256
   7: [1280..2047]:1376..21430 (1376..2143)   768
/mnt/scratch/fred:
 EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
   0: [0..31]: 96..127   0 (96..127)   32
   1: [32..191]:   128..287  0 (128..287) 160 1
   2: [192..447]:  288..543  0 (288..543) 256
   3: [448..1279]: 544..1375 0 (544..1375)832 1
   4: [1280..2047]:1376..21430 (1376..2143)   768
 FLAG Values:
01 Unwritten preallocated extent
001000 Doesn't begin on stripe unit
000100 Doesn't end   on stripe unit
10 Doesn't begin on stripe width
01 Doesn't end   on stripe width

Yes, that looks like it filled all the holes properly, and the allocator
allocated the right holes on disk to merge adjacent extents when hole
filling. ;)

---
 xfsprogs/io/prealloc.c |   72 +
 1 file changed, 72 insertions(+)

Index: xfs-cmds/xfsprogs/io/prealloc.c
===
--- xfs-cmds.orig/xfsprogs/io/prealloc.c2006-11-15 19:00:31.0 
+1100
+++ xfs-cmds/xfsprogs/io/prealloc.c 2007-07-16 15:25:44.041513574 +1000
@@ -26,6 +26,8 @@ static cmdinfo_t allocsp_cmd;
 static cmdinfo_t freesp_cmd;
 static cmdinfo_t resvsp_cmd;
 static cmdinfo_t unresvsp_cmd;
+static cmdinfo_t falloc_allocsp_cmd;
+static cmdinfo_t falloc_resvsp_cmd;
 
 static int
 offset_length(
@@ -119,6 +121,56 @@ unresvsp_f(
return 0;
 }
 
+/*
+ * Hack, hack, hackety-hack-hack.
+ *
+ * This only works for ia64...
+ */
+#define __NR_fallocate1303
+
+/*
+ * someday there'll be a real header file
+ */
+#define FALLOC_FL_KEEP_SIZE 0x01
+#define FALLOC_ALLOCATE 0x0
+#define FALLOC_RESV_SPACE   FALLOC_FL_KEEP_SIZE
+
+static int
+fallocate_allocsp_f(
+   int argc,
+   char**argv)
+{
+   xfs_flock64_t   segment;
+
+   if (!offset_length(argv[1], argv[2], segment))
+   return 0;
+
+   if (syscall(__NR_fallocate, file-fd, FALLOC_ALLOCATE,
+   segment.l_start, segment.l_len)) {
+   perror(FALLOC_ALLOCATE);
+   return 0;
+   }
+   return 0;
+}
+
+static int
+fallocate_resvsp_f(
+   int argc,
+   char**argv)
+{
+   xfs_flock64_t   segment;
+
+   if (!offset_length(argv[1], argv[2], segment))
+   return 0;
+
+   if (syscall(__NR_fallocate, file-fd, FALLOC_RESV_SPACE,
+   segment.l_start, segment.l_len)) {
+   perror(FALLOC_ALLOCATE);
+   return 0;
+   }
+   return 0;
+}
+
 void
 prealloc_init(void)
 {
@@ -156,8 +208,28 @@ prealloc_init(void)
unresvsp_cmd.oneline =
_(frees reserved space associated with part of a file);
 
+