Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-11 Thread Suparna Bhattacharya
On Fri, May 11, 2007 at 08:39:50AM +1000, David Chinner wrote:
 On Thu, May 10, 2007 at 05:26:20PM +0530, Amit K. Arora wrote:
  On Thu, May 10, 2007 at 10:59:26AM +1000, David Chinner wrote:
   On Wed, May 09, 2007 at 09:31:02PM +0530, Amit K. Arora wrote:
I have the updated patches ready which take care of Andrew's comments.
Will run some tests and post them soon.

But, before submitting these patches, I think it will be better to
finalize on certain things which might be worth some discussion here:

1) Should the file size change when preallocation is done beyond EOF ?
- Andreas and Chris Wedgwood are in favor of not changing the file size
in this case. I also tend to agree with them. Does anyone has an
argument in favor of changing the filesize ?  If not, I will remove the
code which changes the filesize, before I resubmit the concerned ext4
patch.
   
   I think there needs to be both. If we don't have a mechanism to atomically
   change the file size with the preallocation, then applications that use
   stat() to work out if they need to preallocate more space will end up
   racing.
  
  By both above, do you mean we should give user the flexibility if it wants
  the filesize changed or not ? It can be done by having *two* modes for
  preallocation in the system call - say FA_PREALLOCATE and FA_ALLOCATE. If we
  use FA_PREALLOCATE mode, fallocate() will allocate blocks, but will not
  change the filesize and [cm]time. If FA_ALLOCATE mode is used, fallocate()
  will change the filesize if required (i.e.  when allocation is beyond EOF)
  and also update [cm]time.  This way, the application can decide what it
  wants.
 
 Yes, that's right.
 
  This will be helpfull for the partial allocation scenario also. Think of the
  case when we do not change the filesize in fallocate() and expect
  applications/posix_fallocate() to do ftruncate() after fallocate() for this.
  Now if fallocate() results in a partial allocation with -ENOSPC error
  returned, applications/posix_fallocate() will not know for what length
  ftruncate() has to be called.  :(
 
 Well, posix_fallocate() either gets all the space or it fails. If
 you truncate to extend the file size after an ENOSPC, then that is
 a buggy implementation.
 
 The same could be said for any application, or even the fallocate()
 call itself if it changes the filesize without having completely
 preallocated the space asked
 
  Hence it may be a good idea to give user the flexibility if it wants to
  atomically change the file size with preallocation or not. But, with more
  flexibility there comes inconsistency in behavior, which is worth
  considering.
 
 We've got different modes to specify different behaviour. That's
 what the mode field was put there for in the first place - the
 interface is *designed* to support different preallocation
 behaviours
 
2) For FA_UNALLOCATE mode, should the file system allow unallocation of
normal (non-preallocated) blocks (blocks allocated via regular
write/truncate operations) also (i.e. work as punch()) ?
   
   Yes. That is the current XFS implementation for XFS_IOC_UNRESVSP, and what
   i did for FA_UNALLOCATE as well.
  
  Ok. But, some people may not expect/like this. I think, we can keep it on
  the backburner for a while, till other issues are sorted out.
 
 How can it be a backburner issue when it defines the
 implementation?  I've already implemented some thing in XFS that
 sort of does what I think that the interface is supposed to do, but
 I need that interface to be nailed down before proceeding any
 further.
 
 All I'm really interested in right now is that the fallocate
 _interface_ can be used as a *complete replacement* for the
 pre-existing XFS-specific ioctls that are already used by
 applications.  What ext4 can or can't do right now is irrelevant to
 this discussion - the interface definition needs to take priority
 over implementation

Would you like to write up an interface definition description (likely
man page) and post it for review, possibly with a mention of apps using
it today ?

One reason for introducing the mode parameter was to allow the interface to
evolve incrementally as more options / semantic questions are proposed, so
that we don't have to make all the decisions right now. 
So it would be good to start with a *minimal* definition, even just one mode.
The rest could follow as subsequent patches, each being reviewed and debated
separately. Otherwise this discussion can drag on for a long time.

Regards
Suparna

 
 Cheers,
 
 Dave,
 -- 
 Dave Chinner
 Principal Engineer
 SGI Australian Software Group
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the 

[RFC] store RAID stride in superblock

2007-05-11 Thread Andreas Dilger
It is possible to specify the RAID stride to mke2fs allow it to optimize
the layout of the bitmaps.  With the new mballoc it is also possible to
tell it via a mount option to do large allocations aligned on the RAID
stride (by default it aligns on 1MB boundaries from the start of the LUN).

What would be rather convenient is to store the RAID stride value in the
superblock.  That would spare a lot of hassle on the part of the admin
to tune the filesystem optimally for the underlying storage.  There is
also a library used in the XFS tools that knows how to probe various
kinds of block devices (e.g. MD RAID, LVM/DM, etc) to get their storage
layout that would avoid the need for the user to specify anything.

Any thoughts on this?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html