Re: [PATCH 1/5][TAKE8] manpage for fallocate
On Thu, Jul 19, 2007 at 03:10:52PM +1000, David Chinner wrote: % git-log 84e1e99f112dead8f9ba036c02d24a9f5ce7f544 |head -10 commit 84e1e99f112dead8f9ba036c02d24a9f5ce7f544 Author: David Chinner [EMAIL PROTECTED] Date: Mon Jun 18 16:50:27 2007 +1000 [XFS] Prevent ENOSPC from aborting transactions that need to succeed During delayed allocation extent conversion or unwritten extent conversion, we need to reserve some blocks for transactions reservations. We need to reserve these blocks in case a btree split occurs and we need to allocate some blocks. -- IOWs, XFS didn't provide this guarantee until about a month ago Ok, once again XFS is ahead of the curve ;) Comment rescinded then... --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5][TAKE8] manpage for fallocate
On Jul 18, 2007 20:41 -0700, Mark Fasheh wrote: On Sat, Jul 14, 2007 at 12:16:25AM +0530, Amit K. Arora wrote: After a successful call, subsequent writes are guaranteed not to fail because of lack of disk space. If a write to an unwritten region requires a node split, that could result in the allocation of new meta data which obviously could fail if the disk is truly full. Granted that's unlikely to happen but maybe we should be conservative and say something like: After a successful call, subsequent writes are guaranteed to never require allocation of file data. ? --Mark In the worst case, the unwritten extent could be zero-filled before the write is done, so no exent split is needed. We discussed this recently for the ext4 fallocate, but didn't consider it important enough to hold the code. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5][TAKE8] manpage for fallocate
On Sat, Jul 14, 2007 at 12:16:25AM +0530, Amit K. Arora wrote: After a successful call, subsequent writes are guaranteed not to fail because of lack of disk space. If a write to an unwritten region requires a node split, that could result in the allocation of new meta data which obviously could fail if the disk is truly full. Granted that's unlikely to happen but maybe we should be conservative and say something like: After a successful call, subsequent writes are guaranteed to never require allocation of file data. ? --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5][TAKE8] manpage for fallocate
On Wed, Jul 18, 2007 at 08:41:55PM -0700, Mark Fasheh wrote: On Sat, Jul 14, 2007 at 12:16:25AM +0530, Amit K. Arora wrote: After a successful call, subsequent writes are guaranteed not to fail because of lack of disk space. If a write to an unwritten region requires a node split, that could result in the allocation of new meta data which obviously could fail if the disk is truly full. % git-log 84e1e99f112dead8f9ba036c02d24a9f5ce7f544 |head -10 commit 84e1e99f112dead8f9ba036c02d24a9f5ce7f544 Author: David Chinner [EMAIL PROTECTED] Date: Mon Jun 18 16:50:27 2007 +1000 [XFS] Prevent ENOSPC from aborting transactions that need to succeed During delayed allocation extent conversion or unwritten extent conversion, we need to reserve some blocks for transactions reservations. We need to reserve these blocks in case a btree split occurs and we need to allocate some blocks. -- IOWs, XFS didn't provide this guarantee until about a month ago Granted that's unlikely to happen but maybe we should be conservative and say something like: After a successful call, subsequent writes are guaranteed to never require allocation of file data. ? Well, the above phrasing is taken directly from the posix_fallocate() man page, and it is intended that sys_fallocate() is used to implement posix_fallocate(). In that case, the semantics we have to provide are writes are guaranteed not to fail due to lack of disk space. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5][TAKE8] manpage for fallocate
Following is the modified version of the manpage originally submitted by David Chinner. Please use `nroff -man fallocate.2 | less` to view. Following changed from TAKE7: * Removed FALLOC_ALLOCATE and FALLOCATE_RESV_SPACE modes. * Described only single flag for mode, i.e. FALLOC_FL_KEEP_SIZE. * s/zero blocks/zeroed blocks/ as suggested by Dave. * Included linux/falloc.h instead of fcntl.h. Following changed from TAKE6 to TAKE7: Included changes suggested by Heikki Orsila and Barry Naujok. .TH fallocate 2 .SH NAME fallocate \- manipulate file space .SH SYNOPSIS .nf .B #include linux/falloc.h .PP .BI long fallocate(int fd , int mode , loff_t offset , loff_t len ); .SH DESCRIPTION The .B fallocate syscall allows a user to directly manipulate the allocated disk space for the file referred to by .I fd for the byte range starting at .I offset and continuing for .I len bytes. The .I mode parameter determines the operation to be performed on the given range. Currently there is only one flag supported for the mode argument. .TP .B FALLOC_FL_KEEP_SIZE allocates and initialises to zero the disk space within the given range. After a successful call, subsequent writes are guaranteed not to fail because of lack of disk space. Even if the size of the file is less than .IR offset + len , the file size is not changed. This allows allocation of zeroed blocks beyond the end of file and is useful for optimising append workloads. .PP If .B FALLOC_FL_KEEP_SIZE flag is not specified in the mode argument, the default behavior of this system call is almost same as when this flag is passed. The only difference is that on success, the file size will be changed if the .IR offset + len is greater than the file size. This default behavior closely resembles .BR posix_fallocate (3) and is intended as a method of optimally implementing this function. .PP .B fallocate may allocate a larger range than that was specified. .SH RETURN VALUE .B fallocate returns zero on success, or an error number on failure. Note that .I errno is not set. .SH ERRORS .TP .B EBADF .I fd is not a valid file descriptor, or is not opened for writing. .TP .B EFBIG .IR offset + len exceeds the maximum file size. .TP .B EINVAL .I offset was less than 0, or .I len was less than or equal to 0. .TP .B ENODEV .I fd does not refer to a regular file or a directory. .TP .B ENOSPC There is not enough space left on the device containing the file referred to by .IR fd . .TP .B ESPIPE .I fd refers to a pipe of file descriptor. .TP .B ENOSYS The filesystem underlying the file descriptor does not support this operation. .TP .B EINTR A signal was caught during execution .TP .B EIO An I/O error occurred while reading from or writing to a file system. .TP .B EOPNOTSUPP The mode is not supported on the file descriptor. .SH AVAILABILITY The .B fallocate system call is available since 2.6.XX .SH SEE ALSO .BR posix_fallocate (3), .BR posix_fadvise (3), .BR ftruncate (3). - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html