Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
--On 18 April 2007 6:21:39 PM -0600 Andreas Dilger <[EMAIL PROTECTED]> wrote: Below is an aggregation of the comments in this thread: struct fiemap_extent { __u64 fe_start; /* starting offset in bytes */ __u64 fe_len; /* length in bytes */ __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ __u32 fe_lun; /* logical storage device number in array */ } struct fiemap { __u64 fm_start; /* logical start offset of mapping (in/out) */ __u64 fm_len; /* logical length of mapping (in/out) */ __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ __u32 fm_extent_count; /* number of extents in fm_extents (in/out) */ __u64 fm_unused; struct fiemap_extent fm_extents[0]; } /* flags for the fiemap request */ # define FIEMAP_FLAG_SYNC 0x0001 /* flush delalloc data to disk*/ # define FIEMAP_FLAG_HSM_READ 0x0002 /* retrieve data from HSM */ # define FIEMAP_FLAG_INCOMPAT0xff00 /* must understand these flags*/ /* flags for the returned extents */ # define FIEMAP_EXTENT_HOLE 0x0001 /* no space allocated */ # define FIEMAP_EXTENT_UNWRITTEN0x0002 /* uninitialized space */ # define FIEMAP_EXTENT_UNKNOWN 0x0004 /* in use, location unknown */ # define FIEMAP_EXTENT_ERROR0x0008 /* error mapping space */ # define FIEMAP_EXTENT_NO_DIRECT0x0010 /* no direct data access */ SUMMARY OF CHANGES == - use fm_* fields directly in request instead of making it a fiemap_extent (though they are layed out identically) I much prefer that - it makes it a lot clearer to me to have fiemap_extent just for fm_extents (no different meanings now). (Don't like the word "offset" in comment without "physical" or some such but whatever;-) I also prefer the flags as separate fields too :) --Tim - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
On Wed, Apr 18, 2007 at 06:21:39PM -0600, Andreas Dilger wrote: > On Apr 16, 2007 21:22 +1000, David Chinner wrote: > > On Thu, Apr 12, 2007 at 05:05:50AM -0600, Andreas Dilger wrote: > > > struct fiemap_extent { > > > __u64 fe_start; /* starting offset in bytes */ > > > __u64 fe_len; /* length in bytes */ > > > } > > > > > > struct fiemap { > > > struct fiemap_extent fm_start; /* offset, length of desired mapping */ > > > __u32 fm_extent_count; /* number of extents in array */ > > > __u32 fm_flags; /* flags (similar to XFS_IOC_GETBMAP) */ > > > __u64 unused; > > > struct fiemap_extent fm_extents[0]; > > > } > > > > > > #define FIEMAP_LEN_MASK 0xff > > > #define FIEMAP_LEN_HOLE 0x01 > > > #define FIEMAP_LEN_UNWRITTEN 0x02 > > > > I'm not sure I like stealing bits from the length to use a flags - > > I'd prefer an explicit field per fiemap_extent for this. > > Christoph expressed the same concern. I'm not dead set against having an > extra 8 bytes per extent (32-bit flags, 32-bit reserved), though it may > mean the need for 50% more ioctls if the file is large. I don't think this overhead is a huge problem - just pass in a larger buffer (e.g. xfs_bmap can ask for thousands of extents in a single ioctl call as we can extract the number of extents in an inode via XFS_IOC_FSGETXATTRA). > Below is an aggregation of the comments in this thread: > > struct fiemap_extent { > __u64 fe_start; /* starting offset in bytes */ > __u64 fe_len; /* length in bytes */ > __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ > __u32 fe_lun; /* logical storage device number in array */ > } Oh, I missed the bit about the fe_lun - I was thinking something like that might be useful in future > struct fiemap { > __u64 fm_start; /* logical start offset of mapping (in/out) */ > __u64 fm_len; /* logical length of mapping (in/out) */ > __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ > __u32 fm_extent_count; /* number of extents in fm_extents (in/out) */ > __u64 fm_unused; > struct fiemap_extent fm_extents[0]; > } > > /* flags for the fiemap request */ > #define FIEMAP_FLAG_SYNC 0x0001 /* flush delalloc data to disk*/ > #define FIEMAP_FLAG_HSM_READ 0x0002 /* retrieve data from HSM */ > #define FIEMAP_FLAG_INCOMPAT0xff00/* must understand these flags*/ No flags in the INCOMPAT range - shouldn't it be 0x3 at this point? > /* flags for the returned extents */ > #define FIEMAP_EXTENT_HOLE0x0001 /* no space allocated */ > #define FIEMAP_EXTENT_UNWRITTEN 0x0002 /* uninitialized space > */ > #define FIEMAP_EXTENT_UNKNOWN 0x0004 /* in use, location unknown */ > #define FIEMAP_EXTENT_ERROR 0x0008 /* error mapping space */ > #define FIEMAP_EXTENT_NO_DIRECT 0x0010 /* no direct data > access */ SO, there's a HSM_READ flag above. If we are going to make this interface useful for filesystems that have HSMs interacting with their extents, the HSM needs to be able to query whether the extent is online (on disk), has been migrated offline (on tape) or in dual-state (i.e. both online and offline). > SUMMARY OF CHANGES > == > - use fm_* fields directly in request instead of making it a fiemap_extent > (though they are layed out identically) > > - separate flags word for fm_flags: > - FIEMAP_FLAG_SYNC = range should be synced to disk before returning > mapping, may return FIEMAP_EXTENT_UNKNOWN for delalloc writes otherwise > - FIEMAP_FLAG_HSM_READ = force retrieval + mapping from HSM if specified > (this has the opposite meaning of XFS's BMV_IF_NO_DMAPI_READ flag) > - FIEMAP_FLAG_XATTR = omitted for now, can address that in the future > if there is agreement on whether that is desirable to have or if it is > better to call ioctl(FIEMAP) on an XATTR fd. > - FIEMAP_FLAG_INCOMPAT = if flags are set in this mask in request, kernel > must understand them, or fail ioctl with e.g. EOPNOTSUPP, so that we > don't request e.g. FIEMAP_FLAG_XATTR and kernel ignores it > > - __u64 fm_unused does not take up an extra space on all power-of-two buffer > sizes (would otherwise be at end of buffer), and may be handy in the future. > > - add separate fe_flags word with flags from various suggestions: > - FIEMAP_EXTENT_HOLE = extent has no space allocation > - FIEMAP_EXTENT_UNWRITTEN = extent space allocation but contains no data > - FIEMAP_EXTENT_UNKNOWN = extent contains data, but location is unknown > (e.g. HSM, delalloc awaiting sync, etc) I'd like an explicit delalloc flag, not lumping it in with "unknown". we *know* the extent is delalloc ;) > - FIEMAP_EXTENT_ERROR = error mapping extent. Should fe_lun == e
Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
On Apr 16, 2007 21:22 +1000, David Chinner wrote: > On Thu, Apr 12, 2007 at 05:05:50AM -0600, Andreas Dilger wrote: > > struct fiemap_extent { > > __u64 fe_start; /* starting offset in bytes */ > > __u64 fe_len; /* length in bytes */ > > } > > > > struct fiemap { > > struct fiemap_extent fm_start; /* offset, length of desired mapping */ > > __u32 fm_extent_count; /* number of extents in array */ > > __u32 fm_flags; /* flags (similar to XFS_IOC_GETBMAP) */ > > __u64 unused; > > struct fiemap_extent fm_extents[0]; > > } > > > > #define FIEMAP_LEN_MASK 0xff > > #define FIEMAP_LEN_HOLE 0x01 > > #define FIEMAP_LEN_UNWRITTEN0x02 > > I'm not sure I like stealing bits from the length to use a flags - > I'd prefer an explicit field per fiemap_extent for this. Christoph expressed the same concern. I'm not dead set against having an extra 8 bytes per extent (32-bit flags, 32-bit reserved), though it may mean the need for 50% more ioctls if the file is large. Below is an aggregation of the comments in this thread: struct fiemap_extent { __u64 fe_start; /* starting offset in bytes */ __u64 fe_len; /* length in bytes */ __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ __u32 fe_lun; /* logical storage device number in array */ } struct fiemap { __u64 fm_start; /* logical start offset of mapping (in/out) */ __u64 fm_len; /* logical length of mapping (in/out) */ __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ __u32 fm_extent_count; /* number of extents in fm_extents (in/out) */ __u64 fm_unused; struct fiemap_extent fm_extents[0]; } /* flags for the fiemap request */ #define FIEMAP_FLAG_SYNC0x0001 /* flush delalloc data to disk*/ #define FIEMAP_FLAG_HSM_READ0x0002 /* retrieve data from HSM */ #define FIEMAP_FLAG_INCOMPAT0xff00 /* must understand these flags*/ /* flags for the returned extents */ #define FIEMAP_EXTENT_HOLE 0x0001 /* no space allocated */ #define FIEMAP_EXTENT_UNWRITTEN 0x0002 /* uninitialized space */ #define FIEMAP_EXTENT_UNKNOWN 0x0004 /* in use, location unknown */ #define FIEMAP_EXTENT_ERROR 0x0008 /* error mapping space */ #define FIEMAP_EXTENT_NO_DIRECT 0x0010 /* no direct data access */ SUMMARY OF CHANGES == - use fm_* fields directly in request instead of making it a fiemap_extent (though they are layed out identically) - separate flags word for fm_flags: - FIEMAP_FLAG_SYNC = range should be synced to disk before returning mapping, may return FIEMAP_EXTENT_UNKNOWN for delalloc writes otherwise - FIEMAP_FLAG_HSM_READ = force retrieval + mapping from HSM if specified (this has the opposite meaning of XFS's BMV_IF_NO_DMAPI_READ flag) - FIEMAP_FLAG_XATTR = omitted for now, can address that in the future if there is agreement on whether that is desirable to have or if it is better to call ioctl(FIEMAP) on an XATTR fd. - FIEMAP_FLAG_INCOMPAT = if flags are set in this mask in request, kernel must understand them, or fail ioctl with e.g. EOPNOTSUPP, so that we don't request e.g. FIEMAP_FLAG_XATTR and kernel ignores it - __u64 fm_unused does not take up an extra space on all power-of-two buffer sizes (would otherwise be at end of buffer), and may be handy in the future. - add separate fe_flags word with flags from various suggestions: - FIEMAP_EXTENT_HOLE = extent has no space allocation - FIEMAP_EXTENT_UNWRITTEN = extent space allocation but contains no data - FIEMAP_EXTENT_UNKNOWN = extent contains data, but location is unknown (e.g. HSM, delalloc awaiting sync, etc) - FIEMAP_EXTENT_ERROR = error mapping extent. Should fe_lun == errno? - FIEMAP_EXTENT_NO_DIRECT = data cannot be directly accessed (e.g. data encrypted, compressed, etc), may want separate flags for these? - add new fe_lun word per extent for filesystems that manage multiple devices (e.g. OCFS, GFS, ZFS, Lustre). This would otherwise have been unused. > Given that xfs_bmap uses extra information from the filesystem > (geometry) to display extra (and frequently used) information > about the alignment of extents. ie: > > chook 681% xfs_bmap -vv fred > fred: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >0: [0..151]:288444888..288445039 8 (1696536..1696687) 152 00010 > FLAG Values: > 01 Unwritten preallocated extent > 001000 Doesn't begin on stripe unit > 000100 Doesn't end on stripe unit > 10 Doesn't begin on stripe width > 01 Doesn't end on stripe width Can you clarify the terminology here? What is a "stripe unit" and what is a "stripe width"? Are
Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
On Apr 16, 2007 18:01 +1000, Timothy Shimmin wrote: > --On 12 April 2007 5:05:50 AM -0600 Andreas Dilger <[EMAIL PROTECTED]> > wrote: > >struct fiemap_extent { > > __u64 fe_start; /* starting offset in bytes */ > > __u64 fe_len; /* length in bytes */ > >} > > > >struct fiemap { > > struct fiemap_extent fm_start; /* offset, length of desired mapping > > */ > > __u32 fm_extent_count; /* number of extents in array */ > > __u32 fm_flags; /* flags (similar to > > XFS_IOC_GETBMAP) */ > > __u64 unused; > > struct fiemap_extent fm_extents[0]; > >} > > > ># define FIEMAP_LEN_MASK 0xff > ># define FIEMAP_LEN_HOLE 0x01 > ># define FIEMAP_LEN_UNWRITTEN0x02 > > > >All offsets are in bytes to allow cases where filesystems are not going > >block-aligned/sized allocations (e.g. tail packing). The fm_extents array > >returned contains the packed list of allocation extents for the file, > >including entries for holes (which have fe_start == 0, and a flag). > > > >The ->fm_extents[] array includes all of the holes in addition to > >allocated extents because this avoids the need to return both the logical > >and physical address for every extent and does not make processing any > >harder. > > Well, that's what stood out for me. I was wondering where the "fe_block" > field had gone - the "physical address". > So is your "fe_start; /* starting offset */" actually the disk location > (not a logical file offset) > _except_ in the header (fiemap) where it is the desired logical offset. Correct. The fm_extent in the request contains the logical start offset and length in bytes of the requested fiemap region. In the returned header it represents the logical start offset of the extent that contained the requested start offset, and the logical length of all the returned extents. I haven't decided whether the returned length should be until EOF, or have the "virtual hole" at the end of the file. I think EOF makes more sense. The fe_start + fe_len in the fm_extents represent the physical location on the block device for that extent. fm_extent[i].fe_start (per Anton) is undefined if FIEMAP_LEN_HOLE is set, and .fe_len is the length of the hole. > Okay, looking at your example use below that's what it looks like. > And when you refer to fm_start below, you mean fm_start.fe_start? > Sorry, I realise this is just an approximation but this part confused me. Right, I'll write up a new RFC based on feedback here, and correcting the various errors in the original proposal. > So you get rid of all the logical file offsets in the extents because we > report holes explicitly (and we know everything is contiguous if you > include the holes). Correct. It saves space in the common case. > >Caller works something like: > > > > char buf[4096]; > > struct fiemap *fm = (struct fiemap *)buf; > > int count = (sizeof(buf) - sizeof(*fm)) / sizeof(fm_extent); > > > > fm->fm_start.fe_start = 0; /* start of file */ > > fm->fm_start.fe_len = -1; /* end of file */ > > fm->fm_extent_count = count; /* max extents in fm_extents[] array */ > > fm->fm_flags = 0; /* maybe "no DMAPI", etc like XFS */ > > > > fd = open(path, O_RDONLY); > > printf("logical\t\tphysical\t\tbytes\n"); > > > > /* The last entry will have less extents than the maximum */ > > while (fm->fm_extent_count == count) { > > rc = ioctl(fd, FIEMAP, fm); > > if (rc) > > break; > > > > /* kernel filled in fm_extents[] array, set fm_extent_count > > * to be actual number of extents returned, leaves > > * fm_start.fe_start alone (unlike XFS_IOC_GETBMAP). */ > > > > for (i = 0; i < fm->fm_extent_count; i++) { > > __u64 len = fm->fm_extents[i].fe_len & > > FIEMAP_LEN_MASK; > > __u64 fm_next = fm->fm_start.fe_start + len; > > int hole = fm->fm_extents[i].fe_len & > > FIEMAP_LEN_HOLE; > > int unwr = fm->fm_extents[i].fe_len & > > FIEMAP_LEN_UNWRITTEN; > > > > printf("%llu-%llu\t%llu-%llu\t%llu\t%s%s\n", > > fm->fm_start.fe_start, fm_next - 1, > > hole ? 0 : fm->fm_extents[i].fe_start, > > hole ? 0 : fm->fm_extents[i].fe_start + > >fm->fm_extents[i].fe_len - 1, > > len, hole ? "(hole) " : "", > > unwr ? "(unwritten) " : ""); > > > > /* get ready for printing next extent, or next ioctl > > */ > > fm->fm_start.fe_start = fm_next; > > } > > } > > Cheers, Andreas
Re: Ext3 behavior on power failure
On Wed, Mar 28, 2007 at 09:17:27 -0400, "John Anthony Kazos Jr." <[EMAIL PROTECTED]> wrote: > > If you fsync() your data, you are guaranteed that also your data are > >safely on disk when fsync returns. So what is the question here? > > Pardon a newbie's intrusion, but I do know this isn't true. There is a > window of possible loss because of the multitude of layers of caching, > especially within the drive itself. Unless there is a super_duper_fsync() > that is able to actually poll the hardware and get a confirmation that the > internal buffers are purged? That is why you need to disable write caching of the drives or use cache flushes via write barriers (if the stack of block devices all support them) if the hardware cache isn't battery backed or the device doesn't support returning the status of particular commands. Of course nothing is perfectly safe. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance degradation with FFSB between 2.6.20 and 2.6.21-rc7
> On Wed, 18 Apr 2007 15:54:00 +0200 Valerie Clement <[EMAIL PROTECTED]> wrote: > > Running benchmark tests (FFSB) on an ext4 filesystem, I noticed a > performance degradation (about 15-20 percent) in sequential write tests > between 2.6.19-rc6 and 2.6.21-rc4 kernels. > > I ran the same tests on ext3 and XFS filesystems and I saw the same > performance difference between the two kernel versions for these two > filesystems. > > I have also reproduced it between 2.6.20.7 and 2.6.21-rc7. > The FFSB tests run 16 threads, each creating 1GB files. The tests were > done on the same x86_64 system, with the same kernel configuration and > on the same scsi device. Below are the throughput values given by FFSB. > >kernel XFSext3 > -- > 2.6.20.748 MB/sec 44 MB/sec > > 2.6.21-rc7 38 MB/sec 37 MB/sec > > Did anyone else run across the problem? > Is there a known issue? > That's a new discovery, thanks. It could be due to I/O scheduler changes. Which one are you using? CFQ? Or it could be that there has been some changed behaviour at the VFS/pagecache layer: the VFS might be submitting little hunks of lots of files, rather than large hunks of few files. Or it could be a block-layer thing: perhaps some driver change has caused us to be placing less data into the queue. Which device driver is that machine using? Being a simple soul, the first thing I'll try when I get near a test box will be for i in $(seq 1 16) do time dd if=/dev/zero of=$i bs=1M count=1024 & done - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tiny e2fsprogs fix, bitmaps question
On Wed, Apr 18, 2007 at 12:54:44AM -0600, Andreas Dilger wrote: [ I'm quoting out of order here, and cc'ing the linux-ext4 list with permission since I think the topics under discussion have a more general interest. --Ted] > Just reading the updated e2fsck.conf.5 man page, and noticed in [scratch > files]: > > numdirs_threshold: > s/numbers of directory/number of directories/ Oops, thanks for catching that. I implemented the in-core memory reduction patches in somewhat of a hurry because I had a number of users who had been using BackupPC or other similar hard-link intensive backup programs, and they were running into massive memory usage issues. So this took priority over the extents refactorization work (which is currently on the top of my e2fsprogs work queue). > We are also looking to implement something better than raw bitmaps for > cases where the set bits are expected to be sparse (e.g. block_dup_map, > inode_bad_map, inode_bb_map, inode_imagic_map), instead of just going > wholesale to on-disk storage (which is just going to slow things down). The other type of bitmap implementation which I had been thinking about asking people to implement is one which works well in the case where the set bits are expected to be mostly be contiguous --- i.e., the block allocation map --- so some kind of extent-based data structure indexed using an in-memory b-tree would be ideal. Note that for block_dup_map, inode_bad_map, inode_bb_map, inode_imagic_map, et. al., they are usually not allocated at all, and if they are allocated, so while there are usually a very few numbers of set bits, so using a tree-indexed, extent-based data structure should work just fine for this sort of implementation. Yes, it's not as efficient as a array of integers, but it's much more efficient. > The current API doesn't match that of the normal bitmap routines, > but I think that is desirable. What do you think? The other > thing that is lacking is an efficient and generic bitmap iterator > instead of just walking [0 ... nbits], because in the sparse case > the full-range walking can be grossly inefficient. I agree it's desirable, and I had been planning a bitmap API revision anyway. The big changes I had been thinking for the new interface were: * 64-bit enabled * no inline functions * pluggable back-ends (for multiple implementations, traditional, tree-based extents, disk-backed, etc.) * extent-based set/clear functions (so you can take an extent map from an inode and mark the entire extent as allocated in a block bitmap) * To provide backwards ABI compatibility, if the magic number in the first word of the bitmap indicates an old-style bitmap, dispatch to the old inline-style bitmap operators An iterator makes a lot of sense and I hadn't thought of it, but we should definitely add it. It might also be a good idea to add an extent-based iterator, as well, since that would be even more CPU efficient for some callers. > Some things we are targetting with the design: > - use less RAM for sparsely populated bitmaps > - be not much worse than bitmaps if they turn out not to be sparse > - avoid allocating gigantic or many tiny chunks of memory > - be dynamic in chosing the method for storing "set" bits Yep, all good things. I hadn't really considered the requirement of dynamically choosing a method, but that's because I figured the tree-indexed extents data structure would hopefully be general purpose enough to work with a very large range of filesystems, and dynamism wasn't something I wanted to try to solve the first time around. My current thinking favors a design like the io_manager, where you can have one io_manager (test_io) provide services where the actual back end work is done by another io_manager (i.e., unix_io). So I could imagine a "auto" bitmap type which automatically converts bitmap representations behind the scenes from an in-memory to an on-disk format, hopefully using a single in-memory format which is generally applicable to most cases (such as tree-indexed extents), and then once you go out to disk, it's all about correctness and completing the task, and only secondarily about performance. But part of that is that while your ebitmap implementation has desirable properties in terms of scaling from in-memory sparse arrays to full bitmaps, I suspect a tree-indexed extents implementation has a wider range of applicability, so I was assuming that we wouldn't have to get to the dynamic switching part of the program for quite some time. (IIRC, all xfs_repair has right now is a simple bit-count compression scheme, and that seems to have been sufficient for them.) BTW, If you're interested in implementing an extent-based b-tree, which will be the next low-hanging fruit in terms of reducing e2fsprogs's memory usage, not that we already have a red-black tree implementat
Performance degradation with FFSB between 2.6.20 and 2.6.21-rc7
Running benchmark tests (FFSB) on an ext4 filesystem, I noticed a performance degradation (about 15-20 percent) in sequential write tests between 2.6.19-rc6 and 2.6.21-rc4 kernels. I ran the same tests on ext3 and XFS filesystems and I saw the same performance difference between the two kernel versions for these two filesystems. I have also reproduced it between 2.6.20.7 and 2.6.21-rc7. The FFSB tests run 16 threads, each creating 1GB files. The tests were done on the same x86_64 system, with the same kernel configuration and on the same scsi device. Below are the throughput values given by FFSB. kernel XFSext3 -- 2.6.20.748 MB/sec 44 MB/sec 2.6.21-rc7 38 MB/sec 37 MB/sec Did anyone else run across the problem? Is there a known issue? Valérie (in attachment, my kernel configuration file) # # Automatically generated make config: don't edit # Linux kernel version: 2.6.21-rc7 # Wed Apr 18 11:29:53 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_CPUSETS is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y # # Block layer # CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set CONFIG_GENERIC_CPU=y CONFIG_X86_L1_CACHE_BYTES=128 CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_INTERNODE_CACHE_BYTES=128 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_X86_HT=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_NUMA=y CONFIG_K8_NUMA=y CONFIG_NODES_SHIFT=6 CONFIG_X86_64_ACPI_NUMA=y CONFIG_NUMA_EMU=y CONFIG_ARCH_DISCONTIGMEM_ENABLE=y CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set CONFIG_DISCONTIGMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_DISCONTIGMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MIGRATION=y CONFIG_RESOURCES_64BIT=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y CONFIG_NR_CPUS=32 CONFIG_HOTPLUG_CPU=y CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_AMD=y # CONFIG_KEXEC is not set # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x20 CONFIG_SECCOMP=y # CONFIG_CC_STACKPROTECTOR is not set # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_REORD
Re: Interface for the new fallocate() system call
On Apr 17, 2007 18:25 +0530, Amit K. Arora wrote: > On Fri, Mar 30, 2007 at 02:14:17AM -0500, Jakub Jelinek wrote: > > Wouldn't > > int fallocate(loff_t offset, loff_t len, int fd, int mode) > > work on both s390 and ppc/arm? glibc will certainly wrap it and > > reorder the arguments as needed, so there is no need to keep fd first. > > I think more people are comfirtable with this approach. Really? I thought from the last postings that "fd first, wrap on s390" was better. > Since glibc > will wrap the system call and export the "conventional" interface > (with fd first) to applications, we may not worry about keeping fd first > in kernel code. I am personally fine with this approach. It would seem to make more sense to wrap the syscall on those architectures that can't handle the "conventional" interface (fd first). > Still, if people have major concerns, we can think of getting rid of the > "mode" argument itself. Anyhow we may, in future, need to have a policy > based system call (say, for providing the goal block by applications for > performance reasons). "mode" can then be made part of it. We need at least mode="unallocate" or a separate funallocate() call to allow allocated-but-unwritten blocks to be unallocated without actually punching out written data. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html