Re: nocow flags
Kyle Gates posted on Fri, 02 Mar 2012 11:29:40 -0600 as excerpted: > I set the C (NOCOW) and z (Not_Compressed) flags on a folder but the > extent counts of files contained there keep increasing. > Said files are large and frequently modified but not changing in size. > This does not happen when the filesystem is mounted with nodatacow. > > I'm using this as a workaround since subvolumes can't be mounted with > different options simultaneously. ie. one with COW, one with nodatacow > > Any ideas why the flags are being ignored? > > I'm running 32bit 3.3rc4 with > noatime,nodatasum,space_cache,autodefrag,inode_cache on a 3 disk RAID0 > data RAID1 metadata filesystem. I'm not sure if it applies here or not, but there's a note on the wiki under the defrag discussion, that mentions that defrag, anyway, is per- file, and defragging a dir doesn't defrag the files in that dir. I'm /guessing/ the same thing may apply here since these are per-file flags. There's a workaround suggested. Let me see if I can find that note again... Found it in the problem FAQ: http://btrfs.ipv5.de/index.php? title=Problem_FAQ#Defragmenting_a_directory_doesn.27t_work > Defragmenting a directory doesn't work Running this: # btrfs filesystem defragment ~/stuff doesn't defragment the contents of the directory. This is by design. btrfs fi defrag operates on the single filesystem object passed to it. This means that the command defragments just the metadata held by the directory object, and not the contents of the directory. If you want to defragment the contents of the directory, something like this would be more useful: # find -type f -xdev -print0 | xargs -0 btrfs fi defrag < Perhaps you need to do something similar to set the flags on all the files under a specific dir? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subvolume nomenclature
On Fri, Mar 2, 2012 at 1:44 PM, Brian J. Murrell wrote: > On 12-03-02 08:36 AM, cwillu wrote: >> >> Try btrfs sub delete /etc/apt/oneiric, assuming that that's the path >> where you actually see it. > > Well, there is a root filesystem at /etc/apt/oneiric: > > # ls /etc/apt/oneiric/ > bin etc initrd.img.old mnt root selinux tmp vmlinuz > boot home lib opt run srv usr vmlinuz.old > dev initrd.img media proc sbin sys var > > but it doesn't delete: > > # btrfs subvolume delete /etc/apt/oneiric > Delete subvolume '/etc/apt/oneiric' > ERROR: cannot delete '/etc/apt/oneiric' - Device or resource busy root@repository:~/foo$ btrfs sub create bar Create subvolume './bar' root@repository:~/foo$ cd bar root@repository:~/foo/bar$ btrfs sub del /home/cwillu/foo/bar Delete subvolume '/home/cwillu/foo/bar' ERROR: cannot delete '/home/cwillu/foo/bar' - Device or resource busy It's likely there's at least one process with an open handle to that subvolume (even just a shell that as its current working directory). lsof | grep /etc/apt/oneiric should tell you what you need. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
On Fri, Mar 02, 2012 at 02:32:15PM -0500, Ted Ts'o wrote: > On Fri, Mar 02, 2012 at 09:26:51AM -0500, Chris Mason wrote: > > > > filefrag will tell you how many extents each file has, any file with > > more than one extent is interesting. (The ext4 crowd may have better > > suggestions on measuring fragmentation). > > You can get a *huge* amount of information (probably more than you'll > want to analyze) by doing this: > > e2fsck -nf -E fragcheck /dev/ >& /tmp/fragcheck.out > > I haven't had time to do this in a while, but a while back I used this > to debug the writeback code with an eye towards reducing > fragmentation. At the time I was trying to optimize the case of > reducing fragmentation in the easist case possible, where you start > with an empty file system, and then copy all of the data from another > file system onto it using rsync -avH. > > It would be worth while to see what happens with files written by the > compiler and linker. Given that libelf tends to write .o files > non-sequentially, and without telling us how big the space is in > advance, I could well imagine that we're not doing the best job > avoiding free space fragmentation, which eventually leads to extra > file system aging. I just realized that I confused things. He's doing a read on the results of a cp -a to a fresh FS, so there's no way the compiler/linker are causing trouble. > > It would be interesting to have a project where someone added > fallocate() support into libelf, and then added some hueristics into > ext4 so that if a file is fallocated to a precise size, or if the file > is fully written and closed before writeback begins, that we use this > to more efficiently pack the space used by the files by the block > allocator. This is a place where I would not be surprised that XFS > has some better code to avoid accelerated file system aging, and where > we could do better with ext4 with some development effort. The part I don't think any of us have solved is writing back the files in a good order after we've fallocated the blocks. So this will probably be great for reads and not so good for writes. > > Of course, it might also be possible to hack around this by simply > using VPATH and dropping your build files in a separate place from > your source files, and periodically reformatting the file system where > your build tree lives. (As a side note, something that works well for > me is to use an SSD for my source files, and a separate 5400rpm HDD > for my build tree. That allows me to use a smaller and more > affordable SSD, and since the object files can be written > asynchronously by the writeback threads, but the compiler can't move > forward until it gets file data from the .c or .h file, it gets me the > best price/performance for a laptop build environment.) mkfs for defrag ;) It's the only way to be sure. > > BTW, I suspect we could make acp even more efficient by teaching it to > use FIEMAP ioctl to map out the data blocks for all of the files in > the source file system, and then copied the files (or perhaps even > parts of files) in a read order which reduced seeking on the source > drive. acp does have a -b mode where it fibmaps (I was either lazy or it is older than fiemap, I forget) the first block in the file, and uses that to sort. It does help if the file blocks aren't ordered well wrt their inode numbers, but not if the files are fragmented. It's also worth mentioning that acp doesn't actually cp. I never got that far. It was supposed to be the perfect example of why everything should be done via aio, but it just ended up demonstrating that ordering by inode number and leveraging kernel/hardware reada were more important. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subvolume nomenclature
On 12-03-02 08:36 AM, cwillu wrote: > > Try btrfs sub delete /etc/apt/oneiric, assuming that that's the path > where you actually see it. Well, there is a root filesystem at /etc/apt/oneiric: # ls /etc/apt/oneiric/ bin etc initrd.img.old mnt root selinux tmp vmlinuz boot homelib opt run srv usr vmlinuz.old dev initrd.img media proc sbin sys var but it doesn't delete: # btrfs subvolume delete /etc/apt/oneiric Delete subvolume '/etc/apt/oneiric' ERROR: cannot delete '/etc/apt/oneiric' - Device or resource busy and doesn't unmount: # umount /etc/apt/oneiric umount: /etc/apt/oneiric: not mounted Cheers, b. signature.asc Description: OpenPGP digital signature
Re: getdents - ext4 vs btrfs performance
On Fri, Mar 02, 2012 at 09:26:51AM -0500, Chris Mason wrote: > > filefrag will tell you how many extents each file has, any file with > more than one extent is interesting. (The ext4 crowd may have better > suggestions on measuring fragmentation). You can get a *huge* amount of information (probably more than you'll want to analyze) by doing this: e2fsck -nf -E fragcheck /dev/ >& /tmp/fragcheck.out I haven't had time to do this in a while, but a while back I used this to debug the writeback code with an eye towards reducing fragmentation. At the time I was trying to optimize the case of reducing fragmentation in the easist case possible, where you start with an empty file system, and then copy all of the data from another file system onto it using rsync -avH. It would be worth while to see what happens with files written by the compiler and linker. Given that libelf tends to write .o files non-sequentially, and without telling us how big the space is in advance, I could well imagine that we're not doing the best job avoiding free space fragmentation, which eventually leads to extra file system aging. It would be interesting to have a project where someone added fallocate() support into libelf, and then added some hueristics into ext4 so that if a file is fallocated to a precise size, or if the file is fully written and closed before writeback begins, that we use this to more efficiently pack the space used by the files by the block allocator. This is a place where I would not be surprised that XFS has some better code to avoid accelerated file system aging, and where we could do better with ext4 with some development effort. Of course, it might also be possible to hack around this by simply using VPATH and dropping your build files in a separate place from your source files, and periodically reformatting the file system where your build tree lives. (As a side note, something that works well for me is to use an SSD for my source files, and a separate 5400rpm HDD for my build tree. That allows me to use a smaller and more affordable SSD, and since the object files can be written asynchronously by the writeback threads, but the compiler can't move forward until it gets file data from the .c or .h file, it gets me the best price/performance for a laptop build environment.) BTW, I suspect we could make acp even more efficient by teaching it to use FIEMAP ioctl to map out the data blocks for all of the files in the source file system, and then copied the files (or perhaps even parts of files) in a read order which reduced seeking on the source drive. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nocow flags
I set the C (NOCOW) and z (Not_Compressed) flags on a folder but the extent counts of files contained there keep increasing. Said files are large and frequently modified but not changing in size. This does not happen when the filesystem is mounted with nodatacow. I'm using this as a workaround since subvolumes can't be mounted with different options simultaneously. ie. one with COW, one with nodatacow Any ideas why the flags are being ignored? I'm running 32bit 3.3rc4 with noatime,nodatasum,space_cache,autodefrag,inode_cache on a 3 disk RAID0 data RAID1 metadata filesystem. Thanks, Kyle -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
On Fri, Mar 02, 2012 at 03:16:12PM +0100, Jacek Luczak wrote: > 2012/3/2 Chris Mason : > > On Fri, Mar 02, 2012 at 11:05:56AM +0100, Jacek Luczak wrote: > >> > >> I've took both on tests. The subject is acp and spd_readdir used with > >> tar, all on ext4: > >> 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png > >> 2) spd_readdir: > >> http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png > >> 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png > >> > >> The acp looks much better than spd_readdir but directory copy with > >> spd_readdir decreased to 52m 39sec (30 min less). > > > > Do you have stats on how big these files are, and how fragmented they > > are? For acp and spd to give us this, I think something has gone wrong > > at writeback time (creating individual fragmented files). > > How big? Which files? All the files you're reading ;) filefrag will tell you how many extents each file has, any file with more than one extent is interesting. (The ext4 crowd may have better suggestions on measuring fragmentation). Since you mention this is a compile farm, I'm guessing there are a bunch of .o files created by parallel builds. There are a lot of chances for delalloc and the kernel writeback code to do the wrong thing here. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
2012/3/2 Chris Mason : > On Fri, Mar 02, 2012 at 11:05:56AM +0100, Jacek Luczak wrote: >> >> I've took both on tests. The subject is acp and spd_readdir used with >> tar, all on ext4: >> 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png >> 2) spd_readdir: >> http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png >> 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png >> >> The acp looks much better than spd_readdir but directory copy with >> spd_readdir decreased to 52m 39sec (30 min less). > > Do you have stats on how big these files are, and how fragmented they > are? For acp and spd to give us this, I think something has gone wrong > at writeback time (creating individual fragmented files). How big? Which files? -Jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
On Fri, Mar 02, 2012 at 11:05:56AM +0100, Jacek Luczak wrote: > > I've took both on tests. The subject is acp and spd_readdir used with > tar, all on ext4: > 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png > 2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png > 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png > > The acp looks much better than spd_readdir but directory copy with > spd_readdir decreased to 52m 39sec (30 min less). Do you have stats on how big these files are, and how fragmented they are? For acp and spd to give us this, I think something has gone wrong at writeback time (creating individual fragmented files). -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subvolume nomenclature
On Fri, Mar 2, 2012 at 7:31 AM, Brian J. Murrell wrote: > I seem to have the following subvolumes of my filesystem: > > # btrfs sub li / > ID 256 top level 5 path @ > ID 257 top level 5 path @home > ID 258 top level 5 path @/etc/apt/oneiric > > I *think* the last one is there due to a: > > # btrfsctl -s oneiric / > > that I did prior to doing an upgrade. I can't seem to figure out the > nomenclature to delete it though: > > # btrfs sub de /@/etc/apt/oneiric Try btrfs sub delete /etc/apt/oneiric, assuming that that's the path where you actually see it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
subvolume nomenclature
I seem to have the following subvolumes of my filesystem: # btrfs sub li / ID 256 top level 5 path @ ID 257 top level 5 path @home ID 258 top level 5 path @/etc/apt/oneiric I *think* the last one is there due to a: # btrfsctl -s oneiric / that I did prior to doing an upgrade. I can't seem to figure out the nomenclature to delete it though: # btrfs sub de /@/etc/apt/oneiric ERROR: error accessing '/@/etc/apt/oneiric' I've tried lots of other combinations with no luck. Can anyone give me a hint (or the answer :-) )? Cheers, b. signature.asc Description: OpenPGP digital signature
Re: [PATCH] [RFC] Add btrfs autosnap feature
cwillu wrote (ao): > > While developing snapper I faced similar problems and looked at > > find-new but unfortunately it is not sufficient. E.g. when a file > > is deleted find-new does not report anything, see the reply to my > > mail here one year ago [1]. Also for newly created empty files > > find-new reports nothing, the same with metadata changes. > For a system-wide undo'ish sort of thing that I think autosnapper is > going for, it should work quite nicely, but you're right that it > doesn't help a whole lot with a backup system. It can't tell you > which files were touched or deleted, but it will still tell you that > _something_ in the subvolume was touched, modified or deleted (at > least, as of the last commit), which is all you need if you're only > ever comparing it to its source. Tar can remove deleted files for you during a restore. This is (imho) a really cool feature of tar, and I use it in combination with btrfs snapshots. https://www.gnu.org/software/tar/manual/tar.html#SEC94 "The option `--listed-incremental' instructs tar to operate on an incremental archive with additional metadata stored in a standalone file, called a snapshot file. The purpose of this file is to help determine which files have been changed, added or deleted since the last backup" "When extracting from the incremental backup GNU tar attempts to restore the exact state the file system had when the archive was created. In particular, it will delete those files in the file system that did not exist in their directories when the archive was created" Sander -- Humilis IT Services and Solutions http://www.humilis.net -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem full when it's not? out of inodes? huh?
On Fri, Mar 2, 2012 at 6:50 PM, Brian J. Murrell wrote: > Is 2010-06-01 really the last time the tools were considered > stable or are Ubuntu just being conservative and/or lazy about updating? The last one :) Or probably no one has bugged them enough and point out they're already using a git snapshot anyway and there are many new features in the "current" git version of btrfs-tools. I have been compiling my own kernel (just recently switched to Precise's kernel though) and btrfs-progs for quite some time, so even if Ubuntu doesn't provide updated package it wouldn't matter much to me. If it's important for you, you could file a bug report in launchpad asking for an update. Even debian testing has an updated version (which you might be able to use: http://packages.debian.org/btrfs-tools) Or create your own ppa with an updated version (or at least rebuilt of Debian's version). -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Add btrfs autosnap feature
>> Perhaps all that is unnecessary: rather than doing the walk, why not >> make use of btrfs subvolume find-new (or rather, the syscalls it >> uses)? > > While developing snapper I faced similar problems and looked at > find-new but unfortunately it is not sufficient. E.g. when a file > is deleted find-new does not report anything, see the reply to my > mail here one year ago [1]. Also for newly created empty files > find-new reports nothing, the same with metadata changes. > > If I'm wrong or find-new gets extended I happy to implement it in > snapper. For a system-wide undo'ish sort of thing that I think autosnapper is going for, it should work quite nicely, but you're right that it doesn't help a whole lot with a backup system. It can't tell you which files were touched or deleted, but it will still tell you that _something_ in the subvolume was touched, modified or deleted (at least, as of the last commit), which is all you need if you're only ever comparing it to its source. -- Carey -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem full when it's not? out of inodes? huh?
On 12-02-26 06:00 AM, Hugo Mills wrote: > >The option that nobody's mentioned yet is to use mixed mode. This > is the -M or --mixed option when you create the filesystem. It's > designed specifically for small filesystems, and removes the > data/metadata split for more efficient packing. Cool. >As mentioned before, you probably need to upgrade to 3.2 or 3.3-rc5 > anyway. There were quite a few fixes in the ENOSPC/allocation area > since then. I've upgraded to the Ubuntu Precise kernel which looks to be 3.2.6 with btrfs-tools 0.19+20100601-3ubuntu3 so that would look like a btrfs-progs snapshot from 2010-06-01 and (unsurprisingly) I don't see the -M option in mkfs.btrfs. So I went digging and I just wanted to verify what I think I am seeing. Looking at http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=67377734fd24c32cbdfeb697c2e2bd7fed519e75 it would appear that the mixed data+metadata code landed in the kernel back in Sep, of 2010, is that correct? And looking at http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commit;h=b8802ae3fa0c70d4cfc3287ed07479925973b0ac the userspace support for this landed in Dec. of 2010, is that right? If my archeology is correct, then I only need to update my btrfs-tools, yes? Is 2010-06-01 really the last time the tools were considered stable or are Ubuntu just being conservative and/or lazy about updating? Cheers, b. signature.asc Description: OpenPGP digital signature
Re: [PATCH] [RFC] Add btrfs autosnap feature
On Thu, Mar 01, 2012 at 05:54:40AM -0600, cwillu wrote: > There doesn't appear to be any reason for the scratch file to exist at > all (one can build up the hash while reading the directories), and > keeping a scratch file in /etc/ is poor practice in the first place > (that's what /tmp and/or /var/run is for). It's also a lot of io to > stat every file in the subvolume every time you make a snapshot, and > I'm not convinced that the walk is actually correctly implemented: > what stops an autosnap of / from including all of /proc and /sys in > the hash? > > Perhaps all that is unnecessary: rather than doing the walk, why not > make use of btrfs subvolume find-new (or rather, the syscalls it > uses)? While developing snapper I faced similar problems and looked at find-new but unfortunately it is not sufficient. E.g. when a file is deleted find-new does not report anything, see the reply to my mail here one year ago [1]. Also for newly created empty files find-new reports nothing, the same with metadata changes. If I'm wrong or find-new gets extended I happy to implement it in snapper. Regards, Arvin [1] http://www.spinics.net/lists/linux-btrfs/msg08683.html -- Arvin Schnell, Senior Software Engineer, Research & Development SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
2012/3/1 Chris Mason : > On Wed, Feb 29, 2012 at 11:44:31PM -0500, Theodore Tso wrote: >> You might try sorting the entries returned by readdir by inode number before >> you stat them. This is a long-standing weakness in ext3/ext4, and it has >> to do with how we added hashed tree indexes to directories in (a) a >> backwards compatible way, that (b) was POSIX compliant with respect to >> adding and removing directory entries concurrently with reading all of the >> directory entries using readdir. >> >> You might try compiling spd_readdir from the e2fsprogs source tree (in the >> contrib directory): >> >> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209 >> >> … and then using that as a LD_PRELOAD, and see how that changes things. >> >> The short version is that we can't easily do this in the kernel since it's a >> problem that primarily shows up with very big directories, and using >> non-swappable kernel memory to store all of the directory entries and then >> sort them so they can be returned in inode number just isn't practical. It >> is something which can be easily done in userspace, though, and a number of >> programs (including mutt for its Maildir support) does do, and it helps >> greatly for workloads where you are calling readdir() followed by something >> that needs to access the inode (i.e., stat, unlink, etc.) >> > > For reading the files, the acp program I sent him tries to do something > similar. I had forgotten about spd_readdir though, we should consider > hacking that into cp and tar. > > One interesting note is the page cache used to help here. Picture two > tests: > > A) time tar cf /dev/zero /home > > and > > cp -a /home /new_dir_in_new_fs > unmount / flush caches > B) time tar cf /dev/zero /new_dir_in_new_fs > > On ext, The time for B used to be much faster than the time for A > because the files would get written back to disk in roughly htree order. > Based on Jacek's data, that isn't true anymore. I've took both on tests. The subject is acp and spd_readdir used with tar, all on ext4: 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png 2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png The acp looks much better than spd_readdir but directory copy with spd_readdir decreased to 52m 39sec (30 min less). -Jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
2012/3/1 Ted Ts'o : > On Thu, Mar 01, 2012 at 03:43:41PM +0100, Jacek Luczak wrote: >> >> Yep, ext4 is close to my wife's closet. >> > > Were all of the file systems freshly laid down, or was this an aged > ext4 file system? Always fresh, recreated for each tests - that's why it takes quite much time as I had to copy the test dir back in places. Env is kept in all tests as much consistent as possible thus the values should have credibility. > Also you should beware that if you have a workload which is heavy > parallel I/O, with lots of random, read/write accesses to small files, > a benchmark using tar might not be representative of what you will see > in production --- different file systems have different strengths and > weaknesses --- and the fact that ext3/ext4's readdir() returns inodes > in a non-optimal order for stat(2) or unlink(2) or file copy in the > cold cache case may not matter as much as you think in a build server. > (i.e., the directories that do need to be searched will probably be > serviced out of the dentry cache, etc.) The set of tests were chosen as is not to find best fs for build purposes. For a pure builds ext4 is as of now the best and most of the points you've put above are valid here. We've performed a real tests in a clone of production environment. Results are not that surprising as one can find same from e.g. Phoronix test. We've done test in XFS vs ext[34] and ext4 vs btrfs. Here if we are taking into account only a software compilation ext4 rocks. Btrfs is only few seconds slower (max 5 in average). The choice then was to use ext4 due to more mature foundations and support in RHEL. Why we're looking for new one? Well, the build environment is not only based on software building. There are e.g. some strange tests running in parallel, code analysis, etc. Users are doing damn odd things there ... are using Java, you can imagine how bad bad bad bad zombie java can be. We've failed here and are not able to isolate each use cases and create profiled environment. Thus we need to find sth that will provide common sense for all use cases. The previous tests shown that ext[34] rocks on compilation timing, but all around not really. Also one need to remember that fs content changes often. The ext3 was ageing in 4-6 months of use. XFS on the other hand was great all around while not in compilation timings. Roughly 10% is not that much but if hosts is doing builds 24h/7 then after a few days we've been much behind ext[34] clone env. Btrfs was only tested against compilation timings, not in general use. We've created a simple test case for compilation. It's quite not same as what we got in real env but is good baseline (kernel build system is too perfect). Simply parallel kernel builds with randomly allyesconfig or allmodconfig. Below are the seekwatcher graphs of around 1h of tests running. There were 10 builds (kernels 2.6.20-2.6.29) running with three parallel threads: 1) ext4: http://91.234.146.107/~difrost/seekwatcher/kt_ext4.png 2) btrfs: http://91.234.146.107/~difrost/seekwatcher/kt_btrfs.png 3) both: http://91.234.146.107/~difrost/seekwatcher/kt_btrfs_ext4.png Above graphs prove that ext4 is ahead in this ,,competition''. I will try to setup there a real build env to see how those two compare. -Jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html