Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Tue, 14 Aug 2012 15:54:17 +0200, Daniel Troeder wrote: sys-process/incron ? Uh... didn't know that one! ... very interesting :) Have you used it? Yes... How does it perform if there are lots of modifications going on? Does it have a throttle against fork bombing? must-read-myself-a-little. but only for fairly infrequently written locations. I have no idea how well it scales. -- Neil Bothwick Hors d'oeuvres: 3 sandwiches cut into 40 pieces. signature.asc Description: PGP signature
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Tue, 2012-08-14 at 18:36 +0200, Helmut Jarausch wrote: On 08/14/2012 04:07:39 AM, Adam Carter wrote: I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways. If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used by some distribution and Oracle for real work) Most benchmark don't use compression since other FS can't use it. But that's unfair. With compression, one needs to read much less data (my /usr partition has less than 50% of an ext4 partition, savings with the root partition are even higher). I'm using the mount options compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent kernel. I'd give it a try. Helmut. Whats the latest on fsck tools for BTRFS? - useful ones are still not available right? Reason I am asking is that is not an easy question to google, and my last attempt to use BTRFS for serious work ended in tears when I couldn't rescue a corrupted file system. BillK
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Wed, 2012-08-15 at 15:31 +0800, Bill Kenworthy wrote: On Tue, 2012-08-14 at 18:36 +0200, Helmut Jarausch wrote: On 08/14/2012 04:07:39 AM, Adam Carter wrote: I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways. If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used by some distribution and Oracle for real work) Most benchmark don't use compression since other FS can't use it. But that's unfair. With compression, one needs to read much less data (my /usr partition has less than 50% of an ext4 partition, savings with the root partition are even higher). I'm using the mount options compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent kernel. I'd give it a try. Helmut. Whats the latest on fsck tools for BTRFS? - useful ones are still not available right? Reason I am asking is that is not an easy question to google, and my last attempt to use BTRFS for serious work ended in tears when I couldn't rescue a corrupted file system. BillK Sorry, replying to myself to clarify ... I sent this as I was reading the backlog before the statement that the tools are incomplete. my question is more along the lines of do they work? (which was answered as I do not know in posted links which are probably old) Another point I just saw is its inability to support swapfiles. Also in the past OO would not compile on a btrfs (/tmp/portage) filesystem as it did something that basicly killed everything. Other packages were fine. Then there was a certain man page I couldnt backup to a btrfs file system, ~/.gvfs files that hung the system when I tried to put on btrfs. Hopefully they have been fixed. BillK
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On 13.08.2012 16:53, Michael Hampicke wrote: 2012/8/13 Daniel Troeder dan...@admin-box.com 3rd thought: purging old files with find? your cache system should have some kind of DB that holds that information. 3: Well, it's a 3rd party application that - in theory - should take care of removing old files. Sadly, it does not work as it's supposed to be, While time passes the number of orphans grow :( There is also the possibility to write a really small daemon (less than 50 lines of C) that registers with inotify for the entire fs and journals the file activity to a sqlite-db. A simple sql-query from a cron/bash script will then give you all the files to delete with paths. It will probably be less work to write the daemon than to do 40 fs-benchmarks - and the result will be the most efficient.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote: There is also the possibility to write a really small daemon (less than 50 lines of C) that registers with inotify for the entire fs and journals the file activity to a sqlite-db. sys-process/incron ? -- Neil Bothwick A friend of mine sent me a postcard with a satellite photo of the entire planet on it, and on the back he wrote, Wish you were here. signature.asc Description: PGP signature
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 14.08.2012 11:46, schrieb Neil Bothwick: On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote: There is also the possibility to write a really small daemon (less than 50 lines of C) that registers with inotify for the entire fs and journals the file activity to a sqlite-db. sys-process/incron ? I think in order to make it work, you have to increase the number of file descriptors available to inotify. See /proc/sys/fs/inotify/max_user_watches Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On 14.08.2012 11:46, Neil Bothwick wrote: On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote: There is also the possibility to write a really small daemon (less than 50 lines of C) that registers with inotify for the entire fs and journals the file activity to a sqlite-db. sys-process/incron ? Uh... didn't know that one! ... very interesting :) Have you used it? How does it perform if there are lots of modifications going on? Does it have a throttle against fork bombing? must-read-myself-a-little. A incron line # sqlite3 /file.sql 'INSERT filename, date INTO table' would be inefficient, because it spawn lots of processes, but it would be very nice to simply test out the idea. Then a # sqlite3 /file.sql 'SELECT filename FROM table SORTBY date date-30days' or something to get the files older than 30 days, and voilá :)
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 13.08.2012 20:18, schrieb Michael Hampicke: Am 13.08.2012 19:14, schrieb Florian Philipp: Am 13.08.2012 16:52, schrieb Michael Mol: On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote: Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time) 1) Scan for files to remove 2) disable index 3) Remove files 4) enable index ? -- :wq Other things to think about: 1. Play around with data=journal/writeback/ordered. IIRC, data=journal actually used to improve performance depending on the workload as it delays random IO in favor of sequential IO (when updating the journal). 2. Increase the journal size. 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of course this only helps after re-allocating everything. 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since 2.6.39 IIRC). For example: find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \ xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f 5. Use a separate device for the journal. 6. Temporarily deactivate the journal with tune2fs similar to MM's idea. Regards, Florian Philipp Trying out different journals-/options was already on my list, but the manpage on chattr regarding the T attribute is an interesting read. Definitely worth trying. Parallelizing multiple finds was something I already did, but the only thing that increased was the IO wait :) But now having read all the suggestions in this thread, I might try it again. Separate device for the journal is a good idea, but not possible atm (machine is abroad in a data center) Something else I just remembered. I guess it doesn't help you with your current problem but it might come in handy when working with such large cache dirs: I once wrote a script that sorts files by their starting physical block. This improved reading them quite a bit (2 minutes instead of 11 minutes for copying the portage tree). It's a terrible clutch, will probably fail when passing FS boundaries or a thousand other oddities and requires root for some very scary programs. I never had the time to finish an improved C version. Anyway, maybe it helps you: #!/bin/bash # # Example below copies /usr/portage/* to /tmp/portage. # Replace /usr/portage with the input directory. # Replace `cpio` with whatever does the actual work. Input is a # \0-delimited file list. # FIFO=/tmp/$(uuidgen).fifo mkfifo $FIFO find /usr/portage -type f -fprintf $FIFO 'bmap %i 0\n' -print0 | tr '\n\0' '\0\n' | paste ( debugfs -f $FIFO /dev/mapper/vg-portage | grep -E '^[[:digit:]]+' ) - | sort -k 1,1n | cut -f 2- | tr '\n\0' '\0\n' | cpio -p0 --make-directories /tmp/portage/ unlink $FIFO signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 14.08.2012 15:54, schrieb Daniel Troeder: On 14.08.2012 11:46, Neil Bothwick wrote: On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote: There is also the possibility to write a really small daemon (less than 50 lines of C) that registers with inotify for the entire fs and journals the file activity to a sqlite-db. sys-process/incron ? Uh... didn't know that one! ... very interesting :) Have you used it? How does it perform if there are lots of modifications going on? Does it have a throttle against fork bombing? must-read-myself-a-little. A incron line # sqlite3 /file.sql 'INSERT filename, date INTO table' would be inefficient, because it spawn lots of processes, but it would be very nice to simply test out the idea. Then a # sqlite3 /file.sql 'SELECT filename FROM table SORTBY date date-30days' or something to get the files older than 30 days, and voilá :) Maybe inotifywait is better for this kind of batch job. Collecting events: inotifywait -rm -e CREATE,DELETE --timefmt '%s' --format \ $(printf '%%T\t%%e\t%%w%%f') /tmp events.tbl # the printf is there because inotifywait's format does not # recognize common escapes like \t # Output format: # Seconds since epoch \t CREATE/DELETE \t file name \n Filtering events: sort --stable -k3 events.tbl | awk ' function update() { line=$0; exists= $2==DELETE ? 0 : 1; file=$3 } NR==1{ update(); next } { if($3!=file exists==1){ print line } update() }' # Sorts by file name while preserving temporal order. # Uses awk to suppress output of files that have been deleted. # Output: Last CREATE event for each existing file Retrieving files created 30+ days ago: awk -v newest=$(date -d -5seconds +%s) ' $1newest{ nextfile } { print $3 }' Remarks: The awk scripts need some improvement if you have to handle whitespaces in filenames but with the input format, it should be able to work with everything except newlines. Inotifywait itself is utterly useless when dealing with newlines in file names unless you want to put some serious effort into sanitizing the output. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 14.08.2012 17:09, schrieb Florian Philipp: Retrieving files created 30+ days ago: awk -v newest=$(date -d -5seconds +%s) ' $1newest{ nextfile } { print $3 }' s/-5seconds/-30days/ signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On 08/14/2012 04:07:39 AM, Adam Carter wrote: I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways. If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used by some distribution and Oracle for real work) Most benchmark don't use compression since other FS can't use it. But that's unfair. With compression, one needs to read much less data (my /usr partition has less than 50% of an ext4 partition, savings with the root partition are even higher). I'm using the mount options compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent kernel. I'd give it a try. Helmut.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de wrote: On 08/14/2012 04:07:39 AM, Adam Carter wrote: I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways. If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used by some distribution and Oracle for real work) Most benchmark don't use compression since other FS can't use it. But that's unfair. With compression, one needs to read much less data (my /usr partition has less than 50% of an ext4 partition, savings with the root partition are even higher). I'm using the mount options compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent kernel. I'd give it a try. Helmut. Are the support tools for btrfs (fsck, defrag, etc.) already complete? If so, I certainly would like to take it out for a spin... Rgds,
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Sure, but wouldn't compression make write operations slower? And isn't he looking for performance? On Aug 14, 2012 1:14 PM, Pandu Poluan pa...@poluan.info wrote: On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de wrote: On 08/14/2012 04:07:39 AM, Adam Carter wrote: I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways. If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used by some distribution and Oracle for real work) Most benchmark don't use compression since other FS can't use it. But that's unfair. With compression, one needs to read much less data (my /usr partition has less than 50% of an ext4 partition, savings with the root partition are even higher). I'm using the mount options compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent kernel. I'd give it a try. Helmut. Are the support tools for btrfs (fsck, defrag, etc.) already complete? If so, I certainly would like to take it out for a spin... Rgds,
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger: Sure, but wouldn't compression make write operations slower? And isn't he looking for performance? not really. As long as the CPU can compress faster than the disk can write stuff. More interessting: is btrfs trying to be smart - only compressing compressible stuff? -- #163933
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 14.08.2012 16:00, schrieb Florian Philipp: Am 13.08.2012 20:18, schrieb Michael Hampicke: Am 13.08.2012 19:14, schrieb Florian Philipp: Am 13.08.2012 16:52, schrieb Michael Mol: On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote: Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time) 1) Scan for files to remove 2) disable index 3) Remove files 4) enable index ? -- :wq Other things to think about: 1. Play around with data=journal/writeback/ordered. IIRC, data=journal actually used to improve performance depending on the workload as it delays random IO in favor of sequential IO (when updating the journal). 2. Increase the journal size. 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of course this only helps after re-allocating everything. 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since 2.6.39 IIRC). For example: find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \ xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f 5. Use a separate device for the journal. 6. Temporarily deactivate the journal with tune2fs similar to MM's idea. Regards, Florian Philipp Trying out different journals-/options was already on my list, but the manpage on chattr regarding the T attribute is an interesting read. Definitely worth trying. Parallelizing multiple finds was something I already did, but the only thing that increased was the IO wait :) But now having read all the suggestions in this thread, I might try it again. Separate device for the journal is a good idea, but not possible atm (machine is abroad in a data center) Something else I just remembered. I guess it doesn't help you with your current problem but it might come in handy when working with such large cache dirs: I once wrote a script that sorts files by their starting physical block. This improved reading them quite a bit (2 minutes instead of 11 minutes for copying the portage tree). It's a terrible clutch, will probably fail when passing FS boundaries or a thousand other oddities and requires root for some very scary programs. I never had the time to finish an improved C version. Anyway, maybe it helps you: #!/bin/bash # # Example below copies /usr/portage/* to /tmp/portage. # Replace /usr/portage with the input directory. # Replace `cpio` with whatever does the actual work. Input is a # \0-delimited file list. # FIFO=/tmp/$(uuidgen).fifo mkfifo $FIFO find /usr/portage -type f -fprintf $FIFO 'bmap %i 0\n' -print0 | tr '\n\0' '\0\n' | paste ( debugfs -f $FIFO /dev/mapper/vg-portage | grep -E '^[[:digit:]]+' ) - | sort -k 1,1n | cut -f 2- | tr '\n\0' '\0\n' | cpio -p0 --make-directories /tmp/portage/ unlink $FIFO No, I don't think that's practicable with the number of files in my setup. To be honest, currently I am quite happy with the performance of btrfs. Running through the directory tree only takes 1/10th of the time it took with ext4, and deletes are pretty fast as well. I'm sure there's still room for more improvement, but right now it's much better than it was before.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am Mittwoch, 15. August 2012, 00:05:40 schrieb Pandu Poluan: Are the support tools for btrfs (fsck, defrag, etc.) already complete? no -- #163933
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 14.08.2012 10:21, schrieb Daniel Troeder: On 13.08.2012 16:53, Michael Hampicke wrote: 2012/8/13 Daniel Troeder dan...@admin-box.com 3rd thought: purging old files with find? your cache system should have some kind of DB that holds that information. 3: Well, it's a 3rd party application that - in theory - should take care of removing old files. Sadly, it does not work as it's supposed to be, While time passes the number of orphans grow :( There is also the possibility to write a really small daemon (less than 50 lines of C) that registers with inotify for the entire fs and journals the file activity to a sqlite-db. A simple sql-query from a cron/bash script will then give you all the files to delete with paths. It will probably be less work to write the daemon than to do 40 fs-benchmarks - and the result will be the most efficient. That is an interesting idea, but I have never used inotify on such a huge file base, I am not sure what impact that has in terms of cpu cycles being used. But I am going to try this on some snowy winter weekend :)
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 14.08.2012 19:21, schrieb Jason Weisberger: Sure, but wouldn't compression make write operations slower? And isn't he looking for performance? On Aug 14, 2012 1:14 PM, Pandu Poluan pa...@poluan.info wrote: On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de wrote: On 08/14/2012 04:07:39 AM, Adam Carter wrote: I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways. If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used by some distribution and Oracle for real work) Most benchmark don't use compression since other FS can't use it. But that's unfair. With compression, one needs to read much less data (my /usr partition has less than 50% of an ext4 partition, savings with the root partition are even higher). I'm using the mount options compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent kernel. I'd give it a try. Helmut. Are the support tools for btrfs (fsck, defrag, etc.) already complete? If so, I certainly would like to take it out for a spin... Rgds, I have enough cpu power at hand for compression, I guess that should not be the issue. But the cache dir mostly consists of prescaled jpeg images, so compressing them again would not give me any benefits, speed- or size-wise.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 14.08.2012 19:42, schrieb Volker Armin Hemmann: Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger: Sure, but wouldn't compression make write operations slower? And isn't he looking for performance? not really. As long as the CPU can compress faster than the disk can write stuff. More interessting: is btrfs trying to be smart - only compressing compressible stuff? It does do that, but letting btrfs check if the files are already compressed, if you know, that they are compressed, is a waste of cpu cycles :)
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Tue, Aug 14, 2012 at 12:05 PM, Pandu Poluan pa...@poluan.info wrote: On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de wrote: On 08/14/2012 04:07:39 AM, Adam Carter wrote: I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways. If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used by some distribution and Oracle for real work) Most benchmark don't use compression since other FS can't use it. But that's unfair. With compression, one needs to read much less data (my /usr partition has less than 50% of an ext4 partition, savings with the root partition are even higher). I'm using the mount options compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent kernel. I'd give it a try. Helmut. Are the support tools for btrfs (fsck, defrag, etc.) already complete? Do they exist? Yes (sys-fs/btrfs-progs). Are they complete? Probably not...
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Tue, Aug 14, 2012 at 12:50 PM, Michael Hampicke gentoo-u...@hadt.biz wrote: Am 14.08.2012 19:42, schrieb Volker Armin Hemmann: Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger: Sure, but wouldn't compression make write operations slower? And isn't he looking for performance? not really. As long as the CPU can compress faster than the disk can write stuff. More interessting: is btrfs trying to be smart - only compressing compressible stuff? It does do that, but letting btrfs check if the files are already compressed, if you know, that they are compressed, is a waste of cpu cycles :) Also look into the difference between compress and compress-force[0]. I wonder how much overhead checking whether or not to compress a file costs. I use mount options similar to Helmut and get great results: defaults,autodefrag,space_cache,compress=lzo,subvol=@,relatime But most of my data is compressible. Compression makes such a huge difference, it surprises me. Apparently on this Ubuntu system it automatically makes use of all files on / as a subvolume in @. Interesting. Anyway, btrfs-progs does include basic fsck now but I wouldn't use it for anything serious[1]. [0] https://btrfs.wiki.kernel.org/index.php/Mount_options [1] https://btrfs.wiki.kernel.org/index.php/Btrfsck
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Tue, Aug 14, 2012 at 3:55 PM, Alecks Gates aleck...@gmail.com wrote: On Tue, Aug 14, 2012 at 12:50 PM, Michael Hampicke gentoo-u...@hadt.biz wrote: Am 14.08.2012 19:42, schrieb Volker Armin Hemmann: Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger: Sure, but wouldn't compression make write operations slower? And isn't he looking for performance? not really. As long as the CPU can compress faster than the disk can write stuff. More interessting: is btrfs trying to be smart - only compressing compressible stuff? It does do that, but letting btrfs check if the files are already compressed, if you know, that they are compressed, is a waste of cpu cycles :) Also look into the difference between compress and compress-force[0]. I wonder how much overhead checking whether or not to compress a file costs. I use mount options similar to Helmut and get great results: defaults,autodefrag,space_cache,compress=lzo,subvol=@,relatime But most of my data is compressible. Compression makes such a huge difference, it surprises me. Apparently on this Ubuntu system it automatically makes use of all files on / as a subvolume in @. Interesting. Huge difference, how? Could we see some bonnie++ comparisons between the various configurations we've discussed for ext4 and btrfs? Depending on the results, it might be getting time for me to take the plunge myself. -- :wq
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Tue, Aug 14, 2012 at 3:17 PM, Michael Mol mike...@gmail.com wrote: On Tue, Aug 14, 2012 at 3:55 PM, Alecks Gates aleck...@gmail.com wrote: On Tue, Aug 14, 2012 at 12:50 PM, Michael Hampicke gentoo-u...@hadt.biz wrote: Am 14.08.2012 19:42, schrieb Volker Armin Hemmann: Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger: Sure, but wouldn't compression make write operations slower? And isn't he looking for performance? not really. As long as the CPU can compress faster than the disk can write stuff. More interessting: is btrfs trying to be smart - only compressing compressible stuff? It does do that, but letting btrfs check if the files are already compressed, if you know, that they are compressed, is a waste of cpu cycles :) Also look into the difference between compress and compress-force[0]. I wonder how much overhead checking whether or not to compress a file costs. I use mount options similar to Helmut and get great results: defaults,autodefrag,space_cache,compress=lzo,subvol=@,relatime But most of my data is compressible. Compression makes such a huge difference, it surprises me. Apparently on this Ubuntu system it automatically makes use of all files on / as a subvolume in @. Interesting. Huge difference, how? Could we see some bonnie++ comparisons between the various configurations we've discussed for ext4 and btrfs? Depending on the results, it might be getting time for me to take the plunge myself. -- :wq Check out some of the benchmarks on Phoronix[0]. It's definitely not a win-win scenario, but it seems to be great at random writes and compiling. And a lot of those wins are without compress=lzo enabled, so it only gets better. I'm not going to say it's the absolute best out there (because it isn't, of course), but it's at least worth checking into. I'm using a standard 2.5 HDD like in this[1] so perhaps that's why I see the results. [0] http://www.phoronix.com/scan.php?page=searchq=Btrfs [1] http://www.phoronix.com/scan.php?page=articleitem=btrfs_old_linux31
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Mon 13 Aug 2012 06:46:53 PM IST, Michael Hampicke wrote: Howdy gentooers, I am looking for a filesystem that perfomes well for a cache directory. Here's some data on that dir: - cache for prescaled images files + metadata files - nested directory structure ( 20/2022/202231/*files* ) - about 20GB - 100.000 directories - about 2 million files The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two 10.000rpm hard drives running a RAID1. Up until now I was using ext4 with noatime, but I am not happy with it's performence. Finding and deleting old files with 'find' is incredible slow, so I am looking for a filesystem that performs better. First candiate that came to mind was reiserfs, but last time I tried it, it became slower over time (fragmentation?). Currently I am running a test with btrfs and so far I am quiet happy with it as it is much faster in my use case. Do you guys have any other suggestions? How about JFS? I used that on my old NAS box because of it's low cpu usage. Should I give reiser4 a try, or better leave it be given Hans Reiser's current status? Thx in advance, Mike You should have a look at xfs. I used to use ext4 earlier, traversing through /usr/portage used to be very slow. When I switched xfs, speed increased drastically. This might be kind of unrelated, but makes sense. -- Nilesh Govindrajan http://nileshgr.com
Re: [gentoo-user] Fast file system for cache directory with lot's of files
You should have a look at xfs. I used to use ext4 earlier, traversing through /usr/portage used to be very slow. When I switched xfs, speed increased drastically. This might be kind of unrelated, but makes sense. I guess traversing through directories may be faster with XFS, but in my experience ext4 perfoms better than XFS in regard to operations (cp, rm) on small files. I read that there are some tuning options for XFS and small files, but never tried it. But if somone seconds XFS I will try it too.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Aug 13, 2012 9:01 PM, Michael Hampicke mgehampi...@gmail.com wrote: You should have a look at xfs. I used to use ext4 earlier, traversing through /usr/portage used to be very slow. When I switched xfs, speed increased drastically. This might be kind of unrelated, but makes sense. I guess traversing through directories may be faster with XFS, but in my experience ext4 perfoms better than XFS in regard to operations (cp, rm) on small files. I read that there are some tuning options for XFS and small files, but never tried it. But if somone seconds XFS I will try it too. Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Rgds,
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On 13.08.2012 15:16, Michael Hampicke wrote: - about 20GB - 100.000 directories - about 2 million files The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two 10.000rpm hard drives running a RAID1. 1st thought: switch to SSDs 2nd thought: maybe lots of writes? - get a SSD for the fs metadata 3rd thought: purging old files with find? your cache system should have some kind of DB that holds that information.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Michael Hampicke wrote: You should have a look at xfs. I used to use ext4 earlier, traversing through /usr/portage used to be very slow. When I switched xfs, speed increased drastically. This might be kind of unrelated, but makes sense. I guess traversing through directories may be faster with XFS, but in my experience ext4 perfoms better than XFS in regard to operations (cp, rm) on small files. I read that there are some tuning options for XFS and small files, but never tried it. But if somone seconds XFS I will try it too. It's been a while since I messed with this but isn't XFS the one that hates power failures and such? Dale :-) :-) -- I am only responsible for what I said ... Not for what you understood or how you interpreted my words!
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time)
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.comwrote: Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time) 1) Scan for files to remove 2) disable index 3) Remove files 4) enable index ? -- :wq
Re: [gentoo-user] Fast file system for cache directory with lot's of files
2012/8/13 Daniel Troeder dan...@admin-box.com On 13.08.2012 15:16, Michael Hampicke wrote: - about 20GB - 100.000 directories - about 2 million files The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two 10.000rpm hard drives running a RAID1. 1st thought: switch to SSDs 2nd thought: maybe lots of writes? - get a SSD for the fs metadata 3rd thought: purging old files with find? your cache system should have some kind of DB that holds that information. 1: SSDs are not possible atm. The machine is in a data center abroad. 2: Writes are not that much of a problem at this time. 3: Well, it's a 3rd party application that - in theory - should take care of removing old files. Sadly, it does not work as it's supposed to be, While time passes the number of orphans grow :(
Re: [gentoo-user] Fast file system for cache directory with lot's of files
I guess traversing through directories may be faster with XFS, but in my experience ext4 perfoms better than XFS in regard to operations (cp, rm) on small files. I read that there are some tuning options for XFS and small files, but never tried it. But if somone seconds XFS I will try it too. It's been a while since I messed with this but isn't XFS the one that hates power failures and such? Dale :-) :-) -- I am only responsible for what I said ... Not for what you understood or how you interpreted my words! Well, it's the delayed allocation of XFS (which prevents fragmentation) that does not like sudden power losses :) But ext4 has that too, you can disable it though - that should be true for XFS too. But the power situation in the datacenter has never been a problem so far, and even if the cache partition get's screwed, we can always rebuild it. Takes a few hours, but it would not be the end of the world :)
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Mon 13 Aug 2012 08:28:15 PM IST, Michael Hampicke wrote: I guess traversing through directories may be faster with XFS, but in my experience ext4 perfoms better than XFS in regard to operations (cp, rm) on small files. I read that there are some tuning options for XFS and small files, but never tried it. But if somone seconds XFS I will try it too. It's been a while since I messed with this but isn't XFS the one that hates power failures and such? Dale :-) :-) -- I am only responsible for what I said ... Not for what you understood or how you interpreted my words! Well, it's the delayed allocation of XFS (which prevents fragmentation) that does not like sudden power losses :) But ext4 has that too, you can disable it though - that should be true for XFS too. But the power situation in the datacenter has never been a problem so far, and even if the cache partition get's screwed, we can always rebuild it. Takes a few hours, but it would not be the end of the world :) Yes, XFS hates power failures. I got a giant UPS for my home desktop to use XFS because of it's excellent performance ;-) -- Nilesh Govindrajan http://nileshgr.com
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 13.08.2012 16:52, schrieb Michael Mol: On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.comwrote: Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time) 1) Scan for files to remove 2) disable index 3) Remove files 4) enable index ? That's what I love about gentoo-users :) , I would never have thought of that myself. I will try this and see how much of an performance gain there is. Disabling the index should only require a 'mount -o remount' I guess.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Mon, Aug 13, 2012 at 11:26 AM, Michael Hampicke mgehampi...@gmail.comwrote: Am 13.08.2012 16:52, schrieb Michael Mol: On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.comwrote: Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time) 1) Scan for files to remove 2) disable index 3) Remove files 4) enable index ? That's what I love about gentoo-users :) , I would never have thought of that myself. I will try this and see how much of an performance gain there is. Disabling the index should only require a 'mount -o remount' I guess. It's the same logic as behind database programming; do a bulk modification, then update the index afterwards. The index update will take longer than that for a single file, but it has the potential to be more efficient in bulk operations. (It'd be nice if ext4 supported transactional behaviors, where you could defer index updates until after a commit, but I don't think it (or any filesystem on Linux) does.) You *should* be able to enable/disable indexes on a per-directory basis, so if your search pattern is confined, I'd go that route. -- :wq
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 13.08.2012 16:52, schrieb Michael Mol: On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote: Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time) 1) Scan for files to remove 2) disable index 3) Remove files 4) enable index ? -- :wq Other things to think about: 1. Play around with data=journal/writeback/ordered. IIRC, data=journal actually used to improve performance depending on the workload as it delays random IO in favor of sequential IO (when updating the journal). 2. Increase the journal size. 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of course this only helps after re-allocating everything. 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since 2.6.39 IIRC). For example: find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \ xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f 5. Use a separate device for the journal. 6. Temporarily deactivate the journal with tune2fs similar to MM's idea. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am 13.08.2012 19:14, schrieb Florian Philipp: Am 13.08.2012 16:52, schrieb Michael Mol: On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote: Have you indexed your ext4 partition? # tune2fs -O dir_index /dev/your_partition # e2fsck -D /dev/your_partition Hi, the dir_index is active. I guess that's why delete operations take as long as they take (index has to be updated every time) 1) Scan for files to remove 2) disable index 3) Remove files 4) enable index ? -- :wq Other things to think about: 1. Play around with data=journal/writeback/ordered. IIRC, data=journal actually used to improve performance depending on the workload as it delays random IO in favor of sequential IO (when updating the journal). 2. Increase the journal size. 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of course this only helps after re-allocating everything. 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since 2.6.39 IIRC). For example: find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \ xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f 5. Use a separate device for the journal. 6. Temporarily deactivate the journal with tune2fs similar to MM's idea. Regards, Florian Philipp Trying out different journals-/options was already on my list, but the manpage on chattr regarding the T attribute is an interesting read. Definitely worth trying. Parallelizing multiple finds was something I already did, but the only thing that increased was the IO wait :) But now having read all the suggestions in this thread, I might try it again. Separate device for the journal is a good idea, but not possible atm (machine is abroad in a data center)
Re: [gentoo-user] Fast file system for cache directory with lot's of files
On Mon, Aug 13, 2012 at 8:16 AM, Michael Hampicke mgehampi...@gmail.com wrote: Howdy gentooers, I am looking for a filesystem that perfomes well for a cache directory. Here's some data on that dir: - cache for prescaled images files + metadata files - nested directory structure ( 20/2022/202231/*files* ) - about 20GB - 100.000 directories - about 2 million files The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two 10.000rpm hard drives running a RAID1. Up until now I was using ext4 with noatime, but I am not happy with it's performence. Finding and deleting old files with 'find' is incredible slow, so I am looking for a filesystem that performs better. First candiate that came to mind was reiserfs, but last time I tried it, it became slower over time (fragmentation?). Currently I am running a test with btrfs and so far I am quiet happy with it as it is much faster in my use case. Do you guys have any other suggestions? How about JFS? I used that on my old NAS box because of it's low cpu usage. Should I give reiser4 a try, or better leave it be given Hans Reiser's current status? I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs (tail-packing, indexing, compression, snapshots, subvolumes, etc). Don't know if it is considered stable enough for your usage but at least it is under active development and funded by large names. I think if you would consider reiser4 as a possibility then you should consider btrfs as well.
Re: [gentoo-user] Fast file system for cache directory with lot's of files
Am Montag, 13. August 2012, 15:13:03 schrieb Paul Hartman: On Mon, Aug 13, 2012 at 8:16 AM, Michael Hampicke mgehampi...@gmail.com wrote: Howdy gentooers, I am looking for a filesystem that perfomes well for a cache directory. Here's some data on that dir: - cache for prescaled images files + metadata files - nested directory structure ( 20/2022/202231/*files* ) - about 20GB - 100.000 directories - about 2 million files The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two 10.000rpm hard drives running a RAID1. Up until now I was using ext4 with noatime, but I am not happy with it's performence. Finding and deleting old files with 'find' is incredible slow, so I am looking for a filesystem that performs better. First candiate that came to mind was reiserfs, but last time I tried it, it became slower over time (fragmentation?). Currently I am running a test with btrfs and so far I am quiet happy with it as it is much faster in my use case. Do you guys have any other suggestions? How about JFS? I used that on my old NAS box because of it's low cpu usage. Should I give reiser4 a try, or better leave it be given Hans Reiser's current status? I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs (tail-packing, indexing, compression, snapshots, subvolumes, etc). Don't know if it is considered stable enough for your usage but at least it is under active development and funded by large names. I think if you would consider reiser4 as a possibility then you should consider btrfs as well. reiser4 has one feature btrfs and ever other is missing. atomic operations. Which is a wonderful feature. Too bad 'politics' killed reiser4. -- #163933
Re: [gentoo-user] Fast file system for cache directory with lot's of files
I think btrfs probably is meant to provide a lot of the modern features like reiser4 or xfs Unfortunately btrfs is still generally slower than ext4 for example. Checkout http://openbenchmarking.org/, eg http://openbenchmarking.org/s/ext4%20btrfs The OS will use any spare RAM for disk caching, so if there's not much else running on that box, most of your content will be served from RAM. It may be that whatever fs you choose wont make that much of a difference anyways.