Re: File system for large directories?
On 22/04/07, Gil Freund <[EMAIL PROTECTED]> wrote: On 4/21/07, Amos Shapira <[EMAIL PROTECTED]> wrote: > Hi, > > Our servers have to deal with huge amounts of small files (tens, sometimes > hundreds of thousands of files IN ONE DIRECTORY). Do you access them locally or remotely, if so, how? They are writen locally, then transferred over FTP to Windows machines. I used to be fond of ReiserFS v3 until I got beaten by it not recovering > from a partition resizing excercise. Resizing a partition is not a good indicator. There are too many other factors involved. I just became worry of the admin tools available for ReiserFS. It's wonderfull when everything is dandy (and survived many power failures at my previous home) but then when I needed to do something else, which I hear is trivial with ext3 for instance, it failed measerebly. Benchmark your own environment. Hardware specs (RAID, RAM, CPU, etc) can tilt the results. Repeat the benchmarks for about 3-5 time. I like Bonnie++. Look for the results that best match your environment (Read, Write, Create, etc). It looks like Bonnie++ is what everyone and his dog are doing. I'll try to see how can I do that (hardly any headroom in terms of spare hardware to shift things around). Cheers, --Amos
Re: File system for large directories?
On 22/04/07, Marc A. Volovic <[EMAIL PROTECTED]> wrote: Quoth Amos Shapira: > Hi, > > Our servers have to deal with huge amounts of small files (tens, > sometimes hundreds of thousands of files IN ONE DIRECTORY). > > Currently they use ext3 but I wonder wether this is the prefered FS. Ext3 is - last I chaecked (about two years ago) possibly the worst filesystem for dealing with LOTS of files in a single directory. Reiser 3 was very good (did not try reiser 4). However, I am very wary of reiser now - what with poor (or, maybe, not so poor) Hans being in jail, reiserfs may be going the way of the dodo. If Reiser3 is already in mainline and stable - wouldn't it be supported even if Hans/Nemesis vanishes? Reiser4 is not relevant because I want to stick to mainline kernels, much preferably Debian supplied kernels. I'd run bonnie (just the creation/deletion tests) for JFS, XFS and Ext4 (which is starting to make an appearance here and there). IIRC - XFS is ALSO not very good with lots of small files. Will try to do that, though again - if ext4 isn't in the mainline yet then it's not relevant for me. I'm also thinking about better ways to handle the files (e.g. putting every > few thousands of them in a .zip file to transfer, spreading them across a > two-level directory tree etc) but I'd rathertry to keep the changes to the > existing software and scripts the the minimum which is required to speed > things up. B-sort em? Switch the back-end to database (assuming the blobs are small)? I'm thinking of databases sometimes (the files are around 4k on average) but it feels like Hans Reiser was sort of right about that - a filesystem can be used as a database for this sort of data. Cheers, --Amos
Re: File system for large directories?
On 4/21/07, Amos Shapira <[EMAIL PROTECTED]> wrote: Hi, Our servers have to deal with huge amounts of small files (tens, sometimes hundreds of thousands of files IN ONE DIRECTORY). Do you access them locally or remotely, if so, how? Currently they use ext3 but I wonder wether this is the prefered FS. I have found ReiserFS outperformed EXT3 on a similar site, this, however was made irreverent, as access was via done mainly via NFS. I used to be fond of ReiserFS v3 until I got beaten by it not recovering from a partition resizing excercise. Resizing a partition is not a good indicator. There are too many other factors involved. Trying to find the answer on the net I found: http://librenix.com/?inode=3296 (Circa 2003, recommends ReiserFS v4, which isn't in the mainstream kernel yet). and http://www.debian-administration.org/articles/388 (Circa 2006, recommends XFS). The later compared handling of large trees (i.e. not necessarily single directory with lots of files in it). Does anyone have good and up to date recommendations for such situation? The files are e-mail messages which are writen, transferred then deleted. I'm also thinking about better ways to handle the files ( e.g. putting every few thousands of them in a .zip file to transfer, spreading them across a two-level directory tree etc) but I'd rathertry to keep the changes to the existing software and scripts the the minimum which is required to speed things up. Benchmark your own environment. Hardware specs (RAID, RAM, CPU, etc) can tilt the results. Repeat the benchmarks for about 3-5 time. I like Bonnie++. Look for the results that best match your environment (Read, Write, Create, etc). Thanks, --Amos -- Gil Freund, Systems Analyst --- Sysnet consulting [EMAIL PROTECTED], http://www.sysnet.co.il voice: +972-54-2035888, Fax: +972-8-9356026 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: File system for large directories?
Quoth Amos Shapira: > Hi, > > Our servers have to deal with huge amounts of small files (tens, > sometimes hundreds of thousands of files IN ONE DIRECTORY). > > Currently they use ext3 but I wonder wether this is the prefered FS. Ext3 is - last I chaecked (about two years ago) possibly the worst filesystem for dealing with LOTS of files in a single directory. Reiser 3 was very good (did not try reiser 4). However, I am very wary of reiser now - what with poor (or, maybe, not so poor) Hans being in jail, reiserfs may be going the way of the dodo. I'd run bonnie (just the creation/deletion tests) for JFS, XFS and Ext4 (which is starting to make an appearance here and there). IIRC - XFS is ALSO not very good with lots of small files. > I'm also thinking about better ways to handle the files (e.g. putting every > few thousands of them in a .zip file to transfer, spreading them across a > two-level directory tree etc) but I'd rathertry to keep the changes to the > existing software and scripts the the minimum which is required to speed > things up. B-sort em? Switch the back-end to database (assuming the blobs are small)? -- ---MAV Marc A. Volovic [EMAIL PROTECTED] Swiftouch, LTD +972-544-676764 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]