>Perhaps think about running tune2fs maybe also consider adding noatime 

Yes, I added it and I got a perfomance increase, anyway as the number of fields 
grows the speed keeps going below an acceptable level.

>I saw this article some time back.

http://www.linux.com/archive/feature/127055
Good idea, I already use mysql for indexing the files, so everytime I need to 
make a lookup I don't need the entire dir and then get the file, anyway my 
requirements are keeping the files on disk.

>The only way to deal with it (especially if the
application adds and removes these files regularly) is to every once in a
while copy the files to another directory, nuke the directory and restore
from the copy.Thanks, but there will not be too many file updates once the 
cache is done, so recreating directories can not be very helpful here. The 
issue is that as the number of files grows, bot reads from existing files and 
new insertion gets slower and slower.

>I haven't done, or even seen, any recent benchmarks but I'd expect
 reiserfs to still be the best at that sort of thing. I've looking at some 
benchmarks and reiser seems a bit faster in my scenario, however my problem 
happens when I have a arge number of files, for what I have seen, I'm not sure 
if reiser would be a fix....
>However even if 
you can improve things slightly, do not let whoever is responsible for 
that application ignore the fact that it is a horrible design that 
ignores a very well known problem that has easy solutions.My original idea was 
storing the file with a hash of it name, and then store a  hash->real filename 
in mysql. By this way I have direct access to the file and I can make a 
directory hierachy with the first characters of teh hash /c/0/2/a, so i would 
have 16*4 =65536 leaves in the directoy tree, and the files would be 
identically distributed, with around 200 files per dir (waht should not give 
any perfomance issues). But the requiremenst are to use the real file name for 
the directory tree, what gives the issue.


>Did that program also write your address header ?
:)

Thanks for the help.


----------------------------------------
> From: hhh...@hotmail.com
> To: centos@centos.org
> Date: Wed, 8 Jul 2009 06:27:40 +0000
> Subject: [CentOS] Question about optimal filesystem with many small files.
>
>
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15 
> Million fo files), and a node can have up to 400000 files (and I don't have 
> any way to split this ammount in smaller ones). As the number of files grows, 
> my application gets slower and slower (the app is works something like a 
> cache for another app and I can't redesign the way it distributes files into 
> disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype 
> needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another FS 
> for this situation (this is a prodution server, so I need a stable one) ?
>
> Thanks in advance (and please excuse my bad english).
>
>
> _________________________________________________________________
> Connect to the next generation of MSN Messenger
> http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos

_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to