oooooooooooo ooooooooooooo wrote: >> You can hash it and still keep the original filename, and you don't >> even need a MySQL database to do lookups. > > There are an issue I forgot to mention: the original file name can be up top > 1023 characters long. As linux only allows 256 characters in the file path, I > could have a (very small) number of collisions, that's why my original idea > was using a hash->filename table. So I'm not sure if I could implement that > idea in my scenario. > >> For instance: example.txt -> >> e7/6f/example.txt. That might (or might not) give you a better >> performance. > > After a quick calculation, that could put around 3200 files per directory (I > have around 15 million of files), I think that above 1000 files the > performance will start to degrade significantly, anyway it would be a mater > of doing some benchmarks.
There's C code to do this in squid, and backuppc does it in perl (for a pool directory where all identical files are hardlinked). Source for both is available and might be worth a look at their choices for the depth of the trees and collision handling (backuppc actually hashes the file content, not the name, though). -- Les Mikesell lesmikes...@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos