> I have thousands of files that I need to analyze with Perl and discard any > duplicates. I also need to implement a way to *not* save on disk any file > that a visitor uploads on the website in the case it's a file we already > have on disk. > > So, I need to compare files and have some kind of identifiers in a database > that can help me quickly identify when a duplicate file is received (so [...]
See the Digest module (http://search.cpan.org/perldoc?Digest). > Someone told me that CRC can sometimes make you believe it's a duplicate > when it's not (that it can give you the same result with two different > files), and I need to be 100% certain that a file is not a duplicate of > another already on the server. It's true; for a 256-bit digest (like SHA-256), there's something like a 1 in 2^256 chance that a file will have the same digest as some other file. It's probably a chance you can live with. -- Eric _______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
