> I have thousands of files that I need to analyze with Perl and discard
any 
> duplicates. I also need to implement a way to *not* save on disk any
file 
> that a visitor uploads on the website in the case it's a file we
already 
> have on disk.
> 
> So, I need to compare files and have some kind of identifiers in a
database 
> that can help me quickly identify when a duplicate file is received
(so 
[...]

See the Digest module (http://search.cpan.org/perldoc?Digest).


> Someone told me that CRC can sometimes make you believe it's a
duplicate 
> when it's not (that it can give you the same result with two different

> files), and I need to be 100% certain that a file is not a duplicate
of 
> another already on the server.

It's true; for a 256-bit digest (like SHA-256), there's something like 
a 1 in 2^256 chance that a file will have the same digest as some other
file.  It's probably a chance you can live with.

-- Eric


_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to