this has nothing to do with perl. 

there is no practical way to "be 100% certain that a file is not a duplicate of 
another already on the server"

use a strong checksum like sha256 and you'll be fine.

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Francisco Zarabozo 
[[email protected]]
Sent: Thursday, September 23, 2010 12:29 PM
To: Active State Perl Mailing List
Subject: Best way to compare to files in Perl

Hello All,


I have thousands of files that I need to analyze with Perl and discard any
duplicates. I also need to implement a way to *not* save on disk any file
that a visitor uploads on the website in the case it's a file we already
have on disk.

So, I need to compare files and have some kind of identifiers in a database
that can help me quickly identify when a duplicate file is received (so
comparing the whole files against each file in the server in every upload is
not really an option since it could take forever). I've heard a little about
CRC and checksum (about how you can obtain a little identifier/result that
can be stored in the DB) but I'm not really sure how to use it in Perl for
file comparition and if that's the best way to do this.

Someone told me that CRC can sometimes make you believe it's a duplicate
when it's not (that it can give you the same result with two different
files), and I need to be 100% certain that a file is not a duplicate of
another already on the server.

Can you guys please give me some advice on how to do this and maybe point me
to the right modules?

Thanks a lot! :-)

Francisco

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to