Because it's all backwards!
Why is that?
Because it's hard to read.
Why?
Please do not top post!


> Francisco Zarabozo
> I have thousands of files that I need to analyze with Perl and discard any 
> duplicates. I also need to implement a way to *not* save on disk any file 
> that a visitor uploads on the website in the case it's a file we already 
> have on disk.
> 
> So, I need to compare files and have some kind of identifiers in a database 
> that can help me quickly identify when a duplicate file is received (so 
> comparing the whole files against each file in the server in every upload is 
> not really an option since it could take forever). I've heard a little about 
> CRC and checksum (about how you can obtain a little identifier/result that 
> can be stored in the DB) but I'm not really sure how to use it in Perl for 
> file comparition and if that's the best way to do this.
> 
> Someone told me that CRC can sometimes make you believe it's a duplicate 
> when it's not (that it can give you the same result with two different 
> files), and I need to be 100% certain that a file is not a duplicate of 
> another already on the server.

From: Ken Cornetet <ken.corne...@kimball.com>
> Your requirements are impossible to fulfill.
> 
> Think about this for a minute. There are an infinite possible 
number
> of input files, but only a finite number of digests or checksums of
> any given fixed length. Hence, no way to make this work. 

But of course his requirements can be fulfilled. Think about this for 
a minute! He's got the CRCs (or MD5/SHA/... hashes) of the old files. 
He computes the CRC/hash of the new file. From time to time he gets a 
positive match. All he has to do at that moment is to compare the new 
file with a single old one. I would not call that a huge deal.
Comparing with all of them would be too expensive, comparing with one 
is not. OK, in the extremely unlikely case that he gets a positive 
match on files whose contents are not equal, he'll then have to 
compare with two files instead of one. Still no huge deal.

Jenda
===== je...@krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to