Because it's all backwards! Why is that? Because it's hard to read. Why? Please do not top post!
> Francisco Zarabozo > I have thousands of files that I need to analyze with Perl and discard any > duplicates. I also need to implement a way to *not* save on disk any file > that a visitor uploads on the website in the case it's a file we already > have on disk. > > So, I need to compare files and have some kind of identifiers in a database > that can help me quickly identify when a duplicate file is received (so > comparing the whole files against each file in the server in every upload is > not really an option since it could take forever). I've heard a little about > CRC and checksum (about how you can obtain a little identifier/result that > can be stored in the DB) but I'm not really sure how to use it in Perl for > file comparition and if that's the best way to do this. > > Someone told me that CRC can sometimes make you believe it's a duplicate > when it's not (that it can give you the same result with two different > files), and I need to be 100% certain that a file is not a duplicate of > another already on the server. From: Ken Cornetet <ken.corne...@kimball.com> > Your requirements are impossible to fulfill. > > Think about this for a minute. There are an infinite possible number > of input files, but only a finite number of digests or checksums of > any given fixed length. Hence, no way to make this work. But of course his requirements can be fulfilled. Think about this for a minute! He's got the CRCs (or MD5/SHA/... hashes) of the old files. He computes the CRC/hash of the new file. From time to time he gets a positive match. All he has to do at that moment is to compare the new file with a single old one. I would not call that a huge deal. Comparing with all of them would be too expensive, comparing with one is not. OK, in the extremely unlikely case that he gets a positive match on files whose contents are not equal, he'll then have to compare with two files instead of one. Still no huge deal. Jenda ===== je...@krynicky.cz === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery _______________________________________________ ActivePerl mailing list ActivePerl@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs