On Wed, Jan 13, 2010 at 12:44:47PM +0000, Dermot wrote: >I have a lots of PDFs that I need to catalogue and I want to ensure >the uniqueness of each PDF. At LWP, Jonathan Rockway mentioned >something similar with SHA1 and binary files. Am I right in thinking >that the code below is only taking the SHA on the name of the file and >if I want to ensure uniqueness of the content I need to do something >similar but as a file blob?
Yes. You may want to be slightly cleverer about it - taking a SHAsum is computationally expensive, and it's only worth doing if the files have the same size. If you don't require a pure-Perl solution, bear in mind that all this has been done for you in the "fdupes" program, already in Debian or at http://netdial.caribe.net/~adrian2/programs/ . Roger