On Fri, Dec 11, 2009 at 8:31 PM, Alexander Bochmann <[email protected]> wrote:

> find . -type f -print0 | xargs -0 -r -n 100 md5 -r > md5sums
>
> You could now just sort the md5sums file to find
> all entries with the same md5... Or sort by filename
> (will need some more logic if files are distributed
> over several subdirectories) to weed out those with
> the same name and different checksums.
>
> Alex.

I do something similar, but more elaborate, using Python to backup
redundant pics scattered in various folders into one folder... would
need to be modified for name clashes:

already = []
dst = os.getcwd()

paths = ["/usr/local", "/home", "/storage"]

for p in paths:

    for root, dirs, files in os.walk(p):
        for f in files:

            m = hashlib.md5()

            # Get file extension
            ext = os.path.splitext(os.path.join(root, f))[1]

            try:

                # Copy JPG files
                if ext.lower() == ".jpg":
                    fp = open(os.path.join(root, f),'rb')
                    data = fp.read()
                    fp.close()
                    m.update(data)
                    if m.hexdigest() not in already:
                        already.append(m.hexdigest())
                        print "Copying", os.path.join(root,f)
                        shutil.copyfile(os.path.join(root,f),
os.path.join(dst,f))
                    else:
                        print "Already Copied!!!"
...

Reply via email to