On Fri, Dec 11, 2009 at 8:31 PM, Alexander Bochmann <[email protected]> wrote:
> find . -type f -print0 | xargs -0 -r -n 100 md5 -r > md5sums
>
> You could now just sort the md5sums file to find
> all entries with the same md5... Or sort by filename
> (will need some more logic if files are distributed
> over several subdirectories) to weed out those with
> the same name and different checksums.
>
> Alex.
I do something similar, but more elaborate, using Python to backup
redundant pics scattered in various folders into one folder... would
need to be modified for name clashes:
already = []
dst = os.getcwd()
paths = ["/usr/local", "/home", "/storage"]
for p in paths:
for root, dirs, files in os.walk(p):
for f in files:
m = hashlib.md5()
# Get file extension
ext = os.path.splitext(os.path.join(root, f))[1]
try:
# Copy JPG files
if ext.lower() == ".jpg":
fp = open(os.path.join(root, f),'rb')
data = fp.read()
fp.close()
m.update(data)
if m.hexdigest() not in already:
already.append(m.hexdigest())
print "Copying", os.path.join(root,f)
shutil.copyfile(os.path.join(root,f),
os.path.join(dst,f))
else:
print "Already Copied!!!"
...