Ben Rf wrote:
I'm new to programming and i'd like to write a program that will parse a list produced by md5summer and give me a report in a text file on which md5 sums appear more than once and where they are located.
This should do the trick:
""" import fileinput
md5s = {} for line in fileinput.input(): md5, filename = line.rstrip().split() md5s.setdefault(md5, []).append(filename)
for md5, filenames in md5s.iteritems(): if len(filenames) > 1: print "\t".join(filenames) """
Put this in md5dups.py and you can then use md5dups.py [FILE]... to find duplicates in any of the files you specify. They'll then be printed out as a tab-delimited list.
Key things you might want to look up to understand this:
* the dict datatype * dict.setdefault() * dict.iteritems() * the fileinput module -- Michael Hoffman -- http://mail.python.org/mailman/listinfo/python-list