Ben Rf wrote:

I'm new to programming and i'd like to write a program that will parse
a list produced by md5summer and give me a report in a text file on
which md5 sums appear more than once and where they are located.

This should do the trick:

"""
import fileinput

md5s = {}
for line in fileinput.input():
    md5, filename = line.rstrip().split()
    md5s.setdefault(md5, []).append(filename)

for md5, filenames in md5s.iteritems():
    if len(filenames) > 1:
        print "\t".join(filenames)
"""

Put this in md5dups.py and you can then use
md5dups.py [FILE]... to find duplicates in any of the files you
specify. They'll then be printed out as a tab-delimited list.

Key things you might want to look up to understand this:

* the dict datatype
* dict.setdefault()
* dict.iteritems()
* the fileinput module
--
Michael Hoffman
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to