Thanks, all. Yes, Levenshtein seems to be the magic word I was looking for. (It's blazingly fast, too.)
I suspect that if I strip out all the punctuation, etc. from both the itemnumber and description columns, as suggested, and concatenate them, pairing the record with its closest match in the other file, it ought to be pretty accurate. Obviously, the final decision will be up to a human being, but this should help them quite a bit. BTW, excluding all the items that match exactly, I only have 8000 items in one file to compare to 2600 in the other. As fast as python-levenshtein seems to be, this should finish in well under a minute. Thanks again. -Steve -- http://mail.python.org/mailman/listinfo/python-list