Thanks, all. Yes, Levenshtein seems to be the magic word I was looking
for.  (It's blazingly fast, too.)

I suspect that if I strip out all the punctuation, etc. from both the
itemnumber and description columns, as suggested, and concatenate them,
pairing the record with its closest match in the other file, it ought
to be pretty accurate.  Obviously, the final decision will be up to a
human being, but this should help them quite a bit.

BTW, excluding all the items that match exactly, I only have 8000 items
in one file to compare to 2600 in the other.  As fast as
python-levenshtein seems to be, this should finish in well under a
minute.

Thanks again.

-Steve

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to