Joel Goldstick <joel.goldst...@columbuswebmakers.com> writes: > pyt...@bdurham.com wrote: >> I'm parsing a simple, domain specific scripting language that has >> commands like the following: *page, *title, *text, *footer, etc. >> There are about 100 of these '*' commands. When we see a command >> that we don't recognize, I would like to find the closest match >> possible (from a list of all legal commands) and include this >> suggestion in our diagnostic output. >> >> I'm not sure what you would call the type of algorithm I'm >> looking for: closest matching string or auto-correct? >> >> Any suggestions on algorithms or python libraries that would help >> me do what I'm looking for? >> >> Here's my short-list of ideas based on my research so far: >> - Soundex >> - Lawrence Philips' Metaphone Algorithm (from aspell?) >> - Edit Distance Algorithm >> - Levenstein Word Distance >> >> Any suggestions, comments on the above techniques, or ideas on a >> simpler algorithm for finding close string matches based on a >> list, dict, or set of possible strings? >> >> Thank you, >> Malcolm >> >> > Have you looked at difflib?
You only have a few words to compare with, so why not try this naive solution? >>> commands = ["page", "text", "footer", "header", "title"] >>> def closest(u): ... return min(commands, key=lambda v: len(set(u) ^ set(v))) ... >>> closest("paige") 'page' >>> closest("heeder") 'header' >>> closest("tilte") 'title' -- Arnaud PS: oddly, I didn't see the OP (reading newsgroup). -- http://mail.python.org/mailman/listinfo/python-list