I have a side project that needs to "intelligently" know if two strings are contextually similar. Think about how CDDB information is collected and sorted. It isn't perfect, but there should be enough information to be usable.
Think about this: "pink floyd - dark side of the moon - money" "dark side of the moon - pink floyd - money" "money - dark side of the moon - pink floyd" etc. To a human, these strings are almost identical. Similarly: "dark floyd of money moon pink side the" Is a puzzle to be solved by 13 year old children before the movie starts. My post has three questions: (1) Does anyone know of an efficient and numerically quantified method of detecting these sorts of things? I currently have a fairly inefficient and numerically bogus solution that may be the only non-impossible solution for the problem. (2) Does any one see a need for this feature in PostgreSQL? If so, what kind of interface would be best accepted as a patch? I am currently returning a match liklihood between 0 and 100; (3) Is there also a desire for a Levenshtein distence function for text and varchars? I experimented with it, and was forced to write the function in item #1. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly