But if you just wanna find candidates for spell errors with one char
(miss, add or misspell one char), you

may store several version of the word in the database.

Here is one possible implementation:

For "world", we first add a special char, say '$' to the beginning of
the world, and get "$world".
Then we get $world, d$worl, ld$wor, rld$wo, orld$w, five versions by
rotating the string.
Do the same thing for all words, and store them in sorted array or
trie.

For any incorrect spelling, say "wrld", we do conversions as before
and get four versions, say, "$wrld", "d$wrl" "ld$wr" "rld$w".
Search all of them in the database for longest prefix matching.
For example, when searching for "rld$w", we get the longest matching
"rld$wo".
As you can find out, spelling errors within one char ends up with a
longest matching which only the last char differs.
The complexity is O(L^2 logM) for sorted array or O(L^2) for trie. M
is the size of th word database.



On 9月25日, 下午7时49分, mrkzea <mrk...@gmail.com> wrote:
> Hey. I wonder what method would you use to find value in the large
> database assuming that if the word in the database differs from our
> pattern only in one character we treat that as a match.
> Lets say that someone wants to find word "independent" in the database
> but he put "independext" instead. We find "independent" in the
> database and return this word to the user. I wonder how does Google
> solve this problem in its algorithm. If you search for independent,
> google gives you a hint.
>
> Regards
> Mark.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to