I think the way google deals with spell error is quite complex.

But if you just wanna find candidates for spell errors with one char
(miss, add or misspell one char), you may store several version of the
word in the database.

Here is one possible implementation:

For "world", we first add a special char, say '$' to the beginning of
the world, and get "$world".
Then we get $world, d$worl, ld$wor, rld$wo, orld$w, five versions by
rotating the string.
Do the same thing for all words, and store them in sorted array or
trie.
For any incorrect spelling, say "wrld", do the same conversion and we
get four versions of the word "$wrld" "d$wrl" "ld$wr" "rld$w".
Search all of them for max prefix matching in the database.
For example, when you search for "rld$w", you can find the longest
matching "rld$wo".
As you may find out, if spell error is within one char, we will end up
with a longest macthing only the last char differs.

The complexity is O(L^2 * log M) for sorted array and O(L^2) for trie.


On 9月25日, 下午7时49分, mrkzea <mrk...@gmail.com> wrote:
> Hey. I wonder what method would you use to find value in the large
> database assuming that if the word in the database differs from our
> pattern only in one character we treat that as a match.
> Lets say that someone wants to find word "independent" in the database
> but he put "independext" instead. We find "independent" in the
> database and return this word to the user. I wonder how does Google
> solve this problem in its algorithm. If you search for independent,
> google gives you a hint.
>
> Regards
> Mark.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to