On Fri, 26 Sep 2003 17:08:02 +0200 (CEST) Robert Vazan <[EMAIL PROTECTED]> wrote:
> Would it be possible to reduce address book searches down to logarithmic > algorithm range? Yes, probably. > I know that exact match can be done with hash Yes. > and right-hand extension can be done with binary search, You mean searching for "prefix*"? Then yes. > but is it possible to do fast substring search with little memory > overhead? "little", I don't know. [Disclaimer: I just invented the following and I have no idea about how this is actually done by, say, a search engine. I guess this (indexing algorithms) is where to look for some more information: if Google can answer a complex query in less than one second, how is it that it takes us the same time to do it a just slightly little data set :)] But one idea could be to build an index of all the words in the searchable parts of the address book. You'd end up with a binary tree in which the data for nodes and leaves are lists of relevant entries. Suppose I have the following list of entries (I use only the address here): 1. [EMAIL PROTECTED] 2. [EMAIL PROTECTED] 3. [EMAIL PROTECTED] We would then decompose it in a list of words foo free fr edu com bar but we would then have to break this list further so that one can look starting from the middle of a word foo oo o free ree re e edu du u com om m bar ar r Ordering it gives ar bar com du e edu foo free m o om oo r re ree u The tree could be built as | a(3) -- r(3) | b(3) -- a(3) -- r(3) | c(2,3) -- o(2,3) -- m(2,3) | d(2,3) -- u(2,3) | e(1,2,3) -- d(2,3) -- u(2,3) | f(1,2) -- o(1,2) -- o(1,2) | | | |- r(1) -- e(1) -- e(1) | ... Suppose you want to do a search on "fo", you would start at the 'f', but see that the next letter in the pattern is also in the tree and go to the o(1,2) node below, which tells you that entries 1 and 2 have 'fo' as a substring. This is actually a logarithmic search, but on a huge tree... Does this qualify as "little"? With respect to the updating of the tree when data change, this may not be overly complicated, but I leave this as an exercise for the reader... -- Xavier Nodet "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, 1759.
pgp00000.pgp
Description: PGP signature
