On Fri, 26 Sep 2003 17:08:02 +0200 (CEST) Robert Vazan <[EMAIL PROTECTED]> wrote:

> Would it be possible to reduce address book searches down to logarithmic
> algorithm range? 

Yes, probably.

> I know that exact match can be done with hash 

Yes.

> and right-hand extension can be done with binary search, 

You mean searching for "prefix*"? Then yes.

> but is it possible to do fast substring search with little memory
> overhead?

"little", I don't know.

[Disclaimer: I just invented the following and I have no idea about how
this is actually done by, say, a search engine. I guess this (indexing
algorithms) is where to look for some more information: if Google can
answer a complex query in less than one second, how is it that it takes
us the same time to do it a just slightly little data set :)]

But one idea could be to build an index of all the words in the
searchable parts of the address book. You'd end up with a binary tree in
which the data for nodes and leaves are lists of relevant entries.

Suppose I have the following list of entries (I use only the address
here):

1. [EMAIL PROTECTED]
2. [EMAIL PROTECTED]
3. [EMAIL PROTECTED]

We would then decompose it in a list of words

  foo free fr edu com bar

but we would then have to break this list further so that one can look
starting from the middle of a word

  foo oo o free ree re e edu du u com om m bar ar r

Ordering it gives

  ar bar com du e edu foo free m o om oo r re ree u

The tree could be built as

   |
   a(3) -- r(3)
   |
   b(3) -- a(3) -- r(3)
   |
   c(2,3) -- o(2,3) -- m(2,3)
   |
   d(2,3) -- u(2,3)
   |
   e(1,2,3) -- d(2,3) -- u(2,3)
   |
   f(1,2) -- o(1,2) -- o(1,2)
   |      |
   |      |- r(1) -- e(1) -- e(1)
   |
   ...


Suppose you want to do a search on "fo", you would start at the 'f', but
see that the next letter in the pattern is also in the tree and go to
the o(1,2) node below, which tells you that entries 1 and 2 have 'fo' as
a substring.

This is actually a logarithmic search, but on a huge tree... Does this
qualify as "little"? With respect to the updating of the tree when data
change, this may not be overly complicated, but I leave this as an
exercise for the reader...

-- 
Xavier Nodet
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin, 1759.

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to