Hi,

  I am planning a very large search engine. I've spent some time reading
the archive and I found some suggestions on how to do this. The word
indexing method is a very interesting alternative to slow "...where like
'%foo%';" queries.

  There is from 100k to 500k documents to index, each are about 10
KBytes large. Plain text.

  The search engine will allow complex boolean queries (AND, OR, NEAR,
NOT).

  I have 2 plans in mind, I'd like to have opinions on what's would be
the most efficient.

First Way:
We populate 2 tables:
one containing the documents (text and ID)
one containing all the words and the documents ID containing each word

The idea is to first filter the documents and then to perform a query in
the documents that contains at least one of the words, so we're supposed
to get a decent speed.

I know this is used by many people and I know it gives good results,
even when searching through 100k+ documents. Although, I am wondering if
there is not a way to do it without any use of LIKE statements. I really
don't know if what I have in mind is a good idea, it may be completely
stupid and inefficient, I have very little DB experiences.

Anyways, what if in the table containing the words and the matching
document ID's, we also specify where in each documents the word is
located.

ex:
WORD|   DOCS     |   LOCATIONS
sun | 32;45;1302 | 3 ; 554,1022 ; 76,675,3445

So word sun is the third word of doc 32, the 554th and 1022th word doc
45, etc..

Then the search script will do all the job without sending any other
queries. May get quite complicated but it should work, it may also be
easier to process sun NEAR star (maybe this is easy to do with LIKE too,
I don't know, but I saw nothing in the docs.)


Thank you,

Cedric Veilleux


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to