At 04:33 PM 11/18/2001 +0100, you wrote:
>Hello, > >I am building my database for the spider to fill but I have a problem. ><SNIP> >At first I thought about indexing only the words that seem relevants but >this way I can only make simple searches (ie : "rabbit"). Then I thought >about Indexing with the word, the previous one and the next one. This way I >should be able to make complex searches even on more than 3 words since each >new word can find next on or previous one and so on. eg : the -> red -> >rabbit -> with -> a -> big -> tail Typically, the way this is handled is by storing in the index record number and word number pairs. So if you are looking for a phrase you simply look for words that have the same record number and the word numbers are sequential. >It seems quite a good way to do it but since I would like to avoid indexing >"noise words" such as "the" or "a" it is not really satisfying.* If you index multiple words as you have suggested above, you are going to have a _huge_ index size. (You probably already will anyway.) However, if you store the words as record num / word num pairs, then you have a bit more flexibility to play around with. Hope this helps, -Art -- Art Pollard http://www.lextek.com/ Suppliers of High Performance Text Retrieval Engines. -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".