Hello!

We are considering implement this in 3.2. branch. The idea
is to use the dictionary you are talking about and take it's
autoincremental word IDs instead of word CRC32, which is stable 
enough against duplicates, however not absolutely (200 equal pairs in
3.5 mln unique words). This scheme  will allow also to use substring 
search even in "cache mode".


Regards!


"Vlad V. Borisov" wrote:
> 
> Hello,
> 
> I have a question about the way data base tables organized.
> Why did you choose to store words and url_id in one table?
> Wouldn't it be better to have one more table.  Like this:
> 
>  words            joiner             urls
> _________      ____________      __________________________________
> id | word    word_id | url_id    id | url | status | crc32 | ...etc
>    |                 |              |     |        |       |
>    |                 |              |     |        |       |
> 
> This table structure would allow queries like this:
> 
> INSERT INTO tmp_table_#id
> SELECT id,words FROM words
> {       WHERE word LIKE 'exact1' OR word LIKE 'exact2'
>    |    WHERE word LIKE 'halfsubstring%'
>    |    WHERE word LIKE '%substring%'
> };
> 
> In  the  last WHERE clause a full scan through keywords table would be
> performed, but there should be <= 87000 words, so it may be acceptable
> in  a  number  of  applications. You see I'm trying to use your search
> engine  for  a  microchips  technical  specifications  where substring
> search  capability is a must (for example query '16F84' would not give
> results  'PIC16F84A',  and  even query 'PIC16F' would not give desired
> result)
> 
> So, my question is: Is there obvious reason for organizing tables the
> way you have done it? Maybe I'm not taking into account something?
> 
> Sincerely,
> Vlad Borisov.
> Webmaster of http://www.microchip.ru
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to