Hello!
We are considering implement this in 3.2. branch. The idea
is to use the dictionary you are talking about and take it's
autoincremental word IDs instead of word CRC32, which is stable
enough against duplicates, however not absolutely (200 equal pairs in
3.5 mln unique words). This scheme will allow also to use substring
search even in "cache mode".
Regards!
"Vlad V. Borisov" wrote:
>
> Hello,
>
> I have a question about the way data base tables organized.
> Why did you choose to store words and url_id in one table?
> Wouldn't it be better to have one more table. Like this:
>
> words joiner urls
> _________ ____________ __________________________________
> id | word word_id | url_id id | url | status | crc32 | ...etc
> | | | | | |
> | | | | | |
>
> This table structure would allow queries like this:
>
> INSERT INTO tmp_table_#id
> SELECT id,words FROM words
> { WHERE word LIKE 'exact1' OR word LIKE 'exact2'
> | WHERE word LIKE 'halfsubstring%'
> | WHERE word LIKE '%substring%'
> };
>
> In the last WHERE clause a full scan through keywords table would be
> performed, but there should be <= 87000 words, so it may be acceptable
> in a number of applications. You see I'm trying to use your search
> engine for a microchips technical specifications where substring
> search capability is a must (for example query '16F84' would not give
> results 'PIC16F84A', and even query 'PIC16F' would not give desired
> result)
>
> So, my question is: Is there obvious reason for organizing tables the
> way you have done it? Maybe I'm not taking into account something?
>
> Sincerely,
> Vlad Borisov.
> Webmaster of http://www.microchip.ru
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]