Jorge del Conde wrote:
>
> Hi,
> I would like your comments and / or suggestions concerning the following:
> I will be building a Yellow Pages site, that has all the information stored
> in a database.
>
> It would be easy to search the db by
> select * where field like '%query%'
>
> but the db is very very big and that would make searching very slow!
> Can anyone suggest me a better way of aproaching this problem? The reason
> I sent an email to htdig was because I'm impressed with the searching speed
> of htdig and If i was to use an algorithm like the one htdig uses, my sites
> search speed would be very impressive too!
>
> Thanks a lot!
>
> Jorge del Conde
The speed is a function of the database and indexing scheme used. ht://Dig 3
uses a hash-based database (either GDBM or Berkeley db 2) this means that
database lookups generally only require 2 or 3 disk accesses, regardless of
the number of records. SQL databases almost never use a hash-based indexing
scheme because it is too restrictive (no duplicate keys allowed, no wildcards
in lookups, etc.)
Also, your example uses wildcards on both sides of the query. This is highly
inefficient regardless of what database you use. ht://Dig uses numerous
pluggable "fuzzy" search algorithms to speed up searches. These algorithms
each can have their own special database that is optimized to the specific
algorithm. Look at the ht://Dig documentation at http://www.htdig.org/ for
more information on all this.
--
Andrew Scherpbier <[EMAIL PROTECTED]>
Contigo Software <http://www.contigo.com/>
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.