Re: [htdig3-dev] Htdig database backend

loic Tue, 14 Dec 1999 02:22:06 -0800
Geoff Hutchison writes:

 > (Remember that Marcel and Loic's fantastic database wizardry will go
 > to nothing if we switch to SQL...)

 I'm not sure of what you mean. Whatever happens, Berkeley DB *will*
be needed for the word database. It cannot be efficiently implemented
using an SQL database. Having an SQL database is needed for all the
persistent datastrcutures of htdig, except for the inverted
index. Because the inverted index may be huge, it needs a carefully
crafted permanent storage and that's what htword/* + Berkeley DB +
transparent compression provides. An SQL database may contain 50
million records describing 50 million urls. It cannot contain 500
million records describing 500 million word occurences using a
reasonable amount of space (1.5Gb with current htword/*
implementation). Beside, you can do fetch(word), next, next with a SQL
database, which an absolute must for the inverted index.

 > For what it's worth, Google, the only large-scale saerch engine to open up
 > details on its backend, does not use SQL.

 The question is really : what kind of software do we use to store
htdig permanent data structures ? We want something that's easy to
use, fast, reliable. That's what SQL databases provide. Berkeley DB is
fast and reliable but not easy to use. You can't really design
datastructures when adding a new field to a dataset requires
coding. Implementing something else is out of question :-) While we
are at it you may want to check this:

http://www.postgresql.org/mhonarc/pgsql-general/1999-11/msg00227.html

        Cheers,

-- 
                Loic Dachary

                24 av Secretan
                75019 Paris
                Tel: 33 1 42 45 09 16
                e-mail: [EMAIL PROTECTED]
                URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
Re: [htdig3-dev] Htdig database backend

Reply via email to