Geoff Hutchison writes:
> (Remember that Marcel and Loic's fantastic database wizardry will go
> to nothing if we switch to SQL...)
I'm not sure of what you mean. Whatever happens, Berkeley DB *will*
be needed for the word database. It cannot be efficiently implemented
using an SQL database. Having an SQL database is needed for all the
persistent datastrcutures of htdig, except for the inverted
index. Because the inverted index may be huge, it needs a carefully
crafted permanent storage and that's what htword/* + Berkeley DB +
transparent compression provides. An SQL database may contain 50
million records describing 50 million urls. It cannot contain 500
million records describing 500 million word occurences using a
reasonable amount of space (1.5Gb with current htword/*
implementation). Beside, you can do fetch(word), next, next with a SQL
database, which an absolute must for the inverted index.
> For what it's worth, Google, the only large-scale saerch engine to open up
> details on its backend, does not use SQL.
The question is really : what kind of software do we use to store
htdig permanent data structures ? We want something that's easy to
use, fast, reliable. That's what SQL databases provide. Berkeley DB is
fast and reliable but not easy to use. You can't really design
datastructures when adding a new field to a dataset requires
coding. Implementing something else is out of question :-) While we
are at it you may want to check this:
http://www.postgresql.org/mhonarc/pgsql-general/1999-11/msg00227.html
Cheers,
--
Loic Dachary
24 av Secretan
75019 Paris
Tel: 33 1 42 45 09 16
e-mail: [EMAIL PROTECTED]
URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.