Torsten Neuer writes:
 > 
 > It is not quite that easy ;-)
 > 
 > To have a good working SQL database backend, you should not adopt
 > the techniques used for DB2/DBM.  The main differences are in the
 > connection to the database backend (there is no database file, but
 > a database server) and in the database structure.

 I'd say in addition that the whole logic is different. With DB each
index is a separate DB file. With SQL it's merely an index updated 
automatically. Simulating the automatic indexing you get with an SQL
database (create index) with DB file is possible. But it requires some
non-trivial reworking of the current DB class. 
 In addition the query capabilities of the SQL database can hardly be
emulated with DB files. Can you imagine how to resolve 'select url
from url_table where return_code = 404 and document_size < 2000' with 
the current DB files ? 
 What I'd like to point is that, IMHO, turning ht://dig to SQL cannot
preserve compatibility with DB files, unless you decide to forget all
the advantages you have of using an SQL database.

 > >From my point of view, a proper SQL backend needs a rewrite of the
 > non-SQL code (to get a portable interface to non-SQL databases),
 > i.e. a new high-level interface that matches the special needs and
 > features of both, SQL and non-SQL databases. Many things currently
 > done by Ht://Dig itself could then (whenever SQL is used as a backend)
 > be transferred to the SQL server.

 If you want to see an example of a crawler using MySQL check 
http://www.senga.org/webbase/html/. This crawler is not what it should be
but definitely work well and is able to handle million of urls.

    Cheers,

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 10 85
                e-mail: [EMAIL PROTECTED]
                URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to