> 
 > I would state this as "document records," because I assume that's what
 > you mean--I can't come up with other "permanent data structures." This
 > certainly limits the scope of changes, mostly to inside DocumentDB.cc
 > and below.

 I think URL state description, robots.txt content, cookies are all 
candidates to be stored on disk. One *very* interesting feature would
be to have a restartable crawler. htdig + ^C + htdig restart where it
stopped. Once you store the state of your crawler in a database, you
get that advantage. 

-- 
                Loic Dachary

                24 av Secretan
                75019 Paris
                Tel: 33 1 42 45 09 16
                e-mail: [EMAIL PROTECTED]
                URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to