I'm not exactly sure what you are looking for, but we had a similar
problem.  We used to do a full index of our entire site every night but
as you say, that took up too many resources.  We now do a full index
once a week and an incremental index every night.  In order to do a full
index, run htdig with the -i option.  In order to run an incremental
index, run htdig without the -i option.


Malki Cymbalista
Webmaster, Weizmann Institute of Science
Rehovot, Israel 76100
Internet: [EMAIL PROTECTED]
08-934-3036

>>> Manuel Lemos <[EMAIL PROTECTED]> 01/11/2004 19:29:26 >>>
Hello,


I have been using htdig for years to crawl a site that now has over 
30.000. Since it may have many changes in the pages I have been 
reindexing the whole site on a daily basis.

However this lazy indexing approach is taking too much resources. 
Therefore I am looking into a better approach of keeping a list of only

the pages that have changed and just reindex those pages in much
shorter 
  cycle than what I am doing.

My question is how can I reindex just a few pages at once and merge the

crawled pages with a previously indexed site database? I mean, index 
only a few pages that I list and only follow links to site pages that 
were not yet indexed.

-- 

Regards,
Manuel Lemos

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/ 

PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/ 

Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html 


-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click 
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html 
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general


-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to