I've just installed htdig; in our situation, we'll need to index multiple 
domains, in such a manner that htsearch is able to access a "combined" 
version.  (so a keyword search will locate results from any of the domains).  

I was hoping that I could use only one database, "htdig" one (or relatively 
few) URL's at a given time, and thus "stagger" the process of re-indexing the 
database.  

At least as I've been running, however, htdig appears to be "re-checking" 
every url which is already in the database, presumably with intent to 
determine whether any have changed. I can see rationale to this, but it will 
result in a substantial (and very-possibly unacceptable) workload increase.  

Is there any way to prevent this re-checking behavior?  

Whether or not there is, I have been unable to locate any clear documentation 
concerning file handling.  specifically:
A.  Which data-input files are mandatory, and which optional, for each of the 
three 
components?
B.  Which data files do htdig, and htmerge, create and/or update? 

What I think I want to develop is an approach under which htdig is executed 
against partial databases (each containing results from relatively few 
domains), and htmerge is used to merge the search results, from the domains 
in each of the partial databases, into a combined database.  

If there's an FAQ, or equivalent, which covers this, please so advise . .   

Steven P Haver

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to