On Tue, 19 Dec 2000, David Gewirtz wrote:

> on something. I attempted to index a remote site, in this case Lotus.com. 
> Now, I have no idea how many pages that is. But I let the index process run 

If you have no idea how many pages will be on a server, I'd start with a
set max_hop_count or server_max_docs limit and go from there. These
attributes are meant to keep the dig from spiralling out of control (or in
this case, out of the limits of your server).

<http://www.htdig.org/attrs.html#max_hop_count>
<http://www.htdig.org/attrs.html#server_max_docs>

> handle it. Right now, I'm thinking the process is too big. Can htdig and/or 
> htmerge running on a 258MB or 384MB machine handle indexing/merging sites 

This question is a bit hard to answer. From what you said, the answer is
"no," but I can't give a better answer unless there's at least an estimate
of the number of URLs, as I mentioned earlier.

There are also simple "link checker" scripts which can give you a count of
the number of URLs on a site.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to