>100,000 pages". However, I don't want to embark on this endeavor if it's
>really a poor use of a good tool. (Like, my first question would be can
>the config file handle 100,000 starting points?)

Sure. You can include a file into any config attribute by enclosing 
it in backticks. So to include a file of URLs to start with:

start_url:   `/path/to/url/file`

You should also understand that if you include a lot of URLs up 
front, it has to make up the list of them all in memory, so it'll 
probably grab a ton right when it starts up.

>Am I barking up the wrong tree? If so, what would ya'll recommend. My
>goal is to run this on a Linux box (Redtel) and not spend much of
>anything on the software. Am I nuts?

Personally, my server handles about 75,000 URLs at the moment without 
breaking much sweat. But we have a dedicated drive with plenty of 
space, a large swap partition, and 128MB RAM. I also don't have DBs 
large enough to break the 2GB size limit imposed by Linux.

As a developer, my biggest concern when people say they want to index 
a ton of URLs is that they don't know what they're in for. For one, 
you'll probably need to tinker with the config file. I'd probably add 
this anytime I'm indexing someone elses servers:

server_wait_time: 30

Finally, indexing a ton of pages takes a ton of resources. There's no 
way to get around that. Not to say we don't work at it. ;-)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to