Re: [htdig] Indexing large amount of non-related files

Marcel Hicking Wed, 24 May 2000 03:55:16 -0700
Since I dont't have a document referring all files
to be indexed, I'm thinking of generating a
start_url file "on the fly".

I have been doing this for a much smaller site: 
I have set up a little shell script to generate
a list with all available files and send it through
sed to convert local paths to http://...-URLs. 
ht://dig is set up with  start_url=allfiles.list
and a local_urls line to "undo" the above mapping 
again.

Do you think this is appropriate for a larger search
or do you have any other suggestions?

Marcel


On 23 May 00, at 17:09, Geoff Hutchison wrote:

> At 6:51 PM +0200 5/23/00, Marcel Hicking wrote:
> >I have at about 200,000 plain text files
> >spread over a few 100, maybe 1000, directories.
> >File size is between a few bytes and, sometimes,
> >above 1mb. All in all this ends up in 1.2gb
> >of data, growing daily. The files do not
> >contain HTML code and I need them to be
> >indexed at least daily (that is, nightly ;-)
> >Most of the files are static, only few of them
> >change, say, 100-200 a day.
> 
> Well I don't think you'll have much problem indexing them with 
> ht://Dig. As to performance, it depends a lot on your machine and the 
> data itself. It sounds like you might get some use out of local_urls, 
> though if they don't have extensions, you might see it hit the HTTP 
> server a lot as it tries to figure out the MIME type.
> 
> Also remember that ht://Dig currently doesn't have any sort of "index 
> this directory" feature.
> 
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/



------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
Re: [htdig] Indexing large amount of non-related files

Reply via email to