Re: [htdig] fine-tuning htdig/indexing local files

Jim Sat, 26 Jun 2004 04:57:59 -0700

On Sat, 19 Jun 2004, Matt Price wrote:

> I think this first one is simpler.  Now that I know how great htdig
> is, I'd like to use it to index the local files on my workstation.  Is
> it possible to get htdig to simply iterate recursively through a
> filesystem, rather than following hyperlinks?  Perhaps using the
> local_* variables somehow?


Even with local_* variables, htdig still discovers documents by following
links. If you need to index an arbitrary collection of unlinked documents,
your best is probably to write a script that enumerates the list of files
and then pass that list to htdig through the start_url attribute. This is
discussed briefly in the following FAQ.

http://www.htdig.org/dev/htdig-3.2/FAQ.html#q5.25

> my course websites have a fair number of external links.  I would love
> for ht://dig to index the pages linked to, but NOT keep crawling
> further along the chain of links.  That is, when ht://dig sees a link
> to an external page, it would follow that link, index it, but NOT go
> any further.  Even better would be if those links that wget calls
> "page-requisite" -- links that need to be loaded in order to view the
> page properly -- are also indexed.

I do not know of any general way to do this with htdig alone. Perhaps
you could make an initial pass with wget that collects all of the URLs
that you are interested in and then feed those to htdig with an
appropriately configured max_hop_count?

Jim


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Re: [htdig] fine-tuning htdig/indexing local files

Reply via email to