Other people may have more elegant solutions, but what I do is carry out
two digs - first one is pretty standard: whole site, with specified,
local domains. From this dig I generate a URL_List, which contains all
urls found, whether discarded or not.

Then I run a second dig, with almost the same parameters, but with the
URL_List file as the 'start_URL'
Any domain that is listed in this file is automatically 'allowed' and
can therefore be indexed to whatever depth is required. I have this dig
limited to a few hops, and a hand full of docs, to prevent it getting
out of hand.

Finally, I merge the databases produced in the two steps.

Running the URL_List file through a filter between the two steps would
marginally improve performance, and could potentially allow a lot more
flexibility if needed.

Mike



> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf 
> Of [EMAIL PROTECTED]
> Sent: 29 July 2005 21:09
> To: [email protected]
> Subject: [htdig] how to htdig across other domains
> 
> 
> How can I get htdig to span referenced websites when it's not 
> in the domain? This would be useful to me. Thanks.
> 
> -- 
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=ick
> _______________________________________________
> ht://Dig general mailing list: <[email protected]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
> 


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to