Hi folks,

I'm using ht://Dig 3.2.0b4-011302 and I'm having trouble specifying a 
regex that will limit the URLs I want to crawl.

Each URL must contain one of the following (actually, there are more 
values in this list, but they have been eliminated to simply things):

  DO_TOPIC
  DO_ROOT
  DO_COMMUNITY

How can I use that on limit_urls_to?  I've been trying this:

limit_urls_to:  ${start_url}*DO_TOPIC|DO_ROOT|DO_COMMUNITY*

There are addiitonal restrictions, but once I get a starting point, I 
think it'll all fall into place.

A few example of what we want to do:

  http://example.org/index.html OK
  http://example.org/index.html?ID=4  BAD 
  http://example.org/index.html?ID=4&DO_TOPIC OK

Thanks.
-- 
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to