Hi folks,
I'm using ht://Dig 3.2.0b4-011302 and I'm having trouble specifying a
regex that will limit the URLs I want to crawl.
Each URL must contain one of the following (actually, there are more
values in this list, but they have been eliminated to simply things):
DO_TOPIC
DO_ROOT
DO_COMMUNITY
How can I use that on limit_urls_to? I've been trying this:
limit_urls_to: ${start_url}*DO_TOPIC|DO_ROOT|DO_COMMUNITY*
There are addiitonal restrictions, but once I get a starting point, I
think it'll all fall into place.
A few example of what we want to do:
http://example.org/index.html OK
http://example.org/index.html?ID=4 BAD
http://example.org/index.html?ID=4&DO_TOPIC OK
Thanks.
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general