On Mon, 10 Jan 2005, Dan Langille wrote:

Each URL must contain one of the following (actually, there are more
values in this list, but they have been eliminated to simply things):

 DO_TOPIC
 DO_ROOT
 DO_COMMUNITY

How can I use that on limit_urls_to?  I've been trying this:

limit_urls_to:  ${start_url}*DO_TOPIC|DO_ROOT|DO_COMMUNITY*

There are addiitonal restrictions, but once I get a starting point, I
think it'll all fall into place.

A few example of what we want to do:

 http://example.org/index.html OK
 http://example.org/index.html?ID=4  BAD
 http://example.org/index.html?ID=4&DO_TOPIC OK

I don't think that you are going to be able to do what you want with limit_urls_to. The attribute contains a list of patterns, one of which must be matched. Once you add a pattern that satisfies the first URL above, the other two are also satisfied since they contain the first.

I am not sure how you would completely solve this type of problem short of
somehow using the external parser/converter mechanism as a filter.
Depending on specifics, you might be able to handle some restrictions
through the bad_querystr attribute, but that would not be sufficient for
the example above. There are also restrict and exclude attributes, but
those are applied at search time. The only other thing I can think of is
perhaps using url_rewrite_rules to rewrite URL's that you don't want to
something that limit_normalized then then drops (never tried this and
don't even know if it is actually feasible).

Jim


------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to