I've been thinking a little more about this problem, and since it seems to
consist of two parts, I wonder if it can be solved by splitting the dig into
two parts, and then merging the databases.
If you use:
limit_urls_to: DO_TOPIC \
DO_ROOT \
DO_COMMUNITY
in one config, then my understanding of your problem is that the only 'GOOD'
URL that you will exclude is http://example.org/index.html
If you then have:
limit_urls_to: ${start_url}
Max_docs: 1 (or something similar)
in a second config then you should be able to get the missing document into a
second database, and merge it into the first.
The only problem that I can see then is that on many systems you may not be
able to get a good index this way, since the obvious start point is not
accessible in the main dig. This may then be overcome by feeding a URL list
generated by the 'short dig' (config 2) into the 'full dig' (config 1)
Mike
> On Mon, 10 Jan 2005, Dan Langille wrote:
>
> > How can I use that on limit_urls_to? I've been trying this:
> >
> > limit_urls_to: ${start_url}*DO_TOPIC|DO_ROOT|DO_COMMUNITY*
> >
> > There are addiitonal restrictions, but once I get a
> starting point, I
> > think it'll all fall into place.
> >
> > A few example of what we want to do:
> >
> > http://example.org/index.html OK
> http://example.org/index.html?ID=4
> > BAD
> http://example.org/index.html?ID=4&DO_TOPIC OK
>
********************************************************************
This email may contain information which is privileged or confidential. If you
are not the intended recipient of this email, please notify the sender
immediately and delete it without reading, copying, storing, forwarding or
disclosing its contents to any other person
Thank you
Check us out at http://www.bt.com/consulting
********************************************************************
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general