On Wed, 29 Nov 2006, Dan Richardson wrote:

> In my htdig conf file I have:
> 
> start_url:              http://www.mydomain.co.uk/
> limit_urls_to:        http://www.mydomain.co.uk/
> exclude_urls:        /usr/local/htdig/conf/excludes

To pull in a file you should enclose the path in backticks. For example,

  exclude_urls:        `/usr/local/htdig/conf/excludes`

> /usr/local/htdig/conf/excludes contains:
> 
> /cgi-bin/ .cgi action=vote&voteid bookmark.html email.html reddit.com
> del.icio.us www.google.com digg.com ma.gnolia.com www.newsvine.com

You might want to consider moving action=vote&voteid to the bad_querystr
attribute, which exists specifically for this purpose.

> As I understand it both limit_urls and exclude_urls are string patterns, but
> which one takes precedence?

I believe exclude_urls is tested before limit_urls_to.

> I have links on my site such as:
> http://reddit.com/submit?url=http://www.mydomain.co.uk/about_us/for_your_site/email.html
> which contains both exclude_urls and limit_urls strings and htdig seems to
> be trying to index these links, any pointers on how I can definately exclude
> them from an index?

You don't have more than one exclude_urls line in your config file do
you? If the default definition is still lurking in the file it might be
overriding your custom settings.

Jim

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to