With htdig3.2.0b4  it is possible to use the exclude-urls  as a block
directive ;
quoting


Many attributes may be restricted in scope, specifically those used by the
htdig indexer. These attributes can be specified on a
per-server or per-URL basis and thus can be applied to only one site or even
one particular portion of a site. For example: 

       <server: www.foo.com>
       server_wait_time: 5
       </server>

Here the portions inside the <server:> </server> block are normal attributes
as specified in the general
configuration documentation. However, rather than applying to all servers,
these attributes will apply only to the www.foo.com
server. 

It is also possible to have <url:> </url> blocks. With these, any URL
matching the pattern specified in the block will use the
attributes within, overriding any other configuration. 

Not all attributes apply within blocks. Those that do are listed with the
appropriate context in the attribute documentation. 





exclude_urls 
       type: 
              pattern list 
       used by: 
              htdig 
       default: 
              /cgi-bin/ .cgi 
       block: 
              URL 
       version: 
              all 
       description: 
              If a URL contains any of the space separated patterns, it will
be rejected. This is used to exclude such common things such as an infinite
virtual web-tree
              which start with cgi-bin. 
       example: 
              exclude_urls: 
                           students.html cgi-bin 





-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 06, 2003 5:15 PM
To: [EMAIL PROTECTED]
Subject: [htdig] wildcards in robot.txt?


Hi,

     I apologize if this is in the FAQ.  We have here a request from one of
the web sites we index to not index the PDF files.  I can't do that at the
global level because I want to continue indexing the other sites and in
general we do not support individual request.  It would be more simple if
the webmaster of this site could just use a robot.txt like this:

User-agent: htdig-udem
Disallow: *.pdf
Disallow: *.PDF

Is this possible with ht://Dig 3.1.6?

Thanks.




-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to