On 16 Dec, Simon Blake wrote:
> Hi there
>
> I've just setup htdig to index a website that has a sitemap on every page,
> included as a drop down <select> menu. Because this is on every page, if
> you search a term that occurs in the drop down list, every page in the
> webspace is returned, which isn't wonderful on a website with several GB
> of static pages!
>
> Therefore, I'd like to prevent htdig from indexing material between
> <select name=url> and </select>. Is this a straightforward way to achieve
> this? Looking at the factor system, it struck me that a neat way to do
> this would be with a custom factor - you define the start and end tags,
> maybe with a regexp, and everything in between gets the relevant weight.
>
> I've had a good look through htdig.org, and I don't see anything -
> apologies if this is a FAQ...
>
> Cheers
> Si
>
How about this?
noindex_start, noindex_end
type:
string
used by:
htdig
default:
<!--htdig_noindex--> <!--/htdig_noindex-->
description:
The text encompassing a section of an HTML file that should be
completely ignored when indexing. As in the defaults, this can be SGML
comment declarations that can be inserted anywhere in the documents to
exclude different sections from being indexed . How ever,existing tags
can also be used; this is especially useful to exclude some sections
from being indexed where the files to be indexed can not be edited. The
example shows how SCRIPT sections in 'uneditable' documents can be
skipped; note how noindex_start does not contain an ending >: this
allows for all SCRIPT tags to be matched regardless of attributes
defined (different types or languages). Note that the match for this
string is case insensitive.
example:
noindex_start: <SCRIPT
noindex_end: </SCRIPT>
Cheers
--
David Robley
WEBMASTER | Phone +61 8 8374 0970
RESEARCH CENTRE FOR INJURY STUDIES | http://www.nisu.flinders.edu.au/
AusEinet | http://auseinet.flinders.edu.au/
Flinders University, ADELAIDE, SOUTH AUSTRALIA
Visit the PHP mirror at http://au.php.net:81/
<<<<<<<<<<<<< WARNING * END OF TEXT * STOP READING HERE >>>>>>>>>>>>>>
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.