On Mon, 21 Aug 2000, Gilles Detillieux wrote:
> I don't know that a stack is easier to implement than a bitmask, but both
> approaches have their merits, and either would be better than what the
> code does now. Using a stack raises the question of how the code should
> deal with tags that are not properly nested. Should it pop off everything
> when faced with a closing tag, until it finds the matching opening tag
> on the stack?
I can come up with convoluted (but real) HTML to kill a bitmask too. For
example:
<noindex> .. <index> ... <noindex>...
What do you do when the bit is already flipped? In the case of the stack,
the common behavior in browsers is to pop off up to the closing tag and
then push the rest of the tags back on. Bad HTML, alas, is a pain in the
neck but an unfortunate fact of life.
> What about <meta htdig-noindex> and <meta htdig-index>?
> These aren't strictly opening and closing tags, so can any nesting rules
> be imposed on them?
META tags are a bit different. The way I read the spec (i.e. META robots
information), the last one wins. So there's no need to keep any sort of
stack since it applies to the whole document once you get out of the
</head> tag.
> I see the whole noindex_start thing as a separate issue, though, because
> it's parsed at an earlier stage, and actually causes sections of the
> HTML to be stripped out, rather than just flipping flags. One of the
> advantages of doing it this way is that the start and end strings don't
> have to be complete tags, and we probably don't want to loose that
It's also probably a bit faster since it doesn't pay much attention to
what's in between. My assumption has generally been that the only reason
<script> and <style> and other tags set doindex and dofollow is that we
don't currently handle multiple noindex_start portions and these were
hacks until that point.
Is there a good reason we don't implement multiple noindex_start/end lists
and throw the code for these other tags into this?
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.