At 3:36 PM -0600 12/3/99, Gilles Detillieux wrote:
>1) The handling of meta keywords and meta descriptions kept right on
>going even if doindex was 0, so a noindex tag had no effect on these.
Whoops! That's a pretty big bug. The best I can claim is that I used
the code from meta keywords for the meta descriptions. :-(
>2) The handling of meta keywords and meta descriptions didn't
>consider word offsets in the document - it used a relative offset of
>1 for everything. While not really a problem for the keywords tags,
>it seemed wrong to me that it did that for meta descriptions too.
>I changed the latter.
>
>3) The relative offset calculation was done by dividing by the total
>size of the document, before any stripping of comments, JavaScript, etc.
>That meant that the more stuff got stripped out, the lower the offset
>(and the higher the importance) of remaining words. I've changed it to
>use the size after stripping.
>
>It appears that (2) and (3) are no longer an issue in 3.2, but (1)
>still is. Anyway, I'd appreciate some feedback on these fixes, as
>well as the img alt handling and my HtWordtoken() function. It all
>seems to work, as far as I can tell, but I may have missed something.
Yes, (3) isn't an issue in 3.2 because it doesn't do this at all. It
just keeps around a count of the number of words and increments it.
At the moment, it doesn't do any scoring based on location in the
document either. (Words used to be scaled from 1-1000 in the
document, 1000 being at the end. They also were scaled linearly based
on this relative position.)
I'm not sure if 3.2 keeps track of word offsets in descriptions or
keywords. If it doesn't, it should somehow. Maybe it should just
start the word count at 1 for the first indexable word it sees?
The code looks fine to my eyes, though we should probably run it
through some tests. Hmm... more HTML tests? ;-)
-Geoff
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.