> Lennart Almkvist wrote:
> >
> > Some more testing gave the following results:
> >
> > The german flower words "Stiefmütterchen" and the islandic
> > "þrenningarfjóla" are treated different in meta content
> > and in the body or title part of an html document.
> >
> > When in the body or in the title, the "ü", "þ" and "ó "
> > are decoded to a one byte character in the .wordlist and .words.db files.
> >
> > In meta content however, these words are decoded to "stiefmuuml;t"
> > and "thorn;rennin" in the .wordlist and .words.db file. That is the "&" is
> > removed and the rest is kept as letters ("&" is in valid_punctuation but
> > the ";" is not, by default).
> >
> > Should not they be decoded as the title or body is ?
OK, we do clearly have a problem with SGML entities in 3.1.2, as well
as 3.2. (3.2 has some more serious problems, which I was hoping to
tackle, but that's another story.) So, right now, it only translates
&foo; entities outside of any HTML tags. I think there are reasons
not to translate them in all tags, but where is it valid to do so?
Certainly in keywords text, alt text in img tags, and meta description
text. How about htdig-email-subject? Any others I've missed?
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.