According to Gilles Detillieux:
>> Lennart Almkvist wrote:
>> > 
>> > Some more testing gave the following results:
>> > 
>> > The german flower words "Stiefmütterchen" and the islandic
>> > "þrenningarfjóla" are treated different in meta content
>> > and in the body or title part of an html document.
>> > 
>> > When in the body or in the title,  the  "ü", "þ" and "ó "
>> > are decoded to a one byte character in the .wordlist and .words.db files.
>> > 
>> > In meta content however, these  words are decoded to "stiefmuuml;t"
>> > and "thorn;rennin" in the .wordlist and .words.db file. That is the "&" is
>> > removed and the rest is kept as letters ("&" is in valid_punctuation but
>> > the ";"  is not, by default).
>> > 
>> > Should not they be decoded as the title or body is ?
>
>OK, we do clearly have a problem with SGML entities in 3.1.2, as well
>as 3.2.  (3.2 has some more serious problems, which I was hoping to
>tackle, but that's another story.)  So, right now, it only translates
>&foo; entities outside of any HTML tags.  I think there are reasons
>not to translate them in all tags, but where is it valid to do so?
>Certainly in keywords text, alt text in img tags, and meta description
>text.  How about htdig-email-subject?  Any others I've missed?

- HTML 4.0 "title" attribute (not yet handled by ht://Dig, but would be
  nice to improve search results)

- Most of Dublin Core META infomation contents (would be nice if ht://Dig
  could directly support this META standard).

- Alt text in client side image maps.


cheers,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]            Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to