On Mon, 6 Jan 2003, Jerry Stratton wrote:

> What is the reason that ht://dig does any conversion of
> ampersand-entities at all for excerpt display? Why not leave
> &validcharacters; alone in the excerpts, and let the browsers choose
> how to display them?

Most importantly, many entities *must* be transformed in some fashion to
allow for indexing and searching accented text. Some files may have
non-escaped 8-bit accented characters and some may have proper
é etc. But imagine you didn't transform and tried to search for
résumé (don't know how that'll go through, but maybe that's
the point).

Or let's say you're indexing documentation about HTML. And you want to
search for "&" which was probably in the HTML itself as &

In theory, you're also wasting space in the excerpt: é is 7 bytes
versus one byte for the actual 8-bit character.

Suggestions *greatly* welcome.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to