According to Benjamin Smedberg:
> > Unfortunately we also need to translate URLs in an HTML context. It has
> > become a "standard" to include escapes such as & and © in the
> > URL text itself. This is not forbidden in the RFC on URIs, but for
> > obvious reasons it's not always supported by the webserver. Furthermore
> > we need to normalize URLs anyway.
> 
> The HTML4 standard specifies that HTML entities *must* be translated in
> every context except for SCRIPT tags. Therefore, the following link is
> "wrong":
> 
> <a href="mypage.cgi?ID=1&location=3">
> 
> And this is "right":
> 
> <a href="mypage.cgi?ID=1&amp;location=3">
> 
> When the client actually processes the link, however, the URI that it
> requests is
> 
> mypage.cgi?ID=1&location=3
> 
> However, most older HTML does not follow this standard (I know mine does
> not). We therefore have to check and see whether an & is followed by a valid
> HTML entity code, and only translate the entity if it is valid. I believe
> that this is how NS/IE currently function.

htdig handles bare &'s well too, so translating URIs won't be a problem.

This raises a question in my mind, though.  Since htdig doesn't care
about <SCRIPT> tags anyway, why not just translate all SGML entities all
at once, instead of doing it in a context-dependent way?  It seems to me
that early on in the 3.2 development, this was done, but that change was
backed out and the code reverted to only translating text outside of tags.
Does anyone know why this was done?  Was it causing a problem somewhere?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to