PowerMail Engineering wrote:

>and non US-ASCII characters in an URL are invalid, as there
>is no specification of the charset they would use.

Hasn't the URI spec recently been updated to both follow already common
practice and define the encoding to be used? (I've just remotely
followed this.)

See e.g.:
<http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#percent-encoding>
(last paragraph of Section 2.5; I'm aware that this is a general URI
definition document, not explicitly applying to existing protocols like
mailto: )

So, the recommendation is to encode any non-ASCII characters first as
UTF-8, then encode that octet (byte) sequence using "%" hex escapes. So
effectively, the encoding has been fixed at UTF-8.

An 'ü' would therefore be encoded like this:

ü = Unicode: 0xFC
  = UTF-8: 0xC3 0xBC
  = URL: %C3%BC

(At least this is what I am doing with URLs with hi-ASCII characters in
my own product.)

Regards, Christian.





Reply via email to