Matt Kraai <[EMAIL PROTECTED]> writes:
> On Mon, Jul 23, 2001 at 12:54:31PM +0200, Kjetil Torgrim Homme wrote:
> > It's very early, yet. But a few things are reasonably clear. They'll
> > use Unicode, they just haven't decided on the encoding. That is, all
> > characters which aren't US-ASCII will probably be added to the list of
> > allowed characters.
>
> I don't think so. According to [1], Appendix F, there are quite a
> few prohibited non-ASCII code points.
>
> 1. http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-04.txt
You are right. (btw, it's now replaced by -05)
> There are also some normalization rules. I'm not sure if the
> normalizations should be performed in dbootstrap or in some lower
> layer, however.
Ugh. Do we really want to go there?
> > The limit of 63 octets per name component probably won't change,
> > but notice that the number of characters will be less, depending
> > on the encoding.
> >
> > One thing: It would be good to disallow the use of ASCII Compatible
> > Encoding-prefixes. They look like "xx--", where x is an arbitrary
> > letter.
>
> I've never seen these before, this being my first foray into
> internationalization. I'll keep this in mind, however.
>
> Will the input be encoded in UTF-8?
No, that will break too many protocols. That's the reason for ASCII
Compatible Encoding, using only characters "a-z0-9/-".
Look at some of the examples (/^Exampl) in
http://www.i-d-n.net/draft/draft-ietf-idn-amc-ace-w-00.txt
Notice that UTF8 is inefficient for Hangeul and other scripts, even if
it uses the full 8 bits instead of 5.
(Personally, I hope something like
http://www.i-d-n.net/draft/draft-ietf-idn-udns-02.txt
passes. I'm not too optimistic, this reminds me of all the warts of
MIME we probably never will be rid of.)
Kjetil T.
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]