Re: Unicode

Marcin 'Qrczak' Kowalczyk Tue, 16 May 2000 03:27:57 -0700
Tue, 16 May 2000 10:44:28 +0200, George Russell <[EMAIL PROTECTED]> pisze:

> > As for the language standard: I hope that Char will be allowed or
> > required to have >=30 bits instead of current 16; but never more than
> > Int, to be able to use ord and chr safely.
> 
> Er does it have to?  The Java Virtual Machine implements Unicode with
> 16 bits.  (OK, so I suppose that means it can't cope with Korean or Chinese.)
> So requiring Char to be >=30 bits would stop anyone implementing a
> conformant Haskell on the JVM.

OK, "allowed", not "required"; currently it is not even allowed.
The minimum should probably be 16, maximum - the size of Int.

Oops, ord will have to be allowed to return negative numbers when
the size of Char is equal to the size of Int. Another solution is to
make Char at least one bit less than Int, or also at the same time
no larger than 31 bits. ISO-10646 describes the space of 31 bits,
UTF-8 is able to encode up to 31 bits, so then a UTF-8 encoder would
be portable without worrying about Char values that don't fit, and
a decoder could easily check if a character is representable in Char:
ord maxBound would be guaranteed to be positive.

Choices I see:
- 30 <= Int, 16 <= Char <= 31, Char <  Int
- 30 <= Int, 16 <= Char,       Char <  Int
- 30 <= Int, 16 <= Char,       Char <= Int

-- 
 __("<    Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/              GCS/M d- s+:-- a23 C+++$ UL++>++++$ P+++ L++>++++$ E-
  ^^                  W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP+ t
QRCZAK                  5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y-
Re: Unicode

Reply via email to