On 29/03/2014 23:15, Boris Zbarsky wrote:
On 3/29/14 6:56 PM, Simon Sapin wrote:
Or I guess we could use what I’ll call "evil UTF-8", which is UTF-8
without the artificial restriction of not encoding surrogates.
http://en.wikipedia.org/wiki/CESU-8
CESU-8 is evil too, but it’s not what I had in mind. Its main
characteristic is encoding non-BMP characters as surrogates pairs, which
does not change the value space.
But http://www.unicode.org/reports/tr26/ is unclear whether CESU-8
allows unpaired surrogates (which was the issue in the previous
message.) I suppose it does not, by virtue of valid UTF-16 not allowing
them either.
--
Simon Sapin
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo