On Sat, Aug 14, 2010 at 3:46 PM, Sean Leather <leat...@cs.uu.nl> wrote:

>
> So then, what is the standard?
>

There isn't one. There are many national standards:

   - China: GB-2312, GBK and GB18030
   - Taiwan: Big5
   - Japan: JIS and Shift-JIS (0208 and 0213 variants) and EUC-JP
   - Korea: KS-X-2001, EUC-KR, and ISO-2022-KR

In general, Unicode uptake is increasing rapidly:
http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html

Being not familiar with this area, I googled a bit, and I don't see a
> consensus. But I also noticeably don't see UTF-16. So, if this is the case,
> then a similar question still arises for CJK text: What format/library to
> use for it (assuming one doesn't want a performance penalty for translating
> between Data.Text's internal format and the target format)?
>

In my opinion, this "performance penalty" hand-wringing is mostly silly.
We're talking a pretty small factor of performance difference in most of
these cases. Even the biggest difference, between ByteString and String, is
usually much less than a factor of 100.

Your absolute first concern should be correctness, for which you should (a)
use text and (b) assume that any performance issues are being actively
worked on, especially if you report concrete problems and how to reproduce
them. In the unlikely event that you need to support non-Unicode encodings,
they are readily available via text-icu.

The only significant change to the text API that lies ahead is an
introduction of locale support in a few critical places, so that we can do
the right thing for languages like Turkish.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to