On Tue, Aug 17, 2010 at 13:40, Ketil Malde <ke...@malde.org> wrote:

> Michael Snoyman <mich...@snoyman.com> writes:
>
> > As far as space usage, you are correct that CJK data will take up more
> > memory in UTF-8 than UTF-16.
>
> With the danger of sounding ... alphabetist? as well as belaboring a
> point I agree is irrelevant (the storage format):
>
> I'd point out that it seems at least as unfair to optimize for CJK at
> the cost of Western languages.
>

Thing is that here you're only talking about size optimizations, for
somebody having to handle a lot of international texts (and I'm not
necessarily talking about Chinese or Japanese here) it would be important
that this is handled in the most efficient way possible, because in the end
storing and retrieving you only do once each while maybe doing a lot of
processing in between. And the on-disk storage or the over-the-wire format
might very well be different than the in-memory format. Each can be selected
for what it's best at.

I'll repeat here that in my opinion a Text package should be good at
handling text, human text, from whatever country. If I need to handle large
streams of ASCII I'll use something else.

:)

Cheers,
 -Tako
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to