On Mon, Mar 26, 2012 at 06:08, Christian Siefkes <christ...@siefkes.net>wrote:

> On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
> > True, but should the language definition default to a string type
> > that is one the most unsuited for text processing in the 21st
> > century where global multilingualism abounds?  Even C has qualms
> > about that.
> ...
> > I have no doubt believing that if all texts my students have to
> > process are US ASCII, [Char] is more than sufficient.  So, I have
> > sympathy for your position.  However,  I doubt [Char] would be
> > adequate if I ask them to shared texts from their diverse cultures.
>
> Uh, while a C char is (usually) just a byte (2^8 bits of information, like
> Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
> information). A single C char cannot contain arbitrary Unicode character,
> while a Haskell Char can, and does. Hence [Char] is (efficiency issues
> aside) perfectly adequate for dealing with texts written in arbitrary
> languages.
>

...as long as you ignore combining characters and the like.  I claim
ignoring them in this way is just continuing the same "good enough for my
language" attitude that has plagued text handling ever since someone got
the notion that maybe text processing should consider more than just ISO
8859/1 and got roundly pooh-poohed by the community.

-- 
brandon s allbery                                      allber...@gmail.com
wandering unix systems administrator (available)     (412) 475-9364 vm/sms
_______________________________________________
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime

Reply via email to