Re: [Haskell-cafe] UTF-8 in Haskell.

Magicloud Magiclouds Wed, 22 Dec 2010 22:17:17 -0800

On Thu, Dec 23, 2010 at 2:01 PM, Mark Lentczner <ma...@glyphic.com> wrote:
>
> On Dec 22, 2010, at 9:29 PM, Magicloud Magiclouds wrote:
>> Thus under all situation (ascii, UTF-8, or even
>> UTF-32), my program always send 4 bytes through the network. Is that
>> OK?
>
> Generally, no.
>
> Haskell strings are sequences of Unicode characters. Each character has an 
> integral code point value, from 0 to 0x10ffff, but technically, the code 
> point itself is just a number, not a pattern of bits to be exchanged. That is 
> an encoding.
>
> In any protocol you need know the encoding before you exchange characters as 
> bytes or words. In some protocols it is implicit, in others explicit in 
> header or meta data, and in yet others (IRC comes to mind) it is undefined 
> (which makes problems for the user).
>
> The UTF-8 encoding uses a variable number of bytes to represent each 
> character, depending on the code point, not Word32 as you suggested.
>
> Converting from Haskell's String to various encodings can be done with either 
> the "text" package or "utf8-string" package.
>
>                - Mark


I see. I just realize that, in this case (ssh), I could use CString to
avoid all problems about encoding.

-- 
竹密岂妨流水过
山高哪阻野云飞

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] UTF-8 in Haskell.

Reply via email to