On Thu, Dec 23, 2010 at 2:01 PM, Mark Lentczner <ma...@glyphic.com> wrote: > > On Dec 22, 2010, at 9:29 PM, Magicloud Magiclouds wrote: >> Thus under all situation (ascii, UTF-8, or even >> UTF-32), my program always send 4 bytes through the network. Is that >> OK? > > Generally, no. > > Haskell strings are sequences of Unicode characters. Each character has an > integral code point value, from 0 to 0x10ffff, but technically, the code > point itself is just a number, not a pattern of bits to be exchanged. That is > an encoding. > > In any protocol you need know the encoding before you exchange characters as > bytes or words. In some protocols it is implicit, in others explicit in > header or meta data, and in yet others (IRC comes to mind) it is undefined > (which makes problems for the user). > > The UTF-8 encoding uses a variable number of bytes to represent each > character, depending on the code point, not Word32 as you suggested. > > Converting from Haskell's String to various encodings can be done with either > the "text" package or "utf8-string" package. > > - Mark
I see. I just realize that, in this case (ssh), I could use CString to avoid all problems about encoding. -- 竹密岂妨流水过 山高哪阻野云飞 _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe