>> Let 'c2h' convert CStrings to Haskell Strings, and 'h2c' convert
>> Haskell Strings to CStrings.  (If I understand correctly, c2h . h2c
>> === id, but h2c . c2h is not the identity on all inputs;
>
> That is correct.  CStrings are 8-bits, and Haskell Strings are 32-bits.  
> Converting from Haskell to C loses information, unless you use a multi-byte 
> encoding on the C side (for instance, UTF8).

So actually I am incorrect, and h2c . c2h is the identity but c2h . h2c is not?

> I suggest you look at the utf8-string package, for instance 
> Codec.Binary.UTF8.String.{encode,decode}, which convert Haskell strings 
> to/from a list of Word8, which can then be transferred via the FFI to 
> wherever you like.

This package was very helpful;  I looked at the source to see how the
utf8 encoding was done.  It looks as if the functionality I want is
technically feasible but not implemented yet;  it shouldn't be too
much trouble to implement it myself, by imitating the existing
'decode' function but changing its behavior when it runs out of input
in the middle of a utf8-character.  Also key is the property
s1 ++ s2 == decode (encode s1)) ++ decode (encode s2))
holds.

Thanks,
Eric

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to