Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

Simon Marlow Wed, 04 Feb 2009 05:31:46 -0800

Duncan Coutts wrote:

On Tue, 2009-02-03 at 11:03 -0600, John Goerzen wrote:

Will there also be something to handle the UTF-16 BOM marker?  I'm not
sure what the best API for that is, since it may or may not be present,
but it should be considered -- and could perhaps help autodetect encoding.


I think someone else mentioned this already, but utf16 (as opposed to
utf16be/le) will use the BOM if its present.

I'm not quite sure what happens when you switch encoding, presumably
it'll accept and consider a BOM at that point.

Yes; the utf16 and utf32 encodings accept a BOM (and generate a BOM inwrite mode). This caused interesting bugs when doing re-decoding afterswitching encodings, because the BOM constitutes state in the decoder,which means that decoding is not necessarily repeatable unless you save thestate (which iconv doesn't provide a way to do).

Are there other encodings that have this kind of state? If so, then theymight be restricted to NoBuffering at least when switching encodings.


Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

Reply via email to