On 8 November 2011 11:43, Simon Marlow <marlo...@gmail.com> wrote: > Don't you mean 1 is what we have?
Yes, sorry! > Failing to roundtrip in some cases, and doing so silently, seems highly > suboptimal to me. I'm sorry I didn't pick up on this at the time (Unicode > is a swamp :). I *can* change the implementation back to using lone surrogates. This gives us guaranteed roundtripping but it means that the user might see lone-surrogate Char values in Strings from the filesystem/command line. IIRC this does break some software -- e.g. Brian's "text" library explicitly checks for such characters and fails if it detects them. So whatever happens we are going to end up making some group of users unhappy! * No PEP383: Haskellers using non-ASCII get upset when their command line argument [String]s aren't in fact sequences of characters, but sequences of bytes in some arbitrary encoding * PEP383(surrogates): Unicoders get upset by lone surrogates (which can actually occur at the moment, independent of PEP383 -- e.g. as character literals or from FFI) * PEP383(private chars): Unixers get upset that we can't roundtrip byte sequences that look like the codepoint 0xEFXX encoded in the current locale. In practice, 0xEFXX is only decodable from a UTF encoding, so we fail to roundtrip byte sequences like the one Ian posted. I'm happy to implement any behaviour, I would just like to know that whatever it is is accepted as the correct tradeoff :-) RE exposing a ByteString based interface to the IO library from base/unix/whatever: AFAIK Python doesn't do this, and just tells people to use the (x.encode(sys.getfilesystemencoding(), "surrogateescape")) escape hatch, which is what I've been recommending. I think this would be more satisfying to John if it were actually guaranteed to work on arbitrary byte sequences, not just *highly likely* to work :-) Max _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users