On Thu, May 1, 2014 at 1:06 PM, Malthe Borch <[email protected]> wrote:
> This is not the case in the proposed design. > You're wrong. > All string operations would behave exactly as if there was only a > single encoding. The only requirement is that the strings are properly > declared with an encoding (which may be different). > Nope, that's not how it works in practice (See below). I speak as someone who has spent blood, sweat, and tears debugging systems that work exactly like what you're proposing. > With Ruby and most other languages, a string is just a sequence of > bytes. It does not know about an encoding Wrong again, and that hasn't been the case for some 7 years. That was the case with Ruby <= 1.8, however Ruby 1.9 introduced a feature called "M17N" which works almost exactly like what you describe: each string is tagged with an encoding which is stored in a bitfield alongside the string object. > Note that it may not always be possible to encode a string to a > non-unicode encoding such as ASCII. But this is only a failure mode on > the I/O barrier where you explicitly need to encode. When no I/O > barrier and/or protocol is involved, there needs to be no awareness of > string encodings. > No, when you combine strings with different encodings, you need to transcode one of the strings. When this happens, the transcoding process may encounter some characters which are valid in one encoding, but not another, in which case the transcoding will fail, and it will fail at runtime. This can happen long after a string has crossed the I/O boundary. The result is errors which pop up at runtime in odd circumstances. This is nothing short of a fucking nightmare to debug. -- Tony Arcieri
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
