Hi, 2014-05-02 3:52 GMT+03:00 Nathan Myers <[email protected]>:
> > There's a string type because it *enforces* the guarantee of containing >> valid UTF-8, meaning it can always be converted to code points. This >> also means all of the Unicode algorithms can assume that they're dealing >> with a valid sequence of code points with no out-of-range values or >> surrogates, per the specification. >> > > A UTF-8 string type can certainly earn its keep. (Probably it should > have "utf8" somewhere in its name.) Not all byte sequences a program > encounters are, or can or should be converted to, valid UTF-8. Any > that might not be must still be put in something that users probably > want to call a string. But is it the case for the standard string? What you describe is a sequence of bytes with application-defined encoding. There might be a custom type for it, with an API for conversion and validation. There is, however, value in the core language string type that is 1) guaranteed to contain a valid Unicode character sequence; 2) readily interoperable with C functions expecting char* strings having either ASCII or UTF-8 encoding. Best regards, Mikhail
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
