Re: [rust-dev] UTF-8 strings versus "encoded ropes"

Mikhail Zabaluev Fri, 02 May 2014 03:32:19 -0700

Hi,

2014-05-02 3:52 GMT+03:00 Nathan Myers <[email protected]>:


>
> There's a string type because it *enforces* the guarantee of containing
>> valid UTF-8, meaning it can always be converted to code points. This
>> also means all of the Unicode algorithms can assume that they're dealing
>> with a valid sequence of code points with no out-of-range values or
>> surrogates, per the specification.
>>
>
> A UTF-8 string type can certainly earn its keep.  (Probably it should
> have "utf8" somewhere in its name.)  Not all byte sequences a program
> encounters are, or can or should be converted to, valid UTF-8.  Any
> that might not be must still be put in something that users probably
> want to call a string.


But is it the case for the standard string? What you describe is a sequence
of bytes with application-defined encoding. There might be a custom type
for it, with an API for conversion and validation. There is, however, value
in the core language string type that is 1) guaranteed to contain a valid
Unicode character sequence; 2) readily interoperable with C functions
expecting char* strings having either ASCII or UTF-8 encoding.

Best regards,
  Mikhail

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] UTF-8 strings versus "encoded ropes"

Reply via email to