Re: [rust-dev] UTF-8 strings versus "encoded ropes"

Malthe Borch Thu, 01 May 2014 13:07:10 -0700

On 1 May 2014 21:03, Tony Arcieri <[email protected]> wrote:
> Oh god no! Please no. This is what Ruby does and it's a complete nightmare.
> This creates an entire new class of bug when operations are performed on
> strings with incompatible encodings. It's an entire class of bug that simply
> doesn't exist if you just pick a standard encoding and stick to it.


This is not the case in the proposed design.

All string operations would behave exactly as if there was only a
single encoding. The only requirement is that the strings are properly
declared with an encoding (which may be different).

With Ruby and most other languages, a string is just a sequence of
bytes. It does not know about an encoding and therefore, the
application must always know the encoding used. This is also the case
with Python 2.x.

Note that it may not always be possible to encode a string to a
non-unicode encoding such as ASCII. But this is only a failure mode on
the I/O barrier where you explicitly need to encode. When no I/O
barrier and/or protocol is involved, there needs to be no awareness of
string encodings.

Also, note that you can't simply pick a standard encoding and stick
with it. To return to the original example of an HTTP request, the
header values are ISO 8859-1. If you insist on UTF-8 then you must
always transcode.

\malthe
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] UTF-8 strings versus "encoded ropes"

Reply via email to