Re: [rust-dev] UTF-8 strings versus "encoded ropes"

Matthieu Monrocq Wed, 14 May 2014 10:43:39 -0700

On Wed, May 14, 2014 at 2:25 PM, Armin Ronacher <armin.ronac...@active-4.com
> wrote:


> Hi,
>
> On 02/05/2014 00:03, John Downey wrote:
>
>> I have actually always been a fan of how .NET did this. The System.String
>> type
>> is opinionated in how it is stored internally and does not allow anyone to
>> change that (unlike Ruby). The conversion from String to byte[] is done
>> using
>> explicit conversion methods like:
>>
> Unfortunately the .NET string type does not support UCS4 and as such is a
> nightmare to deal with.  Also because the internal encoding is not UTF-8
> *any* interaction with the outside world (ignoring the win32 api) is going
> through an encode/decode step which can be unnecessary.
>
> For instance if you would do that on Linux you would decode from utf-8 to
> your internal UCS4 encoding, then encode back to utf-8 on the way back to
> the terminal.  (Aside from that, 32bit for a charpoint is too large as
> unicode does not go in more than 21bit or something.  Useless)
>
>
Even keeping whole bytes, 3 bytes (24 bits) is effectively sufficient for
the whole of Unicode. If you don't mind some arithmetic, you could thus use
a backing array of bytes and just recompose the value on output.



>
>
> Regards,
> Armin
>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>

_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] UTF-8 strings versus "encoded ropes"

Reply via email to