Re: [rust-dev] How to find Unicode string length in rustlang

Kevin Ballard Wed, 28 May 2014 20:37:27 -0700

On May 28, 2014, at 6:00 PM, Benjamin Striegel <[email protected]> wrote:


> To reiterate, it simply doesn't make sense to ask what the length of a string 
> is. You may as well ask what color the string is, or where the string went to 
> high school, or how many times the string rode the roller coaster that one 
> year on the first day of summer vacation when the string's parents took the 
> string to that amusement park and the weather said that it was going to rain 
> so there were almost no crowds that day but then it didn't rain and all the 
> rides were open with absolutely no lines whatsoever.

As amusing as this imagery is, you're still arguing from faulty premise, which 
is that the concept of a "string" has not been well-defined. The nebulous 
"string", as it applies to the general category of programming languages, does 
indeed not have a well-defined length. But Rust's strings (both String and str) 
are very explicitly defined as a utf-8 encoded sequence. And when dealing with 
a sequence in a precise encoding, the natural unit to work with is the code 
unit (and this has precedence in other languages, such as JavaScript, Obj-C, 
and Go).

---

My interpretation of your arguments is that your real objection is that you 
think that calling it len() will mean people won't even think about the fact 
that there's a difference between byte length and character length, because 
they'll be too used to working with ASCII data, and that they'll write code 
that breaks when forced to confront the difference. This is true regardless of 
how len() is defined (whether it's in bytes, in UTF-16 characters, in unicode 
scalar values, etc).

My assertion is that calling the method .byte_len() will not force anyone to 
deal with non-ASCII data if they don't want to, it will only annoy everyone by 
being overly verbose, even more so when you rename .slice() to .byte_slice(), 
etc.

I also believe that renaming .slice() to .byte_slice() is unambiguously wrong, 
as the name implies that it returns &[u8] when it doesn't. And similarly, that 
renaming just .len() to .byte_len() without renaming .slice() to .byte_slice() 
is also wrong. This means you cannot rename .len() to .byte_len() without 
introducing unambiguously wrong naming elsewhere.

---

Does this accurately represent your argument? And do you have any rebuttal to 
my argument that hasn't already been said? If the answers are "yes" and "no" 
respectively, then I agree, we will have to simply live with being in 
disagreement.

> Oh and while we're belligerently bikeshedding, we should rename `to_str` to 
> `to_string` once we rename `StrBuf` to `String`. :)

We've already renamed StrBuf to String, but I agree that .to_str() makes more 
sense as .to_string(). I was assuming that would eventually get renamed, 
although I just realized that it would then conflict with StrAllocating's 
.to_string() method, which is rather unfortunate.

-Kevin

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] How to find Unicode string length in rustlang

Reply via email to