Re: [rust-dev] How to find Unicode string length in rustlang

Matthieu Monrocq Fri, 30 May 2014 09:44:37 -0700

Except that in C++ std::basic_string::size and std::basic_string:length are
synonymous (both return the number of CharTs, which in std::string is also
the number of bytes).


Thus I am unsure whether this would end up helping C++ developers. Might
help others though.


On Fri, May 30, 2014 at 2:12 PM, Nathan Myers <[email protected]> wrote:

> A good name would be size().  That would avoid any confusion over various
> length definitions, and just indicate how much address space it occupies.
>
> Nathan Myers
>
>
> On May 29, 2014 8:11:47 PM Palmer Cox <[email protected]> wrote:
>
>  Thinking about it more, units() is a bad name. I think a renaming could
>> make sense, but only if something better than len() can be found.
>>
>> -Palmer Cox
>>
>>
>> On Thu, May 29, 2014 at 10:55 PM, Palmer Cox <[email protected]> wrote:
>>
>> > What about renaming len() to units()?
>> >
>> > I don't see len() as a problem, but maybe as a potential source of
>> > confusion. I also strongly believe that no one reads documentation if
>> they
>> > *think* they understand what the code is doing. Different people will
>> see
>> > len(), assume that it does whatever they want to do at the moment, and
>> for
>> > a significant portion of strings that they encounter it will seem like
>> > their interpretation, whatever it is, is correct. So, why not rename
>> len()
>> > to something like units()? Its more explicit with the value that its
>> > actually producing than len() and its not all that much longer to type.
>> As
>> > stated, exactly what a string is varies greatly between languages, so, I
>> > don't think that lacking a function named len() is bad. Granted, I would
>> > expect that many people expect that a string will have method named
>> len()
>> > (or length()) and when they don't find one, they will go to the
>> > documentation and find units(). I think this is a good thing since the
>> > documentation can then explain exactly what it does.
>> >
>> > I much prefer len() to byte_len(), though. byte_len() seems like a bit
>> > much to type and it seems like all the other methods on strings should
>> then
>> > be renamed with the byte_ prefix which seems unpleasant.
>> >
>> > -Palmer Cox
>> >
>> >
>> > On Thu, May 29, 2014 at 3:39 AM, Masklinn <[email protected]>
>> wrote:
>> >
>> >>
>> >> On 2014-05-29, at 08:37 , Aravinda VK <[email protected]>
>> wrote:
>> >>
>> >> > I think returning length of string in bytes is just fine. Since I
>> >> didn't know about the availability of char_len in rust caused this
>> >> confusion.
>> >> >
>> >> > python 2.7 - Returns length of string in bytes, Python 3 returns
>> number
>> >> of codepoints.
>> >>
>> >> Nope, depends on the string type *and* on compilation options.
>> >>
>> >> * Python 2's `str` and Python 3's `bytes` are byte sequences, their
>> >>  len() returns their byte counts.
>> >> * Python 2's `unicode` and Python 3's `str` before 3.3 returns a code
>> >>  units count which may be UCS2 or UCS4 (depending whether the
>> >>  interpreter was compiled with `—enable-unicode=ucs2` — the default —
>> >>  or `—enable-unicode=ucs4`. Only the latter case is a true code points
>> >>  count.
>> >> * Python 3.3's `str` switched to the Flexible String Representation,
>> >>  the build-time option disappeared and len() always returns the number
>> >>  of codepoints.
>> >>
>> >> Note that in no case to len() operations take normalisation or visual
>> >> composition in account.
>> >>
>> >> > JS returns number of codepoints.
>> >>
>> >> JS returns the number of UCS2 code units, which is twice the number of
>> >> code points for those in astral planes.
>> >> _______________________________________________
>> >> Rust-dev mailing list
>> >> [email protected]
>> >> https://mail.mozilla.org/listinfo/rust-dev
>> >>
>> >
>> >
>>
>
>
>
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev
>

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] How to find Unicode string length in rustlang

Reply via email to