Thinking about it more, units() is a bad name. I think a renaming could make sense, but only if something better than len() can be found.
-Palmer Cox On Thu, May 29, 2014 at 10:55 PM, Palmer Cox <[email protected]> wrote: > What about renaming len() to units()? > > I don't see len() as a problem, but maybe as a potential source of > confusion. I also strongly believe that no one reads documentation if they > *think* they understand what the code is doing. Different people will see > len(), assume that it does whatever they want to do at the moment, and for > a significant portion of strings that they encounter it will seem like > their interpretation, whatever it is, is correct. So, why not rename len() > to something like units()? Its more explicit with the value that its > actually producing than len() and its not all that much longer to type. As > stated, exactly what a string is varies greatly between languages, so, I > don't think that lacking a function named len() is bad. Granted, I would > expect that many people expect that a string will have method named len() > (or length()) and when they don't find one, they will go to the > documentation and find units(). I think this is a good thing since the > documentation can then explain exactly what it does. > > I much prefer len() to byte_len(), though. byte_len() seems like a bit > much to type and it seems like all the other methods on strings should then > be renamed with the byte_ prefix which seems unpleasant. > > -Palmer Cox > > > On Thu, May 29, 2014 at 3:39 AM, Masklinn <[email protected]> wrote: > >> >> On 2014-05-29, at 08:37 , Aravinda VK <[email protected]> wrote: >> >> > I think returning length of string in bytes is just fine. Since I >> didn't know about the availability of char_len in rust caused this >> confusion. >> > >> > python 2.7 - Returns length of string in bytes, Python 3 returns number >> of codepoints. >> >> Nope, depends on the string type *and* on compilation options. >> >> * Python 2's `str` and Python 3's `bytes` are byte sequences, their >> len() returns their byte counts. >> * Python 2's `unicode` and Python 3's `str` before 3.3 returns a code >> units count which may be UCS2 or UCS4 (depending whether the >> interpreter was compiled with `—enable-unicode=ucs2` — the default — >> or `—enable-unicode=ucs4`. Only the latter case is a true code points >> count. >> * Python 3.3's `str` switched to the Flexible String Representation, >> the build-time option disappeared and len() always returns the number >> of codepoints. >> >> Note that in no case to len() operations take normalisation or visual >> composition in account. >> >> > JS returns number of codepoints. >> >> JS returns the number of UCS2 code units, which is twice the number of >> code points for those in astral planes. >> _______________________________________________ >> Rust-dev mailing list >> [email protected] >> https://mail.mozilla.org/listinfo/rust-dev >> > >
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
