On 2014-05-29, at 08:37 , Aravinda VK <[email protected]> wrote:
> I think returning length of string in bytes is just fine. Since I didn't know > about the availability of char_len in rust caused this confusion. > > python 2.7 - Returns length of string in bytes, Python 3 returns number of > codepoints. Nope, depends on the string type *and* on compilation options. * Python 2's `str` and Python 3's `bytes` are byte sequences, their len() returns their byte counts. * Python 2's `unicode` and Python 3's `str` before 3.3 returns a code units count which may be UCS2 or UCS4 (depending whether the interpreter was compiled with `—enable-unicode=ucs2` — the default — or `—enable-unicode=ucs4`. Only the latter case is a true code points count. * Python 3.3's `str` switched to the Flexible String Representation, the build-time option disappeared and len() always returns the number of codepoints. Note that in no case to len() operations take normalisation or visual composition in account. > JS returns number of codepoints. JS returns the number of UCS2 code units, which is twice the number of code points for those in astral planes. _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
