On 2014-05-29, at 08:37 , Aravinda VK <[email protected]> wrote:

> I think returning length of string in bytes is just fine. Since I didn't know 
> about the availability of char_len in rust caused this confusion.
> 
> python 2.7 - Returns length of string in bytes, Python 3 returns number of 
> codepoints. 

Nope, depends on the string type *and* on compilation options.

* Python 2's `str` and Python 3's `bytes` are byte sequences, their
 len() returns their byte counts.
* Python 2's `unicode` and Python 3's `str` before 3.3 returns a code
 units count which may be UCS2 or UCS4 (depending whether the
 interpreter was compiled with `—enable-unicode=ucs2` — the default —
 or `—enable-unicode=ucs4`. Only the latter case is a true code points
 count.
* Python 3.3's `str` switched to the Flexible String Representation,
 the build-time option disappeared and len() always returns the number
 of codepoints.

Note that in no case to len() operations take normalisation or visual
composition in account.

> JS returns number of codepoints.

JS returns the number of UCS2 code units, which is twice the number of
code points for those in astral planes.
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to