A good name would be size(). That would avoid any confusion over various
length definitions, and just indicate how much address space it occupies.
Nathan Myers
On May 29, 2014 8:11:47 PM Palmer Cox palmer...@gmail.com wrote:
Thinking about it more, units() is a bad name. I think a renaming
Except that in C++ std::basic_string::size and std::basic_string:length are
synonymous (both return the number of CharTs, which in std::string is also
the number of bytes).
Thus I am unsure whether this would end up helping C++ developers. Might
help others though.
On Fri, May 30, 2014 at 2:12
This is a very long bikeshed for something which there's no evidence is even a
problem. I propose that we terminate this thread now.
If you believe that .len() needs to be renamed, please go gather evidence
that's compelling enough to warrant breaking tradition with practically every
On 2014-05-29 07:47, Kevin Ballard wrote:
The JavaScript version is quite wrong. Isaac points out that NFC vs NFD can
change the result,
although that's really an issue with grapheme clusters vs codepoints.
More interestingly,
JavaScript's idea of string length is wrong for anything outside
On May 28, 2014, at 11:37 PM, Aravinda VK hallimanearav...@gmail.com wrote:
I wonder if chars() available for String itself, so that we can avoid running
as_slice().chars()
This is a temporary issue. Once DST lands we will likely implement Derefstr
for String, which will make all str methods
On 29/05/2014 08:25, Kevin Ballard wrote:
This is a temporary issue. Once DST lands we will likely implement
Derefstr for String, which will make all str methods work
transparently on String.
Until then, is there a reason not to have String implement the StrSlice
trait?
--
Simon Sapin
On 2014-05-29, at 08:37 , Aravinda VK hallimanearav...@gmail.com wrote:
I think returning length of string in bytes is just fine. Since I didn't know
about the availability of char_len in rust caused this confusion.
python 2.7 - Returns length of string in bytes, Python 3 returns number of
What about renaming len() to units()?
I don't see len() as a problem, but maybe as a potential source of
confusion. I also strongly believe that no one reads documentation if they
*think* they understand what the code is doing. Different people will see
len(), assume that it does whatever they
Thinking about it more, units() is a bad name. I think a renaming could
make sense, but only if something better than len() can be found.
-Palmer Cox
On Thu, May 29, 2014 at 10:55 PM, Palmer Cox palmer...@gmail.com wrote:
What about renaming len() to units()?
I don't see len() as a problem,
Hi,
How to find number of characters in a string?
Following example returns byte count instead of number of characters.
use std::string::String;
fn main() {
let unicode_str = String::from_str(ಅ);
let ascii_str = String::from_str(a);
println!(unicode str: {},
On 2014-05-28, at 11:10 , Aravinda VK hallimanearav...@gmail.com wrote:
Hi,
How to find number of characters in a string?
Problem 1: define character. Do you mean a glyph? A grapheme cluster? A
code point? Composed or decomposed?
Problem 2: what use is knowing the length of a string?
Thanks. I didn't know about char_len. `unicode_str.as_slice().char_len()`
is giving number of code points.
Sorry for the confusion, I was referring codepoint as character in my mail.
char_len gives the correct output for my requirement. I have written
javascript script to convert from string
I think that the naming of `len` here is dangerously misleading. Naive
ASCII-users will be free to assume that this is counting codepoints rather
than bytes. I'd prefer the name `byte_len` in order to make the behavior
here explicit.
On Wed, May 28, 2014 at 5:55 AM, Simon Sapin
On 28/05/14 10:07 AM, Benjamin Striegel wrote:
I think that the naming of `len` here is dangerously misleading. Naive
ASCII-users will be free to assume that this is counting codepoints
rather than bytes. I'd prefer the name `byte_len` in order to make the
behavior here explicit.
It doesn't
On 28/05/2014 15:13, Daniel Micay wrote:
On 28/05/14 10:07 AM, Benjamin Striegel wrote:
I think that the naming of `len` here is dangerously misleading. Naive
ASCII-users will be free to assume that this is counting codepoints
rather than bytes. I'd prefer the name `byte_len` in order to make
It's .len() because slicing and other related functions work on byte indexes.
We've had this discussion before in the past. People expect there to be a
.len(), and the only sensible .len() is byte length (because char length is not
O(1) and not appropriate for use with most string-manipulation
People expect there to be a .len()
This is the assumption that I object to. People expect there to be a .len()
because strings have been fundamentally broken since time immemorial. Make
people type .byte_len() and be explicit about their desire to index via
code units.
On Wed, May 28, 2014 at
Breaking with established convention is a dangerous thing to do. Being too
opinionated (regarding opinions that deviate from the norm) tends to put people
off the language unless there's a clear benefit to forcing the alternative
behavior.
In this case, there's no compelling benefit to naming
Being too opinionated (regarding opinions that deviate from the norm)
tends to put people off the language unless there's a clear benefit to
forcing the alternative behavior.
We have already chosen to be opinionated by enforcing UTF-8 in our strings.
This is an extension of that break with
Benjamin seems to say that folks won't read the docs and we need to make
the syntax more helpful..
Kevin seems to say that we need to keep the syntax simple and just teach
folks to read the docs.
I think I would agree with both of them overall for a language design goal
that Rust wants to
On May 28, 2014, at 11:55 AM, Benjamin Striegel ben.strie...@gmail.com wrote:
Being too opinionated (regarding opinions that deviate from the norm) tends
to put people off the language unless there's a clear benefit to forcing
the alternative behavior.
We have already chosen to be
There's no clear tradition regarding strings.
Excellent, then surely nobody has any right to expect a method named .len()
:)
Unicode is not a simple concept. UTF-8 on the other hand is a pretty
simple concept.
I don't think we can fully divorce these two ideas. Understanding UTF-8
still
On May 28, 2014, at 1:26 PM, Benjamin Striegel ben.strie...@gmail.com wrote:
Unicode is not a simple concept. UTF-8 on the other hand is a pretty simple
concept.
I don't think we can fully divorce these two ideas. Understanding UTF-8 still
implies understanding the difference between
Do you honestly believe
Yes. Anyone who comes to Rust expecting there to be a .len() method on
strings has demonstrated that they fundamentally misunderstand what strings
are. Correcting them will be a learning experience, to their benefit.
more verbose, annoying, unconventional names
I
On 29/05/14 06:38, Kevin Ballard wrote:
On May 28, 2014, at 1:26 PM, Benjamin Striegel ben.strie...@gmail.com
mailto:ben.strie...@gmail.com wrote:
Unicode is not a simple concept. UTF-8 on the other hand is a
pretty simple concept.
I don't think we can fully divorce these two ideas.
On May 28, 2014, at 3:24 PM, Huon Wilson dbau...@gmail.com wrote:
Changing the names of methods on strings seems very similar how Path does not
implement Show (except with even stronger motivation, because strings have at
least 3 sensible interpretations of what the length could be).
I
Oh and while we're belligerently bikeshedding, we should rename `to_str` to
`to_string` once we rename `StrBuf` to `String`. :)
On Wed, May 28, 2014 at 9:00 PM, Benjamin Striegel
ben.strie...@gmail.comwrote:
but people will still end up calling the
*exact same method*
...Except when they
On May 28, 2014, at 6:00 PM, Benjamin Striegel ben.strie...@gmail.com wrote:
To reiterate, it simply doesn't make sense to ask what the length of a string
is. You may as well ask what color the string is, or where the string went to
high school, or how many times the string rode the roller
On 2014-05-29 05:36, Kevin Ballard wrote:
[--snip--]
And when dealing with a sequence in a precise encoding, the natural unit to
work
with is the code unit (and this has precedence in other languages,
such as JavaScript, Obj-C, and Go).
JavaScript:
$ node
var s = hï; // Note the
Hi all, I don't suggest seeing Javascript as a great example for Rust.
It uses UTF-16, but was created back when UTF-16 was UCS-2, so
two-code-unit codepoints are poorly supported in Javascript (e.g. you
can't use them in regex character classes).
On 05/29/2014 12:16 AM, Bardur Arantsson
On May 28, 2014, at 9:16 PM, Bardur Arantsson s...@scientician.net wrote:
Rust:
$ cat
fn main() {
let l = hï.len(); // Note the accent
println!({:u}, l);
}
$ rustc hello.rs
$ ./hello
3
No matter how defective the notion of length may be, personally I
think that
31 matches
Mail list logo