Le 30/12/2011 20:55, Timon Gehr a écrit :
On 12/30/2011 08:33 PM, Joshua Reusch wrote:
Am 29.12.2011 19:36, schrieb Andrei Alexandrescu:
On 12/29/11 12:28 PM, Don wrote:
On 28.12.2011 20:00, Andrei Alexandrescu wrote:
Oh, one more thing - one good thing that could come out of this thread
is abolition (through however slow a deprecation path) of s.length and
s[i] for narrow strings. Requiring s.rep.length instead of s.length
and
s.rep[i] instead of s[i] would improve the quality of narrow strings
tremendously. Also, s.rep[i] should return ubyte/ushort, not
char/wchar.
Then, people would access the decoding routines on the needed
occasions,
or would consciously use the representation.

Yum.


If I understand this correctly, most others don't. Effectively, .rep
just means, "I know what I'm doing", and there's no change to existing
semantics, purely a syntax change.

Exactly!

If you change s[i] into s.rep[i], it does the same thing as now.
There's
no loss of functionality -- it's just stops you from accidentally doing
the wrong thing. Like .ptr for getting the address of an array.
Typically all the ".rep" everywhere would get annoying, so you would
write:
ubyte [] u = s.rep;
and use u from then on.

I don't like the name 'rep'. Maybe 'raw' or 'utf'?
Apart from that, I think this would be perfect.

Yes, I mean "rep" as a short for "representation" but upon first sight
the connection is tenuous. "raw" sounds great.

Now I'm twice sorry this will not happen...


Maybe it could happen if we
1. make dstring the default strings type --

Inefficient.

code units and characters would be the same

Wrong.

or 2. forward string.length to std.utf.count and opIndex to
std.utf.toUTFindex

Inconsistent and inefficient (it blows up the algorithmic complexity).


so programmers could use the slices/indexing/length (no lazyness
problems), and if they really want codeunits use .raw/.rep (or better
.utf8/16/32 with std.string.representation(std.utf.toUTF8/16/32)


Anyone who intends to write efficient string processing code needs this.
Anyone who does not want to write string processing code will not need
to index into a string -- standard library functions will suffice.

But generally I liked the idea of just having an alias for strings...

Me too. I think the way we have it now is optimal. The only reason we
are discussing this is because of fear that uneducated users will write
code that does not take into account Unicode characters above code point
0x80. But what is the worst thing that can happen?


ATOS origin was hacked because of bad management of unicode in string in some of their software.

Consequences can be more importants than you may think.

Additionnaly, you make an asumption that is realy wrong : an educated programmer will not make mistake. C programmers will just tell you excactly the same thing is the discution comes to pointers. But the fact is, we all do mistakes. Many of them ! We should go into unsafe behaviour, that rely on programmer capabilities only when needed.

I do understand pointers. I do make mistake with them and it does have crazy consequences sometime. And I do not trust anyone that say me he/she doesn't.

The #1 quality of a programmer is to act like he/she is a morron. Because sometime we all are morrons.

Reply via email to