subject:"Re\: Go and generic programming on reddit, also touches on D"

Re: Go and generic programming on reddit, also touches on D

2011-09-22 Thread Robert Jacques


On Wed, 21 Sep 2011 16:02:53 -0400, Andrei Alexandrescu 
seewebsiteforem...@erdani.org wrote:

On 9/21/11 1:49 PM, Don wrote:

On 19.09.2011 18:12, Andrei Alexandrescu wrote:

On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei


Note that high-performance libraries that use slices, like GMP and the
many BLAS libraries, use the pointer+length representation, not
pointer+pointer. They've done a lot of benchmarking on a huge range of
architectures, with a large range of compilers.

The underlying reason for this, is that almost all CISC instruction sets
have built-in support for pointer+length. AFAIK nothing has builtin
support for ptr+ptr.

On x86, you have this wonderful [EAX+8*EBX] addressing mode, that can be
used on almost every instruction, so that the calculation [addr +
sz*index] takes ZERO clock cycles when sz is a power of 2.
Generally, when you supply two pointers, the optimizer will try to
convert it into ptr + offset (where offset isn't bytes, it corresponds
to D's length).


To all who replied and tested - color me convinced we should keep the
current state of affairs. Thanks!

Andrei



No problem. Also, TDPL uses ptr+ptr for its representation. Having gone back 
and looked at the chapter on arrays, I think that it makes for great figures 
and aides the comprehension of ideas. On the other hand, a lot of programming 
books, e.g. Numeric Recipes in C, have done a lot of harm over the years 
through people copying their sub-optimal code samples/snippets. So you may want 
to add a sentence regarding D's actual implementation to the clause under 
figure 4.1 on page 98.

Re: Go and generic programming on reddit, also touches on D

2011-09-21 Thread Don


On 19.09.2011 18:12, Andrei Alexandrescu wrote:

On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei


Note that high-performance libraries that use slices, like GMP and the 
many BLAS libraries, use the pointer+length representation, not 
pointer+pointer. They've done a lot of benchmarking on a huge range of 
architectures, with a large range of compilers.


The underlying reason for this, is that almost all CISC instruction sets 
have built-in support for pointer+length. AFAIK nothing has builtin 
support for ptr+ptr.


On x86, you have this wonderful [EAX+8*EBX] addressing mode, that can be 
used on almost every instruction, so that the calculation [addr + 
sz*index] takes ZERO clock cycles when sz is a power of 2.
Generally, when you supply two pointers, the optimizer will try to 
convert it into ptr + offset (where offset isn't bytes, it corresponds 
to D's length).

Re: Go and generic programming on reddit, also touches on D

2011-09-21 Thread Andrei Alexandrescu


On 9/21/11 1:49 PM, Don wrote:

On 19.09.2011 18:12, Andrei Alexandrescu wrote:

On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei


Note that high-performance libraries that use slices, like GMP and the
many BLAS libraries, use the pointer+length representation, not
pointer+pointer. They've done a lot of benchmarking on a huge range of
architectures, with a large range of compilers.

The underlying reason for this, is that almost all CISC instruction sets
have built-in support for pointer+length. AFAIK nothing has builtin
support for ptr+ptr.

On x86, you have this wonderful [EAX+8*EBX] addressing mode, that can be
used on almost every instruction, so that the calculation [addr +
sz*index] takes ZERO clock cycles when sz is a power of 2.
Generally, when you supply two pointers, the optimizer will try to
convert it into ptr + offset (where offset isn't bytes, it corresponds
to D's length).


To all who replied and tested - color me convinced we should keep the 
current state of affairs. Thanks!


Andrei

Re: Go and generic programming on reddit, also touches on D

2011-09-20 Thread Robert Jacques


On Mon, 19 Sep 2011 12:00:17 -0400, Steven Schveighoffer schvei...@yahoo.com 
wrote:


On Mon, 19 Sep 2011 11:46:44 -0400, Robert Jacques sandf...@jhu.edu
wrote:


On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu
seewebsiteforem...@erdani.org wrote:

On 9/19/11 6:25 AM, Steven Schveighoffer wrote:

On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch
wrote:


On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better. Some
operations become higher performance (i.e. popFront), and some become
worse (i.e. length). Most of the others are a wash.


That's where frequency of use comes into play. I'm thinking popFront
would be used most often, and it touches two words.

Andrei



The elephant in the room, of course, is that length now requires a
division and that popFront is actually implemented using slicing:

a = a[i .. $];

which translates into:

auto begin = i;
auto end   = length;
if(end - begin = 0   length - end = 0) {
 ptr = ptr + T.sizeof * begin;
 length  = end - begin;
}

vs

auto length = (ptrBack - ptrFront) / T.sizeof;
auto begin  = ptrFront + T.sizeof * i;
auto end= ptrFront + T.sizeof * length;
if(end - begin = 0   ptrBack - end = 0) {
 ptrFront = begin;
 ptrBack  = end;
}


I would hope something like this would be optimized by the compiler:

auto begin = ptrFront + T.sizeof * i;
if(ptrBack - begin = 0)
ptrFront = begin;

If not, popFront could optimize it.

Certainly, to say popFront is going to perform *worse* using a
dual-pointer representation is false.  Only one calculation is needed.

-Steve



Unfortunately, the compiler isn't going to auto-magically optimize the length 
computation away. (Remember that end is not necessarily equal to ptrBack, for 
an arbitrary set of ptrFront and ptrBack; it is the high level invariants 
associated with arrays which make it true) The compiler could optimize the 
slice operator, but it has to do so at a very high level; i.e. recognizing and 
special casing statements like a[1..$]. (And such optimizations are notoriously 
brittle) And yes, popFront could optimize the slicing operator. But having to 
do so is a pretty good indication that something's wrong with the array design.

Also, I didn't say that popFront, by itself, is going to perform *worse*. I 
said that popFront + empty require 1 extra subtraction and 1 less memory write. 
On today's out of order desktop processors that probably amounts to a wash. 
(And the benchmarks support this, see my other post). And given that the change 
makes computations like empty, length and slicing much worse, I stated that 
dual-pointers would, on the whole, be worse than ptr+length.

Re: Go and generic programming on reddit, also touches on D

2011-09-20 Thread Robert Jacques


On Mon, 19 Sep 2011 12:12:04 -0400, Andrei Alexandrescu 
seewebsiteforem...@erdani.org wrote:


On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei



So I ran Timon Gehr's benchmark on my system 10 times, and although the Old 
method performed the best, the times showed a scheduling jitter of about 0.2 s 
which is an order of magnitude larger than the differences between the runs. 
I've had some experience with removing jitter from benchmarks, so I redid it. 
For those who don't want to read through all the results and code snippets 
below, here's the executive summary: there's no measurable difference between 
the optimum popFront + empty implementations as laid out by Timon Gehr and 
myself. However, when you consider the current slice based implementation of 
popFront (i.e. a[1..$]), things change drastically. For an int[], ptr+ptr was 
3x slower and for an int[3][] it was 8x slower.


Timon Gehr's benchmark:

Old New Diff
4.64242 4.58722 0.0551916
4.60659 4.5482  0.058386
4.53478 4.51753 0.0172519
4.54561 4.51867 0.026935
4.46662 4.4733  -0.00668139
4.47204 4.46893 0.00310762
4.52798 4.54021 -0.0122326
4.5685  4.57667 -0.00816801
4.50138 4.49186 0.00951325
4.49764 4.49422 0.00342912

--

import std.datetime, std.stdio, std.algorithm;

int[1_000_000] a;

void f1(){
for(auto ptr=a.ptr, len=a.length+1; len; ++ptr, --len) {}
}
void f2(){
for(auto b=a.ptr, e=a.ptr+a.length; be; b++){}
}

void main(){
writeln(Always remember to run benchmarks on a single core. Pause);
readln();
double min_f1 = int.max;
double min_f2 = int.max;
foreach(i;0..10_000) {
auto b=benchmark!(f1,f2)(3);
min_f1 = min(min_f1, b[0].to!(seconds,double));
min_f2 = min(min_f2, b[1].to!(seconds,double));
}
writeln(Old\tNew\tDiff);
writeln(min_f1,\t,min_f2,\t,(min_f2-min_f1)/(min_f1)*100.0,%);
}

which gives

Old New Diff
0.00127244  0.00127244  0%

On my system. Which is pretty much as expected. Now onto the more critical test.

import std.datetime, std.stdio, std.algorithm, std.range, std.array;

alias int[3] T;
T[1_000_000] a;

void f1(){
auto a2 = a[];
while(!a2.empty) { a2.popFront(); }
}
void f2(){
auto ptrFront = a.ptr;
auto ptrBack  = a.ptr + a.length;
auto i = 1;
while(!(ptrFront = ptrBack) ) {
auto length = (ptrBack - ptrFront) / T.sizeof;
auto begin  = ptrFront + T.sizeof * i;
auto end= ptrFront + T.sizeof * length;
if(end - begin = 0   ptrBack - end = 0) {
ptrFront = begin;
ptrBack  = end;
}
}
}

void main(){
writeln(Always remember to run benchmarks on a single core. Pause);
readln;
double min_f1 = int.max;
double min_f2 = int.max;
foreach(i;0..10_000) {
auto b=benchmark!(f1,f2)(3);
min_f1 = min(min_f1, b[0].to!(seconds,double));
min_f2 = min(min_f2, b[1].to!(seconds,double));
}
writeln(Old\tNew\tDiff);
writeln(min_f1,\t,min_f2,\t,(min_f2-min_f1)/(min_f1)*100.0,%);
}

Old New Diff
0.00869062  0.0118318   36.145%

Which makes sense given the division. Switching T to an int gives

Old New Diff
0.00868968  0.00502586  -42.1629%

Ah.. Perhaps I should also inline f1:

void f1(){
//auto a2 = a[];
//while(!a2.empty) { a2.popFront(); }

auto i = 1;
auto length = a.length;
auto ptr= a.ptr;
while(!(length == 0)) {
auto end   = length;
auto begin = i;
if(end - begin = 0   length - end = 0) {
ptr = ptr + T.sizeof * begin;
length  = end - begin;
}
}
}

which results in:

Old New Diff
0.00127244  0.00502912  295.233%

And Switching T back to int[3] gives:

Old New Diff
0.00127244  0.01192 836.78%

Re: Go and generic programming on reddit, also touches on D

2011-09-20 Thread Robert Jacques


On Tue, 20 Sep 2011 02:18:12 -0400, Robert Jacques sandf...@jhu.edu wrote:


On Mon, 19 Sep 2011 12:12:04 -0400, Andrei Alexandrescu 
seewebsiteforem...@erdani.org wrote:


On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei



So I ran Timon Gehr's benchmark on my system 10 times...


P.S. All results are compiled using -O -release -inline and run on a single 
core of a Core 2 Duo T7500 2.2 GHz

Re: Go and generic programming on reddit, also touches on D

2011-09-20 Thread Marco Leise

Am 19.09.2011, 19:08 Uhr, schrieb Steven Schveighoffer  
schvei...@yahoo.com:



On Mon, 19 Sep 2011 12:00:46 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:




$ should denote the end point of the aggregate, but it does not have to
be equivalent to length, or even an integer/uint. It should just mean
end.


Point taken. What is the solution for infinite ranges? Should any  
arithmetics on $ just be disallowed?


I think $ should not pose any type restrictions, and whatever type you  
return can have whatever semantics you want.  The only semantic  
restriction is it should mean the end of the container.


In other words, generic code that expects to be able to do:

x[0..$/2]

should only expect this to work when $ is a numeric value, or defines  
division by numbers.


Technically, half of infinity is still infinity, so the type of $ could  
be defined so that any arithmetic operations on it result in itself ;)


-Steve


The documentation of such generic code would make it obvious, that it  
cannot work with infinite ranges.

sort() on a list of all primes for example is obviously not working.
Btw, I don't mind [^..$]. I think it is clear for everyone who ever wrote  
a regex and easy for anyone else.
For natrually 0 based ranges, you'd still use 0. ^ would just be  
obfuscating there.

Re: Go and generic programming on reddit, also touches on D

2011-09-20 Thread Timon Gehr


On 09/20/2011 08:18 AM, Robert Jacques wrote:

On Mon, 19 Sep 2011 12:12:04 -0400, Andrei Alexandrescu
seewebsiteforem...@erdani.org wrote:


On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei



So I ran Timon Gehr's benchmark on my system 10 times, and although the
Old method performed the best, the times showed a scheduling jitter of
about 0.2 s which is an order of magnitude larger than the differences
between the runs. I've had some experience with removing jitter from
benchmarks, so I redid it. For those who don't want to read through all
the results and code snippets below, here's the executive summary:
there's no measurable difference between the optimum popFront + empty
implementations as laid out by Timon Gehr and myself. However, when you
consider the current slice based implementation of popFront (i.e.
a[1..$]), things change drastically. For an int[], ptr+ptr was 3x slower
and for an int[3][] it was 8x slower.


Timon Gehr's benchmark:

Old New Diff
4.64242 4.58722 0.0551916
4.60659 4.5482 0.058386
4.53478 4.51753 0.0172519
4.54561 4.51867 0.026935
4.46662 4.4733 -0.00668139
4.47204 4.46893 0.00310762
4.52798 4.54021 -0.0122326
4.5685 4.57667 -0.00816801
4.50138 4.49186 0.00951325
4.49764 4.49422 0.00342912

--

import std.datetime, std.stdio, std.algorithm;

int[1_000_000] a;

void f1(){
for(auto ptr=a.ptr, len=a.length+1; len; ++ptr, --len) {}
}
void f2(){
for(auto b=a.ptr, e=a.ptr+a.length; be; b++){}
}

void main(){
writeln(Always remember to run benchmarks on a single core. Pause);
readln();
double min_f1 = int.max;
double min_f2 = int.max;
foreach(i;0..10_000) {
auto b=benchmark!(f1,f2)(3);
min_f1 = min(min_f1, b[0].to!(seconds,double));
min_f2 = min(min_f2, b[1].to!(seconds,double));
}
writeln(Old\tNew\tDiff);
writeln(min_f1,\t,min_f2,\t,(min_f2-min_f1)/(min_f1)*100.0,%);
}

which gives

Old New Diff
0.00127244 0.00127244 0%

On my system. Which is pretty much as expected. Now onto the more
critical test.

import std.datetime, std.stdio, std.algorithm, std.range, std.array;

alias int[3] T;
T[1_000_000] a;

void f1(){
auto a2 = a[];
while(!a2.empty) { a2.popFront(); }
}
void f2(){
auto ptrFront = a.ptr;
auto ptrBack = a.ptr + a.length;
auto i = 1;
while(!(ptrFront = ptrBack) ) {
auto length = (ptrBack - ptrFront) / T.sizeof;
auto begin = ptrFront + T.sizeof * i;
auto end = ptrFront + T.sizeof * length;
if(end - begin = 0  ptrBack - end = 0) {
ptrFront = begin;
ptrBack = end;
}
}
}

void main(){
writeln(Always remember to run benchmarks on a single core. Pause);
readln;
double min_f1 = int.max;
double min_f2 = int.max;
foreach(i;0..10_000) {
auto b=benchmark!(f1,f2)(3);
min_f1 = min(min_f1, b[0].to!(seconds,double));
min_f2 = min(min_f2, b[1].to!(seconds,double));
}
writeln(Old\tNew\tDiff);
writeln(min_f1,\t,min_f2,\t,(min_f2-min_f1)/(min_f1)*100.0,%);
}

Old New Diff
0.00869062 0.0118318 36.145%

Which makes sense given the division. Switching T to an int gives

Old New Diff
0.00868968 0.00502586 -42.1629%

Ah.. Perhaps I should also inline f1:

void f1(){
// auto a2 = a[];
// while(!a2.empty) { a2.popFront(); }

auto i = 1;
auto length = a.length;
auto ptr = a.ptr;
while(!(length == 0)) {
auto end = length;
auto begin = i;
if(end - begin = 0  length - end = 0) {
ptr = ptr + T.sizeof * begin;
length = end - begin;
}
}
}

which results in:

Old New Diff
0.00127244 0.00502912 295.233%

And Switching T back to int[3] gives:

Old New Diff
0.00127244 0.01192 836.78%





Thank you for making this more meaningful! I assumed the standard 
library benchmark function would take care of those things. Should it?

Re: Go and generic programming on reddit, also touches on D

2011-09-20 Thread Robert Jacques


On Tue, 20 Sep 2011 14:01:05 -0400, Timon Gehr timon.g...@gmx.ch wrote:
[snip]


Thank you for making this more meaningful! I assumed the standard
library benchmark function would take care of those things. Should it?


Yes and no. Benchmark provides a good way to make a single measurement of a 
function, as for really short functions you do have to loop many times to be 
able to get a reliable reading. However, actual benchmarking requires a) tuning 
the benchmark() call time to about 10-20 ms and b) running benchmark() many 
times, taking the minimum. The idea is that on any given run you could hit a 
context switch, etc. so if you make multiple run, one will get lucky and not be 
interrupted. Worse, if a core switch happens between StopWatch start and end, 
the number of ticks returned is random. Hence, the comment to manually limit 
execution to a single core. So, it might be nice if benchmark() also took a 
second parameter, denoting the number of times to benchmark the function and 
had some better documentation on how to do good benchmarking.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/




2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of 
containers. The huge problem with slices is dogfood-related - they are  
magic because the language features proposed to programmers were not 
enough for expressing a simple abstraction. Reserving special features 
for the language is a terrible way to go about programming language design.


Don't D arrays do a similar thing? They are not templates, yet work with 
generic element types.


Afaics, improving the language to the point were dynamic array-like 
structures could be implemented in the library without resulting in a 
bloated executable would be quite involved.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread bearophile

Andrei Alexandrescu:

In that thread you have said:
http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/c2krft0

 I'm interested in furthering D's grok of dependent types.

This blog post shows some basic nice examples of dependent types in ATS 
language:
http://www.bluishcoder.co.nz/2010/09/01/dependent-types-in-ats.html

Some other notes:
http://leonardo-m.livejournal.com/98077.html

But those are toys. For practical programming this alternative version is close 
to what you can do in D, and keep both efficiency and sanity:
http://rosettacode.org/wiki/Matrix_multiplication#Alternative_version

Bye,
bearophile

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 07:16 PM, Peter Alexander wrote:

On 18/09/11 5:08 PM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/






2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special features
for the language is a terrible way to go about programming language
design.

Don't D arrays do a similar thing? They are not templates, yet work with
generic element types.


Yes. As I understand, Andrei prefers things in libraries and Walter
prefers things built in to the compiler (obviously an
oversimplification, but I believe that's the general way they 'lean').

There's advantages to both. Being implementable in a library means that
they can easily be swapped out or modified to work with other code, but
being built-in (magic, as Andrei puts it) means that the compiler has
greater awareness of them and can do better optimizations, give better
errors etc.

Of course, there are ways of extending the language to provide better
errors and allow better optimizations (e.g. 'pure' in D), but as a
purely practical matter, it's easier if they are just built-in.



Well, I think arrays should be built-in. What I was implying was that D, 
not entirely unlike Go, also has some magic. You can get almost the same 
with templates, so it is way better in that aspect than Go, but it is 
still there.





Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


I don't think you'd get much bloat in D by implementing dynamic arrays
with templates.  Remember, the built-in arrays *are* mostly implemented
in D: https://github.com/D-Programming-Language/druntime/tree/master/src/rt


Those work like generics, not like templates. Making them templates 
would duplicate all the runtime functions that process arrays for every 
element type it is instantiated with. And I am sure that would add some 
bloat. D has no generics, just templates. But built-in arrays work like 
generics.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Peter Alexander


On 18/09/11 5:08 PM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/





2 hours ago, Andrei Alexandrescu wrote:
  The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special features
for the language is a terrible way to go about programming language design.

Don't D arrays do a similar thing? They are not templates, yet work with
generic element types.


Yes. As I understand, Andrei prefers things in libraries and Walter 
prefers things built in to the compiler (obviously an 
oversimplification, but I believe that's the general way they 'lean').


There's advantages to both. Being implementable in a library means that 
they can easily be swapped out or modified to work with other code, but 
being built-in (magic, as Andrei puts it) means that the compiler has 
greater awareness of them and can do better optimizations, give better 
errors etc.


Of course, there are ways of extending the language to provide better 
errors and allow better optimizations (e.g. 'pure' in D), but as a 
purely practical matter, it's easier if they are just built-in.




Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


I don't think you'd get much bloat in D by implementing dynamic arrays 
with templates. Remember, the built-in arrays *are* mostly implemented 
in D: https://github.com/D-Programming-Language/druntime/tree/master/src/rt

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Peter Alexander


On 18/09/11 6:55 PM, Timon Gehr wrote:

On 09/18/2011 07:16 PM, Peter Alexander wrote:

On 18/09/11 5:08 PM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/







2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special features
for the language is a terrible way to go about programming language
design.

Don't D arrays do a similar thing? They are not templates, yet work with
generic element types.


Yes. As I understand, Andrei prefers things in libraries and Walter
prefers things built in to the compiler (obviously an
oversimplification, but I believe that's the general way they 'lean').

There's advantages to both. Being implementable in a library means that
they can easily be swapped out or modified to work with other code, but
being built-in (magic, as Andrei puts it) means that the compiler has
greater awareness of them and can do better optimizations, give better
errors etc.

Of course, there are ways of extending the language to provide better
errors and allow better optimizations (e.g. 'pure' in D), but as a
purely practical matter, it's easier if they are just built-in.



Well, I think arrays should be built-in. What I was implying was that D,
not entirely unlike Go, also has some magic. You can get almost the same
with templates, so it is way better in that aspect than Go, but it is
still there.


Yes, you are right. D does have some magic. Every language has to draw 
a line between library code and built-in code. The only way a language 
can be purely library is if the language is just the source code for 
the compiler :-)




Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


I don't think you'd get much bloat in D by implementing dynamic arrays
with templates. Remember, the built-in arrays *are* mostly implemented
in D:
https://github.com/D-Programming-Language/druntime/tree/master/src/rt


Those work like generics, not like templates. Making them templates
would duplicate all the runtime functions that process arrays for every
element type it is instantiated with. And I am sure that would add some
bloat. D has no generics, just templates. But built-in arrays work like
generics.


Yeah, they use runtime polymorphism like generics instead of compile 
time polymorphism.


I don't believe there's any way round this -- you can't solve it with a 
better language design. If you want to handle different types with a 
single interface then either you need to generate code for each type 
(templates) or use runtime polymorphism (generics). Of course, you can 
improve both to minimize the overhead, but there's a fundamental memory 
v.s. performance compromise.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Andrei Alexandrescu


On 9/18/11 11:08 AM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/





2 hours ago, Andrei Alexandrescu wrote:
  The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special features
for the language is a terrible way to go about programming language design.

Don't D arrays do a similar thing? They are not templates, yet work with
generic element types.

Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


That's an incorrect view of my statement. Yes, D's slices are built-in 
but the language offers facilities for defining any other parameterized 
types that are just as powerful. The only advantages slices have left 
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax, 
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray 
language bugs such as '$'.


Walter and I have long discussed that T[] should use an actual type 
object.Slice!T as back-end. That would allow us to e.g. switch from the 
pointer+length representation to the arguably better pointer+pointer 
representation with ease. The main difficulty with object.Slice is CTFE 
- the compiler can manipulate a T[] during compilation because it 
understands its invariant. The same invariant would be difficult to 
infer from a user defined type.



Andrei

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

On 9/18/11 11:08 AM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/






2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special features
for the language is a terrible way to go about programming language
design.

Don't D arrays do a similar thing? They are not templates, yet work with
generic element types.

Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


That's an incorrect view of my statement.


I don't think so. The special feature we are talking about are 
generics. I am certainly not saying that your statement implies in any 
way that arrays should be a library feature.



Yes, D's slices are built-in
but the language offers facilities for defining any other parameterized
types that are just as powerful.


Even way more powerful. But with great power comes great responsibility, 
eg some binary file bloat, along with the undecidability of the 
well-formedness of a particular template.


I am perfectly fine with arrays being built in. I am also fine with some 
magic, because, as you say templates are good enough to replace the 
magic. But still I _do_ think that there is a tiny bit of magic.



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should be 
made accessible to user defined types. Either through opDollar or the 
rewrite a[foo($)] = a[foo(a.length)]. What makes it qualify as a stray 
language bug to you?




Walter and I have long discussed that T[] should use an actual type
object.Slice!T as back-end.


That would make the magic go away indeed, but then you'll get the bloat.


That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?

Pointer+Length:
b=a[i] = b:=*(a.ptr+i); (if lhs typechecks)
b=a[i..j] = b.ptr:=a.ptr+i, b.length=j-i;
b=a.length = b:=a.length
bounds check a[i]: cmp i,$; jae label;


Pointer+Pointer
b=a[i] = b:=*(a.begin+i) (if lhs typechecks)
b=a[i..j] = b.begin:=a.begin+i; b.end=a.begin+j
b=a.length = b:=a.end-a.begin
bounds check a[i]: cmp i,begin; jb label; cmp i,end; jae label;
(not the only way to do it, you could also increase register pressure by 
first computing the length and then doing the other thing, but if the 
array ptr and length is in fact in registers, that loses)



Therefore, in my eyes Pointer+Pointer loses badly, because getting the 
length/bounds checking requires additional machine instructions.



The main difficulty with object.Slice is CTFE
- the compiler can manipulate a T[] during compilation because it
understands its invariant.  The same invariant would be difficult to
infer from a user defined type.



Basically, all that would have to be done would be to make (at least 
parts) of core.memory.GC CTFE-able. Or to treat the template as special 
all along, because basic efficiency concerns dictate that some amount of 
built-in-ness is great for such a basic data structure. Imho they are 
not built in enough into DMD right now.



As a side note, as soon as you use std.array.appender, the CTFE-ability 
goes away anyways. And most of the array-manipulating code in Phobos 
does just that. Are there any plans to make std.array.appender 
CTFE-able? I guess now that we have if(__ctfe){} that would be trivial, 
and the benefit would be extremely large, because many Phobos functions 
would become CTFE-able at once.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 08:17 PM, Peter Alexander wrote:

On 18/09/11 6:55 PM, Timon Gehr wrote:

On 09/18/2011 07:16 PM, Peter Alexander wrote:

On 18/09/11 5:08 PM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/








2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they
are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special
features
for the language is a terrible way to go about programming language
design.

Don't D arrays do a similar thing? They are not templates, yet work
with
generic element types.


Yes. As I understand, Andrei prefers things in libraries and Walter
prefers things built in to the compiler (obviously an
oversimplification, but I believe that's the general way they 'lean').

There's advantages to both. Being implementable in a library means that
they can easily be swapped out or modified to work with other code, but
being built-in (magic, as Andrei puts it) means that the compiler has
greater awareness of them and can do better optimizations, give better
errors etc.

Of course, there are ways of extending the language to provide better
errors and allow better optimizations (e.g. 'pure' in D), but as a
purely practical matter, it's easier if they are just built-in.



Well, I think arrays should be built-in. What I was implying was that D,
not entirely unlike Go, also has some magic. You can get almost the same
with templates, so it is way better in that aspect than Go, but it is
still there.


Yes, you are right. D does have some magic. Every language has to draw
a line between library code and built-in code. The only way a language
can be purely library is if the language is just the source code for
the compiler :-)



It is not at all about syntax sugar, or whether or not the functionality 
comes in a library or as a built-in. I am merely feature-oriented here, 
and D does not have generics, but arrays work like generics. I am not 
even saying that is a bad thing. :)





Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


I don't think you'd get much bloat in D by implementing dynamic arrays
with templates. Remember, the built-in arrays *are* mostly implemented
in D:
https://github.com/D-Programming-Language/druntime/tree/master/src/rt


Those work like generics, not like templates. Making them templates
would duplicate all the runtime functions that process arrays for every
element type it is instantiated with. And I am sure that would add some
bloat. D has no generics, just templates. But built-in arrays work like
generics.


Yeah, they use runtime polymorphism like generics instead of compile
time polymorphism.

I don't believe there's any way round this -- you can't solve it with a
better language design. If you want to handle different types with a
single interface then either you need to generate code for each type
(templates) or use runtime polymorphism (generics). Of course, you can
improve both to minimize the overhead, but there's a fundamental memory
v.s. performance compromise.


Consider eg this:

T foo(T: Object)(T x){
return x;
}

Generics clearly win on that one, because it does not require code 
duplication or runtime polymorphism (as often is true in pure OO design 
programs for instance!). The same is the case for arrays. They don't 
need runtime polymorphism (except for stray language bugs such as 
array.sort).

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Nick Sabalausky

Timon Gehr timon.g...@gmx.ch wrote in message 
news:j55h4f$1ia5$1...@digitalmars.com...

 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be 
 made accessible to user defined types. Either through opDollar or the 
 rewrite a[foo($)] = a[foo(a.length)]. What makes it qualify as a stray 
 language bug to you?


He's saying that one of the few advantages slices have left over 
user-defined types is that, for slices, $ actually works. The bug is that it 
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Andrei Alexandrescu


On 9/18/11 2:34 PM, Timon Gehr wrote:

On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

On 9/18/11 11:08 AM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/


2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special features
for the language is a terrible way to go about programming language
design.

Don't D arrays do a similar thing? They are not templates, yet work with
generic element types.

Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


That's an incorrect view of my statement.


I don't think so.


You're free to think anything, but I know what I said and your view of 
it is incorrect.


Andrei

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 09:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch  wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should be
made accessible to user defined types. Either through opDollar or the
rewrite a[foo($)] =  a[foo(a.length)]. What makes it qualify as a stray
language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is that it
doesn't work for user-defined types.


OK, thanks for clarifying.
(I would have called that an implementation bug though)



FWIW, I like the rewrite idea far better than opDollar.



Me too. It fits better in the picture of D operator overloading.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Andrei Alexandrescu


On 9/18/11 2:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch  wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should be
made accessible to user defined types. Either through opDollar or the
rewrite a[foo($)] =  a[foo(a.length)]. What makes it qualify as a stray
language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is that it
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.


opDollar is more powerful because it can be made to work with infinite 
ranges.


Andrei

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:

On 9/18/11 2:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should be
made accessible to user defined types. Either through opDollar or the
rewrite a[foo($)] = a[foo(a.length)]. What makes it qualify as a stray
language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is
that it
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.


opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


What would it return?

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread dsimcha


On 9/18/2011 4:09 PM, Andrei Alexandrescu wrote:

opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


Yes, this is important.  IMHO, though, the default behavior of the $ 
operator should be to call range.length if it exists and opDollar isn't 
explicitly overloaded.  This would save a lot of boilerplate.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Andrei Alexandrescu


On 9/18/11 3:19 PM, dsimcha wrote:

On 9/18/2011 4:09 PM, Andrei Alexandrescu wrote:

opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


Yes, this is important. IMHO, though, the default behavior of the $
operator should be to call range.length if it exists and opDollar isn't
explicitly overloaded. This would save a lot of boilerplate.


struct MyRange
{
  ...
  alias length opDollar;
}

I do agree that most of the time this is what you want anyway, so that 
line would occur a lot of times...



Andrei

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread dsimcha


On 9/18/2011 4:16 PM, Timon Gehr wrote:

What would it return?


A dummy type, e.g.:

struct Repeat(T) {
T val;
T front() @property { return val; }
void popFront() {}
enum empty = false;

static struct Dollar {}
Dollar opDollar() {
return Dollar.init;
}


auto opSlice(size_t lower, Dollar dollar) { return this; }
}

void main() {
auto r = Repeat!int(1);
auto r2 = r[666..$];
}

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 10:06 PM, Andrei Alexandrescu wrote:

On 9/18/11 2:34 PM, Timon Gehr wrote:

On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

On 9/18/11 11:08 AM, Timon Gehr wrote:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/



2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of
containers. The huge problem with slices is dogfood-related - they
are 
magic because the language features proposed to programmers were not
enough for expressing a simple abstraction. Reserving special
features
for the language is a terrible way to go about programming language
design.

Don't D arrays do a similar thing? They are not templates, yet work
with
generic element types.

Afaics, improving the language to the point were dynamic array-like
structures could be implemented in the library without resulting in a
bloated executable would be quite involved.


That's an incorrect view of my statement.


I don't think so.


You're free to think anything, but I know what I said and your view of
it is incorrect.

Andrei


Well, I'd say there was a misunderstanding. Understanding someone who is 
not explaining himself is particularly difficult, so it is quite 
possible I am the one who misunderstood. But if you lack the time or 
motivation to carry out this discussion, that is fine of course.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/18/2011 10:21 PM, dsimcha wrote:

On 9/18/2011 4:16 PM, Timon Gehr wrote:

What would it return?


A dummy type, e.g.:

struct Repeat(T) {
T val;
T front() @property { return val; }
void popFront() {}
enum empty = false;

static struct Dollar {}
Dollar opDollar() {
return Dollar.init;
}


auto opSlice(size_t lower, Dollar dollar) { return this; }
}

void main() {
auto r = Repeat!int(1);
auto r2 = r[666..$];
}


Ok, but what about

void main(){
auto r = Repeat!int(1);
auto r2 = r[666..$-1]; // ok, return entire range
auto r3 = r[666..$/2]; // ditto
auto r4 = r[666..$100?$:100]; // ???
// those could be made illegal:
auto r5 = r[666..$*0]; // ???
auto r6 = r[666..$-$]; // ???
auto r7 = r[666..2*$-$]; // ???
auto r8 = r[666..($*$)%4==3?667:668]; // ???
}

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Simen Kjaeraas

On Sun, 18 Sep 2011 22:09:19 +0200, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:



On 9/18/11 2:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch  wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should be
made accessible to user defined types. Either through opDollar or the
rewrite a[foo($)] =  a[foo(a.length)]. What makes it qualify as a  
stray

language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is  
that it

doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.


opDollar is more powerful because it can be made to work with infinite  
ranges.


Also, multi-dimensional structures:

RectangularArray!int a = [[1,2,3], [4,5,6,7]];
int b = a[$-1, $-2];

Those are obviously different $s.

--
  Simen

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Simen Kjaeraas


On Sun, 18 Sep 2011 22:37:03 +0200, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 10:21 PM, dsimcha wrote:

On 9/18/2011 4:16 PM, Timon Gehr wrote:

What would it return?


A dummy type, e.g.:

struct Repeat(T) {
T val;
T front() @property { return val; }
void popFront() {}
enum empty = false;

static struct Dollar {}
Dollar opDollar() {
return Dollar.init;
}


auto opSlice(size_t lower, Dollar dollar) { return this; }
}

void main() {
auto r = Repeat!int(1);
auto r2 = r[666..$];
}


Ok, but what about

void main(){
 auto r = Repeat!int(1);
 auto r2 = r[666..$-1]; // ok, return entire range
 auto r3 = r[666..$/2]; // ditto
 auto r4 = r[666..$100?$:100]; // ???


Compile-time error. struct Dollar does not have an overloaded 
operator. Or you make one that always returns false, in which case
things are a bit more complex (Dollar and 100 do not have a common
type).



 // those could be made illegal:
 auto r5 = r[666..$*0]; // ???
 auto r6 = r[666..$-$]; // ???
 auto r7 = r[666..2*$-$]; // ???
 auto r8 = r[666..($*$)%4==3?667:668]; // ???
}


They already are. The supplied Dollar struct does not have overloaded
operators for any of those operations. If you implement them, you can
have them do whatever you feel is right.

--
  Simen

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread dsimcha


On 9/18/2011 4:32 PM, Andrei Alexandrescu wrote:

struct MyRange
{
...
alias length opDollar;
}

I do agree that most of the time this is what you want anyway, so that
line would occur a lot of times...


Andrei


The problem with this is that everything in std.algorithm and std.range 
would have to be manually changed.  I don't feel like doing this myself 
or waiting for someone else to get around to it.  It just makes a heck 
of a lot more sense to specify it in one place, in the compiler.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Simen Kjaeraas

On Sun, 18 Sep 2011 22:32:44 +0200, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:



On 9/18/11 3:19 PM, dsimcha wrote:

On 9/18/2011 4:09 PM, Andrei Alexandrescu wrote:

opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


Yes, this is important. IMHO, though, the default behavior of the $
operator should be to call range.length if it exists and opDollar isn't
explicitly overloaded. This would save a lot of boilerplate.


struct MyRange
{
   ...
   alias length opDollar;
}

I do agree that most of the time this is what you want anyway, so that  
line would occur a lot of times...


This works if opDollar is expected to be a niladic function. For multi-
dimensional structures, it would have to be monadic.

--
  Simen

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer


On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better.  Some  
operations become higher performance (i.e. popFront), and some become  
worse (i.e. length).  Most of the others are a wash.


FWIW, you can avoid bloat by converting to runtime calls when templating  
is not necessary.  For example, append could just be a template shell:


opBinary(string op : ~=)(T other)
{
   return _d_appendTi(...) // don't remember the parameter types/order
}

In any case, before this could happen, we'd need to implement UFCS for  
custom types, and we'd need a solution on how to specify const(T)[] using  
a template (that implicitly casts from T[]).


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/19/2011 01:25 PM, Steven Schveighoffer wrote:

On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better. Some
operations become higher performance (i.e. popFront), and some become
worse (i.e. length). Most of the others are a wash.

FWIW, you can avoid bloat by converting to runtime calls when templating
is not necessary. For example, append could just be a template shell:

opBinary(string op : ~=)(T other)
{
return _d_appendTi(...) // don't remember the parameter types/order
}



Ok, but I'd like the opBinary to not even be put into the object file. I 
believe an @inline annotation that guarantees inlining or compilation 
failure if it is impossible would be of great use for this and other 
applications.



In any case, before this could happen, we'd need to implement UFCS for
custom types,


First of all, the design of UFCS for custom types has to be settled on. 
Should it be implicit like the current ad-hoc way for arrays or explicit 
(eg per a 'this' storage class) ? I would be in favor of an explicit 
solution.



and we'd need a solution on how to specify const(T)[]
using a template (that implicitly casts from T[]).



Even more than that, templates would need to be able to specify stuff like

// the fact that this currently compiles is a quite severe bug that 
compromises type/memory safety, it would have to be disallowed without 
an explicit cast:


class C{}
class D: C{}
class E: C{}

void main(){
D[] d = [new D];
C[] c = d;
c[0] = new E;
assert(typeid(d[0]) == typeid(E)); // a stray E in a D[]!
}

// this on the other hand is perfectly fine:
void main(){
D[] d = [new D];
const(C)[] c = d;
// no possibility to screw up d. (no possibility to change the 
individual elements per method calls either)

}

As I pointed out in my initial post, I think the language changes to 
make something that works like a dynamic array implementable in a 
library would be quite involved, because there _is_ some nontrivial 
magic going on for them. Similar claims hold for pointers.


Probably having something ad-hoc, like an opImplicitCast, would work 
out, but it would share some functionality with alias this...


Still, imho the best solution is to keep dynamic arrays built in, 
whether or not their special features are made available to the programmer.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer


On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:

On 9/18/11 2:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should be
made accessible to user defined types. Either through opDollar or the
rewrite a[foo($)] = a[foo(a.length)]. What makes it qualify as a  
stray

language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is
that it
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.


opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


What would it return?


Not all types that have an end also support .length, or use sequential  
integers for indexes.


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/19/2011 01:15 PM, Steven Schveighoffer wrote:

On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:

On 9/18/11 2:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should be
made accessible to user defined types. Either through opDollar or the
rewrite a[foo($)] = a[foo(a.length)]. What makes it qualify as a
stray
language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is
that it
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.


opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


What would it return?


Not all types that have an end also support .length, or use sequential
integers for indexes.

-Steve


Yes, but D has already gone the 'being inflexible' path for the 
comparison/negation/logical/ternary operators. Isn't this a similar 
thing? I don't think infinite ranges work well with restrictive operator 
overloading. eg a[0..100$?100:$]. They'd need some additional language 
support or they'll possibly blow up randomly on generic code.


Btw, do you know of an example of a data structure that can be indexed 
continuously and has the notion of an end, but no length?

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer


On Mon, 19 Sep 2011 09:49:14 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 01:15 PM, Steven Schveighoffer wrote:
On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr timon.g...@gmx.ch  
wrote:



On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:

On 9/18/11 2:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal  
syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of  
stray

language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really should  
be
made accessible to user defined types. Either through opDollar or  
the

rewrite a[foo($)] = a[foo(a.length)]. What makes it qualify as a
stray
language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is
that it
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.


opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


What would it return?


Not all types that have an end also support .length, or use sequential
integers for indexes.

-Steve


Yes, but D has already gone the 'being inflexible' path for the  
comparison/negation/logical/ternary operators. Isn't this a similar  
thing? I don't think infinite ranges work well with restrictive operator  
overloading. eg a[0..100$?100:$]. They'd need some additional language  
support or they'll possibly blow up randomly on generic code.


Btw, do you know of an example of a data structure that can be indexed  
continuously and has the notion of an end, but no length?


I was specifically thinking of red-black tree, which has a distinct end,  
but it's index is not necessarily length (or any value for that matter).


If you look at dcollections, you can slice a TreeMap using indexes or  
cursors.  However, to index through the last element in the tree, you use  
the special cursor .end:


auto range = mytree[hello .. mytree.end]; // get all the elements in the  
tree which are = hello


Here, mytree[hello .. mytree.length] would simply not compile.

In addition, a tree with a uint index would silently do the *wrong* thing:

auto set = new TreeSet!uint(1, 3, 5, 7, 9);
assert(set.length == 5);
auto range = set[1..set.length];

assert(walkLength(range) == 2); // probably not what you expected

So I think it's not only limiting to require x.length to be $, it's very  
wrong in some cases.


Also, think of a string.  It has no length (well technically, it does, but  
it's not the number of elements), but it has a distinct end point.  A  
properly written string type would fail to compile if $ was s.length.


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:

On Mon, 19 Sep 2011 09:49:14 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 01:15 PM, Steven Schveighoffer wrote:

On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr timon.g...@gmx.ch
wrote:


On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:

On 9/18/11 2:46 PM, Nick Sabalausky wrote:

Timon Gehrtimon.g...@gmx.ch wrote in message
news:j55h4f$1ia5$1...@digitalmars.com...



The only advantages slices have left
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal
syntax,
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of
stray
language bugs such as '$'.


I am thankful for $, as it is a great feature, and it really
should be
made accessible to user defined types. Either through opDollar or
the
rewrite a[foo($)] = a[foo(a.length)]. What makes it qualify as a
stray
language bug to you?



He's saying that one of the few advantages slices have left over
user-defined types is that, for slices, $ actually works. The bug is
that it
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.


opDollar is more powerful because it can be made to work with infinite
ranges.

Andrei


What would it return?


Not all types that have an end also support .length, or use sequential
integers for indexes.

-Steve


Yes, but D has already gone the 'being inflexible' path for the
comparison/negation/logical/ternary operators. Isn't this a similar
thing? I don't think infinite ranges work well with restrictive
operator overloading. eg a[0..100$?100:$]. They'd need some
additional language support or they'll possibly blow up randomly on
generic code.

Btw, do you know of an example of a data structure that can be indexed
continuously and has the notion of an end, but no length?


I was specifically thinking of red-black tree, which has a distinct end,
but it's index is not necessarily length (or any value for that matter).

If you look at dcollections, you can slice a TreeMap using indexes or
cursors. However, to index through the last element in the tree, you use
the special cursor .end:

auto range = mytree[hello .. mytree.end]; // get all the elements in
the tree which are = hello

Here, mytree[hello .. mytree.length] would simply not compile.

In addition, a tree with a uint index would silently do the *wrong* thing:

auto set = new TreeSet!uint(1, 3, 5, 7, 9);
assert(set.length == 5);
auto range = set[1..set.length];

assert(walkLength(range) == 2); // probably not what you expected


Ok, makes sense. Thanks.



So I think it's not only limiting to require x.length to be $, it's very
wrong in some cases.

Also, think of a string. It has no length (well technically, it does,
but it's not the number of elements), but it has a distinct end point. A
properly written string type would fail to compile if $ was s.length.



But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer


On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:


So I think it's not only limiting to require x.length to be $, it's very
wrong in some cases.

Also, think of a string. It has no length (well technically, it does,
but it's not the number of elements), but it has a distinct end point. A
properly written string type would fail to compile if $ was s.length.



But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?



That's half the string in code units, not code points.

If string was properly implemented, this would fail to compile.  $ is not  
the length of the string range (meaning the number of code points).  The  
given slice operation might actually create an invalid string.


It's tricky, because you want fast slicing, but only certain slices are  
valid.  I once created a string type that used a char[] as its backing,  
but actually implemented the limitations that std.range tries to enforce  
(but cannot).  It's somewhat of a compromise.  If $ was mapped to  
s.length, it would fail to compile, but I'm not sure what I *would* use  
for $.  It actually might be the code units, which would not make the  
above line invalid.


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/19/2011 04:08 PM, Andrei Alexandrescu wrote:

On 9/19/11 6:25 AM, Steven Schveighoffer wrote:

On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better. Some
operations become higher performance (i.e. popFront), and some become
worse (i.e. length). Most of the others are a wash.


That's where frequency of use comes into play. I'm thinking popFront
would be used most often, and it touches two words.

Andrei


Normally, each popFront comes with an accompanying empty, and the 
comparison against 0 is faster after a decrement than the comparison of 
two arbitrary pointer values.


Now a benchmark:

import std.datetime, std.stdio;

int[10] a;

void f1(){
for(auto ptr=a.ptr, len=a.length+1; len; ++ptr, --len){}
}
void f2(){
for(auto b=a.ptr, e=a.ptr+a.length; b!=e; b++){}
}

void main(){
auto b=benchmark!(f1,f2)(10);
writeln(b[0].to!(seconds,double), , b[1].to!(seconds,double));
}

On my machine: (-O -release -inline)
4.00256 4.00099

The difference is inconceivable on my oldish Core 2 Duo processor. Yet 
there is a reproducible difference that indicates that you were right 
about the pointer-pointer representation being slightly faster for that 
use case.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Andrei Alexandrescu


On 9/19/11 6:25 AM, Steven Schveighoffer wrote:

On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better. Some
operations become higher performance (i.e. popFront), and some become
worse (i.e. length). Most of the others are a wash.


That's where frequency of use comes into play. I'm thinking popFront 
would be used most often, and it touches two words.


Andrei

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer

On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:



On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch  
wrote:



On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better. Some
operations become higher performance (i.e. popFront), and some become
worse (i.e. length). Most of the others are a wash.


That's where frequency of use comes into play. I'm thinking popFront  
would be used most often, and it touches two words.


I'm not so sure.  It's very difficult to prove this is the case (either  
way).


Already foreach does not use the range primitives for arrays, which is  
probably the most frequent user of popFront (for ranges other than arrays  
of course).


However, length is used frequently in slicing, or passing to underlying C  
or OS functions which require ptr + length.  Maybe the compiler can  
optimize out the calculations of length when they are just getting  
translated right back to pointers.  For example a = a[1..$] shouldn't have  
to calculate a.length, it should just be a.ptr += 5.  I also think it  
would be nice to have direct access to the end pointer.  Likewise,  
a.length -= 5 shouldn't have to calculate the original length, then  
subtract 5, then pass that to the setlength function.


I'm not sure if the runtime code would be better or worse if we used 2  
pointers instead of ptr/length.  for sure that is where the bulk of the  
changes would need to be made.


The other thing that I don't like is it's much easier to construct an  
invalid slice with 2 pointers than it is with a ptr/length.


I suspect my opinion doesn't matter much for what we all think is used  
more frequently, but there are bigger issues to solve before we could have  
a library-based array type.  It might be best to leave the array as a  
magic type, but just change the internal representation in the  
compiler/runtime.


I know that printf was abused at some point for writing slices by relying  
on the fact that the order of length/ptr happened to be the same when you  
specified the length of a string as a parameter to printf, this would  
definitely break any code that relies on that.  Not that we need to care,  
but it's something to think about.


And actually, false pointers would be cut down (both pointers always point  
to the referenced block), which might be worth all the trouble anyways :)


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer

On Mon, 19 Sep 2011 10:38:21 -0400, Steven Schveighoffer  
schvei...@yahoo.com wrote:


For example a = a[1..$] shouldn't have to calculate a.length, it should  
just be a.ptr += 5.


oops, victim of half-editing there :)

Meant to say a = a[5..$]

-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:

On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:


So I think it's not only limiting to require x.length to be $, it's very
wrong in some cases.

Also, think of a string. It has no length (well technically, it does,
but it's not the number of elements), but it has a distinct end point. A
properly written string type would fail to compile if $ was s.length.



But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?



That's half the string in code units, not code points.

If string was properly implemented, this would fail to compile. $ is not
the length of the string range (meaning the number of code points). The
given slice operation might actually create an invalid string.


Programmers have to be aware of that if they want efficient code that 
deals with unicode. I think having random access to the code units and 
being able to iterate per code point is fine, because it gives you the 
best of both worlds. Manually decoding a string and slicing it at 
positions that were remembered to be safe has been good enough for me, 
at least it is efficient.




It's tricky, because you want fast slicing, but only certain slices are
valid. I once created a string type that used a char[] as its backing,
but actually implemented the limitations that std.range tries to enforce
(but cannot). It's somewhat of a compromise. If $ was mapped to
s.length, it would fail to compile, but I'm not sure what I *would* use
for $. It actually might be the code units, which would not make the
above line invalid.

-Steve


Well it would have to be consistent for a string type that does it 
right . Either the string is indexed with units or it is indexed with 
code points, and the other option should be provided. Dollar should just 
be the length of what is used for indexing/slicing here, and having that 
be different from length makes for a somewhat awkward interface imho.


Btw, D double-quoted string literals let you define invalid byte 
sequences with eg. octal literals:

string s=\377;

What would be use cases for that? Shouldn't \377 map to the extended 
ascii charset instead and yield the same code point that would be given 
in C dq strings?

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Kagamin

Steven Schveighoffer Wrote:

 auto set = new TreeSet!uint(1, 3, 5, 7, 9);
 assert(set.length == 5);
 auto range = set[1..set.length];
 
 assert(walkLength(range) == 2); // probably not what you expected

Where you got that 1?
How to slice it from begin to 7?

Looks like an operator overload abuse. Slicing is an array's feature. If a 
collection doesn't support array interface, it's questionable whether it should 
support slicing as it's defined in D. By stretching slicing to non-array 
collections you break its semantics. Good for voodoo, bad for engineering.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer


On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr timon.g...@gmx.ch  
wrote:



On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:


So I think it's not only limiting to require x.length to be $, it's  
very

wrong in some cases.

Also, think of a string. It has no length (well technically, it does,
but it's not the number of elements), but it has a distinct end  
point. A

properly written string type would fail to compile if $ was s.length.



But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?



That's half the string in code units, not code points.

If string was properly implemented, this would fail to compile. $ is not
the length of the string range (meaning the number of code points). The
given slice operation might actually create an invalid string.


Programmers have to be aware of that if they want efficient code that  
deals with unicode. I think having random access to the code units and  
being able to iterate per code point is fine, because it gives you the  
best of both worlds. Manually decoding a string and slicing it at  
positions that were remembered to be safe has been good enough for me,  
at least it is efficient.


I find the same.  I don't think I've ever dealt with arbitrary math  
operations to do slices of strings like the above.  I only slice a string  
when I know the bounds are sane.


Like I said, it's a compromise.  The right thing to do is probably not  
even allow code-unit access via index (some have even argued that  
code-point slicing is too dangerous, because you can split a grapheme,  
leaving a valid, but incorrect slice of the original).



It's tricky, because you want fast slicing, but only certain slices are
valid. I once created a string type that used a char[] as its backing,
but actually implemented the limitations that std.range tries to enforce
(but cannot). It's somewhat of a compromise. If $ was mapped to
s.length, it would fail to compile, but I'm not sure what I *would* use
for $. It actually might be the code units, which would not make the
above line invalid.

-Steve


Well it would have to be consistent for a string type that does it  
right . Either the string is indexed with units or it is indexed with  
code points, and the other option should be provided. Dollar should just  
be the length of what is used for indexing/slicing here, and having that  
be different from length makes for a somewhat awkward interface imho.


Except we are defining a string as a *range* and a range's length is  
defined as the number of elements.


Note that hasLength!string evaluates to false in std.range.'

$ should denote the end point of the aggregate, but it does not have to be  
equivalent to length, or even an integer/uint.  It should just mean end.


I also proposed a while back to have ^ denote the beginning (similar to  
regex) of an aggregate for aggregates that don't use 0 as the beginning,  
but people didn't like it :)


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Andrei Alexandrescu


On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Timon Gehr


On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:

On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:

On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr timon.g...@gmx.ch
wrote:


On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:


So I think it's not only limiting to require x.length to be $, it's
very
wrong in some cases.

Also, think of a string. It has no length (well technically, it does,
but it's not the number of elements), but it has a distinct end
point. A
properly written string type would fail to compile if $ was s.length.



But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?



That's half the string in code units, not code points.

If string was properly implemented, this would fail to compile. $ is not
the length of the string range (meaning the number of code points). The
given slice operation might actually create an invalid string.


Programmers have to be aware of that if they want efficient code that
deals with unicode. I think having random access to the code units and
being able to iterate per code point is fine, because it gives you the
best of both worlds. Manually decoding a string and slicing it at
positions that were remembered to be safe has been good enough for me,
at least it is efficient.


I find the same. I don't think I've ever dealt with arbitrary math
operations to do slices of strings like the above. I only slice a string
when I know the bounds are sane.

Like I said, it's a compromise. The right thing to do is probably not
even allow code-unit access via index (some have even argued that
code-point slicing is too dangerous, because you can split a grapheme,
leaving a valid, but incorrect slice of the original).


It's tricky, because you want fast slicing, but only certain slices are
valid. I once created a string type that used a char[] as its backing,
but actually implemented the limitations that std.range tries to enforce
(but cannot). It's somewhat of a compromise. If $ was mapped to
s.length, it would fail to compile, but I'm not sure what I *would* use
for $. It actually might be the code units, which would not make the
above line invalid.

-Steve


Well it would have to be consistent for a string type that does it
right . Either the string is indexed with units or it is indexed with
code points, and the other option should be provided. Dollar should
just be the length of what is used for indexing/slicing here, and
having that be different from length makes for a somewhat awkward
interface imho.


Except we are defining a string as a *range* and a range's length is
defined as the number of elements.

Note that hasLength!string evaluates to false in std.range.'


Ok. I feel the way narrow strings are handled in Phobos are a reasonable 
trade-off.




$ should denote the end point of the aggregate, but it does not have to
be equivalent to length, or even an integer/uint. It should just mean
end.


Point taken. What is the solution for infinite ranges? Should any 
arithmetics on $ just be disallowed?




I also proposed a while back to have ^ denote the beginning (similar to
regex) of an aggregate for aggregates that don't use 0 as the beginning,
but people didn't like it :)

-Steve


=D, well, it is grammatically unambiguous!

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Robert Jacques


On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu 
seewebsiteforem...@erdani.org wrote:

On 9/19/11 6:25 AM, Steven Schveighoffer wrote:

On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better. Some
operations become higher performance (i.e. popFront), and some become
worse (i.e. length). Most of the others are a wash.


That's where frequency of use comes into play. I'm thinking popFront
would be used most often, and it touches two words.

Andrei



Well, looking at the two implementations:

popFront()

if(length) {
ptr+= T.sizeof;
length -= 1;
}

vs

if(ptrBack - ptrFront  0) {
ptrFront += T.sizeof;
}

And don't forget that every popFront call will be matched by a call to empty

empty()

return length;

vs

return ptrBack - ptrFront  0;

I see that the 'new' popFront still needs to read 2 words and results in an 
extra subtraction. Without the guard statements, then the instruction counts 
are identical, but note that the current popFront implementation uses slices 
which are 'guarded' and I see every reason not to change this behavior. And 
given how memory works, I doubt there by any advantage to 1 vs 2 writes one 
after another to the same cache line.

By the way, for those wondering why I didn't use 'ptrBack  ptrFront', that's 
because comparisons are eventually reduced to a 'Jump if (Not) Zero', and I wanted 
to make the amount and types of hardware instructions clear.

The elephant in the room, of course, is that length now requires a division and 
that popFront is actually implemented using slicing:

a = a[i .. $];

which translates into:

auto begin = i;
auto end   = length;
if(end - begin = 0   length - end = 0) {
ptr = ptr + T.sizeof * begin;
length  = end - begin;
}

vs

auto length = (ptrBack - ptrFront) / T.sizeof;
auto begin  = ptrFront + T.sizeof * i;
auto end= ptrFront + T.sizeof * length;
if(end - begin = 0   ptrBack - end = 0) {
ptrFront = begin;
ptrBack  = end;
}

And both integer multiplication and division of non-power of 2 sizes is really, 
really, really slow, compared to +-. Now, the division is only needed whenever 
'length' is called, but the extra multiplication is required on every slice.

So, on balance, I'd say the two pointers representation is categorically worse 
than the fat pointer representation.

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer

On Mon, 19 Sep 2011 11:46:44 -0400, Robert Jacques sandf...@jhu.edu  
wrote:


On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:

On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr timon.g...@gmx.ch  
wrote:



On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:

That would allow us to e.g. switch from the
pointer+length representation to the arguably better pointer+pointer
representation with ease.


In what way is that representation better?


I agree, I don't see why the representation is inherently better. Some
operations become higher performance (i.e. popFront), and some become
worse (i.e. length). Most of the others are a wash.


That's where frequency of use comes into play. I'm thinking popFront
would be used most often, and it touches two words.

Andrei



The elephant in the room, of course, is that length now requires a  
division and that popFront is actually implemented using slicing:


a = a[i .. $];

which translates into:

auto begin = i;
auto end   = length;
if(end - begin = 0   length - end = 0) {
 ptr = ptr + T.sizeof * begin;
 length  = end - begin;
}

vs

auto length = (ptrBack - ptrFront) / T.sizeof;
auto begin  = ptrFront + T.sizeof * i;
auto end= ptrFront + T.sizeof * length;
if(end - begin = 0   ptrBack - end = 0) {
 ptrFront = begin;
 ptrBack  = end;
}


I would hope something like this would be optimized by the compiler:

auto begin = ptrFront + T.sizeof * i;
if(ptrBack - begin = 0)
   ptrFront = begin;

If not, popFront could optimize it.

Certainly, to say popFront is going to perform *worse* using a  
dual-pointer representation is false.  Only one calculation is needed.


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer


On Mon, 19 Sep 2011 11:17:01 -0400, Kagamin s...@here.lot wrote:


Steven Schveighoffer Wrote:


auto set = new TreeSet!uint(1, 3, 5, 7, 9);
assert(set.length == 5);
auto range = set[1..set.length];

assert(walkLength(range) == 2); // probably not what you expected


Where you got that 1?


1 is the first element in the set.


How to slice it from begin to 7?


set[set.begin .. 7]

This does not include the element at position 7.  To get that element too,  
you have to do something funky:


auto cur = set.elemAt(7);
cur.popFront();

set[set.begin .. cur];

I haven't thought of an easy way to signify find me the element After X.

Looks like an operator overload abuse. Slicing is an array's feature. If  
a collection doesn't support array interface, it's questionable whether  
it should support slicing as it's defined in D. By stretching slicing to  
non-array collections you break its semantics. Good for voodoo, bad for  
engineering.


I emphatically disagree.  Slicing is the method of extracting a range from  
a collection given two reference points.  For arrays, that happens to be  
indexes.  For node-based containers, it would be references to those nodes  
(in dcollections, those are called cursors).  I don't see why slicing  
should be restricted to ints, otherwise, why have opSlice compile with  
anything other than ints?


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Steven Schveighoffer


On Mon, 19 Sep 2011 12:00:46 -0400, Timon Gehr timon.g...@gmx.ch wrote:


On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:




$ should denote the end point of the aggregate, but it does not have to
be equivalent to length, or even an integer/uint. It should just mean
end.


Point taken. What is the solution for infinite ranges? Should any  
arithmetics on $ just be disallowed?


I think $ should not pose any type restrictions, and whatever type you  
return can have whatever semantics you want.  The only semantic  
restriction is it should mean the end of the container.


In other words, generic code that expects to be able to do:

x[0..$/2]

should only expect this to work when $ is a numeric value, or defines  
division by numbers.


Technically, half of infinity is still infinity, so the type of $ could be  
defined so that any arithmetic operations on it result in itself ;)


-Steve

Re: Go and generic programming on reddit, also touches on D

2011-09-19 Thread Andrei Alexandrescu


On 9/19/11 11:12 AM, Andrei Alexandrescu wrote:

On 9/19/11 10:46 AM, Robert Jacques wrote:

So, on balance, I'd say the two pointers representation is categorically
worse than the fat pointer representation.


Benchmark. A few of your assumptions don't hold.

Andrei


s/don't/may not/

:o)

Andrei

52 matches

Mail list logo