Re: C++ / Why Iterators Got It All Wrong

2017-08-29 Thread bitwise via Digitalmars-d

On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take a 
look.


"superseded" is the wrong word, as ranges cannot do everything 
iterators can.


Re: C++ / Why Iterators Got It All Wrong

2017-08-29 Thread Steven Schveighoffer via Digitalmars-d

On 8/29/17 8:50 AM, Robert M. Münch wrote:

Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take a look.




Interesting. It reminds me a bit of cursors for dcollections. In there, 
a cursor is a 0 or 1 element range. The one element range points at an 
element, the 0 element range points at a border. You can use cursors to 
compose ranges.


https://github.com/schveiguy/dcollections

Not sure how useful it is to have them be separate types. It can be nice 
I suppose. But one thing I love about iterators/cursors is that 
something like find doesn't assume what data you are interested in.


In Phobos, find gives you a range where the first element is the one you 
searched for, and the last element is the end of the original range. But 
what if you wanted all the data *up to* the element instead? What if you 
just wanted to look at that specific element? So we need several 
functions that do this, and it's not always clear how to do it 
correctly, and it's difficult to compose one function from the other. 
With iterators, it's simple.


-Steve


Re: C++ / Why Iterators Got It All Wrong

2017-08-29 Thread bitwise via Digitalmars-d
On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer 
wrote:


In Phobos, find gives you a range where the first element is 
the one you searched for, and the last element is the end of 
the original range. But what if you wanted all the data *up to* 
the element instead? What if you just wanted to look at that 
specific element? So we need several functions that do this, 
and it's not always clear how to do it correctly, and it's 
difficult to compose one function from the other. With 
iterators, it's simple.


-Steve


I was about to whine about exactly this, but thought it would be 
off topic ;)


I see "trim_left" in the slides which I'm guessing refers to what 
D calls "find". IMO, "trim_left" is a much better name. "find" 
should not return elements you didn't ask for, it should return a 
range of length 1 or 0.




Re: C++ / Why Iterators Got It All Wrong

2017-08-29 Thread jmh530 via Digitalmars-d
On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer 
wrote:


Interesting. It reminds me a bit of cursors for dcollections. 
In there, a cursor is a 0 or 1 element range. The one element 
range points at an element, the 0 element range points at a 
border. You can use cursors to compose ranges.


https://github.com/schveiguy/dcollections



I was thinking this exact same thing when I got to the Element 
part of it.


The mach library also makes use of cursors.
https://github.com/pineapplemachine/mach.d

I'm not entirely sure how if I grok how the Border thing they are 
talking about works.


Re: C++ / Why Iterators Got It All Wrong

2017-08-29 Thread Steven Schveighoffer via Digitalmars-d

On 8/29/17 9:34 AM, jmh530 wrote:

On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer wrote:


Interesting. It reminds me a bit of cursors for dcollections. In 
there, a cursor is a 0 or 1 element range. The one element range 
points at an element, the 0 element range points at a border. You can 
use cursors to compose ranges.


https://github.com/schveiguy/dcollections



I was thinking this exact same thing when I got to the Element part of it.

The mach library also makes use of cursors.
https://github.com/pineapplemachine/mach.d

I'm not entirely sure how if I grok how the Border thing they are 
talking about works.


A border points between elements. It's like the end element in a 
standard STL container, it's not really pointing at an element, but one 
past the last element.


It can be handy when specifying ranges. I.e. exclusive or inclusive ranges.

My gut feeling is that the splitting of types between elements and 
borders is too much machinery, but I haven't seen it in practice to make 
a fair judgment.


-Steve


Re: C++ / Why Iterators Got It All Wrong

2017-08-30 Thread Robert M. Münch via Digitalmars-d

On 2017-08-29 13:23:50 +, Steven Schveighoffer said:


...
In Phobos, find gives you a range where the first element is the one 
you searched for, and the last element is the end of the original 
range. But what if you wanted all the data *up to* the element instead? 
What if you just wanted to look at that specific element? So we need 
several functions that do this, and it's not always clear how to do it 
correctly, and it's difficult to compose one function from the other.


I'm a big fan of Rebol (yes, it's far from mainstream and a dead-end 
these days but it has a lot of very nice ideas) and it uses the idea of 
a series datatype. Fruther information here: 
http://www.rebol.com/docs/core23/rebolcore-6.html


There you do something like:

1. Return the first hit and the rest of the series


my-series: "abcdefg"

== "abcdefg"

find my-series "b"

== "bcdefg"


2. Return only the hit


first find my-series "b"

== #"b"


3. Return everything before the first hit


copy/part my-series find my-series "d"

== "abc"


If you ever got used to such a thinking, writing & using more 
non-functional approaches, really hurts.


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: C++ / Why Iterators Got It All Wrong

2017-08-31 Thread drug via Digitalmars-d

31.08.2017 09:50, Robert M. Münch пишет:

On 2017-08-29 13:23:50 +, Steven Schveighoffer said:


...
In Phobos, find gives you a range where the first element is the one
you searched for, and the last element is the end of the original
range. But what if you wanted all the data *up to* the element
instead? What if you just wanted to look at that specific element? So
we need several functions that do this, and it's not always clear how
to do it correctly, and it's difficult to compose one function from
the other.


I'm a big fan of Rebol (yes, it's far from mainstream and a dead-end
these days but it has a lot of very nice ideas) and it uses the idea of
a series datatype. Fruther information here:
http://www.rebol.com/docs/core23/rebolcore-6.html

There you do something like:

1. Return the first hit and the rest of the series


my-series: "abcdefg"

== "abcdefg"

find my-series "b"

== "bcdefg"


2. Return only the hit


first find my-series "b"

== #"b"


3. Return everything before the first hit


copy/part my-series find my-series "d"

== "abc"


If you ever got used to such a thinking, writing & using more
non-functional approaches, really hurts.


Interesting. How is it comparable with iterators and ranges ideas?


Re: C++ / Why Iterators Got It All Wrong

2017-09-01 Thread Mark via Digitalmars-d

On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take a 
look.


Iterators have many problems. Andrei's talk some years ago, 
titled "Iterators Must Go", points out many of them in a clear 
fashion. I think most of the problems stem from the fact that 
they are inherently a leaky abstraction. Iterators basically 
treat all data structures as a singly/doubly linked list or as 
arrays (if they allow random access). There is no natural or 
intuitive way of describing, say, a tree or an arbitrary graph as 
a list/array. Ranges and cursors try to solve some of the issues 
but they still have this fundamental problem.


When I think of a good iteration interface, the only thing that 
comes to mind is SQL queries (= relational algebra, more or 
less). You have a few basic operators (Select, From, etc.) for 
querying that are intuitive, simple and safe. Furthermore, they 
compose elegantly, allowing you to express fairly complicated 
queries using these few basic operators. It's a true abstraction 
- I don't know and I don't care if the operators are implemented 
underneath using iterators, ranges, cursors or whatever (of 
course, sometimes I do care, when performance is a priority...).


It may be unrealisic to expect such a nice abstraction of 
iteration for other data structures, not to mention a universal 
one which will work for all interesting data structures. Still, 
if and when I can dispense with iteration altogether, I'm happy 
to do so.


Re: C++ / Why Iterators Got It All Wrong

2017-09-02 Thread Robert M. Münch via Digitalmars-d

On 2017-08-31 07:13:55 +, drug said:


Interesting. How is it comparable with iterators and ranges ideas?


Well, see: http://www.rebol.com/docs/core23/rebolcore-6.html#section-6

Just download the interpreter and play around with it to get a feeling.

Overall it's way more functional. Like getting the last element: copy 
back tail my-series


What I don't like is that indexing starts at 1, because this makes 
working with +/- offsets from a specific positon cumbersome.


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

Re: C++ / Why Iterators Got It All Wrong

2017-09-02 Thread Robert M. Münch via Digitalmars-d

On 2017-09-01 20:34:32 +, Mark said:


On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:

Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1

I haven't read everything, so not sure if it worth to take a look.


Iterators have many problems. Andrei's talk some years ago, titled 
"Iterators Must Go", points out many of them in a clear fashion. I 
think most of the problems stem from the fact that they are inherently 
a leaky abstraction. Iterators basically treat all data structures as a 
singly/doubly linked list or as arrays (if they allow random access). 
There is no natural or intuitive way of describing, say, a tree or an 
arbitrary graph as a list/array. Ranges and cursors try to solve some 
of the issues but they still have this fundamental problem.


Iterators are not the silver bullet. But IIRC you can specify if you 
want to iterate over a graph BF or DF. If you just need to "iterate" 
over the elements things work pretty good IMO. If you want to select a 
sub-set and then iterate things get much complicater anyway.


It may be unrealisic to expect such a nice abstraction of iteration for 
other data structures, not to mention a universal one which will work 
for all interesting data structures.


That's true, hence I didn't ever expected this at all. Some pretty 
simple & basic building blocks and the rest is done on-demand fitting 
the use-case.


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: C++ / Why Iterators Got It All Wrong

2017-09-02 Thread Moritz Maxeiner via Digitalmars-d
On Saturday, 2 September 2017 at 20:17:40 UTC, Robert M. Münch 
wrote:

On 2017-08-31 07:13:55 +, drug said:

Interesting. How is it comparable with iterators and ranges 
ideas?


Well, see: 
http://www.rebol.com/docs/core23/rebolcore-6.html#section-6


Thanks for your post about Rebol, I didn't know it before.
After reading through the series chaper, though, AFAICT Rebol 
series *are* iterators (begin+end), just with really nice, 
functional (read: LISP) syntax?


Re: C++ / Why Iterators Got It All Wrong

2017-09-02 Thread Ilya Yaroshenko via Digitalmars-d

On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take a 
look.


Iterators has no proper alternative when one need to implement 
generic tensor library like Mir Algorithm [1].


User level API is ndslice [4] (D n-dimensional random access 
range) , can be combined with Phobos. In the same time library 
uses low level unbounded random access iterator [2] 
representation to implement 90% of topology logic. This approach 
allows to maintain few times smaller, simpler and less 
error-prone code comparing with std.range. The best example it 
`bitwise` functions implementation in Phobos and Mir Algorithm. 
Iterators are very good for type composition when one need its 
own D range that can not be constructed using existing 
mir.ndslice.topology / std.range API.


[1] https://github.com/libmir/mir-algorithm
[2] 
http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html
[3] 
http://docs.algorithm.dlang.io/latest/mir_ndslice_iterator.html
[4] 
http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#Slice


Best Regards,
Ilya



Re: C++ / Why Iterators Got It All Wrong

2017-09-02 Thread Moritz Maxeiner via Digitalmars-d
On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
wrote:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take a 
look.


Iterators has no proper alternative when one need to implement 
generic tensor library like Mir Algorithm [1].


Out of curiosity: Could you elaborate on what the issues are with 
using a range based API internally (if it's performance, the 
"why")?


Re: C++ / Why Iterators Got It All Wrong

2017-09-03 Thread Robert M. Münch via Digitalmars-d

On 2017-09-02 21:27:58 +, Moritz Maxeiner said:


Thanks for your post about Rebol, I didn't know it before.


As said, the official Rebol-2 version is a dead-end. Even our main 
product is still based on it :-) 15 years old technology, but still 
working and we know all problemes. So very few surprises. And, it's 
very productive.


There exists a half-done official Rebol-3 version as open-source. It 
was picked up by some and continued. And than there is a thing called 
Red, which uses a lot of ideas but compiles. Worth a look too. It's 
really cool, because it compiles native apps for Android without any 
SDK for example.


Overall, it's worth to spend some time with Rebol. I'm sure you won't 
your time back and can learn a lot. Things to look at: VIEW (GUI), 
PARSE (for parsing) and after this using PARSE to create DSL with 
Rebol. Very cool feature.



After reading through the series chaper, though, AFAICT Rebol series 
*are* iterators (begin+end), just with really nice, functional (read: 
LISP) syntax?


There is no difference between code & data in Rebol. And it has a very 
rich set of datatpye, IIRC about 35 native ones. And many of them are 
series, which can be operated in the same way.


From my experience, traversing datastructures with a functional syntax 
and concept is really natural to work with. It's mostly what you would 
tell someone to do using english.


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: C++ / Why Iterators Got It All Wrong

2017-09-03 Thread Ilya Yaroshenko via Digitalmars-d
On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
wrote:
On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
wrote:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take a 
look.


Iterators has no proper alternative when one need to implement 
generic tensor library like Mir Algorithm [1].


Out of curiosity: Could you elaborate on what the issues are 
with using a range based API internally (if it's performance, 
the "why")?


There are three general kinds of dynamic dense tensors. Mir 
implements all of them:


1. Contiguous tensors. Their data is located contiguously in 
memory. Single dense memory chunk. All strides between 
subs-tensors can be computed from lengths.


2. Canonical tensors. Only data for one dimension is dense, other 
dimensions has strides that can not be computed from lengths. 
BLAS matrixes are canonical tensors: they have two lengths and 
one stride.


3. Universal tensors. Each dimension has a stride. Numpy ndarrays 
are universal tensors.


Finite random access range Issues:

1. Main API issue is that full featured random access range (RAR) 
API (with slicing, and all opIndex* operators like `[]*=` ) is 
much larger comparing with unbounded random access iterator (the 
same API as pointer).


2. Random access range holds its length. Yes, Phobos has infinity 
ranges (mir hasn't), but the practice is that almost any RAR 
constructed using Phobos has length and you can not strip when 
type is constructed. Anyway, infinity RAR is just a 
pointer/iterator that can move forward but can not move backward. 
Tensors hold their own lengths for each dimensions, so range's 
length is just a useless payload.


3. Ranges can not move backward (auto b = a[-4 .. $].). This 
means one can not implement dynamic `reversed`(flip) [1, 2] 
operation for dimensions that have strides (universal or 
canonical tensors). _Dynamic_ means that instead of construct a 
new type with reversed order for a dimension, we just negate the 
corresponding stride and move the cursor.


[1] 
http://docs.algorithm.dlang.io/latest/mir_ndslice_dynamic.html#.reversed
[2]  
https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.flip.html


Best regards,
Ilya


Re: C++ / Why Iterators Got It All Wrong

2017-09-03 Thread Moritz Maxeiner via Digitalmars-d
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
wrote:
On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
wrote:
On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
wrote:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take a 
look.


Iterators has no proper alternative when one need to 
implement generic tensor library like Mir Algorithm [1].


Out of curiosity: Could you elaborate on what the issues are 
with using a range based API internally (if it's performance, 
the "why")?


There are three general kinds of dynamic dense tensors. Mir 
implements all of them:


1. Contiguous tensors. Their data is located contiguously in 
memory. Single dense memory chunk. All strides between 
subs-tensors can be computed from lengths.


2. Canonical tensors. Only data for one dimension is dense, 
other dimensions has strides that can not be computed from 
lengths. BLAS matrixes are canonical tensors: they have two 
lengths and one stride.


3. Universal tensors. Each dimension has a stride. Numpy 
ndarrays are universal tensors.


Finite random access range Issues:

1. Main API issue is that full featured random access range 
(RAR) API (with slicing, and all opIndex* operators like `[]*=` 
) is much larger comparing with unbounded random access 
iterator (the same API as pointer).


Won't you have to implement opIndex* operators either way in 
order to use the `a[]` syntax? The main difference I can see 
should be that you'd also have to implement the InputRange 
(front,popFront,empty), ForwardRange (save), and 
BidirectionalRange (back,popBack) members, which if you don't 
need them, is indeed understandably too much?




2. Random access range holds its length. Yes, Phobos has 
infinity ranges (mir hasn't), but the practice is that almost 
any RAR constructed using Phobos has length and you can not 
strip when type is constructed. Anyway, infinity RAR is just a 
pointer/iterator that can move forward but can not move 
backward. Tensors hold their own lengths for each dimensions, 
so range's length is just a useless payload.


I'm not sure what you mean here. Is this still about accessing 
the elements of one tensor? If so, what do Phobos' ranges have to 
do with your tensor type's API?




3. Ranges can not move backward (auto b = a[-4 .. $].). This 
means one can not implement dynamic `reversed`(flip) [1, 2] 
operation for dimensions that have strides (universal or 
canonical tensors). _Dynamic_ means that instead of construct a 
new type with reversed order for a dimension, we just negate 
the corresponding stride and move the cursor.


Right, that is indeed a limitation of the range abstraction 
(though a type could expose both functionality to move backwards 
_and_ the range API) and if you need that, it makes sense not to 
use ranges.


Thanks for the summary of issues :)



Re: C++ / Why Iterators Got It All Wrong

2017-09-03 Thread Moritz Maxeiner via Digitalmars-d
On Sunday, 3 September 2017 at 08:37:36 UTC, Robert M. Münch 
wrote:

On 2017-09-02 21:27:58 +, Moritz Maxeiner said:


Thanks for your post about Rebol, I didn't know it before.


As said, the official Rebol-2 version is a dead-end. Even our 
main product is still based on it :-) 15 years old technology, 
but still working and we know all problemes. So very few 
surprises. And, it's very productive.


There exists a half-done official Rebol-3 version as 
open-source. It was picked up by some and continued. And than 
there is a thing called Red, which uses a lot of ideas but 
compiles. Worth a look too. It's really cool, because it 
compiles native apps for Android without any SDK for example.


Overall, it's worth to spend some time with Rebol. I'm sure you 
won't your time back and can learn a lot. Things to look at: 
VIEW (GUI), PARSE (for parsing) and after this using PARSE to 
create DSL with Rebol. Very cool feature.


I'll put in on my ever growing list of things to check out in 
depth, thanks :p





After reading through the series chaper, though, AFAICT Rebol 
series *are* iterators (begin+end), just with really nice, 
functional (read: LISP) syntax?


There is no difference between code & data in Rebol. And it has 
a very rich set of datatpye, IIRC about 35 native ones. And 
many of them are series, which can be operated in the same way.


Sounds like LISP :)



From my experience, traversing datastructures with a functional 
syntax and concept is really natural to work with. It's mostly 
what you would tell someone to do using english.


I agree, though I was talking about what the abstract data type 
of a "series" is, i.e. what operations is exposes. From my 
observation:

A D input range exposes via empty/front/popFront.
A classic iterator exposes via begin/end.
A Rebol series seems to be a safer form of iterator, as it 
doesn't expose begin/end directly, but exposes restricted 
operations that are defined as manipulating begin/end.


Re: C++ / Why Iterators Got It All Wrong

2017-09-03 Thread Ilya Yaroshenko via Digitalmars-d
On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner 
wrote:
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
wrote:
On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
wrote:
On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
wrote:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take 
a look.


Iterators has no proper alternative when one need to 
implement generic tensor library like Mir Algorithm [1].


Out of curiosity: Could you elaborate on what the issues are 
with using a range based API internally (if it's performance, 
the "why")?


There are three general kinds of dynamic dense tensors. Mir 
implements all of them:


1. Contiguous tensors. Their data is located contiguously in 
memory. Single dense memory chunk. All strides between 
subs-tensors can be computed from lengths.


2. Canonical tensors. Only data for one dimension is dense, 
other dimensions has strides that can not be computed from 
lengths. BLAS matrixes are canonical tensors: they have two 
lengths and one stride.


3. Universal tensors. Each dimension has a stride. Numpy 
ndarrays are universal tensors.


Finite random access range Issues:

1. Main API issue is that full featured random access range 
(RAR) API (with slicing, and all opIndex* operators like 
`[]*=` ) is much larger comparing with unbounded random access 
iterator (the same API as pointer).


Won't you have to implement opIndex* operators either way in 
order to use the `a[]` syntax? The main difference I can see 
should be that you'd also have to implement the InputRange 
(front,popFront,empty), ForwardRange (save), and 
BidirectionalRange (back,popBack) members, which if you don't 
need them, is indeed understandably too much?


front,
popFront,
empty,
save,
back,
popBack,
opIndexOpAssign,
2 x opIndex(.opSlice) for rhs scalars and ranges
2 x opIndexAssign(.opSlice) for rhs scalars and ranges
2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges

Plus, because of range topology slicing may be slower good to add 
this ones:


popFrontN
popFrontExactly,
popBackN
popBackExactly,

and maybe i forgot something ...



2. Random access range holds its length. Yes, Phobos has 
infinity ranges (mir hasn't), but the practice is that almost 
any RAR constructed using Phobos has length and you can not 
strip when type is constructed. Anyway, infinity RAR is just a 
pointer/iterator that can move forward but can not move 
backward. Tensors hold their own lengths for each dimensions, 
so range's length is just a useless payload.


I'm not sure what you mean here. Is this still about accessing 
the elements of one tensor? If so, what do Phobos' ranges have 
to do with your tensor type's API?


ndslice is:
1. Slice structure with a huge amount of features from mir 
algorithm
2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and 
all other generalized RAR primitves including multidimensional 
index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $].
3. Common one dimensional RAR on top of outer most dimension. So 
one can use Phobos funcitons for Slice structure. For example, a 
matrix will be iterated row-by-row.


But this paragraph was about another issue:

struct BLASMatrixTypeBasedOnRanges
{
   double[] _data; // arrays is the simplest RAR
   size_t rows, cols;
   size_t stride;
}

sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5;

in other hand:

sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4;

Ranges requires for 25% more space for canonical matrixes then 
iterators.




3. Ranges can not move backward (auto b = a[-4 .. $].). This 
means one can not implement dynamic `reversed`(flip) [1, 2] 
operation for dimensions that have strides (universal or 
canonical tensors). _Dynamic_ means that instead of construct 
a new type with reversed order for a dimension, we just negate 
the corresponding stride and move the cursor.


Right, that is indeed a limitation of the range abstraction 
(though a type could expose both functionality to move 
backwards _and_ the range API) and if you need that, it makes 
sense not to use ranges.


Thanks for the summary of issues :)





Re: C++ / Why Iterators Got It All Wrong

2017-09-03 Thread Moritz Maxeiner via Digitalmars-d
On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko 
wrote:
On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner 
wrote:
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
wrote:
On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
wrote:
On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
wrote:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
wrote:
Maybe of interest: 
https://www.think-cell.com/en/career/talks/iterators/#1


I haven't read everything, so not sure if it worth to take 
a look.


Iterators has no proper alternative when one need to 
implement generic tensor library like Mir Algorithm [1].


Out of curiosity: Could you elaborate on what the issues are 
with using a range based API internally (if it's 
performance, the "why")?


There are three general kinds of dynamic dense tensors. Mir 
implements all of them:


1. Contiguous tensors. Their data is located contiguously in 
memory. Single dense memory chunk. All strides between 
subs-tensors can be computed from lengths.


2. Canonical tensors. Only data for one dimension is dense, 
other dimensions has strides that can not be computed from 
lengths. BLAS matrixes are canonical tensors: they have two 
lengths and one stride.


3. Universal tensors. Each dimension has a stride. Numpy 
ndarrays are universal tensors.


Finite random access range Issues:

1. Main API issue is that full featured random access range 
(RAR) API (with slicing, and all opIndex* operators like 
`[]*=` ) is much larger comparing with unbounded random 
access iterator (the same API as pointer).


Won't you have to implement opIndex* operators either way in 
order to use the `a[]` syntax? The main difference I can see 
should be that you'd also have to implement the InputRange 
(front,popFront,empty), ForwardRange (save), and 
BidirectionalRange (back,popBack) members, which if you don't 
need them, is indeed understandably too much?


front,
popFront,
empty,
save,
back,
popBack,
opIndexOpAssign,
2 x opIndex(.opSlice) for rhs scalars and ranges
2 x opIndexAssign(.opSlice) for rhs scalars and ranges
2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges

Plus, because of range topology slicing may be slower good to 
add this ones:


popFrontN
popFrontExactly,
popBackN
popBackExactly,

and maybe i forgot something ...


Didn't know about the *N *Exactly ones, but yeah, it's a lot. 
Though unless you aren't interesting in the `a[]` syntax, you 
can't avoid opIndex*.






2. Random access range holds its length. Yes, Phobos has 
infinity ranges (mir hasn't), but the practice is that almost 
any RAR constructed using Phobos has length and you can not 
strip when type is constructed. Anyway, infinity RAR is just 
a pointer/iterator that can move forward but can not move 
backward. Tensors hold their own lengths for each dimensions, 
so range's length is just a useless payload.


I'm not sure what you mean here. Is this still about accessing 
the elements of one tensor? If so, what do Phobos' ranges have 
to do with your tensor type's API?


ndslice is:
1. Slice structure with a huge amount of features from mir 
algorithm
2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and 
all other generalized RAR primitves including multidimensional 
index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $].
3. Common one dimensional RAR on top of outer most dimension. 
So one can use Phobos funcitons for Slice structure. For 
example, a matrix will be iterated row-by-row.


But this paragraph was about another issue:

struct BLASMatrixTypeBasedOnRanges
{
   double[] _data; // arrays is the simplest RAR
   size_t rows, cols;
   size_t stride;
}


Now I get what you mean, you were talking about not using the 
dynamic arrays built into the language (which indeed carry their 
length for safety purposes), not about exposing range API here; 
you're right, you don't need to use a dynamic array here, as you 
already have the length encoded another way (it would be 
wasteful), but strictly speaking D's builtin dynamic arrays are 
not ranges (as they are neither aggregate types nor do they have 
the range functions baked into the language). They only become 
ranges when you import the appropriate free functions from Phobos 
(or define some yourself).




sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5;

in other hand:

sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4;

Ranges requires for 25% more space for canonical matrixes then 
iterators.


We do have to differentiate between a container and the API with 
which to iterate over its elements.
D's dynamic arrays (a pointer+length) require more space then 
using only a raw pointer, of course. If that's more than an 
iterator depends on the type of iterator you use. Most common 
iterator implementations I know consist of a begin and an end 
pointer, essentially requiring the same space as D's dynamic 
arrays.
In the case you describe, though, we aren't talking 

Re: C++ / Why Iterators Got It All Wrong

2017-09-03 Thread Ilya via Digitalmars-d
On Sunday, 3 September 2017 at 16:33:04 UTC, Moritz Maxeiner 
wrote:
On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko 
wrote:
Ranges requires for 25% more space for canonical matrixes then 
iterators.


We do have to differentiate between a container and the API 
with which to iterate over its elements.
D's dynamic arrays (a pointer+length) require more space then 
using only a raw pointer, of course. If that's more than an 
iterator depends on the type of iterator you use. Most common 
iterator implementations I know consist of a begin and an end 
pointer, essentially requiring the same space as D's dynamic 
arrays.
In the case you describe, though, we aren't talking about the 
iteration API, we are talking about what's used internally for 
storage.


Maybe I should call it cursors or generic pointers instead of 
iterators.


Re: C++ / Why Iterators Got It All Wrong

2017-09-04 Thread jmh530 via Digitalmars-d

On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:


Maybe I should call it cursors or generic pointers instead of 
iterators.


dcollections uses a cursor approach that seems different from 
yours

https://github.com/schveiguy/dcollections/blob/master/concepts.txt

Maybe generic pointers makes more sense? Unless I'm mistaken, mir 
iterators are more like pointers with specific operations defined 
for how they are incremented, assigned, etc.


I had been a little confused by the name iterators because I 
don't see any functions in mir that use begin/end. But I don't 
know if that's more about C++'s implementation of iterators or 
more core to the concept.





Re: C++ / Why Iterators Got It All Wrong

2017-09-04 Thread Robert M. Münch via Digitalmars-d

On 2017-09-03 12:46:05 +, Moritz Maxeiner said:


I'll put in on my ever growing list of things to check out in depth, thanks :p


You're welcome. Don't wait to long ;-)

There is no difference between code & data in Rebol. And it has a very 
rich set of datatpye, IIRC about 35 native ones. And many of them are 
series, which can be operated in the same way.


Sounds like LISP :)


Yes, some ideas are the same. But the syntax is way more beautyful and useable.

I agree, though I was talking about what the abstract data type of a 
"series" is, i.e. what operations is exposes. From my observation:

A D input range exposes via empty/front/popFront.
A classic iterator exposes via begin/end.
A Rebol series seems to be a safer form of iterator, as it doesn't 
expose begin/end directly, but exposes restricted operations that are 
defined as manipulating begin/end.


The series has elements and a "current pointer" which can be 
manipulated. Begin & end are just that, begin and end of the series. 
You don't manipulate them directly. Of couse you can change a series by 
adding/removing, cutting etc. which then changes begin/end accordingly.



a: [a b c d e f g]

== [a b c d e f g]

a/5

== e

skip a 5

== [f g]

a

== [a b c d e f g]

b: a/5

== e

type? b

== word!

type? a

== block!

b: skip a 5

== [f g]

type? b

== block!

index? b

== 6

head b

== [a b c d e f g]

You can even treat such a block as fixed-size record and operate on 
this. Very handy.


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: C++ / Why Iterators Got It All Wrong

2017-09-04 Thread Mark via Digitalmars-d
On Sunday, 3 September 2017 at 12:46:05 UTC, Moritz Maxeiner 
wrote:
I agree, though I was talking about what the abstract data type 
of a "series" is, i.e. what operations is exposes. From my 
observation:

A D input range exposes via empty/front/popFront.
A classic iterator exposes via begin/end.
A Rebol series seems to be a safer form of iterator, as it 
doesn't expose begin/end directly, but exposes restricted 
operations that are defined as manipulating begin/end.


They are all pretty low-level abstractions. C++ has posited that 
iterators should be the bridge connecting generic data structures 
and generic algorithms. But they are pretty awful at that. They 
make it incredibly easy to destroy one of your structure's 
invariants, or to use it in a way that ignores some of its 
invariants (leading to inferior performance and sometimes 
unnecessarily bulky code). Ironically iterators are frequently 
used in OO code even though they clearly violate the "Tell, don't 
ask" principle, as do D ranges (and presumably also Rebol series, 
though I only skimmed over the documentation).


Re: C++ / Why Iterators Got It All Wrong

2017-09-05 Thread Mark via Digitalmars-d
On Saturday, 2 September 2017 at 20:22:44 UTC, Robert M. Münch 
wrote:
Iterators are not the silver bullet. But IIRC you can specify 
if you want to iterate over a graph BF or DF. If you just need 
to "iterate" over the elements things work pretty good IMO. If 
you want to select a sub-set and then iterate things get much 
complicater anyway.


There isn't really some finite (or small, anyway) choice here. I 
may want to iterate over a binary tree in 
preorder/inorder/postorder DF order. I may want to iterate over 
it in BF order. I may want to handle the left son before the 
right one, or vice versa. This is already 8 iterators. In C++ 
land I also need to provide const and non-const versions of each 
iterator, which brings it up to 16 iterators (I'm not sure what's 
the situation in D - can you have const ranges and still be able 
to call popFront on them?)


From the implementor's side, this is a headache. However, it's 
not that bad in a language in D, since "static if" saves you a 
lot of code in cases like this.


The bigger problem is on the user's side: Which iterator should 
she use? For most applications more than one type of iterator 
will be adequate, but the "right" choice will often be 
implementation-dependent. For instance, if the tree in question 
is a complete binary tree, then it is often implemented as an 
array that contains the tree's elements in BF order. Therefore, 
if some operation on the tree can be done by traversing it BF 
then the user should probably use a BF iterator, for performance 
reasons. But the user doesn't know how the tree is implemented, 
so she can't make that choice...


If the implementor informs the user about the implementation in 
the documentation, then his data structure is a cost-free 
abstraction (assuming that the user always chooses wisely...) but 
also a leaky one. If he doesn't, then the abstraction doesn't 
leak but it is no longer cost-free. He can't have both.


A simpler example is a class interface for a (two-dimensional) 
matrix. You would expect the interface to provide a row-major 
order iterator and a column-major order one. Again, from the 
user's POV, the right choice (performance wise) will be dependent 
on the implementation. Asymptotic complexity is sometimes 
considered an essential detail of a "true" abstraction but this 
doesn't help here since the difference in performance is only 
observed in practice.


This is why I find the presentation in the OP - "Why iterators 
got it all wrong" - a bit iffy. Iterators did get a lot of things 
wrong, but the presentation focuses on relatively mundane issues 
- ugly syntax and somewhat annoying semantics.


Re: C++ / Why Iterators Got It All Wrong

2017-09-06 Thread Ola Fosheim Grøstad via Digitalmars-d

On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:
Maybe I should call it cursors or generic pointers instead of 
iterators.


Well, «C++ iterators» are table pointer abstractions, so you need 
a pair.


An «iterator» would be a possibly heavy object that is used for 
traversing a possibly complex and heterogenous data-structure 
without exposing any notion of size or end a-priori (e.g. before 
it arrives at the end, if there is an end).





Re: C++ / Why Iterators Got It All Wrong

2017-09-06 Thread jmh530 via Digitalmars-d
On Wednesday, 6 September 2017 at 06:57:25 UTC, Ola Fosheim 
Grøstad wrote:


Well, «C++ iterators» are table pointer abstractions, so you 
need a pair.


An «iterator» would be a possibly heavy object that is used for 
traversing a possibly complex and heterogenous data-structure 
without exposing any notion of size or end a-priori (e.g. 
before it arrives at the end, if there is an end).


mir-ndslices are like a generalization of D's slices. Instead of 
pointer and a length, they are an iterator and lengths/strides. 
So the size is exposed.


Re: C++ / Why Iterators Got It All Wrong

2017-09-06 Thread Enamex via Digitalmars-d
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
wrote:
1. Contiguous tensors. Their data is located contiguously in 
memory. Single dense memory chunk. All strides between 
subs-tensors can be computed from lengths.


2. Canonical tensors. Only data for one dimension is dense, 
other dimensions has strides that can not be computed from 
lengths. BLAS matrixes are canonical tensors: they have two 
lengths and one stride.


3. Universal tensors. Each dimension has a stride. Numpy 
ndarrays are universal tensors.


Can you elaborate?


Re: C++ / Why Iterators Got It All Wrong

2017-09-06 Thread jmh530 via Digitalmars-d

On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote:
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
wrote:
1. Contiguous tensors. Their data is located contiguously in 
memory. Single dense memory chunk. All strides between 
subs-tensors can be computed from lengths.


2. Canonical tensors. Only data for one dimension is dense, 
other dimensions has strides that can not be computed from 
lengths. BLAS matrixes are canonical tensors: they have two 
lengths and one stride.


3. Universal tensors. Each dimension has a stride. Numpy 
ndarrays are universal tensors.


Can you elaborate?


IMO, it's something that still needs to get explained better in 
the documentation. I haven't tried to because I'm not 100% on it.


Below is as best as I have figured things out:

All Slices in mir can have an iterator, lengths, and strides.

The lengths are always N-dimensional and contain information on 
the shape of the Slice. So for instance, if the lengths are [3, 
4], then N=2 and it is a 2-dimensional slice, with 3 rows and 4 
columns.


I left out packs...which are an added complication. Packs can be 
used to make slices of slices. For the above Slice, the default 
would simply be that the packs are [1], which means that there is 
no slice of slicing going on. If the packs were now [1, 1] (the 
sum of packs must equal N), then that is like saying you now have 
a slice of slices. In this case, slice[0] would be a row instead 
of a scalar. This is what allows you to iterate by row or by 
column.


So when you're thinking about contiguous, canonical, and 
universal. The way that lengths and packs work is the same for 
all of them. The difference is in the strides. Contiguous slices 
don't have a strides vector. Canonical slices have a strides 
vector with a length of N-1. Universal slices have a strides 
vector of length N.


So how are the strides used and why do they matter? I'm not sure 
I grok this part 100%, so I'll do my best. Strides tell you how 
much difference there is between the units of each array. So for 
instance, if my slice (call it a) has lengths [2, 3, 4] with 
strides [12, 4, 1], then a[0] is a [3, 4] matrix, which is where 
the 12 comes from. To move the pointer to the start of the next 
[3, 4] matrix (a[1]), requires moving 12 of whatever the type is. 
This would be a universal slice because it has N=3 lengths and 
strides. So if you call a._strides, then it would give you [12, 
4, 1]. If a were canonical, calling _strides would give you [12, 
4] because _strides for canonical slices have length N-1. Now if 
a were contiguous instead of universal and you call _strides on 
it, then it would give you [], because contiguous slices have no 
strides.


The memory footprint of the slice appears different for these 
with a and a[0] of universal being larger than canonical and 
contiguous. This largely reflects the storage of the strides data.


Similarly, a[0] has _strides [4, 1] for universal, [4] for 
canonical, and [] for contiguous. Mir is written in such a way 
that a[0] the same regardless of the SliceKind. For the most 
part, this means that it isn't really obvious that there is a 
difference between them. It matters in some underlying functions, 
but I haven't needed to do much other than sometimes convert a 
contiguous slice to universal (though it's not always clear to me 
why, I just do it).


Re: C++ / Why Iterators Got It All Wrong

2017-09-07 Thread Ilya Yaroshenko via Digitalmars-d

On Wednesday, 6 September 2017 at 21:44:01 UTC, jmh530 wrote:

On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote:
Similarly, a[0] has _strides [4, 1] for universal, [4] for 
canonical, and [] for contiguous. Mir is written in such a way 
that a[0] the same regardless of the SliceKind. For the most 
part, this means that it isn't really obvious that there is a 
difference between them. It matters in some underlying 
functions, but I haven't needed to do much other than sometimes 
convert a contiguous slice to universal (though it's not always 
clear to me why, I just do it).


For example, lets takes `transposed` function. It does not 
transpose the date. Instead, it swap dimensions.
Assume you have a canonical matrix with _lengths = [3, 4]. So its 
strides are [4]. Now we want to swap dimensions, but to do it we 
need to swap both lengths and strides. So first we need to 
convert a slice to universal, so it will have both strides we 
want to swap: [4, 1]. Transposed slice will have _lengths = [4, 
3] and _strides = [1, 4].


Best Regards,
Ilya



Re: C++ / Why Iterators Got It All Wrong

2017-09-07 Thread jmh530 via Digitalmars-d
On Thursday, 7 September 2017 at 09:40:40 UTC, Ilya Yaroshenko 
wrote:


For example, lets takes `transposed` function. It does not 
transpose the date. Instead, it swap dimensions.
Assume you have a canonical matrix with _lengths = [3, 4]. So 
its strides are [4]. Now we want to swap dimensions, but to do 
it we need to swap both lengths and strides. So first we need 
to convert a slice to universal, so it will have both strides 
we want to swap: [4, 1]. Transposed slice will have _lengths = 
[4, 3] and _strides = [1, 4].


Best Regards,
Ilya


I think what's missing from the documentation is a clear 
explanation of how the strides determine how the iterator moves. 
Even something like below (assuming I have the math right) would 
be an improvement, though I'm sure there is a clearer way to 
express the concept.


auto x = iota(3, 4).universal;
assert(x.strides == [4, 1]);
assert(x[2, 3] == 6); //(2 - 1) * 4 + (3 - 1) * 1

auto y = x.transposed;
assert(y.strides == [1, 4]);
assert(y[2, 3] == 9); //(2 - 1) * 1 + (3 - 1) * 4


Re: C++ / Why Iterators Got It All Wrong

2017-09-07 Thread jmh530 via Digitalmars-d

On Thursday, 7 September 2017 at 12:04:12 UTC, jmh530 wrote:


I think what's missing from the documentation is a clear 
explanation of how the strides determine how the iterator 
moves. Even something like below (assuming I have the math 
right) would be an improvement, though I'm sure there is a 
clearer way to express the concept.


auto x = iota(3, 4).universal;
assert(x.strides == [4, 1]);
assert(x[2, 3] == 6); //(2 - 1) * 4 + (3 - 1) * 1

auto y = x.transposed;
assert(y.strides == [1, 4]);
assert(y[2, 3] == 9); //(2 - 1) * 1 + (3 - 1) * 4


Hmm, maybe I can express this better as something like

auto data = iota(12);

auto x = data.sliced(2, 3).universal;
assert(x.strides == [4, 1]);
assert(x[2, 3] == data[(2 - 1) * 4 + (3 - 1) * 1]);

auto y = x.transposed;
assert(y.strides == [1, 4]);
assert(y[2, 3] == data[(2 - 1) * 1 + (3 - 1) * 4]);


Re: C++ / Why Iterators Got It All Wrong

2017-09-07 Thread jmh530 via Digitalmars-d

On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:


auto x = data.sliced(2, 3).universal;


Err, (3, 4) not (2, 3)


Re: C++ / Why Iterators Got It All Wrong

2017-09-07 Thread jmh530 via Digitalmars-d

On Thursday, 7 September 2017 at 12:28:00 UTC, jmh530 wrote:

On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:


auto x = data.sliced(2, 3).universal;


Err, (3, 4) not (2, 3)


All kinds of screwed up. This is what I get for not testing 
things before I post them.


unittest {
auto data = iota(12);

auto x = data.sliced(3, 4).universal;
assert(x.strides == [4, 1]);
assert(x[1, 2] == data[1 * 4 + 2 * 1]);

auto y = x.transposed;
assert(y.strides == [1, 4]);
assert(y[1, 2] == data[1 * 1 + 2 * 4]);
}


Re: C++ / Why Iterators Got It All Wrong

2017-09-10 Thread Ilya Yaroshenko via Digitalmars-d

On Thursday, 7 September 2017 at 20:46:22 UTC, jmh530 wrote:

On Thursday, 7 September 2017 at 12:28:00 UTC, jmh530 wrote:

On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:


auto x = data.sliced(2, 3).universal;


Err, (3, 4) not (2, 3)


All kinds of screwed up. This is what I get for not testing 
things before I post them.


unittest {
auto data = iota(12);

auto x = data.sliced(3, 4).universal;
assert(x.strides == [4, 1]);
assert(x[1, 2] == data[1 * 4 + 2 * 1]);

auto y = x.transposed;
assert(y.strides == [1, 4]);
assert(y[1, 2] == data[1 * 1 + 2 * 4]);
}


Another small difference is slicing:
For example, for contiguous matrix m:
1. m[a .. b] is contiguous
2. m[i]  is contiguous
3. m[a .. b, i]  is universal (because there are no 1D 
canonical slices)

4. m[a .. b, c .. d] is canonical

BTW, could you please update the docs or may be write a small 
article for Mir blog?

I will be happy to answer your questions if any.

Best Regards,
Ilya


Re: C++ / Why Iterators Got It All Wrong

2017-09-11 Thread jmh530 via Digitalmars-d
On Monday, 11 September 2017 at 05:41:37 UTC, Ilya Yaroshenko 
wrote:


Another small difference is slicing:
For example, for contiguous matrix m:
1. m[a .. b] is contiguous
2. m[i]  is contiguous
3. m[a .. b, i]  is universal (because there are no 1D 
canonical slices)

4. m[a .. b, c .. d] is canonical



Was not aware of this.

BTW, could you please update the docs or may be write a small 
article for Mir blog?


I'm happy to do some additional work on the docs, but I was 
waiting until I had a better understanding of things. I've also 
been working on some other things and there are only so many 
hours in the day. I want to add cholesky to mir-lapack/lubeck.



I will be happy to answer your questions if any.


I think I had asked a question on gitter, but I don't think I had 
heard back. Not sure how often you check that.