Re: Ranges

2011-03-12 Thread Jonathan M Davis
On Saturday 12 March 2011 14:02:00 Jonas Drewsen wrote:
> Hi,
> 
> I'm working a bit with ranges atm. but there are definitely some
> things that are not clear to me yet. Can anyone tell me why the char
> arrays cannot be copied but the int arrays can?
> 
> import std.stdio;
> import std.algorithm;
> 
> void main(string[] args) {
> 
>// This works
>int[]  a1 = [1,2,3,4];
>int[] a2 = [5,6,7,8];
>copy(a1, a2);
> 
>// This does not!
>char[] a3 = ['1','2','3','4'];
>char[] a4 = ['5','6','7','8'];
>copy(a3, a4);
> 
> }
> 
> Error message:
> 
> test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
> (isInputRange!(Range1) && isOutputRange!(Range2,ElementType!(Range1)))
> does not match any function template declaration
> 
> test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
> (isInputRange!(Range1) && isOutputRange!(Range2,ElementType!(Range1)))
> cannot deduce template function from argument types !()(char[],char[])

Character arrays / strings are not exactly normal. And there's a very good 
reason for it: unicode.

In unicode, a character is generally a single code point (there are also 
graphemes which involve combining code points to add accents and superscripts 
and whatnot to create a single character, but we'll ignore that in this 
discussion - it's complicated enough as it is). Depending on the encoding, that 
code point may be made up of one - or more - code units. UTF-8 uses 8 bit code 
units. UTF-16 uses 16 bit code units. And UTF-32 uses 32-bit code units. char 
is 
a UTF-8 code unit. wchar is a UTF-16 code unit. dchar is a UTF-32 code unit. 
UTF-32 is the _only_ one of those three which _always_ has one code unit per 
code point.

With an array of integers you can index it and slice it and be sure that 
everything that you're doing is valid. If you look at a single element, you 
know 
that it's a valid int. If you slice it, you know that every int in there is 
valid. If you're dealing with a dstring or dchar[], then the same still holds.

A dstring or dchar[] is an array of UTF-32 code units. Every code point is a 
single code unit, so every element in the array is a valid code point. You can 
take an arbitrary element in that array and know that it's a valid code point. 
You can slice it wherever you want and you still have a valid dstrin
g or dchar[]. The same does _not_ hold for char[] and wchar[].

char[] and wchar[] are arrays of UTF-8 and UTF-16 code units respectively. In 
both of those encodings, multiple code units are required to create a single 
code point. So, for instance, a code point could have 4 code units. That means 
that _4_ elements of that char[] make up a _single_ code point. You'd need 
_all_ 
4 of those elements to create a single, valid character. So, you _can't_ just 
take an arbitrary element in a char[] or wchar[] and expect it to be valid. You 
_can't_ just slice it anywhere. The resulting array stands a good chance of 
being invalid. You have to slice on code point boundaries - otherwise you could 
slice characters in hald and end up with an invalid string. So, unlike other 
arrays, it just doesn't work to treat char[] and wchar[] as random access 
ranges 
of their element type. What the programmer cares about is characters - dchars - 
not chars or wchars.

So, the way this is handled is that char[], wchar[], and dchar[] are all 
treated 
as ranges of dchar. In the case of dchar[], this is nothing special. You can 
index it and slice it as normal. So, it is a random access range.. However, in 
the case of char[] and wchar[], that means that when you're iterating over them 
that you're not dealing with a single element of the array at a time. front 
returns a dchar, and popFront() pops off however many elements made up front. 
It's like with foreach. If you iterate a char[] with auto or char, then each 
individual element is given

foreach(c; myStr) {}

But if you iterate over with dchar, then each code point is given as a dchar:

foreach(dchar c; myStr) {}

If you were to try and iterate over a char[] by char, then you would be looking 
at code units rather than code points which is _rarely_ what you want. If 
you're 
dealing with anything other than pure ASCII, you _will_ have bugs if you do 
that. You're supposed to use dchar with foreach and character arrays. That way, 
each value you process is a valid character. Ranges do the same, only you don't 
give them an iteration type, so they're _always_ iterating over dchar.

So, when you're using a range of char[] or wchar[], you're really using a range 
of dchar. These ranges are bi-directional. They can't be sliced, and they can't 
be indexed (since doing so would likely be invalid). This generally works very 
well. It's exactly what you want in most cases. The problem is that that means 
that the range that you're iterating over is effectively of a different type 
than 
the original char[] or wchar[].

You can't just take two ranges of dchar of the same

Re: Ranges

2011-03-12 Thread Bekenn

On 3/12/2011 2:02 PM, Jonas Drewsen wrote:

Error message:

test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
(isInputRange!(Range1) && isOutputRange!(Range2,ElementType!(Range1)))
does not match any function template declaration

test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
(isInputRange!(Range1) && isOutputRange!(Range2,ElementType!(Range1)))
cannot deduce template function from argument types !()(char[],char[])


I haven't checked (could be completely off here), but I don't think that 
char[] counts as an input range; you would normally want to use dchar 
instead.


Re: Ranges

2011-03-12 Thread Bekenn

Or, better yet, just read Jonathan's post.


Re: Ranges

2011-03-12 Thread Jonathan M Davis
On Saturday 12 March 2011 16:05:37 Jonathan M Davis wrote:
> You could open an
> enhancement request for copy to treat char[] and wchar[] as arrays if
> _both_ of the arguments are of the same type.

Actually, on reflection, I'd have to say that there's not much point to that. 
If 
you really want to copy on array to another (rather than a range), just use the 
array copy syntax:

void main()
{
auto i = [1, 2, 3, 4];
auto j = [3, 4, 5, 6];
assert(i == [1, 2, 3, 4]);
assert(j == [3, 4, 5, 6]);

i[] = j[];

assert(i == [3, 4, 5, 6]);
assert(j == [3, 4, 5, 6]);
}

copy is of benefit, because it works on generic ranges, not for copying arrays 
(arrays already allow you to do that quite nicely), so if all you're looking at 
copying is arrays, then just use the array copy syntax.

- Jonathan M Davis


Re: Ranges

2011-03-12 Thread Jonathan M Davis
On Saturday 12 March 2011 16:11:20 Bekenn wrote:
> On 3/12/2011 2:02 PM, Jonas Drewsen wrote:
> > Error message:
> > 
> > test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
> > (isInputRange!(Range1) && isOutputRange!(Range2,ElementType!(Range1)))
> > does not match any function template declaration
> > 
> > test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
> > (isInputRange!(Range1) && isOutputRange!(Range2,ElementType!(Range1)))
> > cannot deduce template function from argument types !()(char[],char[])
> 
> I haven't checked (could be completely off here), but I don't think that
> char[] counts as an input range; you would normally want to use dchar
> instead.

Char[] _does_ count as input range (of dchar). It just doesn't count as an 
_output_ range (since it doesn't really hold dchar).

- Jonathan M Davis


Re: Ranges

2011-03-12 Thread Andrej Mitrovic
What Jonathan said really needs to be put up on the D website, maybe
under the articles section. Heck, I'd just put a link to that recent
UTF thread on the website, it's really informative (the one on UTF and
meaning of glyphs, etc). And UTF will only get more important, just
like multicore.

Speaking of which, a description on ranges should be put up there as
well. There's that article Andrei once wrote, but we should put it on
the D site and discuss D's implementation of ranges in more detail.
And by 'we' I mean someone who's well versed in ranges. :p


Re: Ranges

2011-03-12 Thread Jonas Drewsen

Hi Jonathan,

   Thank you very much your in depth answer!

   It should indeed goto a faq somewhere it think. I did now about the 
codepoint/unit stuff but had no idea that ranges of char are handled 
using dchar internally. This makes sense but is an easy pitfall for 
newcomers trying to use std.{algoritm,array,ranges} for char[].


Thanks
Jonas

On 13/03/11 01.05, Jonathan M Davis wrote:

On Saturday 12 March 2011 14:02:00 Jonas Drewsen wrote:

Hi,

 I'm working a bit with ranges atm. but there are definitely some
things that are not clear to me yet. Can anyone tell me why the char
arrays cannot be copied but the int arrays can?

import std.stdio;
import std.algorithm;

void main(string[] args) {

// This works
int[]   a1 = [1,2,3,4];
int[] a2 = [5,6,7,8];
copy(a1, a2);

// This does not!
char[] a3 = ['1','2','3','4'];
char[] a4 = ['5','6','7','8'];
copy(a3, a4);

}

Error message:

test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
(isInputRange!(Range1)&&  isOutputRange!(Range2,ElementType!(Range1)))
does not match any function template declaration

test2.d(13): Error: template std.algorithm.copy(Range1,Range2) if
(isInputRange!(Range1)&&  isOutputRange!(Range2,ElementType!(Range1)))
cannot deduce template function from argument types !()(char[],char[])


Character arrays / strings are not exactly normal. And there's a very good
reason for it: unicode.

In unicode, a character is generally a single code point (there are also
graphemes which involve combining code points to add accents and superscripts
and whatnot to create a single character, but we'll ignore that in this
discussion - it's complicated enough as it is). Depending on the encoding, that
code point may be made up of one - or more - code units. UTF-8 uses 8 bit code
units. UTF-16 uses 16 bit code units. And UTF-32 uses 32-bit code units. char is
a UTF-8 code unit. wchar is a UTF-16 code unit. dchar is a UTF-32 code unit.
UTF-32 is the _only_ one of those three which _always_ has one code unit per
code point.

With an array of integers you can index it and slice it and be sure that
everything that you're doing is valid. If you look at a single element, you know
that it's a valid int. If you slice it, you know that every int in there is
valid. If you're dealing with a dstring or dchar[], then the same still holds.

A dstring or dchar[] is an array of UTF-32 code units. Every code point is a
single code unit, so every element in the array is a valid code point. You can
take an arbitrary element in that array and know that it's a valid code point.
You can slice it wherever you want and you still have a valid dstrin
g or dchar[]. The same does _not_ hold for char[] and wchar[].

char[] and wchar[] are arrays of UTF-8 and UTF-16 code units respectively. In
both of those encodings, multiple code units are required to create a single
code point. So, for instance, a code point could have 4 code units. That means
that _4_ elements of that char[] make up a _single_ code point. You'd need _all_
4 of those elements to create a single, valid character. So, you _can't_ just
take an arbitrary element in a char[] or wchar[] and expect it to be valid. You
_can't_ just slice it anywhere. The resulting array stands a good chance of
being invalid. You have to slice on code point boundaries - otherwise you could
slice characters in hald and end up with an invalid string. So, unlike other
arrays, it just doesn't work to treat char[] and wchar[] as random access ranges
of their element type. What the programmer cares about is characters - dchars -
not chars or wchars.

So, the way this is handled is that char[], wchar[], and dchar[] are all treated
as ranges of dchar. In the case of dchar[], this is nothing special. You can
index it and slice it as normal. So, it is a random access range.. However, in
the case of char[] and wchar[], that means that when you're iterating over them
that you're not dealing with a single element of the array at a time. front
returns a dchar, and popFront() pops off however many elements made up front.
It's like with foreach. If you iterate a char[] with auto or char, then each
individual element is given

foreach(c; myStr) {}

But if you iterate over with dchar, then each code point is given as a dchar:

foreach(dchar c; myStr) {}

If you were to try and iterate over a char[] by char, then you would be looking
at code units rather than code points which is _rarely_ what you want. If you're
dealing with anything other than pure ASCII, you _will_ have bugs if you do
that. You're supposed to use dchar with foreach and character arrays. That way,
each value you process is a valid character. Ranges do the same, only you don't
give them an iteration type, so they're _always_ iterating over dchar.

So, when you're using a range of char[] or wchar[], you're really using a range
of dchar. These ranges are bi-directional. They can't be sliced, and they can't
be indexed (since doing so wou

Re: Ranges

2011-03-13 Thread spir

On 03/13/2011 01:05 AM, Jonathan M Davis wrote:

If you were to try and iterate over a char[] by char, then you would be looking
at code units rather than code points which is _rarely_ what you want. If you're
dealing with anything other than pure ASCII, you _will_ have bugs if you do
that. You're supposed to use dchar with foreach and character arrays. That way,
each value you process is a valid character. Ranges do the same, only you don't
give them an iteration type, so they're _always_ iterating over dchar.


Side-note: you can be sure the source is pure ASCII if, and only if, it is 
mechanically produced. (As soon as an end-user touches it, it may hold 
anything, since OSes and apps offer users means to introduces characters which 
are not on their keyboards).
This can also easily be checked in utf-8 (which has been designed for that): 
all ASCII chars are coded using the same code as in ASCII, thus all codes 
should be < 128.


Denis
--
_
vita es estrany
spir.wikidot.com



Re: Ranges

2011-03-18 Thread Peter Alexander

On 13/03/11 12:05 AM, Jonathan M Davis wrote:

So, when you're using a range of char[] or wchar[], you're really using a range
of dchar. These ranges are bi-directional. They can't be sliced, and they can't
be indexed (since doing so would likely be invalid). This generally works very
well. It's exactly what you want in most cases. The problem is that that means
that the range that you're iterating over is effectively of a different type 
than
the original char[] or wchar[].


This has to be the worst language design decision /ever/.

You can't just mess around with fundamental principles like "the first 
element in an array of T has type T" for the sake of a minor 
convenience. How are we supposed to do generic programming if common 
sense reasoning about types doesn't hold?


This is just std::vector from C++ all over again. Can we not learn 
from mistakes of the past?


Re: Ranges

2011-03-18 Thread Jonathan M Davis
On Friday 18 March 2011 02:29:51 Peter Alexander wrote:
> On 13/03/11 12:05 AM, Jonathan M Davis wrote:
> > So, when you're using a range of char[] or wchar[], you're really using a
> > range of dchar. These ranges are bi-directional. They can't be sliced,
> > and they can't be indexed (since doing so would likely be invalid). This
> > generally works very well. It's exactly what you want in most cases. The
> > problem is that that means that the range that you're iterating over is
> > effectively of a different type than the original char[] or wchar[].
> 
> This has to be the worst language design decision /ever/.
> 
> You can't just mess around with fundamental principles like "the first
> element in an array of T has type T" for the sake of a minor
> convenience. How are we supposed to do generic programming if common
> sense reasoning about types doesn't hold?
> 
> This is just std::vector from C++ all over again. Can we not learn
> from mistakes of the past?

It really isn't a problem for the most part. You just need to understand that 
when using range-based functions, char[] and wchar[] are effectively _not_ 
arrays. They are ranges of dchar. And given the fact that it really wouldn't 
make sense to treat them as arrays in this case anyway (due to the fact that a 
single element is a code unit but _not_ a code point), the current solution 
makes a lot of sense. Generally, you just can't treat char[] and wchar[] as 
arrays when you're dealing with characters/code points rather than code units. 
So, yes it's a bit weird, but it makes a lot of sense given how unicode is 
designed. And it works.

If you really don't want to deal with it, then just use dchar[] and dstring 
everywhere.

- Jonathan M Davis


Re: Ranges

2011-03-18 Thread spir

On 03/18/2011 10:29 AM, Peter Alexander wrote:

On 13/03/11 12:05 AM, Jonathan M Davis wrote:

So, when you're using a range of char[] or wchar[], you're really using a range
of dchar. These ranges are bi-directional. They can't be sliced, and they can't
be indexed (since doing so would likely be invalid). This generally works very
well. It's exactly what you want in most cases. The problem is that that means
that the range that you're iterating over is effectively of a different type
than
the original char[] or wchar[].


This has to be the worst language design decision /ever/.

You can't just mess around with fundamental principles like "the first element
in an array of T has type T" for the sake of a minor convenience. How are we
supposed to do generic programming if common sense reasoning about types
doesn't hold?

This is just std::vector from C++ all over again. Can we not learn from
mistakes of the past?


I partially agree, but. Compare with a simple ascii text: you could iterate 
over it chars (=codes=bytes), words, lines... Or according to specific schemes 
for your app (eg reverse order, every number in it, every word at start of 
line...). A piece of is not only a stream of codes.


The problem is there is no good decision, in the case of char[] or wchar[]. We 
should have to choose a kind of "natural" sense of what it means to iterate 
over a text, but there no such thing. What does it *mean*? What is the natural 
unit of a text?
Bytes or words are code units which mean nothing. Code units (<-> dchars) are 
not guaranteed to mean anything neither (as shown by past discussion: a code 
unit may be the base 'a', the following one be the composite '^', both in "â"). 
Code unit do not represent "characters" in the common sense. So, it is very 
clear that implicitely iterating over dchars is a wrong choice. But what else?
I would rather get rid of wchar and dchar and deal with plain stream of bytes 
supposed to represent utf8. Until we get a good solution to operate at the 
level of "human" characters.


Denis
--
_
vita es estrany
spir.wikidot.com



Re: Ranges

2011-03-18 Thread Jonathan M Davis
On Friday, March 18, 2011 03:32:35 spir wrote:
> On 03/18/2011 10:29 AM, Peter Alexander wrote:
> > On 13/03/11 12:05 AM, Jonathan M Davis wrote:
> >> So, when you're using a range of char[] or wchar[], you're really using
> >> a range of dchar. These ranges are bi-directional. They can't be
> >> sliced, and they can't be indexed (since doing so would likely be
> >> invalid). This generally works very well. It's exactly what you want in
> >> most cases. The problem is that that means that the range that you're
> >> iterating over is effectively of a different type than
> >> the original char[] or wchar[].
> > 
> > This has to be the worst language design decision /ever/.
> > 
> > You can't just mess around with fundamental principles like "the first
> > element in an array of T has type T" for the sake of a minor
> > convenience. How are we supposed to do generic programming if common
> > sense reasoning about types doesn't hold?
> > 
> > This is just std::vector from C++ all over again. Can we not learn
> > from mistakes of the past?
> 
> I partially agree, but. Compare with a simple ascii text: you could iterate
> over it chars (=codes=bytes), words, lines... Or according to specific
> schemes for your app (eg reverse order, every number in it, every word at
> start of line...). A piece of is not only a stream of codes.
> 
> The problem is there is no good decision, in the case of char[] or wchar[].
> We should have to choose a kind of "natural" sense of what it means to
> iterate over a text, but there no such thing. What does it *mean*? What is
> the natural unit of a text?
> Bytes or words are code units which mean nothing. Code units (<-> dchars)
> are not guaranteed to mean anything neither (as shown by past discussion:
> a code unit may be the base 'a', the following one be the composite '^',
> both in "â"). Code unit do not represent "characters" in the common sense.
> So, it is very clear that implicitely iterating over dchars is a wrong
> choice. But what else? I would rather get rid of wchar and dchar and deal
> with plain stream of bytes supposed to represent utf8. Until we get a good
> solution to operate at the level of "human" characters.

Iterating over dchars works in _most_ cases. Iterating over chars only works 
for 
pure ASCII. The additional overhead for dealing with graphemes instead of code 
points is almost certainly prohibitive, it _usually_ isn't necessary, and we 
don't have an actualy grapheme solution yet. So, treating char[] and wchar[] as 
if their elements were valid on their own is _not_ going to work. Treating them 
along with dchar[] as ranges of dchar _mostly_ works. We definitely should have 
a 
way to handle them as ranges of graphemes for those who need to, but the code 
point vs grapheme issue is nowhere near as critical as the code unit vs code 
point issue.

I don't really want to get into the whole unicode discussion again. It has been 
discussed quite a bit on the D list already. There is no perfect solution. The 
current solution _mostly_ works, and, for the most part IMHO, is the correct 
solution. We _do_ need a full-on grapheme handling solution, but a lot of stuff 
doesn't need that and the overhead for dealing with it would be prohibitive. 
The 
main problem with using code points rather than graphemes is the lack of 
normalization, and a _lot_ of string code can get by just fine without that.

So, we have a really good 90% solution and we still need a 100% solution, but 
using the 100% all of the time would almost certainly not be acceptable due to 
performance issues, and doing stuff by code unit instead of code point would be 
_really_ bad. So, what we have is good and will likely stay as is. We just need 
a proper grapheme solution for those who need it.

- Jonathan M Davis


P.S. Unicode is just plain ugly :(


Re: Ranges

2011-03-18 Thread Peter Alexander

On 18/03/11 5:53 PM, Jonathan M Davis wrote:

On Friday, March 18, 2011 03:32:35 spir wrote:

On 03/18/2011 10:29 AM, Peter Alexander wrote:

On 13/03/11 12:05 AM, Jonathan M Davis wrote:

So, when you're using a range of char[] or wchar[], you're really using
a range of dchar. These ranges are bi-directional. They can't be
sliced, and they can't be indexed (since doing so would likely be
invalid). This generally works very well. It's exactly what you want in
most cases. The problem is that that means that the range that you're
iterating over is effectively of a different type than
the original char[] or wchar[].


This has to be the worst language design decision /ever/.

You can't just mess around with fundamental principles like "the first
element in an array of T has type T" for the sake of a minor
convenience. How are we supposed to do generic programming if common
sense reasoning about types doesn't hold?

This is just std::vector  from C++ all over again. Can we not learn
from mistakes of the past?


I partially agree, but. Compare with a simple ascii text: you could iterate
over it chars (=codes=bytes), words, lines... Or according to specific
schemes for your app (eg reverse order, every number in it, every word at
start of line...). A piece of is not only a stream of codes.

The problem is there is no good decision, in the case of char[] or wchar[].
We should have to choose a kind of "natural" sense of what it means to
iterate over a text, but there no such thing. What does it *mean*? What is
the natural unit of a text?
Bytes or words are code units which mean nothing. Code units (<->  dchars)
are not guaranteed to mean anything neither (as shown by past discussion:
a code unit may be the base 'a', the following one be the composite '^',
both in "â"). Code unit do not represent "characters" in the common sense.
So, it is very clear that implicitely iterating over dchars is a wrong
choice. But what else? I would rather get rid of wchar and dchar and deal
with plain stream of bytes supposed to represent utf8. Until we get a good
solution to operate at the level of "human" characters.


Iterating over dchars works in _most_ cases. Iterating over chars only works for
pure ASCII. The additional overhead for dealing with graphemes instead of code
points is almost certainly prohibitive, it _usually_ isn't necessary, and we
don't have an actualy grapheme solution yet. So, treating char[] and wchar[] as
if their elements were valid on their own is _not_ going to work. Treating them
along with dchar[] as ranges of dchar _mostly_ works. We definitely should have 
a
way to handle them as ranges of graphemes for those who need to, but the code
point vs grapheme issue is nowhere near as critical as the code unit vs code
point issue.

I don't really want to get into the whole unicode discussion again. It has been
discussed quite a bit on the D list already. There is no perfect solution. The
current solution _mostly_ works, and, for the most part IMHO, is the correct
solution. We _do_ need a full-on grapheme handling solution, but a lot of stuff
doesn't need that and the overhead for dealing with it would be prohibitive. The
main problem with using code points rather than graphemes is the lack of
normalization, and a _lot_ of string code can get by just fine without that.

So, we have a really good 90% solution and we still need a 100% solution, but
using the 100% all of the time would almost certainly not be acceptable due to
performance issues, and doing stuff by code unit instead of code point would be
_really_ bad. So, what we have is good and will likely stay as is. We just need
a proper grapheme solution for those who need it.

- Jonathan M Davis


P.S. Unicode is just plain ugly :(


I must be missing something, because the solution seems obvious to me:

char[], wchar[], and dchar[] should be simple arrays like int[] with no 
unicode semantics.


string, wstring, and dstring should not be aliases to arrays, but 
instead should be separate types that behave the way char[], wchar[], 
and dchar[] do currently.


Is there any problem with this approach?


Re: Ranges

2011-03-18 Thread Jonathan M Davis
On Friday, March 18, 2011 14:08:48 Peter Alexander wrote:
> On 18/03/11 5:53 PM, Jonathan M Davis wrote:
> > On Friday, March 18, 2011 03:32:35 spir wrote:
> >> On 03/18/2011 10:29 AM, Peter Alexander wrote:
> >>> On 13/03/11 12:05 AM, Jonathan M Davis wrote:
>  So, when you're using a range of char[] or wchar[], you're really
>  using a range of dchar. These ranges are bi-directional. They can't
>  be sliced, and they can't be indexed (since doing so would likely be
>  invalid). This generally works very well. It's exactly what you want
>  in most cases. The problem is that that means that the range that
>  you're iterating over is effectively of a different type than
>  the original char[] or wchar[].
> >>> 
> >>> This has to be the worst language design decision /ever/.
> >>> 
> >>> You can't just mess around with fundamental principles like "the first
> >>> element in an array of T has type T" for the sake of a minor
> >>> convenience. How are we supposed to do generic programming if common
> >>> sense reasoning about types doesn't hold?
> >>> 
> >>> This is just std::vector  from C++ all over again. Can we not
> >>> learn from mistakes of the past?
> >> 
> >> I partially agree, but. Compare with a simple ascii text: you could
> >> iterate over it chars (=codes=bytes), words, lines... Or according to
> >> specific schemes for your app (eg reverse order, every number in it,
> >> every word at start of line...). A piece of is not only a stream of
> >> codes.
> >> 
> >> The problem is there is no good decision, in the case of char[] or
> >> wchar[]. We should have to choose a kind of "natural" sense of what it
> >> means to iterate over a text, but there no such thing. What does it
> >> *mean*? What is the natural unit of a text?
> >> Bytes or words are code units which mean nothing. Code units (<-> 
> >> dchars) are not guaranteed to mean anything neither (as shown by past
> >> discussion: a code unit may be the base 'a', the following one be the
> >> composite '^', both in "â"). Code unit do not represent "characters" in
> >> the common sense. So, it is very clear that implicitely iterating over
> >> dchars is a wrong choice. But what else? I would rather get rid of
> >> wchar and dchar and deal with plain stream of bytes supposed to
> >> represent utf8. Until we get a good solution to operate at the level of
> >> "human" characters.
> > 
> > Iterating over dchars works in _most_ cases. Iterating over chars only
> > works for pure ASCII. The additional overhead for dealing with graphemes
> > instead of code points is almost certainly prohibitive, it _usually_
> > isn't necessary, and we don't have an actualy grapheme solution yet. So,
> > treating char[] and wchar[] as if their elements were valid on their own
> > is _not_ going to work. Treating them along with dchar[] as ranges of
> > dchar _mostly_ works. We definitely should have a way to handle them as
> > ranges of graphemes for those who need to, but the code point vs
> > grapheme issue is nowhere near as critical as the code unit vs code
> > point issue.
> > 
> > I don't really want to get into the whole unicode discussion again. It
> > has been discussed quite a bit on the D list already. There is no
> > perfect solution. The current solution _mostly_ works, and, for the most
> > part IMHO, is the correct solution. We _do_ need a full-on grapheme
> > handling solution, but a lot of stuff doesn't need that and the overhead
> > for dealing with it would be prohibitive. The main problem with using
> > code points rather than graphemes is the lack of normalization, and a
> > _lot_ of string code can get by just fine without that.
> > 
> > So, we have a really good 90% solution and we still need a 100% solution,
> > but using the 100% all of the time would almost certainly not be
> > acceptable due to performance issues, and doing stuff by code unit
> > instead of code point would be _really_ bad. So, what we have is good
> > and will likely stay as is. We just need a proper grapheme solution for
> > those who need it.
> > 
> > - Jonathan M Davis
> > 
> > 
> > P.S. Unicode is just plain ugly :(
> 
> I must be missing something, because the solution seems obvious to me:
> 
> char[], wchar[], and dchar[] should be simple arrays like int[] with no
> unicode semantics.
> 
> string, wstring, and dstring should not be aliases to arrays, but
> instead should be separate types that behave the way char[], wchar[],
> and dchar[] do currently.
> 
> Is there any problem with this approach?

There has been a fair bit of debate about it in the past. No one has been able 
to come up with an alternate solution which is generally considered better than 
what we have.

char is defined to be a UTF-8 code unit. wchar in defined to be a UTF-16 code 
unit. dchar is defined to be a UTF-32 code unit (which is also guaranteed to be 
a 
code point). So, manipulating char[] and wchar[] as arrays of characters 
doesn't 
generally make any sense.

Re: ranges

2010-04-29 Thread Ellery Newcomer


Oh I get it. hasLength is broken.

http://d.puremagic.com/issues/show_bug.cgi?id=3508

Kyle Foley's patch brings my program to a more respectable 1.6 times 
slower or thereabouts.


On 04/29/2010 01:52 PM, Ellery Newcomer wrote:

Hello.

I'm muddling over the following code, which compares an array/take
composition with the analogous imperative code. For medium-large values
of n, I'm seeing a fivefold degradation in performance, which blows up
to 30 times worse at n=5000.

Any ideas on why this is or better ways to accomplish the same?

import std.range;
import std.date;
import std.random;
import std.array;
import std.stdio;

void main(){
for(int n = 500; n <= 5; n *= 10){
writeln(n);
auto r = rndGen();
auto tz = getUTCtime();
auto a = new int[n];
foreach(ref aa; a){
aa = r.front();
r.popFront();
}
auto tz2 = getUTCtime();
auto a2 = array(take(r,n));
auto tz3 = getUTCtime();
writeln("\tarr: ",tz2-tz);
writeln("\trange: ",tz3-tz2);
}
}
~




Re: ranges

2010-04-29 Thread bearophile
Ellery Newcomer:

> I'm muddling over the following code, which compares an array/take 
> composition with the analogous imperative code. For medium-large values 
> of n, I'm seeing a fivefold degradation in performance, which blows up 
> to 30 times worse at n=5000.

Some of the code written by Andrei can be bad. I have modified your code like 
this:

import std.range: front, popFront, array, take;
import std.date: getUTCtime;
import std.random: rndGen;
import std.stdio: writeln;

void main() {
for (int n = 500; n <= 500_000_000; n *= 10) {
writeln(n, ":");
auto rnd = rndGen();

auto t0 = getUTCtime();
auto rnd_arr1 = new int[n];
foreach (ref el; rnd_arr1) {
el = rnd.front();
rnd.popFront();
}

auto t1 = getUTCtime();
auto rnd_arr2 = new int[n];
int i;
foreach (r; take(rnd, n))
rnd_arr2[i++] = r;

auto t2 = getUTCtime();
auto rnd_arr3 = array(take(rnd, n));

auto t3 = getUTCtime();

writeln("  arr:   ", t1 - t0);
writeln("  take:  ", t2 - t1);
writeln("  range: ", t3 - t2);
}
}


It shows that most of the time is spent by array().
After some experiments I have found that the Appender() is very slow. But the 
direct cause of this problem, beside the slowness of Appender() is that the 
hasLength is buggy and doesn't find the length of the take, using the Appender.

If you replace the hasLenght with the equivalent from my dlibs1:

template hasLength(R) {
enum bool hasLength = is(typeof(R.length)) || is(typeof(R.init.length))
  && !isNarrowString!R;
}

The problem essentially vanishes. The root of the problem is that Phobos has 
not enough unittests and it not battle-tested yet.
I will probably write a bug report.

Bye,
bearophile


Re: ranges

2010-04-29 Thread bearophile
Ellery Newcomer:
> http://d.puremagic.com/issues/show_bug.cgi?id=3508
> Kyle Foley's patch brings my program to a more respectable 1.6 times 
> slower or thereabouts.

No need to write another bug report then.

Bye,
bearophile


Re: Ranges

2011-02-07 Thread Lars T. Kyllingstad
On Tue, 08 Feb 2011 05:03:34 +, %u wrote:

> I've learned that an InputRange needs three methods to enumerate a
> collection:
> 
>   void popFront()
>   @property T front()
>   @property bool empty()
> 
> but is that really necessary? Why not just have:
> 
>   bool next(out T value);
> 
> ?
> Wouldn't this be much cleaner? Even the .NET implementation of
> IEnumerator only has two methods used for enumeration (MoveNext() and
> Current), instead of three.

Related discussion:

http://www.digitalmars.com/d/archives/digitalmars/D/getNext_113217.html

-Lars


Re: Ranges

2011-02-08 Thread %u
> Related discussion:
> http://www.digitalmars.com/d/archives/digitalmars/D/getNext_113217.html

Oh sorry; thank you for the link!


Re: Ranges

2022-08-04 Thread Salih Dincer via Digitalmars-d-learn

On Thursday, 4 August 2022 at 13:08:21 UTC, pascal111 wrote:

```D
import std.stdio;
import std.string;

struct Student {
   string name;
   int number;

   string toString() const {
  return format("%s(%s)", name, number);
   }
}

struct School {
   Student[] students;
}
struct StudentRange {
   Student[] students;

   this(School school) {
  this.students = school.students;
   }
   @property bool empty() const {
  return students.length == 0;
   }
   @property ref Student front() {
  return students[0];
   }
   void popFront() {
  students = students[1 .. $];
   }
}

void main() {
   auto school = School([ Student("Raj", 1), 
Student("John", 2), Student("Ram", 3)]);

   auto range = StudentRange(school);
   writeln(range);

   writeln(school.students.length);

   writeln(range.front);

   range.popFront;

   writeln(range.empty);
   writeln(range);
}
```


😀


Re: Ranges

2022-08-04 Thread frame via Digitalmars-d-learn

On Thursday, 4 August 2022 at 13:08:21 UTC, pascal111 wrote:

1) Why the programmer needs to program "empty()", "front()", 
and "popFront()" functions for ranges while they exist in the 
language library? it seems there's no need to exert efforts for 
that. "https://dlang.org/phobos/std_range_primitives.html";


- These functions are wrappers to use something as range
- Ranges need to implement the functions to keep their data 
private, also there are complex types the need to handle data 
differently
- Ranges must implement the functions so other function can 
recognize it as such (eg. `isInputRange`) - there is no common 
interface, it's determined by compile time


2) "front()", and "popFront()" are using fixed constants to 
move forward the range, while they should use variables.


`front()` is always using the first element BUT `popFront()` 
copies all elements except the first one into the variable (and 
overwrites it), so it moves the data forward.


Re: Ranges

2022-08-04 Thread Ali Çehreli via Digitalmars-d-learn

On 8/4/22 06:08, pascal111 wrote:
> In next code from
> "https://www.tutorialspoint.com/d_programming/d_programming_ranges.htm";,

That page seems to be adapted from this original:

  http://ddili.org/ders/d.en/ranges.html

> we have two issues:
>
> 1) Why the programmer needs to program "empty()", "front()", and
> "popFront()" functions for ranges

The programmer almost never needs to implement those functions. Existing 
data structures and algorithms are almost always sufficient. (I did need 
to implement them but really rarely.)


I tried to explain what those functions do. I don't like my Students 
example much because wrapping a D slice does not make much sense. Again, 
I just try to explain them.


> while they exist in the language
> library?

The existing front, popFronh, etc. are only for arrays (slices).

> it seems there's no need to exert efforts for that.

Exactly.

> "https://dlang.org/phobos/std_range_primitives.html";
>
> 2) "front()", and "popFront()" are using fixed constants to move forward
> the range, while they should use variables.

Well, 0 is always the first element and 1..$ are always the rest. 
Variables would not add any value there.


However, the example could use the existing library function that you 
mention:


> @property bool empty() const {
>return students.length == 0;

Better:

 import std.array : empty;
 return students.empty;

> }
> @property ref Student front() {
>return students[0];

Better:

 import std.array : front;
 return students.front;

> }
> void popFront() {
>students = students[1 .. $];

Better:

 import std.array : popFront;
 students.popFront();

But I think those implementations might confuse the reader.

Ali



Re: Ranges

2022-08-04 Thread Ali Çehreli via Digitalmars-d-learn

On 8/4/22 11:05, frame wrote:

> `popFront()`

The function was this:

   void popFront() {
  students = students[1 .. $];
   }

> copies all
> elements except the first one into the variable (and overwrites it), so
> it moves the data forward.

That would be very slow. :) What actually happens is, just the two 
variables that define a slice is adjusted.


Slices consist of two members:

struct __an_int_D_array_behind_the_scenes {
  size_t length;
  int * ptr;
}

So,

  students = students[1..$];

is the same as doing the following:

  students.length = (students.length - 1);
  students.ptr = students.ptr + 1;

(ptr's value would change by 4 bytes because 'int'.)

No element is copied or moved. :)

Ali



Re: Ranges

2022-08-04 Thread pascal111 via Digitalmars-d-learn

On Thursday, 4 August 2022 at 22:14:26 UTC, Ali Çehreli wrote:

On 8/4/22 11:05, frame wrote:

> `popFront()`

The function was this:

   void popFront() {
  students = students[1 .. $];
   }

> copies all
> elements except the first one into the variable (and
overwrites it), so
> it moves the data forward.

That would be very slow. :) What actually happens is, just the 
two variables that define a slice is adjusted.


Slices consist of two members:

struct __an_int_D_array_behind_the_scenes {
  size_t length;
  int * ptr;
}

So,

  students = students[1..$];

is the same as doing the following:

  students.length = (students.length - 1);
  students.ptr = students.ptr + 1;

(ptr's value would change by 4 bytes because 'int'.)

No element is copied or moved. :)

Ali


I didn't notice that all what we needs to pop a range forward is 
just a slice, yes, we don't need variable here.


Re: Ranges

2022-08-04 Thread Salih Dincer via Digitalmars-d-learn

On Thursday, 4 August 2022 at 22:54:42 UTC, pascal111 wrote:


I didn't notice that all what we needs to pop a range forward 
is just a slice, yes, we don't need variable here.


Ranges and Slices are not the same thing. Slicing an array is 
easy. This is a language possibility. For example, you need an 
incrementing variable for the Fibonacci Series.


SDB@79



Re: Ranges

2022-08-05 Thread frame via Digitalmars-d-learn

On Thursday, 4 August 2022 at 22:14:26 UTC, Ali Çehreli wrote:


No element is copied or moved. :)

Ali


I know that :) I just found that this user has problems to 
understand basics in D, so I tried not to go in detail and keep 
at its kind of logical layer. It seems the better way to help 
until the user asks specific questions.


Re: Ranges

2022-08-05 Thread Ali Çehreli via Digitalmars-d-learn

On 8/5/22 01:59, frame wrote:
> On Thursday, 4 August 2022 at 22:14:26 UTC, Ali Çehreli wrote:
>
>> No element is copied or moved. :)
>>
>> Ali
>
> I know that :)

And I know that. :) We don't know who else is reading these threads, so 
I didn't want to give wrong impression.


Copying would happen if we added slicing on the left-hand side. However, 
I realized that the following fails with a RangeError:


void main() {
  auto arr = [1, 2, 3];
  arr[0..$-1] = arr[1..$];// <-- Runtime error
}

I suspect the length of the array is stamped too soon. (?)

Should that operation be supported?

Ali



Re: Ranges

2022-08-05 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Aug 05, 2022 at 08:06:00AM -0700, Ali Çehreli via Digitalmars-d-learn 
wrote:
> [...] I realized that the following fails with a RangeError:
> 
> void main() {
>   auto arr = [1, 2, 3];
>   arr[0..$-1] = arr[1..$];// <-- Runtime error
> }
> 
> I suspect the length of the array is stamped too soon. (?)
> 
> Should that operation be supported?
[...]

This is why in C there's a difference between memcpy and memmove.  I
don't know how to express the equivalent in D, though. In general, you
can't tell until runtime whether two slices overlap (`arr` could be
aliased by another slice, for example, so you can't just tell by whether
you're copying an overlapping range from the same variable).

But if you know beforehand the ranges being copied are overlapping, you
could use std.algorithm.bringToFront which would do the Right Thing(tm)
in this case.


T

-- 
Why are you blatanly misspelling "blatant"? -- Branden Robinson


Re: Ranges

2022-08-06 Thread pascal111 via Digitalmars-d-learn

On Friday, 5 August 2022 at 04:05:08 UTC, Salih Dincer wrote:

On Thursday, 4 August 2022 at 22:54:42 UTC, pascal111 wrote:


I didn't notice that all what we needs to pop a range forward 
is just a slice, yes, we don't need variable here.


Ranges and Slices are not the same thing. Slicing an array is 
easy. This is a language possibility. For example, you need an 
incrementing variable for the Fibonacci Series.


SDB@79


What!!! so where's ranges?! I thought slices of any array are 
ranges, and understood it like that, and also there's no data 
type called ranges, it's like if you are talking about Ghostly 
data type!


Re: Ranges

2022-08-06 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Aug 06, 2022 at 03:37:32PM +, pascal111 via Digitalmars-d-learn 
wrote:
> On Friday, 5 August 2022 at 04:05:08 UTC, Salih Dincer wrote:
> > On Thursday, 4 August 2022 at 22:54:42 UTC, pascal111 wrote:
> > > 
> > > I didn't notice that all what we needs to pop a range forward is
> > > just a slice, yes, we don't need variable here.
> > 
> > Ranges and Slices are not the same thing. Slicing an array is easy.
> > This is a language possibility. For example, you need an
> > incrementing variable for the Fibonacci Series.
> > 
> > SDB@79
> 
> What!!! so where's ranges?! I thought slices of any array are ranges,
> and understood it like that, and also there's no data type called
> ranges, it's like if you are talking about Ghostly data type!

A range is any type that supports the Range API defined in std.range
(i.e., .empty, .front, .popFront). For more explanations, read:

http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1
http://ddili.org/ders/d.en/ranges.html
http://dconf.org/2015/talks/davis.html
http://tour.dlang.org/tour/en/basics/ranges
http://wiki.dlang.org/Component_programming_with_ranges


T

-- 
No! I'm not in denial!


Re: Ranges

2022-08-06 Thread Salih Dincer via Digitalmars-d-learn

On Saturday, 6 August 2022 at 15:37:32 UTC, pascal111 wrote:

On Friday, 5 August 2022 at 04:05:08 UTC, Salih Dincer wrote:

On Thursday, 4 August 2022 at 22:54:42 UTC, pascal111 wrote:


I didn't notice that all what we needs to pop a range forward 
is just a slice, yes, we don't need variable here.


Ranges and Slices are not the same thing. Slicing an array is 
easy. This is a language possibility. For example, you need an 
incrementing variable for the Fibonacci Series.


SDB@79


What!!! so where's ranges?! I thought slices of any array are 
ranges, and understood it like that, and also there's no data 
type called ranges, it's like if you are talking about Ghostly 
data type!


Well, that's very normal. Because as you work with ranges, you 
will understand better. Indeed, the slices (we can call it a 
dynamic array) feel like slice, don't they?


SDB@79


Re: Ranges

2022-08-06 Thread Salih Dincer via Digitalmars-d-learn

On Saturday, 6 August 2022 at 16:30:55 UTC, Salih Dincer wrote:
Indeed, the slices (we can call it a dynamic array) feel like 
slice, don't they?


Edit: Indeed, the slices feel like ranges, don't they?

Sorry...

SDB@79




Re: Ranges

2022-08-06 Thread Ali Çehreli via Digitalmars-d-learn

On 8/6/22 09:33, Salih Dincer wrote:

> the slices feel like ranges, don't they?

Yes because they are ranges. :) (Maybe you meant they don't have range 
member functions, which is true.)


D's slices happen to be the most capable range kind: RandonAccessRange. 
All of the following operations are supported on them as long as one 
imports std.array (or std.range, which publicly does so):


- empty
- front
- popFront
- save
- back
- popBack
- indexed element access

Slices have the optional length property as well (i.e. hasLength).

Those operations are not supported by member functions but by 
free-standing functions.


Ali



Re: Ranges

2022-08-06 Thread pascal111 via Digitalmars-d-learn

On Saturday, 6 August 2022 at 15:54:57 UTC, H. S. Teoh wrote:
On Sat, Aug 06, 2022 at 03:37:32PM +, pascal111 via 
Digitalmars-d-learn wrote:

On Friday, 5 August 2022 at 04:05:08 UTC, Salih Dincer wrote:
> On Thursday, 4 August 2022 at 22:54:42 UTC, pascal111 wrote:
> > [...]
> 
> Ranges and Slices are not the same thing. Slicing an array 
> is easy. This is a language possibility. For example, you 
> need an incrementing variable for the Fibonacci Series.
> 
> SDB@79


What!!! so where's ranges?! I thought slices of any array are 
ranges, and understood it like that, and also there's no data 
type called ranges, it's like if you are talking about Ghostly 
data type!


A range is any type that supports the Range API defined in 
std.range (i.e., .empty, .front, .popFront). For more 
explanations, read:


http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1
http://ddili.org/ders/d.en/ranges.html
http://dconf.org/2015/talks/davis.html
http://tour.dlang.org/tour/en/basics/ranges
http://wiki.dlang.org/Component_programming_with_ranges


T


You know, the problem is that ranges in D lack the usage of 
pointers as an essential tool to make all of ranges functions 
they need. If ranges exist in C, they would use pointers, and 
this is a powerful point in the account of C.


Re: Ranges

2022-08-06 Thread Ali Çehreli via Digitalmars-d-learn

On 8/6/22 14:10, pascal111 wrote:

> the problem is that ranges in D lack the usage of pointers as
> an essential tool to make all of ranges functions they need. If ranges
> exist in C, they would use pointers, and this is

There are a few cases where pointers provide functionality that ranges 
cannot:


1) Some algorithms don't make much sense with ranges. For example, most 
of the time find() can return just the element that we seek. In D, 
find() returns a range so that we can chain it with other algorithms.


2) Some algorithms like partition() better use three pointers.

Other than that, ranges are superior to pointers in every aspect. (I 
resent the fact that some C++ "experts" used those two points to decide 
ranges are inferior and helped deprive the C++ community of ranges for a 
very long time. The same "experts" did the same with 'static if'.)


> a powerful point in the account of C.

I missed how you made that connection.

Ali



Re: Ranges

2022-08-06 Thread Salih Dincer via Digitalmars-d-learn

On Saturday, 6 August 2022 at 17:29:30 UTC, Ali Çehreli wrote:

On 8/6/22 09:33, Salih Dincer wrote:

> the slices feel like ranges, don't they?

Yes because they are ranges. :) (Maybe you meant they don't 
have range member functions, which is true.)


Slices use pointers.  Do I need to tell you what the pointers do! 
 Each of them points to a data.  Ranges are not like that, all 
they do is generate.  Ok, you use a slice just as if it were a 
range.  But they are not ranges.


SDB@79


Re: Ranges

2022-08-07 Thread pascal111 via Digitalmars-d-learn

On Sunday, 7 August 2022 at 05:12:38 UTC, Ali Çehreli wrote:

On 8/6/22 14:10, pascal111 wrote:




> a powerful point in the account of C.

I missed how you made that connection.

Everyone knows that slices are not pointers that pointers are 
real work, but slices are like a simple un-deep technique that is 
appropriate for beginners, but after that in advanced level in 
programming, we should use pointers to do same tasks we were 
doing with slices (the easy way of beginners).





Re: Ranges

2022-08-07 Thread Salih Dincer via Digitalmars-d-learn

On Sunday, 7 August 2022 at 15:34:19 UTC, pascal111 wrote:
Everyone knows that slices are not pointers that pointers are 
real work, but slices are like a simple un-deep technique that 
is appropriate for beginners, but after that in advanced level 
in programming, we should use pointers to do same tasks we were 
doing with slices (the easy way of beginners).


The following information about slices may be helpful:

Slices are objects from type T[] for any given type T. Slices 
provide a view on a subset of an array of T values - or just 
point to the whole array. Slices and dynamic arrays are the 
same.


A slice consists of two members - a pointer to the starting 
element and the length of the slice:

```d
T* ptr;
size_t length; // unsigned 32 bit on 32bit, unsigned 64 bit on 
64bit

```
[...]

**Source:** https://tour.dlang.org/tour/en/basics/slices

SDB@79


Re: Ranges

2022-08-07 Thread ag0aep6g via Digitalmars-d-learn

On Sunday, 7 August 2022 at 15:34:19 UTC, pascal111 wrote:
Everyone knows that slices are not pointers that pointers are 
real work, but slices are like a simple un-deep technique that 
is appropriate for beginners, but after that in advanced level 
in programming, we should use pointers to do same tasks we were 
doing with slices (the easy way of beginners).


I can't tell if this is a joke or not.


Re: Ranges

2022-08-07 Thread Emanuele Torre via Digitalmars-d-learn

On Saturday, 6 August 2022 at 15:37:32 UTC, pascal111 wrote:

On Friday, 5 August 2022 at 04:05:08 UTC, Salih Dincer wrote:

On Thursday, 4 August 2022 at 22:54:42 UTC, pascal111 wrote:


I didn't notice that all what we needs to pop a range forward 
is just a slice, yes, we don't need variable here.


Ranges and Slices are not the same thing. Slicing an array is 
easy. This is a language possibility. For example, you need an 
incrementing variable for the Fibonacci Series.


SDB@79


What!!! so where's ranges?! I thought slices of any array are 
ranges, and understood it like that, and also there's no data 
type called ranges, it's like if you are talking about Ghostly 
data type!


A range is like an iterator in any other language (Java, C++, 
python3, javascript, etc), it is how D implements (lazy) 
generators https://en.wikipedia.org/wiki/Lazy_evaluation .


Ranges/Iterators don't necessarily have to be backed by memory, 
they just have to implement the interface. In D, a `empty` bool 
function that tells you whether you are at the end of the range 
or not; a `front` function to get the current value if the range 
is not `empty`; and a void function named `popFront` to advance 
to the next value if the range is not `empty`.


Once you have implemented this interface, you can use your 
"range" object with any function that accept a range; with 
`foreach`; etc.


Example of a range that is not backed by memory is a range with 
all the integer numbers.


```D
struct Integers {
  private int z = 0;

  /* or make it a bool attribute that starts as false, and you 
set to

   * true when popFront is called while z is equal to int.min */
  public bool empty() { return false; }

  public int front() { return this.z; }

  public void popFront()
  {
/* if (this.z == int.min) { this.empty = false; return; } */
this.z *= -1;
if (this.z <= 0)
  --this.z;
  }
}

void main()
{
  import std.stdio : writeln;

  /* foreach is syntax sugar for
   *   for (auto r = Integers(); !r.empty(); r.popFront()) {
   * auto z = r.front(); /+ or  const z = r.front();  or ... 
+/

   * ...
   *   }
   * that is why it only works with ranges.
   */
  foreach (const z; Integers()) {
writeln(z);
if (z == 5)
  break;
  }
}
```

output:
```
0
-1
1
-2
2
-3
3
-4
4
-5
5
```

This will iterate all the integers, and the integers are of 
course, not
all in memory, and don't remain in memory after they are used, 
since
that would require infinite memory. (in the case of a range of 
integers,
not infinite, because they are constrained by being int.sizeof 
bytes,
but you could use a bignum implemenation that is not constrained 
by

that and they would actually be infinite.)

---

The equivalent in Java is the Iterable/Iterator interface.

```java
import java.util.Iterator;

public class Integers
  implements Iterable
{
  public class IntegersIterator
implements Iterator
  {
private int z = 0;
private boolean first = true;

public IntegersIterator(Integer z)
{
  this.z = z;
}

@Override
public boolean hasNext() { return true; }

@Override
public Integer next()
{
  if (this.first) {
this.first = false;
return this.z;
  }

  this.z *= -1;
  if (this.z <= 0)
--this.z;
  return this.z;
}
  }

  @Override
  public IntegersIterator iterator() { return new 
IntegersIterator(0); }


  public static void main(String[] args)
  {
/* syntax sugar for
 *   {
 * final var it = newIntegers.iterator();
 * while (it.hasNext()) {
 *   final int z = it.next();
 *   ...
 * }
 *   }
 */
for (final int z : new Integers()) {
  System.out.println(z);
  if (z == 5)
break;
}
  }
}
```

The equivalent in python is a generator function:
```python
def integers():
  z = 0
  yield z
  while True:
z *= -1
if z <= 0:
  z -= 1
yield z

for z in integers():
  print(z)
  if z == 5:
break
```

etc



Re: Ranges

2022-08-07 Thread pascal111 via Digitalmars-d-learn

On Sunday, 7 August 2022 at 19:53:06 UTC, ag0aep6g wrote:

On Sunday, 7 August 2022 at 15:34:19 UTC, pascal111 wrote:
Everyone knows that slices are not pointers that pointers are 
real work, but slices are like a simple un-deep technique that 
is appropriate for beginners, but after that in advanced level 
in programming, we should use pointers to do same tasks we 
were doing with slices (the easy way of beginners).


I can't tell if this is a joke or not.


It's just an opinion.


Re: Ranges

2022-08-07 Thread Ali Çehreli via Digitalmars-d-learn

On 8/7/22 08:34, pascal111 wrote:

> Everyone knows that slices are not pointers

D's slices are "fat pointers": In D's case, that translates to a pointer 
plus length.


> that pointers are real work,

Agreed. Pointers are fundamental features of CPUs.

> but slices are like a simple un-deep technique that is appropriate for
> beginners,

That is not correct. Slices are designed by a C expert to prevent 
horrible bugs caused by C experts. Most C experts use slices very happily.


> but after that in advanced level in programming, we should
> use pointers to do same tasks we were doing with slices (the easy way of
> beginners).

That is an old thought. Today, we see that no matter how experienced, 
every person needs and appreciates help to prevent bugs. There are many 
cases of bugs killing people, jeopardizing expensive projects, loss of 
personal information, etc.


Ali



Re: Ranges

2022-08-07 Thread Ali Çehreli via Digitalmars-d-learn

On 8/6/22 22:58, Salih Dincer wrote:

> Ranges are not like that, all they do is
> generate.

You may be right. I've never seen it that way.

I've been under the following impression:

- C++'s iterators are based on an existing concept: pointers. Pointers 
are iterators.


- D's ranges are based on an existing concept: slices. Slices are ranges.

However, I can't find where I read that.

Ali



Re: Ranges

2022-08-07 Thread pascal111 via Digitalmars-d-learn

On Sunday, 7 August 2022 at 21:57:50 UTC, Ali Çehreli wrote:

On 8/7/22 08:34, pascal111 wrote:

> but after that in advanced level in programming, we should
> use pointers to do same tasks we were doing with slices (the
easy way of
> beginners).

That is an old thought. Today, we see that no matter how 
experienced, every person needs and appreciates help to prevent 
bugs. There are many cases of bugs killing people, jeopardizing 
expensive projects, loss of personal information, etc.


Ali


I think you are right that this is an old thought, I didn't 
noticed that, maybe it's because I didn't study C++ and know only 
about C, so I applied C features on D.


Re: Ranges tutorial

2011-07-16 Thread Johann MacDonagh

On 7/16/2011 3:25 PM, Willy Martinez wrote:

I was wondering if there's a tutorial available on how to write simple ranges.
Something like what I'm trying to do: Skip whitespace from text read from a 
file.

It's difficult to search for "d ranges tutorial" in Google.

Thanks


I found the best place to learn about ranges was simply the std.range 
documentation http://www.d-programming-language.org/phobos/std_range.html


For example, you can see what an input range is by looking at the 
isInputRange template.


However, for what you want, you can use std.algorithm.filter


Re: Ranges tutorial

2011-07-16 Thread Willy Martinez
== Quote from Johann MacDonagh (johann.macdonagh...@spam.gmail.com)'s article
> However, for what you want, you can use std.algorithm.filter

OK. Followed your advice and this is what I've got so far:

import std.algorithm;
import std.file;
import std.stdio;

void main(string[] args) {
auto needle = boyerMooreFinder(args[1]);
foreach (string name; dirEntries(".", SpanMode.shallow)) {
if (name[$-3 .. $] == "txt") {
writeln(name);
string text = readText(name);
auto haystack = filter!("a >= '0' && a <= '9'")(text);
auto result = find(haystack, needle);
writeln(result);
}
}
}

Passing the haystack filter to find() produces the following error:

..\..\src\phobos\std\algorithm.d(2912): Error: function std.
algorithm.BoyerMooreFinder!(result,string).BoyerMooreFinder.beFound (string
haystack) is not callable using argument types (Filter!(result,string))
..\..\src\phobos\std\algorithm.d(2912): Error: cannot implicitly convert
expression (haystack) of type Filter!(result,string) to string
..\..\src\phobos\std\algorithm.d(2912): Error: cannot implicitly convert
expression (needle.beFound((__error))) of type string to Filter!(result,string)
search_seq.d(12): Error: template instance
std.algorithm.find!(Filter!(result,string),result,string) error instantiating

What could be the problem?

Thanks


Re: Ranges tutorial

2011-07-16 Thread Johann MacDonagh

On 7/16/2011 4:13 PM, Willy Martinez wrote:

== Quote from Johann MacDonagh (johann.macdonagh...@spam.gmail.com)'s article

However, for what you want, you can use std.algorithm.filter


OK. Followed your advice and this is what I've got so far:

import std.algorithm;
import std.file;
import std.stdio;

void main(string[] args) {
auto needle = boyerMooreFinder(args[1]);
foreach (string name; dirEntries(".", SpanMode.shallow)) {
if (name[$-3 .. $] == "txt") {
writeln(name);
string text = readText(name);
auto haystack = filter!("a>= '0'&&  a<= '9'")(text);
auto result = find(haystack, needle);
writeln(result);
}
}
}

Passing the haystack filter to find() produces the following error:

..\..\src\phobos\std\algorithm.d(2912): Error: function std.
algorithm.BoyerMooreFinder!(result,string).BoyerMooreFinder.beFound (string
haystack) is not callable using argument types (Filter!(result,string))
..\..\src\phobos\std\algorithm.d(2912): Error: cannot implicitly convert
expression (haystack) of type Filter!(result,string) to string
..\..\src\phobos\std\algorithm.d(2912): Error: cannot implicitly convert
expression (needle.beFound((__error))) of type string to Filter!(result,string)
search_seq.d(12): Error: template instance
std.algorithm.find!(Filter!(result,string),result,string) error instantiating

What could be the problem?

Thanks


Oh, I didn't see you wanted to do a Boyer-Moore search in your original 
post. Dmitry basically explained what's wrong.


filter is lazy. When you do auto x = filter!... it doesn't filter right 
away. It filters as you iterate over the range (this makes it much more 
efficient). This means you can't find the xth element of a filtered 
range (well you could, but it's not in O(1) time, which is why a 
filtered range doesn't expose indexing operators).


You'll want to use array() over the filtered range that filter returns. 
This will iterate over the entire filtered range and pull out each 
element into an array. An array can be randomly accessed, which 
Boyer-Moore needs. See Dmitry's post in your other topic.


Re: Ranges help

2011-10-12 Thread Dmitry Olshansky

On 12.10.2011 22:23, Xinok wrote:

This is in relation to my sorting algorithm. This is what I need to
accomplish with ranges in the most efficient way possible:

1. Merge sort - This involves copying elements to a temporary buffer,
which can simply be an array, then merging the two lists together. The
important thing is that it may merge left to right, or right to left,
which requires a bidirectional range.

c[] = a[0..$/2];
foreach(a; arr) if(!b.empty && !c.empty) if(b.front <= c.front){
a = b.front; b.popFront();
} else{
a = c.front; c.popFront();
}

How about:

if(b.empty)
copy(c, a);
else if(c.empty)
copy(b, a);
foreach(a; arr)
 if(b.front <= c.front){
   a = b.front;
   b.popFront();
   if(b.empty){
copy(c, a);
break;
}
  }
  else{
  a = c.front; c.popFront();
  if(c.empty){
copy(b, a);
break;
  }
 }

no need to check c if it hasn't changed from the last time, same about b.


2. Range swap - First, I need to do a binary search, which requires a
random access range. Then I need to swap two ranges of elements.

while(!a.empty && !b.empty){
swap(a.front, b.front);
a.popFront(); b.popFront();
}



If your ranges have equal lengths (or you assume it)
you can skip one of !a.empty or !b.empty in while clause.

Otherwise :
for(;;){
swap(a.front, b.front);
a.popFront();
if(a.empty)
break;
b.popFront();
if(b.empty)
break;
}
might save you a couple of ops in case a is shorter then b, and with 
sorting every bit counts isn't it?




That's the best I can come up with. I'm wondering if there's a more
efficient way to accomplish what I have above.

I also need to figure out the template constraints. Would this be
correct? Or would this be too much?

isRandomAccessRange && !isFiniteRange && isBidirectionalRange && hasSlicing
isRandomAccessRange should be enough. Also why !isFinite how would one 
sort infinite range? hasSlicing is needed though. So my take on this 
would be:

isRandomAccessRange && hasSlicing

--
Dmitry Olshansky


Re: Ranges help

2011-10-12 Thread Xinok

On 10/12/2011 4:04 PM, Dmitry Olshansky wrote:

On 12.10.2011 22:23, Xinok wrote:

I also need to figure out the template constraints. Would this be
correct? Or would this be too much?

isRandomAccessRange && !isFiniteRange && isBidirectionalRange &&
hasSlicing

isRandomAccessRange should be enough. Also why !isFinite how would one
sort infinite range? hasSlicing is needed though. So my take on this
would be:
isRandomAccessRange && hasSlicing



Sorry, typo. That should be !isInfiniteRange. But I can drop 
!isInfinteRange anyways, so:


isRandomAccessRange && isBidirectionalRange && hasSlicing

isRandomAccessRange can be a bidirectional range or an infinite forward 
range, so isBidirectionalRange is still required.


Re: Ranges help

2011-10-14 Thread Christophe
Xinok , dans le message (digitalmars.D.learn:30054), a écrit :
> This is in relation to my sorting algorithm. This is what I need to 
> accomplish with ranges in the most efficient way possible:
> 
> 1. Merge sort - This involves copying elements to a temporary buffer, 
> which can simply be an array, then merging the two lists together. The 
> important thing is that it may merge left to right, or right to left, 
> which requires a bidirectional range.
> 
> c[] = a[0..$/2];
> foreach(a; arr) if(!b.empty && !c.empty) if(b.front <= c.front){
>   a = b.front; b.popFront();
> } else{
>   a = c.front; c.popFront();
> }
> 
> 2. Range swap - First, I need to do a binary search, which requires a 
> random access range. Then I need to swap two ranges of elements.
> 
> while(!a.empty && !b.empty){
>   swap(a.front, b.front);
>   a.popFront(); b.popFront();
> }
> 
> 
> That's the best I can come up with. I'm wondering if there's a more 
> efficient way to accomplish what I have above.
> 
> I also need to figure out the template constraints. Would this be 
> correct? Or would this be too much?
> 
> isRandomAccessRange && !isFiniteRange && isBidirectionalRange && hasSlicing


You should look at:
std.algorithm.SetUnion
std.algorithm.swapRanges

-- 
Christophe


Re: Ranges help

2011-10-14 Thread Xinok
Thanks. I'll run a benchmark with swapRanges, see how it compares to my own 
code. But it would be better if I coded the merge function myself, since I can 
do it in-place using very little memory.


Re: Ranges suck!

2017-09-14 Thread Brad Anderson via Digitalmars-d-learn

On Thursday, 14 September 2017 at 23:53:20 UTC, Your name wrote:
Every time I go to use something like strip it bitches and 
gives me errors. Why can't I simply do somestring.strip("\n")???


import std.string would be the likely strip yet it takes a 
range and somestring, for some retarded reason, isn't a range. 
strip isn't the only function that does this. Who ever 
implemented ranges the way they did needs to get their head 
checked!


It's not really a range issue. It's that there are two strips. 
One in std.string and one in std.algorithm. The latter which lets 
you define what to strip rather than just whitespace is what you 
are looking for and works as you've written. The former is there 
for legacy reasons and we can hopefully get rid of it in the 
future to avoid this confusion.


I'd also say that you don't seem to be grasping a pretty 
fundamental D concept yet. std.string.strip doesn't take two 
arguments, it takes one argument. The first set of parentheses is 
the template argument which is inferred from the regular argument 
using IFTI.



[snip]


Ok, so I know what your saying "Oh, but strip("\n") should be 
strip()! Your a moron RTFM!" But why then force me to strip for 
nothing? Why not pay me? e.g., let me strip for something like 
strip("x")?


strip()! isn't valid syntax. If you want to strip all whitespace 
you can use std.string.strip (e.g., somestring.strip()). If you 
want to strip "x" you can use std.algorithm.strip (e.g., 
somestring.strip("x")). Pretty much like any other language minus 
the double function mess.


Oh, chomp? Is that the function I'm suppose to use? Seriously? 
Was the D lib written by someone with a pacman fetish?


chomp comes to D by way of perl. I don't know whether or not 
Larry Wall is into pacman or not.


Re: Ranges suck!

2017-09-14 Thread Ali Çehreli via Digitalmars-d-learn

On 09/14/2017 04:53 PM, Your name wrote:

> Why can't I simply do somestring.strip("\n")???

Actually, you can but that's a different one: std.algorithm.strip and it 
takes the element type, so you should provide a char:


somestring = somestring.strip('\n');

(Note: I lied about element type because it's actually dchar but let's 
not go there :) )


> import std.string would be the likely strip yet it takes a range and
> somestring, for some retarded reason, isn't a range.

It is a range but the error messages cannot be clearer unless the 
compiler is modified.


> strip isn't the
> only function that does this. Who ever implemented ranges the way they
> did needs to get their head checked!

I think this issue is more about templates. You see this more with 
ranges because they are all templates.


> How bout you make these functions work on strings automatically so it
> works without moronic errors. Every sane programming(not including D)
> does not try to toy with the programmer and make them figure out
> retarded rules that make no sense logically.

I'm sure there are strange usability issues in other programming 
languages as well. :)


> strip is meant to be used
> on strings in most cases so having direct and easy access to it should
> override trying to force ranges down my throat. I'm not a porn star. No
> other language plays these guys, why does D do it?

I think some of the reason is historical. Any organically grown thing 
necessarily has inconsistencies and compromises that are developed over 
time as new features are added, removed, or moved around. For example, 
many functions from the std.string module were moved to std.algorithm as 
those made sense for other ranges as well.


> Ok, so I know what your saying "Oh, but strip("\n") should be strip()!
> Your a moron RTFM!" But why then force me to strip for nothing? Why not
> pay me? e.g., let me strip for something like strip("x")?

Now you're teasing. ;)

> Oh, chomp? Is that the function I'm suppose to use? Seriously? Was the D
> lib written by someone with a pacman fetish?

Agreed overall but everything has reasons that made sense at some point 
to some people. Only languages like Go that had decades of time 
borrowing ideas from other languages and learning from their mistakes 
have a chance of avoiding this but even they have issues and limitations.


Ali



Re: Ranges suck!

2017-09-15 Thread bitwise via Digitalmars-d-learn

On Thursday, 14 September 2017 at 23:53:20 UTC, Your name wrote:

[...]



I understand your frustration. The fact that "inout" is actually 
a keyword makes it hard not to think that some very strange 
fetishes were at play during the creation of this language.


As a whole though, the language is very usable, and has many 
great features not present in similar languages.


Just this morning, I was able to replace a fairly large and ugly 
pile of code with this:


import std.utf;
foreach(c; myString.byUTF!dchar) {
// ...
}



Re: Ranges require GC?

2013-12-10 Thread Adam D. Ruppe

On Tuesday, 10 December 2013 at 18:54:54 UTC, Frustrated wrote:

I assume that ranges require the GC, is this true?


No, in fact, most ranges don't allocate at all.


Re: Ranges require GC?

2013-12-10 Thread Marco Leise
Am Tue, 10 Dec 2013 19:55:20 +0100
schrieb "Adam D. Ruppe" :

> On Tuesday, 10 December 2013 at 18:54:54 UTC, Frustrated wrote:
> > I assume that ranges require the GC, is this true?
> 
> No, in fact, most ranges don't allocate at all.

"range" is just a concept and not a concrete type.
Functions that work on ranges identify them by specific calls
that can be made on them. (e.g. front, popFront(), length,
etc.). This is commonly known as duck-typing. And allows much
anything to be a range: Classes, structs, built-in arrays.

D objects and dynamic arrays are typically GC managed. Most
ranges returned from Phobos are implemented as structs though
and don't need a GC. If you write something like:

  [1,2,3].map!(a => a+1)()

then you are using a dynamic array ([1,2,3]) that will live in
the GC heap. "map" will then just return a range (implemented
as a struct). One has to understand that no memory will be
allocated to hold the result of the map process. In fact where
possible, Phobos returns lazily evaluated ranges that only
calculate the next item when you ask for it (using .front
and .popFront() on it).

-- 
Marco



Re: Ranges require GC?

2013-12-10 Thread Frustrated

On Tuesday, 10 December 2013 at 21:20:59 UTC, Marco Leise wrote:

Am Tue, 10 Dec 2013 19:55:20 +0100
schrieb "Adam D. Ruppe" :


On Tuesday, 10 December 2013 at 18:54:54 UTC, Frustrated wrote:
> I assume that ranges require the GC, is this true?

No, in fact, most ranges don't allocate at all.


"range" is just a concept and not a concrete type.
Functions that work on ranges identify them by specific calls
that can be made on them. (e.g. front, popFront(), length,
etc.). This is commonly known as duck-typing. And allows much
anything to be a range: Classes, structs, built-in arrays.

D objects and dynamic arrays are typically GC managed. Most
ranges returned from Phobos are implemented as structs though
and don't need a GC. If you write something like:

  [1,2,3].map!(a => a+1)()

then you are using a dynamic array ([1,2,3]) that will live in
the GC heap. "map" will then just return a range (implemented
as a struct). One has to understand that no memory will be
allocated to hold the result of the map process. In fact where
possible, Phobos returns lazily evaluated ranges that only
calculate the next item when you ask for it (using .front
and .popFront() on it).


But surely memory gets allocated in some way?

In Programming in D:

"For example filter(), which
chooses elements that are greater than 10 in the following code, 
actually returns a range

object, not an array:"

But if filter is a range and returns an object then how is that 
object allocated?





Re: Ranges require GC?

2013-12-10 Thread Jonathan M Davis
On Wednesday, December 11, 2013 03:09:52 Frustrated wrote:
> But surely memory gets allocated in some way?
> 
> In Programming in D:
> 
> "For example filter(), which
> chooses elements that are greater than 10 in the following code,
> actually returns a range
> object, not an array:"
> 
> But if filter is a range and returns an object then how is that
> object allocated?

It's on the stack. filter returns a struct whose only member is the range that 
was passed to it. The lambda that you pass to filter might be allocated on the 
heap under some circumstances, but that's the only risk of heap allocation 
with filter, and if you use a function rather than a lambda literal or a 
delegate, then you definitely don't get any heap allocation (and I don't think 
that it's even the case that you necessarily get a heap allocation with a 
lambda).

Most ranges in Phobos are very lightweight, because they just wrap another 
range and maybe add a member variable or two to handle what they do (and many 
don't even need that). And if the optimizer does a decent job, then most of 
the wrappers even get optimized away, causing very little overhead at all 
(though I'm sure that dmd can be improved in that regard).

Whether any particular range allocates anything on the heap depends on the 
range, but it's likely to be very rare that they will - particularly in 
Phobos.

- Jonathan M Davis


Re: Ranges require GC?

2013-12-10 Thread Frustrated
On Wednesday, 11 December 2013 at 02:37:32 UTC, Jonathan M Davis 
wrote:

On Wednesday, December 11, 2013 03:09:52 Frustrated wrote:

But surely memory gets allocated in some way?

In Programming in D:

"For example filter(), which
chooses elements that are greater than 10 in the following 
code,

actually returns a range
object, not an array:"

But if filter is a range and returns an object then how is that
object allocated?


It's on the stack. filter returns a struct whose only member is 
the range that
was passed to it. The lambda that you pass to filter might be 
allocated on the
heap under some circumstances, but that's the only risk of heap 
allocation
with filter, and if you use a function rather than a lambda 
literal or a
delegate, then you definitely don't get any heap allocation 
(and I don't think
that it's even the case that you necessarily get a heap 
allocation with a

lambda).

Most ranges in Phobos are very lightweight, because they just 
wrap another
range and maybe add a member variable or two to handle what 
they do (and many
don't even need that). And if the optimizer does a decent job, 
then most of
the wrappers even get optimized away, causing very little 
overhead at all

(though I'm sure that dmd can be improved in that regard).

Whether any particular range allocates anything on the heap 
depends on the
range, but it's likely to be very rare that they will - 
particularly in

Phobos.

- Jonathan M Davis


I'm trying to avoid the GC so knowing whether or not ranges will 
use the heap is extremely important. I guess that is where the 
@nogc attribute that everyone talks about would come in very 
handy.


Re: Ranges require GC?

2013-12-11 Thread Marco Leise
Am Wed, 11 Dec 2013 08:20:19 +0100
schrieb "Frustrated" :

> On Wednesday, 11 December 2013 at 02:37:32 UTC, Jonathan M Davis 
> wrote:
> > On Wednesday, December 11, 2013 03:09:52 Frustrated wrote:
> >> But surely memory gets allocated in some way?
> >> 
> >> In Programming in D:
> >> 
> >> "For example filter(), which
> >> chooses elements that are greater than 10 in the following 
> >> code,
> >> actually returns a range
> >> object, not an array:"
> >> 
> >> But if filter is a range and returns an object then how is that
> >> object allocated?

That's just a wording issue. It means object in the wider
sense, not instance of a class.

> I'm trying to avoid the GC so knowing whether or not ranges will 
> use the heap is extremely important. I guess that is where the 
> @nogc attribute that everyone talks about would come in very 
> handy.

Correct.

-- 
Marco



Re: Ranges, constantly frustrating

2014-02-11 Thread Tobias Pankrath

On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
  range.popFront();
  foreach (i, line; range.take(4))  //Error: cannot infer 
argument types

  {
..etc..
  }
  range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent 
or temporary limitation of D/MD.


R
Is foreach(i, val; aggregate) even defined if aggr is not an 
array or associated array? It is not in the docs: 
http://dlang.org/statement#ForeachStatement


Re: Ranges, constantly frustrating

2014-02-11 Thread Jakob Ovrum

On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
  range.popFront();
  foreach (i, line; range.take(4))  //Error: cannot infer 
argument types

  {
..etc..
  }
  range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent 
or temporary limitation of D/MD.


R


See this pull request[1] and the linked enhancement report.

Also note that calling `r.popFront()` without checking `r.empty` 
is a program error (so it's recommended to at least put in an 
assert).


[1] https://github.com/D-Programming-Language/phobos/pull/1866


Re: Ranges, constantly frustrating

2014-02-11 Thread Regan Heath
On Tue, 11 Feb 2014 10:10:27 -, Regan Heath   
wrote:



Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
   range.popFront();
   foreach (i, line; range.take(4))  //Error: cannot infer argument types
   {
 ..etc..
   }
   range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent or  
temporary limitation of D/MD.


Further, the naive solution of adding .array gets you in all sorts of  
trouble :p  (The whole byLine buffer re-use issue).


This should be simple and easy, dare I say it trivial.. or am I just being  
dense here.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-11 Thread Tobias Pankrath
Further, the naive solution of adding .array gets you in all 
sorts of trouble :p  (The whole byLine buffer re-use issue).


This should be simple and easy, dare I say it trivial.. or am I 
just being dense here.


R


The second naive solution would be to use readText and splitLines.


Re: Ranges, constantly frustrating

2014-02-11 Thread Regan Heath
On Tue, 11 Feb 2014 10:52:39 -, Tobias Pankrath   
wrote:


Further, the naive solution of adding .array gets you in all sorts of  
trouble :p  (The whole byLine buffer re-use issue).


This should be simple and easy, dare I say it trivial.. or am I just  
being dense here.


R


The second naive solution would be to use readText and splitLines.


The file is huge in my case :)

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-11 Thread Regan Heath
On Tue, 11 Feb 2014 10:58:17 -, Tobias Pankrath   
wrote:



On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
  range.popFront();
  foreach (i, line; range.take(4))  //Error: cannot infer argument types
  {
..etc..
  }
  range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent or  
temporary limitation of D/MD.


R
Is foreach(i, val; aggregate) even defined if aggr is not an array or  
associated array? It is not in the docs:  
http://dlang.org/statement#ForeachStatement


import std.stdio;

struct S1 {
   private int[] elements = [9,8,7];
   int opApply (int delegate (ref uint, ref int) block) {
   foreach (uint i, int n ; this.elements)
   block(i, n);
   return 0;
   }
}

void main()
{
S1 range;   
foreach(uint i, int x; range)
{
  writefln("%d is %d", i, x);
}
}

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-11 Thread Tobias Pankrath

On Tuesday, 11 February 2014 at 13:00:19 UTC, Regan Heath wrote:

import std.stdio;

struct S1 {
   private int[] elements = [9,8,7];
   int opApply (int delegate (ref uint, ref int) block) {
   foreach (uint i, int n ; this.elements)
   block(i, n);
   return 0;
   }
}

void main()
{
S1 range;   
foreach(uint i, int x; range)
{
  writefln("%d is %d", i, x);
}
}

R


byLine does not use opApply
https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1389


Re: Ranges, constantly frustrating

2014-02-11 Thread Rene Zwanenburg

On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
  range.popFront();
  foreach (i, line; range.take(4))  //Error: cannot infer 
argument types

  {
..etc..
  }
  range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent 
or temporary limitation of D/MD.


R


foreach (i, line; iota(size_t.max).zip(range.take(4)))
{

}


Re: Ranges, constantly frustrating

2014-02-11 Thread Ali Çehreli

On 02/11/2014 06:25 AM, Rene Zwanenburg wrote:

On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:



  foreach (i, line; range.take(4))  //Error: cannot infer argument types
  {
..etc..
  }



foreach (i, line; iota(size_t.max).zip(range.take(4)))
{

}


There is also the following, relying on tuples' automatic expansion in 
foreach:


foreach (i, element; zip(sequence!"n", range.take(4))) {
// ...
}

Ali



Re: Ranges, constantly frustrating

2014-02-11 Thread Steve Teale
On Tuesday, 11 February 2014 at 10:52:40 UTC, Tobias Pankrath 
wrote:




The second naive solution would be to use readText and 
splitLines.


That's the sort of thing I always do because then I understand 
what's going on, and when there's a bug I can find it easily!


But then I'm not writing libraries.

Steve



Re: Ranges, constantly frustrating

2014-02-11 Thread Jesse Phillips

On Tuesday, 11 February 2014 at 13:00:19 UTC, Regan Heath wrote:

import std.stdio;

struct S1 {
   private int[] elements = [9,8,7];
   int opApply (int delegate (ref uint, ref int) block) {
   foreach (uint i, int n ; this.elements)
   block(i, n);
   return 0;
   }
}

void main()
{
S1 range;


S1 is not a range. But this is a correct response to "Is 
foreach(i, val; aggregate) even defined if aggr is not an array 
or associated array?"


Re: Ranges, constantly frustrating

2014-02-11 Thread Steven Schveighoffer
On Tue, 11 Feb 2014 05:10:27 -0500, Regan Heath   
wrote:



Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
   range.popFront();
   foreach (i, line; range.take(4))  //Error: cannot infer argument types
   {
 ..etc..
   }
   range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent or  
temporary limitation of D/MD.


This is only available using opApply style iteration. Using range  
iteration does not give you this ability.


It's not a permanent limitation per se, but there is no plan at the moment  
to add multiple parameters to range iteration.


One thing that IS a limitation though: we cannot overload on return  
values. So the obvious idea of overloading front to return tuples of  
various types, would not be feasible. opApply can do that because the  
delegate is a parameter.


-Steve


Re: Ranges, constantly frustrating

2014-02-11 Thread Jesse Phillips

On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
  range.popFront();
  foreach (i, line; range.take(4))  //Error: cannot infer 
argument types

  {
..etc..
  }
  range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent 
or temporary limitation of D/MD.


R


In case the other replies weren't clear enough. A range does not 
have an index.


What do you expect 'i' to be? Is it the line number? Is it the 
index within the line where 'take' begins? Where 'take' stops?


There is a feature of foreach and tuple() which results in the 
tuple getting expanded automatically.


byLine has its own issues with reuse of the buffer, it isn't 
inherent to ranges. I haven't really used it (needed it from 
std.process), when I wanted to read a large file I went with 
wrapping std.mmap:


https://github.com/JesseKPhillips/libosm/blob/master/source/util/filerange.d


Re: Ranges, constantly frustrating

2014-02-11 Thread thedeemon
On Tuesday, 11 February 2014 at 19:48:41 UTC, Jesse Phillips 
wrote:


In case the other replies weren't clear enough. A range does 
not have an index.


What do you expect 'i' to be?


In case of foreach(i, x; range) I would expect it to be iteration 
number of this particular foreach. I miss it sometimes, have to 
create another variable and increment it. I didn't know about 
automatic tuple expansion though, that looks better.


Re: Ranges, constantly frustrating

2014-02-12 Thread Regan Heath

On Tue, 11 Feb 2014 17:11:46 -, Ali Çehreli  wrote:


On 02/11/2014 06:25 AM, Rene Zwanenburg wrote:

On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:


  foreach (i, line; range.take(4))  //Error: cannot infer argument  
types

  {
..etc..
  }



foreach (i, line; iota(size_t.max).zip(range.take(4)))
{

}


There is also the following, relying on tuples' automatic expansion in  
foreach:


 foreach (i, element; zip(sequence!"n", range.take(4))) {
 // ...
 }


Thanks for the workarounds.  :)  Both seem needlessly opaque, but I  
realise you're not suggesting these are better than the original, just  
that they actually work today.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-12 Thread Regan Heath
On Tue, 11 Feb 2014 13:11:54 -, Tobias Pankrath   
wrote:



On Tuesday, 11 February 2014 at 13:00:19 UTC, Regan Heath wrote:

import std.stdio;

struct S1 {
   private int[] elements = [9,8,7];
   int opApply (int delegate (ref uint, ref int) block) {
   foreach (uint i, int n ; this.elements)
   block(i, n);
   return 0;
   }
}

void main()
{
S1 range;   
foreach(uint i, int x; range)
{
  writefln("%d is %d", i, x);
}
}

R


byLine does not use opApply
https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1389


Ahh.. so this is a limitation of the range interface.  Any plans to "fix"  
this?


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-12 Thread Regan Heath
On Tue, 11 Feb 2014 19:16:31 -, Steven Schveighoffer  
 wrote:


On Tue, 11 Feb 2014 05:10:27 -0500, Regan Heath   
wrote:



Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
   range.popFront();
   foreach (i, line; range.take(4))  //Error: cannot infer argument  
types

   {
 ..etc..
   }
   range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent or  
temporary limitation of D/MD.


This is only available using opApply style iteration. Using range  
iteration does not give you this ability.


It's not a permanent limitation per se, but there is no plan at the  
moment to add multiple parameters to range iteration.


One thing that IS a limitation though: we cannot overload on return  
values. So the obvious idea of overloading front to return tuples of  
various types, would not be feasible. opApply can do that because the  
delegate is a parameter.


Thanks for the concise/complete response.  I had managed to piece this  
together from other replies but it's clearer now.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-12 Thread Regan Heath
On Tue, 11 Feb 2014 19:08:18 -, Jesse Phillips  
 wrote:



On Tuesday, 11 February 2014 at 13:00:19 UTC, Regan Heath wrote:

import std.stdio;

struct S1 {
   private int[] elements = [9,8,7];
   int opApply (int delegate (ref uint, ref int) block) {
   foreach (uint i, int n ; this.elements)
   block(i, n);
   return 0;
   }
}

void main()
{
S1 range;


S1 is not a range. But this is a correct response to "Is foreach(i, val;  
aggregate) even defined if aggr is not an array or associated array?"


True, but then I had missed the fact that there are two distinct  
mechanisms (opApply/range) in play here.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-12 Thread Regan Heath
On Tue, 11 Feb 2014 19:48:40 -, Jesse Phillips  
 wrote:



On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
  range.popFront();
  foreach (i, line; range.take(4))  //Error: cannot infer argument types
  {
..etc..
  }
  range.popFront();
}

Tried adding 'int' and 'char[]' or 'auto' .. no dice.

Can someone explain why this fails, and if this is a permanent or  
temporary limitation of D/MD.


R


In case the other replies weren't clear enough. A range does not have an  
index.


It isn't *required* to (input/forward), but it could (random access).  I  
think we even have a template to test if it's indexable as we can optimise  
some algorithms based on this.


What do you expect 'i' to be? Is it the line number? Is it the index  
within the line where 'take' begins? Where 'take' stops?


If I say take(5) I expect 0,1,2,3,4.  The index into the take range itself.

The reason I wanted it was I was parsing blocks of data over 6 lines - I  
wanted to ignore the first and last and process the middle 4.  In fact I  
wanted to skip the 2nd of those 4 as well, but there was not single  
function (I could find) which would do all that so I coded the while above.


There is a feature of foreach and tuple() which results in the tuple  
getting expanded automatically.


And also the opApply overload taking a delegate with both parameters.

byLine has its own issues with reuse of the buffer, it isn't inherent to  
ranges. I haven't really used it (needed it from std.process), when I  
wanted to read a large file I went with wrapping std.mmap:


https://github.com/JesseKPhillips/libosm/blob/master/source/util/filerange.d


Cool, thanks.

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-12 Thread Jakob Ovrum

On Wednesday, 12 February 2014 at 10:44:57 UTC, Regan Heath wrote:
Ahh.. so this is a limitation of the range interface.  Any 
plans to "fix" this?


R


Did my original reply not arrive? It is the first reply in the 
thread...


Reproduced:


See this pull request[1] and the linked enhancement report.

Also note that calling `r.popFront()` without checking 
`r.empty` is a program error (so it's recommended to at least 
put in an assert).


[1] https://github.com/D-Programming-Language/phobos/pull/1866


Re: Ranges, constantly frustrating

2014-02-12 Thread Jesse Phillips

On Wednesday, 12 February 2014 at 10:52:13 UTC, Regan Heath wrote:
On Tue, 11 Feb 2014 19:48:40 -, Jesse Phillips 
 wrote:


On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath 
wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
 range.popFront();
 foreach (i, line; range.take(4))  //Error: cannot infer 
argument types

 {


It isn't *required* to (input/forward), but it could (random 
access).  I think we even have a template to test if it's 
indexable as we can optimise some algorithms based on this.


What do you expect 'i' to be? Is it the line number? Is it the 
index within the line where 'take' begins? Where 'take' stops?


If I say take(5) I expect 0,1,2,3,4.  The index into the take 
range itself.


I don't see how these two replies can coexist. 'range.take(5)' is 
a different range from 'range.' 'range may not traverse in index 
order (personally haven't seen such a range). But more 
importantly you're not dealing with random access ranges. The 
index you're receiving from take(5) can't be used on the range.


Don't get me wrong, counting the elements as you iterate over 
them is useful, but it isn't the index into the range you're 
likely after. Maybe the number is needed to correspond to a line 
number.


There is a feature of foreach and tuple() which results in the 
tuple getting expanded automatically.


And also the opApply overload taking a delegate with both 
parameters.


I'm trying to stick with ranges and not iteration in general.



Re: Ranges, constantly frustrating

2014-02-13 Thread Regan Heath
On Wed, 12 Feb 2014 11:08:57 -, Jakob Ovrum   
wrote:



On Wednesday, 12 February 2014 at 10:44:57 UTC, Regan Heath wrote:
Ahh.. so this is a limitation of the range interface.  Any plans to  
"fix" this?


R


Did my original reply not arrive? It is the first reply in the thread...


It did, thanks.  It would be better if this was part of the language and  
"just worked" as expected, but this is just about as good.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-13 Thread Regan Heath
On Wed, 12 Feb 2014 21:01:58 -, Jesse Phillips  
 wrote:



On Wednesday, 12 February 2014 at 10:52:13 UTC, Regan Heath wrote:
On Tue, 11 Feb 2014 19:48:40 -, Jesse Phillips  
 wrote:



On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote:

Things like this should "just work"..

File input ...

auto range = input.byLine();
while(!range.empty)
{
 range.popFront();
 foreach (i, line; range.take(4))  //Error: cannot infer argument  
types

 {


It isn't *required* to (input/forward), but it could (random access).   
I think we even have a template to test if it's indexable as we can  
optimise some algorithms based on this.


You chopped of your own comment prompting this response, in which I am  
responding to a minor side-point, which I think has confused the actual  
issue.  All I was saying above was that a range might well have an index,  
and we can test for that, but it's not relevant to the foreach issue below.


What do you expect 'i' to be? Is it the line number? Is it the index  
within the line where 'take' begins? Where 'take' stops?


If I say take(5) I expect 0,1,2,3,4.  The index into the take range  
itself.


I don't see how these two replies can coexist. 'range.take(5)' is a  
different range from 'range.'


Yes, exactly, meaning that it can trivially "count" the items it returns,  
starting from 0, and give those to me as 'i'.  *That's all I want*


'range may not traverse in index order (personally haven't seen such a  
range). But more importantly you're not dealing with random access  
ranges. The index you're receiving from take(5) can't be used on the  
range.


A forward range can do what I am describing above, it's trivial.

Don't get me wrong, counting the elements as you iterate over them is  
useful, but it isn't the index into the range you're likely after.


Nope, not what I am after.  If I was, I'd iterate over the original range  
instead or keep a line count manually.



Maybe the number is needed to correspond to a line number.


Nope.  The file contains records of 5 lines plus a blank line.  I want 0,  
1, 2, 3, 4, 5 so I can skip lines 0, 2, and 5 *of each record*.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-13 Thread Jesse Phillips

On Thursday, 13 February 2014 at 14:30:41 UTC, Regan Heath wrote:
Don't get me wrong, counting the elements as you iterate over 
them is useful, but it isn't the index into the range you're 
likely after.


Nope, not what I am after.  If I was, I'd iterate over the 
original range instead or keep a line count manually.


Maybe a better way to phrase this is, while counting may be what 
you're implementation needs, it is not immediately obvious what 
'i' should be. Someone who desires an index into the original 
array will expect 'i' to be that; even though it can be explained 
that .take() is not the same range as the original.


Thus it is better to be explicit with the .enumerate function.


Re: Ranges, constantly frustrating

2014-02-14 Thread Regan Heath
On Fri, 14 Feb 2014 02:48:51 -, Jesse Phillips  
 wrote:



On Thursday, 13 February 2014 at 14:30:41 UTC, Regan Heath wrote:
Don't get me wrong, counting the elements as you iterate over them is  
useful, but it isn't the index into the range you're likely after.


Nope, not what I am after.  If I was, I'd iterate over the original  
range instead or keep a line count manually.


Maybe a better way to phrase this is, while counting may be what you're  
implementation needs, it is not immediately obvious what 'i' should be.  
Someone who desires an index into the original array will expect 'i' to  
be that; even though it can be explained that .take() is not the same  
range as the original.


Thus it is better to be explicit with the .enumerate function.


FWIW I disagree.  I think it's immediately and intuitively obvious what  
'i' should be when you're foreaching over X items taken from another  
range, even if you do not know take returns another range.  Compare it to  
calling a function on a range and foreaching on the result, you would  
intuitively and immediately expect 'i' to relate to the result, not the  
input.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-14 Thread Jakob Ovrum

On Friday, 14 February 2014 at 12:10:51 UTC, Regan Heath wrote:
FWIW I disagree.  I think it's immediately and intuitively 
obvious what 'i' should be when you're foreaching over X items 
taken from another range, even if you do not know take returns 
another range.  Compare it to calling a function on a range and 
foreaching on the result, you would intuitively and immediately 
expect 'i' to relate to the result, not the input.


R


How should it behave on ranges without length, such as infinite 
ranges?


Also, `enumerate` has the advantage of the `start` parameter, 
which usefulness is demonstrated in `enumerate`'s example as well 
as in an additional example in the bug report.


I'm not yet sure whether I think it should be implemented at the 
language or library level, but I think the library approach has 
some advantages.


Re: Ranges, constantly frustrating

2014-02-14 Thread bearophile

Regan Heath:

FWIW I disagree.  I think it's immediately and intuitively 
obvious what 'i' should be when you're foreaching over X items 
taken from another range, even if you do not know take returns 
another range.  Compare it to calling a function on a range and 
foreaching on the result, you would intuitively and immediately 
expect 'i' to relate to the result, not the input.


Using enumerate has several advantages. It gives a bit longer 
code, but it keeps as much complexity as possible out of the 
language. So the language gets simpler to implement and its 
compiler is smaller and simpler to debug.


Also, using enumerate is more explicit, if you have an 
associative array you can iterate it in many ways:


foreach (v; AA) {}
foreach (k, v; AA) {}
foreach (k; AA.byKeys) {}
foreach (i, k; AA.byKeys.enumerate) {}
foreach (i, v; AA.byValues.enumerate) {}
foreach (k, v; AA.byPairs) {}
foreach (i, k, v; AA.byPairs.enumerate) {}

If you want all those schemes built in a language (and to use 
them without adding .enumerate) you risk making a mess. In this 
case "explicit is better than implicit".


Python does the same with its enumerate function and keeps the 
for loop simple:


for k in my_dict: pass
for i, v in enumerate(my_dict.itervalues()): pass
etc.

In D we have a mess because tuples are not built-in. Instead of 
having a built-in functionality similar to what enumerate does, 
it's WAY better to have built-in tuples. Finding what's important 
and what is not important to have as built-ins in a language is 
an essential and subtle design problem.


Bye,
bearophile


Re: Ranges, constantly frustrating

2014-02-14 Thread Regan Heath
On Fri, 14 Feb 2014 13:14:51 -, bearophile   
wrote:



Regan Heath:

FWIW I disagree.  I think it's immediately and intuitively obvious what  
'i' should be when you're foreaching over X items taken from another  
range, even if you do not know take returns another range.  Compare it  
to calling a function on a range and foreaching on the result, you  
would intuitively and immediately expect 'i' to relate to the result,  
not the input.


Using enumerate has several advantages.


In my case I didn't need any of these.  Simple things should be simple and  
intuitive to write.  Yes, we want enumerate *as well* especially for the  
more complex cases but we also want the basics to be simple, intuitive and  
easy.


That's all I'm saying here.  This seems to me to be very low hanging fruit.

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-14 Thread Regan Heath
On Fri, 14 Feb 2014 12:29:49 -, Jakob Ovrum   
wrote:



On Friday, 14 February 2014 at 12:10:51 UTC, Regan Heath wrote:
FWIW I disagree.  I think it's immediately and intuitively obvious what  
'i' should be when you're foreaching over X items taken from another  
range, even if you do not know take returns another range.  Compare it  
to calling a function on a range and foreaching on the result, you  
would intuitively and immediately expect 'i' to relate to the result,  
not the input.


R


How should it behave on ranges without length, such as infinite ranges?


In exactly the same way.  It just counts up until you break out of the  
foreach, or the 'i' value wraps around.  In fact the behaviour I want is  
so trivial I think it could be provided by foreach itself, for iterations  
of anything.  In which case whether 'i' was conceptually an "index" or  
simply a "count" would depend on whether the range passed to foreach  
(after all skip, take, etc) was itself indexable.


Also, `enumerate` has the advantage of the `start` parameter, which  
usefulness is demonstrated in `enumerate`'s example as well as in an  
additional example in the bug report.


Sure, if you need more functionality reach for enumerate.  We can have  
both;  sensible default behaviour AND enumerate for more complicated  
cases.  In my case, enumerate w/ start wouldn't have helped (my file was  
blocks of 6 lines, where I wanted to skip lines 1, 3, and 6 *of each  
block*)


I'm not yet sure whether I think it should be implemented at the  
language or library level, but I think the library approach has some  
advantages.


Certainly, for the more complex usage.  But I reckon we want both  
enumerate and a simple language solution which would do what I've been  
trying to describe.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Ranges, constantly frustrating

2014-02-14 Thread bearophile

Regan Heath:


In my case I didn't need any of these.


I don't understand.

Bye,
bearophile


Re: Ranges, constantly frustrating

2014-02-14 Thread bearophile
Isn't this discussion about adding an index to a range? If it is, 
then I have shown why adding it in the language is a bad idea.


Bye,
bearophile


Re: Ranges, constantly frustrating

2014-02-14 Thread Marc Schütz

On Friday, 14 February 2014 at 17:42:53 UTC, bearophile wrote:
Isn't this discussion about adding an index to a range? If it 
is, then I have shown why adding it in the language is a bad 
idea.


As far as I understand it, it's about adding an index to 
_foreach_, as is already supported for arrays:


foreach(v; [1,2,3,4])
writeln(v);
foreach(i, v; [1,2,3,4])
writeln(i, " => ", v);

But for ranges, the second form is not possible:

foreach(v; iota(4))   // ok
writeln(v);
foreach(i, v; iota(4))// Error: cannot infer argument 
types

writeln(i, " => ", v);


Re: Ranges, constantly frustrating

2014-02-14 Thread bearophile

Marc Schütz:

As far as I understand it, it's about adding an index to 
_foreach_, as is already supported for arrays:


foreach(v; [1,2,3,4])
writeln(v);
foreach(i, v; [1,2,3,4])
writeln(i, " => ", v);

But for ranges, the second form is not possible:

foreach(v; iota(4))   // ok
writeln(v);
foreach(i, v; iota(4))// Error: cannot infer argument 
types

writeln(i, " => ", v);


I see. In my post I have explained why this is a bad idea (it's 
not explicit so it gives confusion, and it complicates the 
language/compiler).


A better design is to remove the auto-indexing feature for arrays 
too, and use .enumerate in all cases, as in Python.


Bye,
bearophile


Re: Ranges, constantly frustrating

2014-02-17 Thread Regan Heath
This turned into a bit of a full spec so I would understand if you TL;DR  
but it would be nice to get some feedback if you have the time..


On Fri, 14 Feb 2014 17:34:46 -, bearophile   
wrote:

Regan Heath:


In my case I didn't need any of these.


I don't understand.


What I meant here is that I don't need the "advantages" provided by  
enumerate like the starting index.


One thing I am unclear about from your response is what you mean by  
implicit in this context?  Do you mean the process of inferring things  
(like the types in foreach)?


(taken from subsequent reply)

Isn't this discussion about adding an index to a range?


No, it's not.  The counter I want would only be an index if the range was  
indexable, otherwise it's a count of foreach iterations (starting from  
0).  This counter is (if you like) an "index into the result set" which is  
not necessarily also an index into the source range (which may not be  
indexable).


What we currently have with foreach is an index and only for indexable  
things.  I want to instead generalise this to be a counter which is an  
index when the thing being enumerated is indexable, otherwise it is a  
count or "index into the result set".


Lets call this change scheme #0.  It solves my issue, and interestingly  
also would have meant we didn't need to add byKey or byValue to AA's,  
instead we could have simply made keys/values indexable ranges and not  
broken any existing code.


Further details of scheme #0 below.

(taken from subsequent reply)
If you want all those schemes built in a language (and to use them  
without adding .enumerate) you risk making

a mess. In this case "explicit is better than implicit".


Have a read of what I have below and let me know if you think it's a  
"mess".  Scheme #2 has more rules, and might be called a "mess" perhaps.   
But, scheme #1 is fairly clean and simple and I think better overall.  The  
one downside is that without some additional syntax it cannot put tuple  
components nicely in context with descriptive variable names, so there is  
that.


To be fair to all 3 schemes below, they mostly "just work" for simple  
cases and/or cases where different types are used for key/values in AA's  
and tuples.  The more complicated rules only kick in to deal with the  
cases where there is ambiguity (AA's with the same type for key and value  
and tuples with multiple components of the same type).


Anyway, on to the details..

***

Scheme 0) So, what I want is for foreach to simply increment a counter  
after each call to the body of the foreach, giving me a counter from 0 to  
N (or infinity/wrap).  It would do this when prompted to do so by a  
variable being supplied in the foreach statement in the usual way (for  
arrays/opApply)


This counter would not be defined/understood to be an "index" into the  
object being enumerated necessarily (as it currently is), instead if the  
object is indexable then it would indeed be an index, otherwise it's a  
count (index into the result set).


I had not been considering associative arrays until now, given current  
support (without built in tuples) they do not seem to be a special case to  
me.  Foreach over byKey() should look/function identically to foreach over  
keys, likewise for byValue().  The only difference is that in the  
byKey()/byValue() case the counter is not necessarily an index into  
anything, though it would be if the underlying byKey() range was indexable.


The syntax for this, is the same as we have for arrays/classes with  
opApply today.  In other words, "it just works" and my example would  
compile and run as one might expect.


This seems to me to be intuitive, useful and easy to implement.  Further,  
I believe it leaves the door open to having built in tuples (or using  
library extensions like enumerate()), with similarly clean syntax and no  
"mess".


***

So, what if we had built in tuples?  Well, seems to me we could do foreach  
over AAs/tuples in one of 2 ways or even a combination of both:


Scheme 1) for AA's/tuples the value given to the foreach body is a  
voldemort (unnamed) type with a public property member for each component  
of the AA/tuple.  In the case of AA's this would then be "key" and  
"value", for tuples it might be a, b, .., z, aa, bb, .. and so on.


foreach(x; AA) {}// Use x.key and x.value
foreach(i, x; AA) {} // Use i, x.key and x.value
foreach(int i, x; AA) {} // Use i, x.key and x.value

Extra/better: For non-AA tuples we could allow the members to be named  
using some sort of syntax, i.e.


foreach(i, (x.bob, x.fred); AA) {} // Use i, x.bob and x.fred
or
foreach(i, x { int bob; string fred }; AA) {} // Use i, x.bob and x.fred
or
foreach(i, new x { int bob; string fred }; AA) {} // Use i, x.bob and  
x.fred



Lets look at your examples re-written for scheme #1


foreach (v; AA) {}

foreach (x; AA) { .. use x.value .. } // better? worse?


foreach (k, v; AA) {}

foreach (x; AA

Re: ranges reading garbage

2015-02-15 Thread bearophile via Digitalmars-d-learn

John Colvin:

prints things like [0, 4, 5, 1, 1, 1459971595, 1459971596, 2, 
2, 1459971596, 1459971597, 3, 4, 8, 9, 5, 5, 4441427819, 
4441427820, 6, 6, 4441427820, 4441427821, 7] but the output 
isn't consistent, the big numbers change on each run.


Try to replace the only() with:

[y, y+ys.length, y+ys.length+1, y+1]

Like this:


import std.range, std.algorithm, std.stdio;

void foo(in float[] data, in float[] xs, in float[] ys) @safe {
iota(0, data.length, ys.length)
.map!(xBase => iota(xBase, xBase + ys.length - 1)
   .map!(y => [y, y+ys.length, y+ys.length+1, 
y+1])

   .joiner)
.joiner
.writeln;
}

void main() {
foo([1,2,3,4,5,6,7,8], [0.1,0.2], [10,20,30,40]);
}



In Rust the compiler enforces that all stack-allocated data 
doesn't come from dead stack frames. In D you have to be careful 
to avoid doing it. In future this kind of bugs will be hopefully 
avoided by a better tracking of the memory.


I am not sure if http://wiki.dlang.org/DIP69 is able to avoid 
this bug, if it can't, then DIP69 needs to be improved.


Bye,
bearophile


Re: ranges reading garbage

2015-02-15 Thread FG via Digitalmars-d-learn

On 2015-02-15 at 19:43, bearophile wrote:

void foo(in float[] data, in float[] xs, in float[] ys) @safe {
 iota(0, data.length, ys.length)
 .map!(xBase => iota(xBase, xBase + ys.length - 1)
.map!(y => [y, y+ys.length, y+ys.length+1, y+1])
.joiner)
 .joiner
 .writeln;
}

void main() {
 foo([1,2,3,4,5,6,7,8], [0.1,0.2], [10,20,30,40]);
}


Odd... Still something is wrong. It prints:
[0, 4, 5, 1, 1, 5, 6, 2, 2, 6, 7, 3, 4, 8, 9, 5, 5, 5, 6, 6, 6, 6, 7, 7]

instead of this:
[0, 4, 5, 1, 1, 5, 6, 2, 2, 6, 7, 3, 4, 8, 9, 5, 5, 9, 10, 6, 6, 10, 11, 7]


Re: ranges reading garbage

2015-02-15 Thread John Colvin via Digitalmars-d-learn

On Sunday, 15 February 2015 at 18:43:35 UTC, bearophile wrote:

John Colvin:

prints things like [0, 4, 5, 1, 1, 1459971595, 1459971596, 2, 
2, 1459971596, 1459971597, 3, 4, 8, 9, 5, 5, 4441427819, 
4441427820, 6, 6, 4441427820, 4441427821, 7] but the output 
isn't consistent, the big numbers change on each run.


Try to replace the only() with:

[y, y+ys.length, y+ys.length+1, y+1]

Like this:


import std.range, std.algorithm, std.stdio;

void foo(in float[] data, in float[] xs, in float[] ys) @safe {
iota(0, data.length, ys.length)
.map!(xBase => iota(xBase, xBase + ys.length - 1)
   .map!(y => [y, y+ys.length, y+ys.length+1, 
y+1])

   .joiner)
.joiner
.writeln;
}

void main() {
foo([1,2,3,4,5,6,7,8], [0.1,0.2], [10,20,30,40]);
}



In Rust the compiler enforces that all stack-allocated data 
doesn't come from dead stack frames. In D you have to be 
careful to avoid doing it. In future this kind of bugs will be 
hopefully avoided by a better tracking of the memory.


I am not sure if http://wiki.dlang.org/DIP69 is able to avoid 
this bug, if it can't, then DIP69 needs to be improved.


Bye,
bearophile


But std.range.OnlyResult!(size_t, 4) is a value type, I don't see 
where the stack reference is being leaked.


Re: ranges reading garbage

2015-02-15 Thread bearophile via Digitalmars-d-learn

FG:


Odd... Still something is wrong. It prints:
[0, 4, 5, 1, 1, 5, 6, 2, 2, 6, 7, 3, 4, 8, 9, 5, 5, 5, 6, 6, 6, 
6, 7, 7]


instead of this:
[0, 4, 5, 1, 1, 5, 6, 2, 2, 6, 7, 3, 4, 8, 9, 5, 5, 9, 10, 6, 
6, 10, 11, 7]


This is less lazy and gives another result:

import std.range, std.algorithm, std.stdio;

void foo(in float[] data, in float[] xs, in float[] ys) @safe {
iota(0, data.length, ys.length)
.map!(xBase => iota(xBase, xBase + ys.length - 1)
   .map!(y => [y, y+ys.length, y+ys.length+1, 
y+1])

   .join)
.join
.writeln;
}

void main() {
foo([1,2,3,4,5,6,7,8], [0.1,0.2], [10,20,30,40]);
}


What a fun program :-)

Bye,
bearophile


Re: ranges reading garbage

2015-02-15 Thread anonymous via Digitalmars-d-learn

On Sunday, 15 February 2015 at 18:13:44 UTC, John Colvin wrote:

Simplified from something bigger:

import std.range, std.algorithm, std.stdio;

void foo(float[] data, float[] xs, float[] ys)
{
auto indices = iota(0, data.length, ys.length)
.map!(xBase =>
iota(xBase, xBase + ys.length - 1)
.map!(y =>
only(y, y+ys.length, y+ys.length+1, y+1))
.joiner())
.joiner();
writeln(indices);
}

void main()
{
foo([1,2,3,4,5,6,7,8],
[0.1,0.2], [10,20,30,40]);
}

prints things like [0, 4, 5, 1, 1, 1459971595, 1459971596, 2, 
2, 1459971596, 1459971597, 3, 4, 8, 9, 5, 5, 4441427819, 
4441427820, 6, 6, 4441427820, 4441427821, 7] but the output 
isn't consistent, the big numbers change on each run.


Reduced some more:

import std.algorithm, std.stdio;
void main()
{
int ys_length = 4;
auto indices = [0]
.map!(xBase => [0].map!(y => ys_length))
.joiner();
writeln(indices);
}


Re: ranges reading garbage

2015-02-15 Thread anonymous via Digitalmars-d-learn

On Sunday, 15 February 2015 at 19:54:45 UTC, anonymous wrote:

Reduced some more:

import std.algorithm, std.stdio;
void main()
{
int ys_length = 4;
auto indices = [0]
.map!(xBase => [0].map!(y => ys_length))
.joiner();
writeln(indices);
}


And more:

import std.stdio;
struct MapResult(alias fun)
{
@property int front() {return fun();}
@property auto save() {return typeof(this)();}
}
void main()
{
int ys_length = 4;
auto dg = {return MapResult!({return ys_length;})();};
writeln(dg().front); /* 4, correct */
writeln(dg().save.front); /* garbage */
}


  1   2   >