Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-15 Thread H. S. Teoh
In an effort to locate the suspected Phobos bug that I ran into in my
new Cartesian product implementation, I looked over the code for
std.algorithm joiner more carefully, and noticed the following:

Given a range of ranges ror, it assigns ror.front to a struct member and
then calls ror.popFront() immediately. Then as the user calls the joined
range's popFront, it successively pops its local copy of ror.front until
it's empty, whereupon it assigns the new ror.front to the local copy
again, and so on.

While this works for array of arrays, it *doesn't* work for ranges that
overwrite ror.front when ror.popFront() is called.  For example, it
wouldn't work for stdin.byLine(), because the underlying buffer it
reused. Here's the proof:

Code:
// Filename: test2.d
import std.algorithm;
import std.stdio;
void main() {
auto lines = stdin.byLine();
writeln(lines.joiner);
}

Compile & test:

# Compile
dmd test2.d

# Echo three lines, "abc", "def", and "ghi", and feed it to
# the program.
(echo abc; echo def; echo ghi) | ./test2

# This is the output:
defghighi

As you can see, the output is mangled, because byLine() has reused the
line buffer before joiner.Result got to it.

Long story short: saving a local copy of range.front and then calling
range.popFront() may _invalidate_ the copy. So basically, either you
need a forward range and use .save to keep track of old values of
range.front, or you have to refrain from calling popFront() until you're
well and truly done with the current value of .front. While not
respecting this will work with many common ranges, it will also fail in
subtle ways when given other ranges.

The scary thing is, I see similar code like this all over Phobos. Does
this mean that most of std.algorithm may need to be revised to address
this issue? At the very least, it would seem that a code audit is in
order to weed out this particular issue.

:-/


T

-- 
A linguistics professor was lecturing to his class one day. "In
English," he said, "A double negative forms a positive. In some
languages, though, such as Russian, a double negative is still a
negative. However, there is no language wherein a double positive can
form a negative."
A voice from the back of the room piped up, "Yeah, yeah."


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-15 Thread Jonathan M Davis
On Monday, October 15, 2012 22:09:05 H. S. Teoh wrote:
> The scary thing is, I see similar code like this all over Phobos. Does
> this mean that most of std.algorithm may need to be revised to address
> this issue? At the very least, it would seem that a code audit is in
> order to weed out this particular issue.

Probably related to http://d.puremagic.com/issues/show_bug.cgi?id=6495

This was discussed in http://d.puremagic.com/issues/show_bug.cgi?id=8084

The thing is that realistically, it's going to be a big problem to not be able 
to rely on front returning the same value on every call or not being able to 
rely on the result of front still being around after the call to popFront. 
Ranges such as ByLine are incredibly abnormal, and it's arguably reasonable 
that they must be treated specially by the programmer using them. But that 
could pose a usability problem as well.

So, I don't really know what the right answer is, but I _really_ don't like 
the idea of having to worry about the result of front changing after a call to 
popFront in every single function that ever uses front. In the general case, I 
just don't see how that's tenable. I'd _much_ rather that it be up to the 
programmer using abnormal ranges such as ByLine to use them correctly.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-15 Thread Jonathan M Davis
On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:
> So, I don't really know what the right answer is, but I _really_ don't like
> the idea of having to worry about the result of front changing after a call
> to popFront in every single function that ever uses front. In the general
> case, I just don't see how that's tenable. I'd _much_ rather that it be up
> to the programmer using abnormal ranges such as ByLine to use them
> correctly.

And actually, it seems to me that issues like this make it look like it was a 
mistake to make ranges like ByLine ranges in the first place. They should have 
just defined opApply so that they'd work nicely in foreach but not with range-
based functions. They're clearly not going to work with a _lot_ of range-based 
functions.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Mehrdad
On Tuesday, 16 October 2012 at 06:03:56 UTC, Jonathan M Davis 
wrote:
And actually, it seems to me that issues like this make it look 
like it was a
mistake to make ranges like ByLine ranges in the first place. 
They should have
just defined opApply so that they'd work nicely in foreach but 
not with range-
based functions. They're clearly not going to work with a _lot_ 
of range-based

functions.

- Jonathan M Davis


+1


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread H. S. Teoh
On Mon, Oct 15, 2012 at 10:59:26PM -0700, Jonathan M Davis wrote:
> On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:
> > So, I don't really know what the right answer is, but I _really_
> > don't like the idea of having to worry about the result of front
> > changing after a call to popFront in every single function that ever
> > uses front. In the general case, I just don't see how that's
> > tenable. I'd _much_ rather that it be up to the programmer using
> > abnormal ranges such as ByLine to use them correctly.
> 
> And actually, it seems to me that issues like this make it look like
> it was a mistake to make ranges like ByLine ranges in the first place.
> They should have just defined opApply so that they'd work nicely in
> foreach but not with range- based functions. They're clearly not going
> to work with a _lot_ of range-based functions.
[...]

But nothing about the definition of a range, as currently defined in
std.range, guarantees that whatever value was returned by .front is
still valid after popFront() is called.

I'm not saying that this should be the "correct" behaviour, but the
current definition does not prohibit a range from, say, reusing an array
to compute its next element. For example, one may have a range that
returns the permutations of a given array, in which popFront() permutes
the elements in-place. In this case, .front will become invalid once
popFront() is called. Many of the current range-based functions will not
work correctly in this case.

Of course, it's arguable whether such ranges should be admissible, but
as far as the current definition goes, I don't see anything that states
otherwise. If we don't make such things clear, then we're bound to run
into pathological cases where bugs will creep in because of unstated
assumptions in the code.


T

-- 
Fact is stranger than fiction.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread bearophile

Jonathan M Davis:

And actually, it seems to me that issues like this make it look 
like it was a
mistake to make ranges like ByLine ranges in the first place. 
They should have
just defined opApply so that they'd work nicely in foreach but 
not with range-
based functions. They're clearly not going to work with a _lot_ 
of range-based

functions.


I like byLine() to be a range, so it's compatible with 
std.algorithm stuff. But a short time after the creation of 
byLine() I suggested to make it copy (dup) lines on default and 
not copy them on request, this means:


auto lineBuff = "foo.txt".File().byLine().array();
==> good result


auto lineBuff = "foo.txt".File().byLine!false().array();
==> doesn't copy, garbage result.

I design my ranges like that. It's safe because on default (or if 
you don't know what you are doing) it copies, and it's a bit 
slower. When you know what you are doing and you want more speed, 
you disable the copy with a compile-time argument.


Bye,
bearophile


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread bearophile

H. S. Teoh:


If we don't make such things clear, then we're bound to run
into pathological cases where bugs will creep in because of 
unstated assumptions in the code.


A good language/std library has to protect the ignorant, where 
possible. And this is a place where it's easy to do this.


Bye,
bearophile


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Joseph Rushton Wakeling

On Tuesday, 16 October 2012 at 14:23:28 UTC, bearophile wrote:
I design my ranges like that. It's safe because on default (or 
if you don't know what you are doing) it copies, and it's a bit 
slower. When you know what you are doing and you want more 
speed, you disable the copy with a compile-time argument.


Have to say that in my (admittedly not so extensive) experience 
of byLine, it's slow enough anyway that I can't imagine the extra 
performance hit you describe would be that onerous.  Certainly 
worth it for making it "safe" as a range.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Tommi
On Tuesday, 16 October 2012 at 05:49:11 UTC, Jonathan M Davis 
wrote:
So, I don't really know what the right answer is, but I 
_really_ don't like the idea of having to worry about

the result of front changing after a call to popFront
in every single function that ever uses front.


Isn't the deeper underlying problem here the fact that there's no 
generic way to make a deep copy in this language (does any 
language?) Making a deep copy of front would be a clean solution. 
I don't know how to implement this generic deep copying 
functionality though. For example: what would be the semantics of 
deep-copying a range?


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread H. S. Teoh
On Tue, Oct 16, 2012 at 05:01:57PM +0200, Tommi wrote:
> On Tuesday, 16 October 2012 at 05:49:11 UTC, Jonathan M Davis wrote:
> >So, I don't really know what the right answer is, but I _really_
> >don't like the idea of having to worry about the result of front
> >changing after a call to popFront in every single function that ever
> >uses front.
> 
> Isn't the deeper underlying problem here the fact that there's no
> generic way to make a deep copy in this language (does any language?)
> Making a deep copy of front would be a clean solution. I don't know
> how to implement this generic deep copying functionality though. For
> example: what would be the semantics of deep-copying a range?

I don't think the problem here is whether we have a deep copy or not,
but that the semantics of ranges are not defined clearly enough. If we
clearly state, for example, that whatever is returned by .front must
persist beyond popFront(), then it would be up to the range to implement
the deep copy, e.g., return the .dup or .idup'd array instead of a
reference to an internal buffer.

OTOH, if we clearly state that .front may not persist past the next
popFront(), then it would be up to the caller to .dup the return value,
or otherwise make a safe copy of it (or use the value before calling
popFront()).

The current ambiguous situation leads to one range doing one thing, and
another range doing something else, and either way, either this code
will break or that code will break.


T

-- 
"Holy war is an oxymoron." -- Lazarus Long


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread deadalnix

Le 16/10/2012 15:30, H. S. Teoh a écrit :

On Mon, Oct 15, 2012 at 10:59:26PM -0700, Jonathan M Davis wrote:

On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:

So, I don't really know what the right answer is, but I _really_
don't like the idea of having to worry about the result of front
changing after a call to popFront in every single function that ever
uses front. In the general case, I just don't see how that's
tenable. I'd _much_ rather that it be up to the programmer using
abnormal ranges such as ByLine to use them correctly.


And actually, it seems to me that issues like this make it look like
it was a mistake to make ranges like ByLine ranges in the first place.
They should have just defined opApply so that they'd work nicely in
foreach but not with range- based functions. They're clearly not going
to work with a _lot_ of range-based functions.

[...]

But nothing about the definition of a range, as currently defined in
std.range, guarantees that whatever value was returned by .front is
still valid after popFront() is called.

I'm not saying that this should be the "correct" behaviour, but the
current definition does not prohibit a range from, say, reusing an array
to compute its next element. For example, one may have a range that
returns the permutations of a given array, in which popFront() permutes
the elements in-place. In this case, .front will become invalid once
popFront() is called. Many of the current range-based functions will not
work correctly in this case.

Of course, it's arguable whether such ranges should be admissible, but
as far as the current definition goes, I don't see anything that states
otherwise. If we don't make such things clear, then we're bound to run
into pathological cases where bugs will creep in because of unstated
assumptions in the code.


T



The default semantic should be the safe one.

Providing backdoor for experienced developer is useful, but it does arm 
as default behavior.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread deadalnix

Le 16/10/2012 17:17, H. S. Teoh a écrit :

On Tue, Oct 16, 2012 at 05:01:57PM +0200, Tommi wrote:

On Tuesday, 16 October 2012 at 05:49:11 UTC, Jonathan M Davis wrote:

So, I don't really know what the right answer is, but I _really_
don't like the idea of having to worry about the result of front
changing after a call to popFront in every single function that ever
uses front.


Isn't the deeper underlying problem here the fact that there's no
generic way to make a deep copy in this language (does any language?)
Making a deep copy of front would be a clean solution. I don't know
how to implement this generic deep copying functionality though. For
example: what would be the semantics of deep-copying a range?


I don't think the problem here is whether we have a deep copy or not,
but that the semantics of ranges are not defined clearly enough. If we
clearly state, for example, that whatever is returned by .front must
persist beyond popFront(), then it would be up to the range to implement
the deep copy, e.g., return the .dup or .idup'd array instead of a
reference to an internal buffer.



It wouldn't be enough. Does several call to front recompute things ? See 
http://d.puremagic.com/issues/show_bug.cgi?id=8803


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Peter Alexander

On Tuesday, 16 October 2012 at 15:47:49 UTC, deadalnix wrote:
It wouldn't be enough. Does several call to front recompute 
things ? See http://d.puremagic.com/issues/show_bug.cgi?id=8803


That's up to the range. You shouldn't rely on it caching anything 
(map does not, but it used to).


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Tuesday, October 16, 2012 06:30:53 H. S. Teoh wrote:
> On Mon, Oct 15, 2012 at 10:59:26PM -0700, Jonathan M Davis wrote:
> > On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:
> > > So, I don't really know what the right answer is, but I _really_
> > > don't like the idea of having to worry about the result of front
> > > changing after a call to popFront in every single function that ever
> > > uses front. In the general case, I just don't see how that's
> > > tenable. I'd _much_ rather that it be up to the programmer using
> > > abnormal ranges such as ByLine to use them correctly.
> > 
> > And actually, it seems to me that issues like this make it look like
> > it was a mistake to make ranges like ByLine ranges in the first place.
> > They should have just defined opApply so that they'd work nicely in
> > foreach but not with range- based functions. They're clearly not going
> > to work with a _lot_ of range-based functions.
> 
> [...]
> 
> But nothing about the definition of a range, as currently defined in
> std.range, guarantees that whatever value was returned by .front is
> still valid after popFront() is called.
> 
> I'm not saying that this should be the "correct" behaviour, but the
> current definition does not prohibit a range from, say, reusing an array
> to compute its next element. For example, one may have a range that
> returns the permutations of a given array, in which popFront() permutes
> the elements in-place. In this case, .front will become invalid once
> popFront() is called. Many of the current range-based functions will not
> work correctly in this case.
> 
> Of course, it's arguable whether such ranges should be admissible, but
> as far as the current definition goes, I don't see anything that states
> otherwise. If we don't make such things clear, then we're bound to run
> into pathological cases where bugs will creep in because of unstated
> assumptions in the code.

There's only so much that the compiler can check for you. Sure, we could say 
in the docs that front must still be valid after a call to popFront (and 
perhaps we should), but there's no way to validate that.

There's also no way to validate that front always returns the same value, or 
that popFront actually pops a value, or that it pops only one value, etc. 
Pretty much _all_ that we can verify with template constraints is function 
signatures. So, we can _never_ fully restrict range types to exactly what 
would be considered correct.

So, if we have expectations about how ranges should/must work, then that 
should be in the docs somewhere, but the definition of a range from the 
compiler's perspective can only ever be defined by eponymous templates which 
can't possibly verify correct behavior, meaning that something like ByLine 
will always be permitted even if it doesn't work, because it breaks the 
semantic design of ranges.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread H. S. Teoh
On Tue, Oct 16, 2012 at 07:02:58PM +0200, Jonathan M Davis wrote:
> On Tuesday, October 16, 2012 06:30:53 H. S. Teoh wrote:
[...]
> > I'm not saying that this should be the "correct" behaviour, but the
> > current definition does not prohibit a range from, say, reusing an
> > array to compute its next element. For example, one may have a range
> > that returns the permutations of a given array, in which popFront()
> > permutes the elements in-place. In this case, .front will become
> > invalid once popFront() is called. Many of the current range-based
> > functions will not work correctly in this case.
> > 
> > Of course, it's arguable whether such ranges should be admissible,
> > but as far as the current definition goes, I don't see anything that
> > states otherwise. If we don't make such things clear, then we're
> > bound to run into pathological cases where bugs will creep in
> > because of unstated assumptions in the code.
> 
> There's only so much that the compiler can check for you. Sure, we
> could say in the docs that front must still be valid after a call to
> popFront (and perhaps we should), but there's no way to validate that.

I wasn't talking about compiler validation. I was talking about clearly
defining, in the docs or otherwise, what exactly a range is, and what is
expected of it. Right now, it's rather nebulous what exactly constitutes
a range. I thought byLine() was a perfectly valid range, but apparently
you think otherwise. The two aren't compatible, since they lead to wrong
code that has buggy behaviour when passed something it doesn't expect.


[...]
> So, if we have expectations about how ranges should/must work, then
> that should be in the docs somewhere, but the definition of a range
> from the compiler's perspective can only ever be defined by eponymous
> templates which can't possibly verify correct behavior, meaning that
> something like ByLine will always be permitted even if it doesn't
> work, because it breaks the semantic design of ranges.
[...]

So what is (or should be) the semantic design of ranges? Let's work out
a precise definition so that we have something to build on.


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Tuesday, October 16, 2012 08:17:40 H. S. Teoh wrote:
> On Tue, Oct 16, 2012 at 05:01:57PM +0200, Tommi wrote:
> > On Tuesday, 16 October 2012 at 05:49:11 UTC, Jonathan M Davis wrote:
> > >So, I don't really know what the right answer is, but I _really_
> > >don't like the idea of having to worry about the result of front
> > >changing after a call to popFront in every single function that ever
> > >uses front.
> > 
> > Isn't the deeper underlying problem here the fact that there's no
> > generic way to make a deep copy in this language (does any language?)
> > Making a deep copy of front would be a clean solution. I don't know
> > how to implement this generic deep copying functionality though. For
> > example: what would be the semantics of deep-copying a range?
> 
> I don't think the problem here is whether we have a deep copy or not,
> but that the semantics of ranges are not defined clearly enough. If we
> clearly state, for example, that whatever is returned by .front must
> persist beyond popFront(), then it would be up to the range to implement
> the deep copy, e.g., return the .dup or .idup'd array instead of a
> reference to an internal buffer.

1. There is no generic way to deep copy stuff. So, requiring a deep copy of 
front just plain doesn't make sense. It would be completely untenable. By 
doing so, you basically make it impossible to use the result of front beyond a 
call to popFront in the general case.

2. Often, a deep copy isn't needed in the least, even with a messed up range 
like ByLine. If it were an array of class objects rather than char, doing a 
deep copy would needlessly copy all of those class objects as well. It could 
quickly become a performance nightmare.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Tuesday, October 16, 2012 10:13:57 H. S. Teoh wrote:
> I wasn't talking about compiler validation. I was talking about clearly
> defining, in the docs or otherwise, what exactly a range is, and what is
> expected of it. Right now, it's rather nebulous what exactly constitutes
> a range. I thought byLine() was a perfectly valid range, but apparently
> you think otherwise. The two aren't compatible, since they lead to wrong
> code that has buggy behaviour when passed something it doesn't expect.

ByLine is perfectly valid range insofar as you realize that it's likely to go 
completely south if you use it in any way that could involve keeping front 
around after popFront has been called, which means that anything which relies 
on keeping front around isn't going to work. So, it's a range, but it's 
essentially an unsafe one (though I'm not sure that it's an un-@safe one).

So, it's fine that ByLine is a range as long as we're willing to put up with it 
not working with a lot of range-based functions because of its abnormal 
behavior. But I don't think that it's at all reasonable for range-based 
functions in general to not be able to rely on front returning the same type 
every time or on its value disappearing or becoming invalid in some way after 
a call to popFront. That's completely untenable IMHO.

Ranges _can_ define semantics which violate that, but they have to make it 
clear that they do so that programmers using them realize that they may not 
work right with a lot of range-based functions (which potentially makes it so 
that it they really shouldn't have been ranges in the first place).

> So what is (or should be) the semantic design of ranges? Let's work out
> a precise definition so that we have something to build on.

As far as front (or back) goes, range-based functions should be able to rely 
on

1. front returning the exact same value on every call until popFront has been 
called (though there's no guarantee that front won't have to be recalculated 
on each call, so assigning the result of front to a local variable is 
advisable for efficiency if front must be used multiple times before a call to 
popFront).

2. the result of front continuing to be valid and unchanged after popFront has 
been called if it was assigned to a variable.

Any range is free to violate this, but because range-based functions are free 
to rely on it, such ranges risk not working correctly with many range-based 
functions and must be used with caution.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Tommi
On Tuesday, 16 October 2012 at 17:19:26 UTC, Jonathan M Davis 
wrote:

1. There is no generic way to deep copy stuff.


Could you elaborate on that a bit more?

How about a deep-assign operator? The compiler could implicitly 
define it at least for arrays and for structs that don't have 
mutable indirection:


struct MyStruct
{
int _value;

// implicit:
ref MyStruct opDeepAssign(ref const MyStruct ms)
{
this = ms;
return this;
}
}

int[] array1 = [1, 2, 3];
int[] array2;
array2.opDeepAssign(array1); // array2 = array1.dup;

And there would be hasDeepAssign!T traits and whatnot.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread David Gileadi

On 10/16/12 10:28 AM, Jonathan M Davis wrote:

Any range is free to violate this, but because range-based functions are free
to rely on it, such ranges risk not working correctly with many range-based
functions and must be used with caution.


As a D dabbler, it seems wrong to me that a built-in range like byLine, 
which is often used in example code, should have weird side effects with 
the built-in methods in std.algorithm.  IMO this needs to be fixed one 
way or another, and a mere documentation change is likely not enough for 
casual D users.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread H. S. Teoh
On Tue, Oct 16, 2012 at 07:28:38PM +0200, Jonathan M Davis wrote:
> On Tuesday, October 16, 2012 10:13:57 H. S. Teoh wrote:
[...]
> ByLine is perfectly valid range insofar as you realize that it's
> likely to go completely south if you use it in any way that could
> involve keeping front around after popFront has been called, which
> means that anything which relies on keeping front around isn't going
> to work. So, it's a range, but it's essentially an unsafe one (though
> I'm not sure that it's an un-@safe one).
> 
> So, it's fine that ByLine is a range as long as we're willing to put
> up with it not working with a lot of range-based functions because of
> its abnormal behavior. But I don't think that it's at all reasonable
> for range-based functions in general to not be able to rely on front
> returning the same type every time or on its value disappearing or
> becoming invalid in some way after a call to popFront. That's
> completely untenable IMHO.

I think a better solution is to somehow differentiate between ranges
whose .front value is permanent (arrays), and ranges whose .front values
are transient (e.g. byLine, but we shouldn't single it out because there
are other occasions where you may *want* to have a range that doesn't
call .dup every time -- say a range over all permutations of an array
that modifies the array in-place).

Perhaps mark ranges with an .isTransient property (which can be an enum
so it doesn't waste any space), and range-based functions that require
non-transient ranges will refuse to work with a transient range. Or
alternatively, switch to a different implementation that takes
transience into account (which may be slower, etc., but the important
thing is to do it right -- after all, D's motto is safe by default,
unsafe if you ask for it).


> Ranges _can_ define semantics which violate that, but they have to
> make it clear that they do so that programmers using them realize that
> they may not work right with a lot of range-based functions (which
> potentially makes it so that it they really shouldn't have been ranges
> in the first place).

I think this approach of what amounts to guessing which ranges are
safe/unsafe with which functions is what's untenable. We all know that
people don't read documentation, or at least, not thoroughly. Something
like this is too easy to miss, and bugs will slip into code unnoticed
for a long time, and then explode in your face. It's unsafe by default,
which goes against D's philosophy.

I think the problem is better addressed by an .isTransient property that
can be selected by templates at compile-time.  I agree that requiring
*every* range function to expect .front to mutate upon calling
.popFront() is unreasonable, but *some* functions *can* be written in a
way that they will still work. Marking ranges with transient .front
values will allow functions that *can* take transient ranges do so,
while leaving other functions accepting only non-transient ranges. That
way, if somebody tries to pass a transient range to a template that only
works with a non-transient range, they will know at compile-time instead
of getting a nasty surprise later on.


> > So what is (or should be) the semantic design of ranges? Let's work
> > out a precise definition so that we have something to build on.
> 
> As far as front (or back) goes, range-based functions should be able
> to rely on
> 
> 1. front returning the exact same value on every call until popFront
> has been called (though there's no guarantee that front won't have to
> be recalculated on each call, so assigning the result of front to a
> local variable is advisable for efficiency if front must be used
> multiple times before a call to popFront).
> 
> 2. the result of front continuing to be valid and unchanged after
> popFront has been called if it was assigned to a variable.
> 
> Any range is free to violate this, but because range-based functions
> are free to rely on it, such ranges risk not working correctly with
> many range-based functions and must be used with caution.
[...]

I don't like the idea that some ranges _may_ or _may not_ work with some
functions. That's just too unreliable, and it's too easy to make
mistakes. Let's put in an isTransient property on the unsafe ranges and
let the templates' signature constraints enforce passing the right kind
of range.


On Tue, Oct 16, 2012 at 10:58:23AM -0700, David Gileadi wrote:
[...]
> As a D dabbler, it seems wrong to me that a built-in range like
> byLine, which is often used in example code, should have weird side
> effects with the built-in methods in std.algorithm.  IMO this needs to
> be fixed one way or another, and a mere documentation change is likely
> not enough for casual D users.

I agree. We should add an isTransient property to at least the built-in
ranges in Phobos, so that at least those ranges will always work
properly (or complain loudly at compile-time if passed to the wrong
function).

User-defined ranges need to be mar

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Tuesday, October 16, 2012 10:58:23 David Gileadi wrote:
> On 10/16/12 10:28 AM, Jonathan M Davis wrote:
> > Any range is free to violate this, but because range-based functions are
> > free to rely on it, such ranges risk not working correctly with many
> > range-based functions and must be used with caution.
> 
> As a D dabbler, it seems wrong to me that a built-in range like byLine,
> which is often used in example code, should have weird side effects with
> the built-in methods in std.algorithm. IMO this needs to be fixed one
> way or another, and a mere documentation change is likely not enough for
> casual D users.

ByLine needs to do what it does for performance reasons, so its implementation 
makes a lot of sense, and it really wouldn't make sense for it to do what 
would be necessary to make it work as a normal range (since that would mean 
allocating a new buffer for every line), so if the fact that it's likely to be 
used and then misused by newbies is a big problem, then that's a strong 
argument for making it just use opApply rather than being a proper range. I 
certainly don't think that it makes sense to change how ranges work in general 
just to better support ByLine's weirdness. it would be far too restrictive to 
do so.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Tuesday, October 16, 2012 19:51:09 Tommi wrote:
> On Tuesday, 16 October 2012 at 17:19:26 UTC, Jonathan M Davis
> 
> wrote:
> > 1. There is no generic way to deep copy stuff.

There is no standard function for copying anything. dup and idup are array-
specific. It's a commonly used function name for a clone function on objects, 
but that's just a convention and not even necessarily a consistently used one. 
Some types are value types and so simply making a copy of them does a deep 
copy, whereas others are reference types and making a copy only does a shallow 
copy (hence why save was created for ranges). There's no way to look at a type 
and know how to correctly copy it. The best that could be done would be to 
standardize on dup, in which case it would be defined it for all built-in stuff 
via UFCS, and any user-defined stuff that wanted to be generically copied would 
define it. But we haven't done anything like that.

But really, not even that would be enough, because dup doesn't do a deep copy 
on arrays. It does something in between a shallow copy and a deep copy. It 
copies the array itself rather than the pointer, meaning that you get a 
totally separate array, but it doesn't do a deep copy of the elements, so if 
you had an array of reference types, anything done to the duped array's 
elements (other than assigning them new references) will affect the originals.

> How about a deep-assign operator? The compiler could implicitly
> define it at least for arrays and for structs that don't have
> mutable indirection:
> 
> struct MyStruct
> {
> int _value;
> 
> // implicit:
> ref MyStruct opDeepAssign(ref const MyStruct ms)
> {
> this = ms;
> return this;
> }
> }
> 
> int[] array1 = [1, 2, 3];
> int[] array2;
> array2.opDeepAssign(array1); // array2 = array1.dup;
> 
> And there would be hasDeepAssign!T traits and whatnot.

Something like that could probably be done, but that's not really all that 
different from just defining dup for stuff and having a hasDup trait, which can 
be done without any changes to the language like opDeepAssign would require. 
And again, it doesn't quite do the job due to how dup on arrays work. We'd 
probably need to add a function to the standard library (e.g. deepCopy) which 
would be defined for all built-in types and which user-defined types could 
define 
(with something hasDeepCopy being there to check for it), and then code could 
use that for doing deep copies. So again, it's possible, but it doesn't 
currently exist.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread H. S. Teoh
On Tue, Oct 16, 2012 at 09:21:18PM +0200, Jonathan M Davis wrote:
[...]
> We'd probably need to add a function to the standard library (e.g.
> deepCopy) which would be defined for all built-in types and which
> user-defined types could define (with something hasDeepCopy being
> there to check for it), and then code could use that for doing deep
> copies. So again, it's possible, but it doesn't currently exist.
[...]

I think with D's compile-time introspection capabilities, it _should_ be
possible to implement a generic deepCopy template that works for any
type.

Of course, then one has to address tricky issues such as complex data
structures that have interlinking parts; a blind recursive deep-copy may
not have the desired effect (e.g., if we're deep-copying a graph and
there are multiple paths (via references) to the same node, then we
shouldn't end up with multiple copies of that node). Some care will also
need to be taken to deal with cyclic structures, etc.. And some
optimizations can probably be done to avoid copying immutable objects,
since that would be a waste of time & memory.

Probably some kind of graph traversal algorithm can be used to address
these issues, I think, perhaps with the help of an AA or two to recreate
the original linking structure in the copy.


T

-- 
He who sacrifices functionality for ease of use, loses both and deserves 
neither. -- Slashdotter


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:
> Perhaps mark ranges with an .isTransient property (which can be an enum
> so it doesn't waste any space), and range-based functions that require
> non-transient ranges will refuse to work with a transient range. Or
> alternatively, switch to a different implementation that takes
> transience into account (which may be slower, etc., but the important
> thing is to do it right -- after all, D's motto is safe by default,
> unsafe if you ask for it).

Not a bad idea, though it's still arguably a bit disgusting to potentially 
have to check for that all over the place. Inevitably, most functions won't 
check, and ByLine _still_ won't work. Yes, Phobos would presumably end up 
checking in most cases, but I suspect that little else will. We arguably have 
way to many things to check about ranges as it is. So, I'd be far more tempted 
to just change ByLine to use opApply rather than adding the extra complication 
of isTransient to the standard library just for this one use case.

> I think this approach of what amounts to guessing which ranges are
> safe/unsafe with which functions is what's untenable. We all know that
> people don't read documentation, or at least, not thoroughly. Something
> like this is too easy to miss, and bugs will slip into code unnoticed
> for a long time, and then explode in your face. It's unsafe by default,
> which goes against D's philosophy.

The problem is that what ByLine is doing is incredibly abnormal. So, we're 
talking about affecting how all ranges do things just to satisfy one, 
incredibly abnormal use case. It's just that this one range is heavily used, 
making the fact that it doesn't work normally a problem. But given that even 
using opApply with ByLine would require the programmer to understand how to 
use it correctly (since simply keeping the buffer around after an iteration 
would still break in that case), the programmer has to understand the 
weirdness of ByLine _regardless_, making it so that I don't know that it's 
ultimately all that big a deal that many range-based functions will choke on 
it. The main problem I see with that is similar to the problem with relying on 
CTFE - a simple change to a function could make it so that what _did_ work 
correctly suddenly stops working correctly and will only be caught if that 
particular use case has unit tests (highly unlikely with ByLine).

So, really, I'm in favor of just making ByLine use opApply and make it so that 
it's not a range. That doesn't completely fix the problem, but it reduces its 
scope and doesn't complicate the rest of Phobos in the process.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread H. S. Teoh
On Tue, Oct 16, 2012 at 09:21:11PM +0200, Jonathan M Davis wrote:
> On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:
> > Perhaps mark ranges with an .isTransient property (which can be an
> > enum so it doesn't waste any space), and range-based functions that
> > require non-transient ranges will refuse to work with a transient
> > range. Or alternatively, switch to a different implementation that
> > takes transience into account (which may be slower, etc., but the
> > important thing is to do it right -- after all, D's motto is safe by
> > default, unsafe if you ask for it).
> 
> Not a bad idea, though it's still arguably a bit disgusting to
> potentially have to check for that all over the place. Inevitably,
> most functions won't check, and ByLine _still_ won't work. Yes, Phobos
> would presumably end up checking in most cases, but I suspect that
> little else will. We arguably have way to many things to check about
> ranges as it is. So, I'd be far more tempted to just change ByLine to
> use opApply rather than adding the extra complication of isTransient
> to the standard library just for this one use case.

But like I said, we shouldn't single out ByLine just because it's what I
happened to run into this time. This is part of a larger issue, which is
that currently, we *assume* that ranges will not invalidate the value of
.front when popFront() is called. But many kinds of ranges don't fulfill
this (currently unstated) requirement. ByLine is only one of the more
common examples.

Another example that I've mentioned is a range which returns the
permutations of an array by swapping array elements in-place.

Yet another example, that actually came up in my own code, is an
expression tree structure that represents potentially multiple values
(it contains operators like ±). There is a method that returns a range
of all possible values the tree can take on. Individual values are in
fact vectors, which means .front returns real[]. It is much more
efficient to implement, for example, ± just by flipping the signs of
each component of the vector in-place, than to .dup every single time.
If one is searching for a specific vector, for example, it would be
wasteful to .dup every real[] that gets generated, only to immediately
discard it.

So this is a non-trivial example of a range that modifies the array
returned by .front when you call popFront(), but is otherwise a
perfectly valid range.

And I argue that these "transient ranges" as I call them do have their
place. One can iterate over them, call find() or canFind() on them,
etc., all the normal things one does with ranges, which are what makes
the concept of ranges so powerful. The only thing is that one has to be
careful about saving the value of .front and then calling popFront().

Marking these ranges with some kind of marker, like isTransient, will
help solve this problem. It doesn't have to be done everywhere: just for
the few ranges that exhibit this property. And it only needs to be
checked by functions that require .front to not mutate under them when
they call popFront(). Most functions in Phobos, I suspect, don't even
need to bother with it. So it should be a relatively small change. Seems
to be a win-win situation to me.


> > I think this approach of what amounts to guessing which ranges are
> > safe/unsafe with which functions is what's untenable. We all know
> > that people don't read documentation, or at least, not thoroughly.
> > Something like this is too easy to miss, and bugs will slip into
> > code unnoticed for a long time, and then explode in your face. It's
> > unsafe by default, which goes against D's philosophy.
> 
> The problem is that what ByLine is doing is incredibly abnormal. So,
> we're talking about affecting how all ranges do things just to satisfy
> one, incredibly abnormal use case. It's just that this one range is
> heavily used, making the fact that it doesn't work normally a problem.

I don't agree that what ByLine does is abnormal. If this were indeed the
case, it would greatly limit the scope of ranges and range-based
functions. You wouldn't be able to implement ranges that reuse buffers
(as I described above), and you may end up being forced to use
inefficient implementations just to satisfy the few functions that
assume that .front doesn't mutate when you call popFront(). I don't see
that as an advantageous position to take. Many range-based functions
*do* work with ByLine and other such ranges, so why impose arbitrary
restrictions on them?

Better to just treat the special cases specially (transient ranges +
functions that assume .front can be safely copied), and let the general
case work as it is already working.


[...]
> So, really, I'm in favor of just making ByLine use opApply and make it
> so that it's not a range. That doesn't completely fix the problem, but
> it reduces its scope and doesn't complicate the rest of Phobos in the
> process.
[...]

This is the simplest way out, I suppose, but it does lim

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Philippe Sigaud
On Tue, Oct 16, 2012 at 9:41 PM, H. S. Teoh  wrote:
> On Tue, Oct 16, 2012 at 09:21:18PM +0200, Jonathan M Davis wrote:
> [...]
>> We'd probably need to add a function to the standard library (e.g.
>> deepCopy) which would be defined for all built-in types and which
>> user-defined types could define (with something hasDeepCopy being
>> there to check for it), and then code could use that for doing deep
>> copies. So again, it's possible, but it doesn't currently exist.
> [...]
>
> I think with D's compile-time introspection capabilities, it _should_ be
> possible to implement a generic deepCopy template that works for any
> type.

I agree. Heck, IIRC I posted a putative deep copy code here some years
ago. Today's Phobos and __traits and the recently improved is()
expression would make that even easier.
After all, any (aggregate) value in D can be decomposed into smaller
parts, until the algorithm finds a leaf/terminal: either a value type
or an array of value types.


> Of course, then one has to address tricky issues such as complex data
> structures that have interlinking parts; a blind recursive deep-copy may
> not have the desired effect (e.g., if we're deep-copying a graph and
> there are multiple paths (via references) to the same node, then we
> shouldn't end up with multiple copies of that node). Some care will also
> need to be taken to deal with cyclic structures, etc.. And some
> optimizations can probably be done to avoid copying immutable objects,
> since that would be a waste of time & memory.

Good idea.Detecting immutable seems easy. Doing a graph traversal is a
bit trickier.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread H. S. Teoh
On Tue, Oct 16, 2012 at 11:00:18PM +0200, Philippe Sigaud wrote:
> On Tue, Oct 16, 2012 at 9:41 PM, H. S. Teoh  wrote:
[...]
> > I think with D's compile-time introspection capabilities, it
> > _should_ be possible to implement a generic deepCopy template that
> > works for any type.
> 
> I agree. Heck, IIRC I posted a putative deep copy code here some years
> ago. Today's Phobos and __traits and the recently improved is()
> expression would make that even easier.
> After all, any (aggregate) value in D can be decomposed into smaller
> parts, until the algorithm finds a leaf/terminal: either a value type
> or an array of value types.

I just thought of a potential problem. Right now I don't think there's a
way to tell whether a reference is a "hard" reference or an "external"
reference. For example, if you have a tree in which nodes reference some
global table of objects, you want to deep-copy only the tree nodes, not
the objects they refer to, otherwise you may end up duplicating the
entire global table every time you duplicate a tree.


> > Of course, then one has to address tricky issues such as complex
> > data structures that have interlinking parts; a blind recursive
> > deep-copy may not have the desired effect (e.g., if we're
> > deep-copying a graph and there are multiple paths (via references)
> > to the same node, then we shouldn't end up with multiple copies of
> > that node). Some care will also need to be taken to deal with cyclic
> > structures, etc.. And some optimizations can probably be done to
> > avoid copying immutable objects, since that would be a waste of time
> > & memory.
> 
> Good idea. Detecting immutable seems easy. Doing a graph traversal is
> a bit trickier.

I was thinking probably a bool[void*] should work as a way of keeping
track of what has already been copied, so a simple depth-first traversal
should work. And maybe another T[void*] for looking up cross-edges in
the DFS tree.


T

-- 
One disk to rule them all, One disk to find them. One disk to bring them
all and in the darkness grind them. In the Land of Redmond where the
shadows lie. -- The Silicon Valley Tarot


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread deadalnix

Le 16/10/2012 21:21, Jonathan M Davis a écrit :

On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:

Perhaps mark ranges with an .isTransient property (which can be an enum
so it doesn't waste any space), and range-based functions that require
non-transient ranges will refuse to work with a transient range. Or
alternatively, switch to a different implementation that takes
transience into account (which may be slower, etc., but the important
thing is to do it right -- after all, D's motto is safe by default,
unsafe if you ask for it).


Not a bad idea, though it's still arguably a bit disgusting to potentially
have to check for that all over the place. Inevitably, most functions won't
check, and ByLine _still_ won't work. Yes, Phobos would presumably end up
checking in most cases, but I suspect that little else will. We arguably have
way to many things to check about ranges as it is. So, I'd be far more tempted
to just change ByLine to use opApply rather than adding the extra complication
of isTransient to the standard library just for this one use case.



I have to agree here. Even if I see several uses outside byLine, this is 
known that most code will not take that into account. Adding a property 
isn't going to help.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Era Scarecrow
On Tuesday, 16 October 2012 at 17:06:06 UTC, Jonathan M Davis 
wrote:
There's also no way to validate that front always returns the 
same value, or that popFront actually pops a value, or that it 
pops only one value, etc. Pretty much _all_ that we can verify 
with template constraints is function signatures. So, we can 
_never_ fully restrict range types to exactly what would be 
considered correct.


 An option that may be decided on, is if an enum is used to 
specify certain behavior decisions made by the programmer, then 
only a few new templates to check for those and return true when 
those are specifically true. Perhaps an example to use?



 enum RangeTypes { normal, mutating, popDisgarded, ... }


 template isRangeMutating(T)
 if (hasMember!(T, "rangeType")) {
   enum isRangeMutating = T.rangeType == RangeType.mutating;
 }


//function that cannot accept input range that mutates any 
members due to

//popFront/popBack
void something(I)(I input)
if (isInputRange!(I) && (!isRangeMutating!(I)))
{ /*...*/ }


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Wednesday, October 17, 2012 00:27:08 Era Scarecrow wrote:
> On Tuesday, 16 October 2012 at 17:06:06 UTC, Jonathan M Davis
> 
> wrote:
> > There's also no way to validate that front always returns the
> > same value, or that popFront actually pops a value, or that it
> > pops only one value, etc. Pretty much _all_ that we can verify
> > with template constraints is function signatures. So, we can
> > _never_ fully restrict range types to exactly what would be
> > considered correct.
> 
> An option that may be decided on, is if an enum is used to
> specify certain behavior decisions made by the programmer, then
> only a few new templates to check for those and return true when
> those are specifically true. Perhaps an example to use?
> 
> 
> enum RangeTypes { normal, mutating, popDisgarded, ... }
> 
> 
> template isRangeMutating(T)
> if (hasMember!(T, "rangeType")) {
> enum isRangeMutating = T.rangeType == RangeType.mutating;
> }
> 
> 
> //function that cannot accept input range that mutates any
> members due to
> //popFront/popBack
> void something(I)(I input)
> if (isInputRange!(I) && (!isRangeMutating!(I)))
> { /*...*/ }

We arguably have to many traits to check for ranges already. This is just 
overcomplicating things IMHO. Ranges should not be this complicated. They're 
pushing it in that regard as it is.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jens Mueller
Jonathan M Davis wrote:
> On Tuesday, October 16, 2012 10:58:23 David Gileadi wrote:
> > On 10/16/12 10:28 AM, Jonathan M Davis wrote:
> > > Any range is free to violate this, but because range-based functions are
> > > free to rely on it, such ranges risk not working correctly with many
> > > range-based functions and must be used with caution.
> > 
> > As a D dabbler, it seems wrong to me that a built-in range like byLine,
> > which is often used in example code, should have weird side effects with
> > the built-in methods in std.algorithm. IMO this needs to be fixed one
> > way or another, and a mere documentation change is likely not enough for
> > casual D users.
> 
> ByLine needs to do what it does for performance reasons, so its 
> implementation 
> makes a lot of sense, and it really wouldn't make sense for it to do what 
> would be necessary to make it work as a normal range (since that would mean 
> allocating a new buffer for every line), so if the fact that it's likely to 
> be 
> used and then misused by newbies is a big problem, then that's a strong 
> argument for making it just use opApply rather than being a proper range. I 
> certainly don't think that it makes sense to change how ranges work in 
> general 
> just to better support ByLine's weirdness. it would be far too restrictive to 
> do so.

I've not really looked at byLine. Why do you have to do an allocation
for each line?
byChunk seems to not suffer from this problem (see
http://d.puremagic.com/issues/show_bug.cgi?id=8085). Does this mean
byChunk is less efficient?

I want to add that if byLine only implements opApply, then there should
be some adapter that turns anything which has an opApply into a range.
Then you have to do: byLine().adapter().joiner().

And I also think that the documentation needs to clarify whether
popFront may invalidate t, if t was obtained before via auto t =
r.front.

Jens


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-16 Thread Jonathan M Davis
On Wednesday, October 17, 2012 00:56:30 Jens Mueller wrote:
> I've not really looked at byLine. Why do you have to do an allocation
> for each line?
> byChunk seems to not suffer from this problem (see
> http://d.puremagic.com/issues/show_bug.cgi?id=8085). Does this mean
> byChunk is less efficient?
> 
> I want to add that if byLine only implements opApply, then there should
> be some adapter that turns anything which has an opApply into a range.
> Then you have to do: byLine().adapter().joiner().
> 
> And I also think that the documentation needs to clarify whether
> popFront may invalidate t, if t was obtained before via auto t =
> r.front.

ByLine and ByChunk are in the same boat. If they want to function sanely as 
ranges, then they need to allocate a new buffer for every line/chunk. Neither 
of them do that, because that's inefficient in many cases (particularly when 
you 
don't need to keep a particular line/chunk around for the next iteration). 
Rather, they reuse their buffers. That means that when popFront is called, the 
previous result of front gets overwritten, which screws over a lot of range-
based algorithms. It's normally perfectly safe to assume that the result of 
front stays valid after popFront is called, but that's not the case with 
ByLine or ByChunk, which is what makes them so horrible as ranges.

If they use opApply, they still have the same problem insomuch as anyone using 
them has to understand that they reuse their buffers and that they must 
therefore dup the buffers if they want to keep anything around after an 
iteration, but then at least they won't be being passed to range-based 
functions where they often won't work and will end up with weird, confusing 
results (e.g. try using std.array.array on ByLine). Providing a means of 
getting a range from them is fine, but it needs to allocate a new buffer for 
each element in the range, or there _will_ be problems.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread monarch_dodra
On Tuesday, 16 October 2012 at 23:14:01 UTC, Jonathan M Davis 
wrote:


- Jonathan M Davis


Crazy idea:

Can't we just have a "byLineDeep" or "byLineSafe", but that 
returns actual deep copied immutable strings?


This way, "byLineDeep" is 100% safe, no tricks involved. Ranges 
are not made any more complicated then they already are.


Then, mark "byLine" as "potentially unsafe, but faster. May fail 
if used in an algorithm. Meant for manual iteratation. use at 
your own risk". Then, anybody who *knows* what he is doing can go 
for it.


Of course, I'd have preferred the naming to be the other way 
around (safe by default, unsafe on demand), but things are as 
they are (maybe).




I know I may not be used to using "byLine" yet, but the sole fact 
that it returns a "char[]" as opposed to a "string" as been an 
infinite source of frustration for me.


I'd *gladly* take a 4x penalty hit on something I won't even see 
anyway, if it means I can forget about having to worry about god 
damned aliased buffers.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread bearophile

monarch_dodra:

Can't we just have a "byLineDeep" or "byLineSafe", but that 
returns actual deep copied immutable strings?


See this old enhancement of mine they have closed:
http://d.puremagic.com/issues/show_bug.cgi?id=4474

Bye,
bearophile


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 1:09 AM, H. S. Teoh wrote:

The scary thing is, I see similar code like this all over Phobos. Does
this mean that most of std.algorithm may need to be revised to address
this issue? At the very least, it would seem that a code audit is in
order to weed out this particular issue.


Such issues do happen, are relatively rare, and are virtually untested 
because there's been no unittests with a deliberately "bad" input range. 
Although of course we do need to add the appropriate fixes and 
unittests, I'm not worried about it at all systemically.


Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 1:59 AM, Jonathan M Davis wrote:

On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:

So, I don't really know what the right answer is, but I _really_ don't like
the idea of having to worry about the result of front changing after a call
to popFront in every single function that ever uses front. In the general
case, I just don't see how that's tenable. I'd _much_ rather that it be up
to the programmer using abnormal ranges such as ByLine to use them
correctly.


And actually, it seems to me that issues like this make it look like it was a
mistake to make ranges like ByLine ranges in the first place. They should have
just defined opApply so that they'd work nicely in foreach but not with range-
based functions. They're clearly not going to work with a _lot_ of range-based
functions.


I think integration of pure streams within ranges is important and 
beneficial.


Andrei




Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread H. S. Teoh
On Wed, Oct 17, 2012 at 11:17:08AM -0400, Andrei Alexandrescu wrote:
> On 10/16/12 1:59 AM, Jonathan M Davis wrote:
> >On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:
> >>So, I don't really know what the right answer is, but I _really_
> >>don't like the idea of having to worry about the result of front
> >>changing after a call to popFront in every single function that ever
> >>uses front. In the general case, I just don't see how that's
> >>tenable. I'd _much_ rather that it be up to the programmer using
> >>abnormal ranges such as ByLine to use them correctly.
> >
> >And actually, it seems to me that issues like this make it look like
> >it was a mistake to make ranges like ByLine ranges in the first
> >place. They should have just defined opApply so that they'd work
> >nicely in foreach but not with range- based functions. They're
> >clearly not going to work with a _lot_ of range-based functions.
> 
> I think integration of pure streams within ranges is important and
> beneficial.
[...]

I agree, and I think that's what makes ranges such a powerful concept.
I'd hate to lose it just over implementation details like popFront()
pulling the carpet from under a few functions that assume the
persistence of what .front returns. I'd much rather live with an
isTransient property (which is only needed in a very few cases, and only
needs to be checked in a handful of places where the dependence matters,
and so is not a big inconvenience at all), than to devolve into Java's
wrapper upon adapter upon wrapper design.


T

-- 
Give a man a fish, and he eats once. Teach a man to fish, and he will sit 
forever.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Jonathan M Davis
On Wednesday, October 17, 2012 11:17:08 Andrei Alexandrescu wrote:
> I think integration of pure streams within ranges is important and
> beneficial.

The problem isn't that it's a stream. The problem is that it reuses the buffer 
that it returns, which is great for efficiency but horrible for algorithms 
which 
actually try and use it as a range. Having to worry about front being 
invalidated after a call to popFront would be a big problem - especially when 
that's not at all normal behavior. And in many cases, you _can't_ code 
algorithms in a way that supports that (e.g. if it needs to compare the result 
of two subsequent calls to front). Having to worry about such ranges is just 
going to complicate things even more. We already have enough issues with 
special cases as it is - like save not being called because reference type 
ranges are so rare, and least that's actually part of the range API, whereas 
the idea that front could be invalidated really isn't.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 11:17 AM, H. S. Teoh wrote:

OTOH, if we clearly state that .front may not persist past the next
popFront(), then it would be up to the caller to .dup the return value,
or otherwise make a safe copy of it (or use the value before calling
popFront()).

The current ambiguous situation leads to one range doing one thing, and
another range doing something else, and either way, either this code
will break or that code will break.


Input ranges don't guarantee preservation of .front upon calling 
.popFront. They do allow several to .front (without an intervening call 
to .popFront) that should return the same result (i.e. a one-element 
buffer).


Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread H. S. Teoh
On Wed, Oct 17, 2012 at 12:22:55PM -0400, Andrei Alexandrescu wrote:
> On 10/16/12 11:17 AM, H. S. Teoh wrote:
> >OTOH, if we clearly state that .front may not persist past the next
> >popFront(), then it would be up to the caller to .dup the return
> >value, or otherwise make a safe copy of it (or use the value before
> >calling popFront()).
> >
> >The current ambiguous situation leads to one range doing one thing,
> >and another range doing something else, and either way, either this
> >code will break or that code will break.
> 
> Input ranges don't guarantee preservation of .front upon calling
> .popFront. They do allow several to .front (without an intervening
> call to .popFront) that should return the same result (i.e. a
> one-element buffer).
[...]

This is contrary to what Jonathan has been saying. So which is it:
should .front return a persistent value, or a transient value that may
get invalidated by popFront? I have been assuming it's the latter, but
others have been saying it's the former.

As Jonathan pointed out, some algorithms won't work with transient
.front values. I can think of one: findAdjacent, which requires that the
previous value of .front be preserved (otherwise you couldn't compare it
with the following element -- the template has no reliable way of saving
the previous value since we don't have a reliable way to deep-copy a
possibly complex data structure that might be getting overwritten by
popFront).

Other algorithms, like joiner, currently don't work with transient
.front values, but can be made to work by tweaking the implementation.
People will hate me for saying this, but I think Phobos algorithms
*should* be written to work with minimal expectations, so I don't
consider it unreasonable to expect std.algorithm to work with transient
.front values where possible. (User code is a different matter, of
course). There *are* cases for which it can't work, which is why I
proposed the isTransient property, but people didn't seem to like that
idea.


T

-- 
They pretend to pay us, and we pretend to work. -- Russian saying


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 1:28 PM, Jonathan M Davis wrote:

So, it's fine that ByLine is a range as long as we're willing to put up with it
not working with a lot of range-based functions because of its abnormal
behavior. But I don't think that it's at all reasonable for range-based
functions in general to not be able to rely on front returning the same type
every time or on its value disappearing or becoming invalid in some way after
a call to popFront. That's completely untenable IMHO.


Then what is to you the difference between an input range and a forward 
range?


Andrei



Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread H. S. Teoh
On Wed, Oct 17, 2012 at 12:39:13PM -0400, Andrei Alexandrescu wrote:
> On 10/16/12 1:28 PM, Jonathan M Davis wrote:
> >So, it's fine that ByLine is a range as long as we're willing to put
> >up with it not working with a lot of range-based functions because of
> >its abnormal behavior. But I don't think that it's at all reasonable
> >for range-based functions in general to not be able to rely on front
> >returning the same type every time or on its value disappearing or
> >becoming invalid in some way after a call to popFront. That's
> >completely untenable IMHO.
> 
> Then what is to you the difference between an input range and a
> forward range?
[...]

So what you're saying, is that we cannot rely on the value of .front
after popFront() is called, and that the only way to ensure the validity
of .front is to use a forward range's .save, and access the saved copy's
.front?

It makes sense to me, but then we'd need to revise quite a lot of code
in Phobos, as I've seen quite a number of places where .front is cached
in some local variable or struct field, which would be invalid by your
definition.


T

-- 
This is not a sentence.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread David Nadlinger
On Wednesday, 17 October 2012 at 15:14:44 UTC, Andrei 
Alexandrescu wrote:
Such issues do happen, are relatively rare, and are virtually 
untested because there's been no unittests with a deliberately 
"bad" input range. Although of course we do need to add the 
appropriate fixes and unittests, I'm not worried about it at 
all systemically.


If an abstraction encourages use in a way which leads to 
hard-to-detect logic bugs that do not occur with the most common 
test cases, this is _exactly_ the thing we should be worried 
about!


David


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 3:21 PM, Jonathan M Davis wrote:

Not a bad idea, though it's still arguably a bit disgusting to potentially
have to check for that all over the place. Inevitably, most functions won't
check, and ByLine _still_ won't work. Yes, Phobos would presumably end up
checking in most cases, but I suspect that little else will.


Then "little else" should claim working with input ranges and rely on 
forward ranges semantics.



The problem is that what ByLine is doing is incredibly abnormal.


Works exactly as intended, and is eerily normal.

Andrei



Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 3:07 PM, H. S. Teoh wrote:

Perhaps mark ranges with an .isTransient property


isTransient!R is exactly the same thing as isInputRange!R && 
!isForwardRange!R.


Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 3:41 PM, H. S. Teoh wrote:

I think with D's compile-time introspection capabilities, it _should_ be
possible to implement a generic deepCopy template that works for any
type.

Of course, then one has to address tricky issues such as complex data
structures that have interlinking parts; a blind recursive deep-copy may
not have the desired effect (e.g., if we're deep-copying a graph and
there are multiple paths (via references) to the same node, then we
shouldn't end up with multiple copies of that node). Some care will also
need to be taken to deal with cyclic structures, etc.. And some
optimizations can probably be done to avoid copying immutable objects,
since that would be a waste of time&  memory.

Probably some kind of graph traversal algorithm can be used to address
these issues, I think, perhaps with the help of an AA or two to recreate
the original linking structure in the copy.


Yes, deepDup should be implementable as a library and use a temporary 
dictionary of already-duplicated items to avoid infinite recursion.


We should add that to Phobos - could you please add a task to trello.


Andrei



Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread monarch_dodra

On Wednesday, 17 October 2012 at 17:15:14 UTC, Andrei
Alexandrescu wrote:

On 10/16/12 3:07 PM, H. S. Teoh wrote:

Perhaps mark ranges with an .isTransient property


isTransient!R is exactly the same thing as isInputRange!R && 
!isForwardRange!R.


Andrei


Technically, (isTransient!R) is just a subset of (isInputRange!R
&& !isForwardRange!R)

"byDChar" (if it existed), would be a perfectly valid
non-transient input only range.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 5:38 PM, H. S. Teoh wrote:

I was thinking probably a bool[void*] should work as a way of keeping
track of what has already been copied, so a simple depth-first traversal
should work.


Should be void*[void*] to also get the actual new location.

Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/16/12 5:51 PM, deadalnix wrote:

Le 16/10/2012 21:21, Jonathan M Davis a écrit :

On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:

Perhaps mark ranges with an .isTransient property (which can be an enum
so it doesn't waste any space), and range-based functions that require
non-transient ranges will refuse to work with a transient range. Or
alternatively, switch to a different implementation that takes
transience into account (which may be slower, etc., but the important
thing is to do it right -- after all, D's motto is safe by default,
unsafe if you ask for it).


Not a bad idea, though it's still arguably a bit disgusting to
potentially
have to check for that all over the place. Inevitably, most functions
won't
check, and ByLine _still_ won't work. Yes, Phobos would presumably end up
checking in most cases, but I suspect that little else will. We
arguably have
way to many things to check about ranges as it is. So, I'd be far more
tempted
to just change ByLine to use opApply rather than adding the extra
complication
of isTransient to the standard library just for this one use case.



I have to agree here. Even if I see several uses outside byLine, this is
known that most code will not take that into account. Adding a property
isn't going to help.


The property is already there - .save. The question boils down to 
whether input ranges should be acceptable as ranges or not.


Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/17/12 5:59 AM, bearophile wrote:

monarch_dodra:


Can't we just have a "byLineDeep" or "byLineSafe", but that returns
actual deep copied immutable strings?


See this old enhancement of mine they have closed:
http://d.puremagic.com/issues/show_bug.cgi?id=4474


You closed it.

Also that bug has nothing to do with this topic because you can't 
compile the buggy code and have it silently do the wrong thing.



Andrei



Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/17/12 12:39 PM, H. S. Teoh wrote:

On Wed, Oct 17, 2012 at 12:22:55PM -0400, Andrei Alexandrescu wrote:

On 10/16/12 11:17 AM, H. S. Teoh wrote:

OTOH, if we clearly state that .front may not persist past the next
popFront(), then it would be up to the caller to .dup the return
value, or otherwise make a safe copy of it (or use the value before
calling popFront()).

The current ambiguous situation leads to one range doing one thing,
and another range doing something else, and either way, either this
code will break or that code will break.


Input ranges don't guarantee preservation of .front upon calling
.popFront. They do allow several to .front (without an intervening
call to .popFront) that should return the same result (i.e. a
one-element buffer).

[...]

This is contrary to what Jonathan has been saying. So which is it:
should .front return a persistent value, or a transient value that may
get invalidated by popFront? I have been assuming it's the latter, but
others have been saying it's the former.


The latter for input ranges, the former for anything stronger.


As Jonathan pointed out, some algorithms won't work with transient
.front values. I can think of one: findAdjacent, which requires that the
previous value of .front be preserved (otherwise you couldn't compare it
with the following element -- the template has no reliable way of saving
the previous value since we don't have a reliable way to deep-copy a
possibly complex data structure that might be getting overwritten by
popFront).


findAdjacent correctly restricts itself to work only with forward ranges.


Other algorithms, like joiner, currently don't work with transient
.front values, but can be made to work by tweaking the implementation.
People will hate me for saying this, but I think Phobos algorithms
*should* be written to work with minimal expectations, so I don't
consider it unreasonable to expect std.algorithm to work with transient
.front values where possible. (User code is a different matter, of
course). There *are* cases for which it can't work, which is why I
proposed the isTransient property, but people didn't seem to like that
idea.


No isTransient is needed.


Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/17/12 12:53 PM, H. S. Teoh wrote:

On Wed, Oct 17, 2012 at 12:39:13PM -0400, Andrei Alexandrescu wrote:

On 10/16/12 1:28 PM, Jonathan M Davis wrote:

So, it's fine that ByLine is a range as long as we're willing to put
up with it not working with a lot of range-based functions because of
its abnormal behavior. But I don't think that it's at all reasonable
for range-based functions in general to not be able to rely on front
returning the same type every time or on its value disappearing or
becoming invalid in some way after a call to popFront. That's
completely untenable IMHO.


Then what is to you the difference between an input range and a
forward range?

[...]

So what you're saying, is that we cannot rely on the value of .front
after popFront() is called, and that the only way to ensure the validity
of .front is to use a forward range's .save, and access the saved copy's
.front?


Affirmative.


It makes sense to me, but then we'd need to revise quite a lot of code
in Phobos, as I've seen quite a number of places where .front is cached
in some local variable or struct field, which would be invalid by your
definition.


Yes. Only code that ostensibly works with input ranges needs to be reviewed.


Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/17/12 12:57 PM, David Nadlinger wrote:

On Wednesday, 17 October 2012 at 15:14:44 UTC, Andrei Alexandrescu wrote:

Such issues do happen, are relatively rare, and are virtually untested
because there's been no unittests with a deliberately "bad" input
range. Although of course we do need to add the appropriate fixes and
unittests, I'm not worried about it at all systemically.


If an abstraction encourages use in a way which leads to hard-to-detect
logic bugs that do not occur with the most common test cases, this is
_exactly_ the thing we should be worried about!


When I designed input ranges vs. forward ranges there were long 
discussion on how to distinguish them. If you have a better design it 
would probably not influence the state of the affair, but it would be a 
good discussion to have. FWIW I can already think of a couple but all 
rely on UFCS.


Andrei




Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Andrei Alexandrescu

On 10/17/12 1:29 PM, monarch_dodra wrote:

On Wednesday, 17 October 2012 at 17:15:14 UTC, Andrei
Alexandrescu wrote:

On 10/16/12 3:07 PM, H. S. Teoh wrote:

Perhaps mark ranges with an .isTransient property


isTransient!R is exactly the same thing as isInputRange!R &&
!isForwardRange!R.

Andrei


Technically, (isTransient!R) is just a subset of (isInputRange!R
&& !isForwardRange!R)

"byDChar" (if it existed), would be a perfectly valid
non-transient input only range.


Depends on implementation - popFront may actually wipe the character.

Andrei


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread monarch_dodra
On Wednesday, 17 October 2012 at 18:13:42 UTC, Andrei 
Alexandrescu wrote:

On 10/17/12 1:29 PM, monarch_dodra wrote:

On Wednesday, 17 October 2012 at 17:15:14 UTC, Andrei
Alexandrescu wrote:

On 10/16/12 3:07 PM, H. S. Teoh wrote:

Perhaps mark ranges with an .isTransient property


isTransient!R is exactly the same thing as isInputRange!R &&
!isForwardRange!R.

Andrei


Technically, (isTransient!R) is just a subset of 
(isInputRange!R

&& !isForwardRange!R)

"byDChar" (if it existed), would be a perfectly valid
non-transient input only range.


Depends on implementation - popFront may actually wipe the 
character.


Andrei


"byDChar" (if it existed), *could* be a perfectly valid 
non-transient input only range, if the implementation returns the 
character by value. :)


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Jonathan M Davis
On Wednesday, October 17, 2012 12:39:13 Andrei Alexandrescu wrote:
> On 10/16/12 1:28 PM, Jonathan M Davis wrote:
> > So, it's fine that ByLine is a range as long as we're willing to put up
> > with it not working with a lot of range-based functions because of its
> > abnormal behavior. But I don't think that it's at all reasonable for
> > range-based functions in general to not be able to rely on front
> > returning the same type every time or on its value disappearing or
> > becoming invalid in some way after a call to popFront. That's completely
> > untenable IMHO.
> 
> Then what is to you the difference between an input range and a forward
> range?

Whether you can get a copy of the range. If you call save, you can save its 
current state and have two copies of the range to operate on separately, 
That's completely different from whether front can be kept around or not. It's 
perfectly possible to have an input range whose previous front does not get 
invalidated by a call to popFront. ByLine would be that way if it allocated a 
new buffer instead of reusing the old one. It just doesn't do that because it's 
less efficient to do so.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread bearophile

Andrei Alexandrescu:


You closed it.

Also that bug has nothing to do with this topic because you 
can't compile the buggy code and have it silently do the wrong 
thing.



Andrei


Sorry.

A solution is to introduce a "byLineFast" that is similar to the 
current "byLine", and make "byLine" copy.


Another solution is to keep only "byLine" and give it a 
compile-time boolean argument to disable the copy, that is active 
on default.


Bye,
bearophile


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Jonathan M Davis
On Wednesday, October 17, 2012 13:15:12 Andrei Alexandrescu wrote:
> On 10/16/12 3:07 PM, H. S. Teoh wrote:
> > Perhaps mark ranges with an .isTransient property
> 
> isTransient!R is exactly the same thing as isInputRange!R &&
> !isForwardRange!R.

In what way? There is nothing anywhere that states that an input range's front 
could be invalidated upon a call to popFront. There's nothing inherent in the 
fact that range can't be saved that makes it so that front must be invalidated 
upon a call to popFront. You could easily define an input range of bytes over a 
file where front is perfectly valid after a call to popFront, because it's a 
value type. You could also define ByLine so that its front is perfectly valid 
after a call to popFront. It's just that it requires that a new buffer be 
allocated instead of reusing the buffer. That has nothing to do with the fact 
that its incapable of copying its state such that you could have multiple 
copies of it being iterated over separately. I don't see anything in the 
concept or definition of input ranges which implies that front would be 
invalidated by a call to popFront.

I'm increasingly convinced that input ranges which are not forward ranges are 
useless for pretty much anything other than foreach. Far too much requires 
that you be able to save the current state - and most stuff _inherently_ 
requires it such that it's not simply a question of implementing the function 
differently. And adding even further restrictions on input ranges just makes it 
worse. It actually wouldn't hurt my feelings one whit if we got rid of the 
idea of input ranges entirely. It's perfectly possible to implement ranges 
like ByLine as forward ranges. It just requires a bit more work. But they'd be 
_way_ more useful if they were. In my experience the only time that you don't 
need to dup what ByLine or ByChunk gives you is when all you need is foreach, 
and if that's really the case, then opApply can be used for foreach, and 
ByLine and ByChunk can be defined in ways that actually allow them to not only 
not invalidate the result of previous front calls but to actually be full-on 
forward ranges, making them actually useful for things beyond foreach.

Regardless, there's nothing in how input ranges are currently defined which 
indicates that front would ever be invalidated for _any_ type of range, and 
ByLine and ByChunk are pretty much the only ranges I've ever seen which 
invalidate previous calls to front. So, I don't see how you could think that 
they're anything but abnormal. And if you really want to argue that whether 
front can be invalidated or not is somehow part of the difference between an 
input range and a forward range, then the documentation on that needs to make 
that _very_ clear, and it's going to be that much worse to deal with input 
ranges which aren't forward ranges.

- Jonathan M Davis


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread monarch_dodra
On Wednesday, 17 October 2012 at 19:14:12 UTC, Jonathan M Davis 
wrote:
On Wednesday, October 17, 2012 12:39:13 Andrei Alexandrescu 
wrote:

On 10/16/12 1:28 PM, Jonathan M Davis wrote:
> So, it's fine that ByLine is a range as long as we're 
> willing to put up
> with it not working with a lot of range-based functions 
> because of its
> abnormal behavior. But I don't think that it's at all 
> reasonable for
> range-based functions in general to not be able to rely on 
> front
> returning the same type every time or on its value 
> disappearing or
> becoming invalid in some way after a call to popFront. 
> That's completely

> untenable IMHO.

Then what is to you the difference between an input range and 
a forward

range?


Whether you can get a copy of the range. If you call save, you 
can save its
current state and have two copies of the range to operate on 
separately,
That's completely different from whether front can be kept 
around or not. It's
perfectly possible to have an input range whose previous front 
does not get
invalidated by a call to popFront. ByLine would be that way if 
it allocated a
new buffer instead of reusing the old one. It just doesn't do 
that because it's

less efficient to do so.

- Jonathan M Davis


This.

When you think about it, byLine *could* be a ForwardRange (it 
could save its pget position in the file), allowing it to 
backtrack, but that *still* wouldn't prevent it from overwriting 
its own internal buffer on calls to front: That is just a detail 
of its implementation.


A range's "Input-ness" or "Forward-ness" is completely orthogonal 
from the returned front's transitiveness.




Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread monarch_dodra

On Wednesday, 17 October 2012 at 19:42:39 UTC, bearophile wrote:

Andrei Alexandrescu:


You closed it.

Also that bug has nothing to do with this topic because you 
can't compile the buggy code and have it silently do the wrong 
thing.



Andrei


Sorry.

A solution is to introduce a "byLineFast" that is similar to 
the current "byLine", and make "byLine" copy.


Another solution is to keep only "byLine" and give it a 
compile-time boolean argument to disable the copy, that is 
active on default.


Bye,
bearophile


Given that "byLine" already exists, I'm not sure we can change it 
now. But I wouldn't be against adding a "byLineSlow" or something.


However, if we could start again, I'd *definitely* favor a deep 
copying "byLine" by default, and have a faster, but harder to use 
"byLineFast".


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread Timon Gehr

On 10/17/2012 07:20 PM, Andrei Alexandrescu wrote:

On 10/16/12 3:41 PM, H. S. Teoh wrote:

I think with D's compile-time introspection capabilities, it _should_ be
possible to implement a generic deepCopy template that works for any
type.

Of course, then one has to address tricky issues such as complex data
structures that have interlinking parts; a blind recursive deep-copy may
not have the desired effect (e.g., if we're deep-copying a graph and
there are multiple paths (via references) to the same node, then we
shouldn't end up with multiple copies of that node). Some care will also
need to be taken to deal with cyclic structures, etc.. And some
optimizations can probably be done to avoid copying immutable objects,
since that would be a waste of time&  memory.

Probably some kind of graph traversal algorithm can be used to address
these issues, I think, perhaps with the help of an AA or two to recreate
the original linking structure in the copy.


Yes, deepDup should be implementable as a library and use a temporary
dictionary of already-duplicated items to avoid infinite recursion.

We should add that to Phobos - could you please add a task to trello.


Andrei



A deepDup is missing information about what is representation and what
is just a reference kept in order to access external data, so it is
quite useless.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-17 Thread H. S. Teoh
On Wed, Oct 17, 2012 at 12:55:56PM -0700, Jonathan M Davis wrote:
[...]
> I'm increasingly convinced that input ranges which are not forward
> ranges are useless for pretty much anything other than foreach. Far
> too much requires that you be able to save the current state - and
> most stuff _inherently_ requires it such that it's not simply a
> question of implementing the function differently.

It's perfectly possible to implement joiner, chain, find, count, cmp,
equal, until, filter, map, reduce, without assuming that the value
returned by .front is persistent. Just to name a few. In fact, it's even
possible to implement cartesianProduct in which one of the ranges is an
input range.  I'd hardly call that useless.


> And adding even further restrictions on input ranges just makes it
> worse. It actually wouldn't hurt my feelings one whit if we got rid of
> the idea of input ranges entirely.

The motivating example for input ranges, at least according to TDPL, is
find(). There's nothing about find() that precludes non-forward input
ranges. A lot would be missing from the usefulness of ranges if we were
forced to only use forward ranges.


[...]
> Regardless, there's nothing in how input ranges are currently defined
> which indicates that front would ever be invalidated for _any_ type of
> range, and ByLine and ByChunk are pretty much the only ranges I've
> ever seen which invalidate previous calls to front. So, I don't see
> how you could think that they're anything but abnormal.

I can think of quite a few situations in which it's useful to not assume
that the return value of .front is persistent, which I've already
mentioned before: in-place array permutation, reused buffers for complex
computations, etc..


> And if you really want to argue that whether front can be invalidated
> or not is somehow part of the difference between an input range and a
> forward range, then the documentation on that needs to make that
> _very_ clear, and it's going to be that much worse to deal with input
> ranges which aren't forward ranges.
[...]

I think I'm not so sure about Andrei's lumping input ranges with
persistent return values from .front together with forward ranges. Some
algorithms, like findAdjacent, do not need a forward range, but they do
need a persistent .front. I do not like the idea of artificially
limiting the scope of findAdjacent just because you can't assume input
ranges' .front returns a persistent value. Like somebody else mentioned,
whether .front is transient or not is orthogonal to whether the range is
an input range or a forward range. There can be ranges whose .front is
persistent, but they can't be forward ranges for practical reasons.


T

-- 
Having a smoking section in a restaurant is like having a peeing section
in a swimming pool. -- Edward Burr 


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-18 Thread Don Clugston

On 17/10/12 23:41, H. S. Teoh wrote:

On Wed, Oct 17, 2012 at 12:55:56PM -0700, Jonathan M Davis wrote:
[...]

I'm increasingly convinced that input ranges which are not forward
ranges are useless for pretty much anything other than foreach. Far
too much requires that you be able to save the current state - and
most stuff _inherently_ requires it such that it's not simply a
question of implementing the function differently.


It's perfectly possible to implement joiner, chain, find, count, cmp,
equal, until, filter, map, reduce, without assuming that the value
returned by .front is persistent. Just to name a few. In fact, it's even
possible to implement cartesianProduct in which one of the ranges is an
input range.  I'd hardly call that useless.



And adding even further restrictions on input ranges just makes it
worse. It actually wouldn't hurt my feelings one whit if we got rid of
the idea of input ranges entirely.


The motivating example for input ranges, at least according to TDPL, is
find(). There's nothing about find() that precludes non-forward input
ranges. A lot would be missing from the usefulness of ranges if we were
forced to only use forward ranges.


[...]

Regardless, there's nothing in how input ranges are currently defined
which indicates that front would ever be invalidated for _any_ type of
range, and ByLine and ByChunk are pretty much the only ranges I've
ever seen which invalidate previous calls to front. So, I don't see
how you could think that they're anything but abnormal.


I can think of quite a few situations in which it's useful to not assume
that the return value of .front is persistent, which I've already
mentioned before: in-place array permutation, reused buffers for complex
computations, etc..



And if you really want to argue that whether front can be invalidated
or not is somehow part of the difference between an input range and a
forward range, then the documentation on that needs to make that
_very_ clear, and it's going to be that much worse to deal with input
ranges which aren't forward ranges.

[...]

I think I'm not so sure about Andrei's lumping input ranges with
persistent return values from .front together with forward ranges. Some
algorithms, like findAdjacent, do not need a forward range, but they do
need a persistent .front. I do not like the idea of artificially
limiting the scope of findAdjacent just because you can't assume input
ranges' .front returns a persistent value. Like somebody else mentioned,
whether .front is transient or not is orthogonal to whether the range is
an input range or a forward range. There can be ranges whose .front is
persistent, but they can't be forward ranges for practical reasons.


Is it actually orthogonal? Is it possible for a forward range to be 
transient?


Or is it an intermediate concept?
TransientInputRange -> NonTransientInputRange -> ForwardRange





Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-18 Thread monarch_dodra

On Thursday, 18 October 2012 at 07:09:04 UTC, Don Clugston wrote:

On 17/10/12 23:41, H. S. Teoh wrote:

Is it actually orthogonal? Is it possible for a forward range 
to be transient?


Or is it an intermediate concept?
TransientInputRange -> NonTransientInputRange -> ForwardRange


Save just means the range can save its position. If it is 
returning via a buffer, Forward of not, it is going to be 
transient.


Take this forward range, that returns the strings "A", "B" and 
"C" ad infinitum:


//
enum _ABC = "ABC";

struct ABC
{
char[1] buf = _ABC[0];
size_t i;

enum empty = false;
@property char[] front(){return buf;}
void popFront()
{
++i;
buf[0] = _ABC[i%3];
}
@property ABC save()
{
return this;
}
}
//

This is a perfectly valid range, which you can save, but the 
returned string is transient:


//
void main()
{
  ABC abc;

  writeln("Printing 10 elements: ");
  abc.take(10).writeln('\n');

  writeln("Duplicating range");
  auto abc2 = abc.save;
  abc.popFront;
  foreach(v; zip(abc, abc2).take(5))
write("[", v[0], ", ", v[1], "]");
  writeln('\n');

  writeln("Prnting two consecutive elements:");
  auto first = abc.front;
  abc.popFront();
  auto second = abc.front;
  writeln("[", first, ", ", second, "]");
}
//

Produces:

//
Printing 10 elements:
["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"]

Duplicating range
[B, A][C, B][A, C][B, A][C, B]

Prnting two consecutive elements:
[C, C]
//

As you can see, you can perfectly iterate.
You can perfectly save the range. The saved range can be used to 
backtrack.
But if you attempt to store two consecutive fronts, things don't 
go well.


The same holds true for a Random Access range BTW.

Iteration and transient-ness of returned value are orthogonal 
concepts


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-18 Thread monarch_dodra
On Wednesday, 17 October 2012 at 19:56:08 UTC, Jonathan M Davis 
wrote:

[SNIP]
I'm increasingly convinced that input ranges which are not 
forward ranges are

useless for pretty much anything other than foreach.
[SNIP]
- Jonathan M Davis


That's already a pretty big usage :) The thing is you just can't 
"do" anything with them, but that *is* the design.


Just read them once to place them into another container. The 
fact they interface with, say "array", or "appender", or copy, 
makes the interface convenient.


The fact that byLine will choke on a call to "array", IMO, has 
nothing to do with it being a forward range.


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-18 Thread H. S. Teoh
On Thu, Oct 18, 2012 at 09:09:03AM +0200, Don Clugston wrote:
> On 17/10/12 23:41, H. S. Teoh wrote:
[...]
> >I think I'm not so sure about Andrei's lumping input ranges with
> >persistent return values from .front together with forward ranges.
> >Some algorithms, like findAdjacent, do not need a forward range, but
> >they do need a persistent .front. I do not like the idea of
> >artificially limiting the scope of findAdjacent just because you
> >can't assume input ranges' .front returns a persistent value. Like
> >somebody else mentioned, whether .front is transient or not is
> >orthogonal to whether the range is an input range or a forward range.
> >There can be ranges whose .front is persistent, but they can't be
> >forward ranges for practical reasons.
> 
> Is it actually orthogonal? Is it possible for a forward range to be
> transient?
[...]

What about a range over all permutations of an array, that modifies the
array in-place? It can be a forward range by having .save copy the
current state of the array, but .front is transient nonetheless.


T

-- 
Life is too short to run proprietary software. -- Bdale Garbee


Re: Tricky semantics of ranges & potentially numerous Phobos bugs

2012-10-18 Thread Marco Leise
Am Wed, 17 Oct 2012 22:09:08 +0200
schrieb "monarch_dodra" :

> Given that "byLine" already exists, I'm not sure we can change it 
> now. But I wouldn't be against adding a "byLineSlow" or something.
> 
> However, if we could start again, I'd *definitely* favor a deep 
> copying "byLine" by default, and have a faster, but harder to use 
> "byLineFast".

I agree. And simple demo programs can just use

byLine => string

and if we talk about a fast "word count" demo, then it
probably doesn't hurt when the reader sees, that the library
provides ranges for both use cases.

byLineOverwrite => char[]

After all a line is expected to be a string, and D to be safe.
But the real issue are the differing views on how .front
should work. Unlike other problems, this one has solutions
that wont break code, if that is a requirement. So I'll let
the Phobos crew argue and see what happens. :)

-- 
Marco