Re: Can't recreate a range?

2020-04-30 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/30/20 6:39 PM, Paul Backus wrote:

On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:
Also, for ranges based on generator functions, if .front is lazy then 
you need to keep extra baggage around your range to indicate whether 
or not the generator has been invoked yet; it's easier to just always 
compute the next element eagerly and cache it, and .front just returns 
the cached data.


std.range.generate is actually a perfect example of the problem with 
doing work in popFront. Because it has to call popFront both on 
construction *and* after every element, consuming n elements will call 
the generator function n+1 times. The final call is completely wasted.


generate used to do everything on front. But that meant it wasn't a true 
range as multiple calls to front generated different data (popFront was 
a noop). I fixed it a while ago.


It should be the status quo to do all work in popFront, and then you 
should wrap if you need different behavior.


I'm ok with something like this (I'm kind of surprised something like 
this doesn't exist already):


struct LazyRange(R)
{
   R src;
   bool frontEvaluated;
   void sync() {
   if(!frontEvaluated) {
  src.popFront;
  frontEvaluated = true;
   }
   }
   auto front() {
   sync();
   return src.front;
   }
   bool empty() {
   sync();
   return src.empty;
   }
   void popFront() {
   sync();
   frontEvaluated = false;
   }
}

-Steve


Re: Can't recreate a range?

2020-04-30 Thread Paul Backus via Digitalmars-d-learn

On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:
On Thu, Apr 30, 2020 at 06:05:55PM +, Paul Backus via 
Digitalmars-d-learn wrote: [...]
Doing work in popFront instead of front is usually an 
anti-pattern, since it forces eager evaluation of the next 
element even when that element is never used. You should only 
do this if there's no reasonable way to avoid it.


Really?? IME, usually you *need* to do work in popFront instead 
of front, because otherwise .empty wouldn't be able to tell 
whether there is another element or not.  E.g., in filtering a 
range based on some criterion on each element, you can't defer 
computing the next element until .front because you can't 
predict whether there will be another element that won't be 
dropped by popFront.


There are certainly cases where it can't be avoided. Filter is 
one; file I/O (as Steven Schveighoffer pointed out) is another 
one. Obviously, you gotta do what you gotta do. Still, I think 
that as long as work *can* be deferred to .front, it should be. 
That's the essence of lazy evaluation: only do your computation 
once you're absolutely sure it's necessary.


Also, for ranges based on generator functions, if .front is 
lazy then you need to keep extra baggage around your range to 
indicate whether or not the generator has been invoked yet; 
it's easier to just always compute the next element eagerly and 
cache it, and .front just returns the cached data.


std.range.generate is actually a perfect example of the problem 
with doing work in popFront. Because it has to call popFront both 
on construction *and* after every element, consuming n elements 
will call the generator function n+1 times. The final call is 
completely wasted.


Re: Can't recreate a range?

2020-04-30 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/30/20 2:30 PM, H. S. Teoh wrote:

On Thu, Apr 30, 2020 at 06:05:55PM +, Paul Backus via Digitalmars-d-learn 
wrote:
[...]

Doing work in popFront instead of front is usually an anti-pattern,
since it forces eager evaluation of the next element even when that
element is never used. You should only do this if there's no
reasonable way to avoid it.


Really?? IME, usually you *need* to do work in popFront instead of
front, because otherwise .empty wouldn't be able to tell whether there
is another element or not.  E.g., in filtering a range based on some
criterion on each element, you can't defer computing the next element
until .front because you can't predict whether there will be another
element that won't be dropped by popFront.

Also, for ranges based on generator functions, if .front is lazy then
you need to keep extra baggage around your range to indicate whether or
not the generator has been invoked yet; it's easier to just always
compute the next element eagerly and cache it, and .front just returns
the cached data.

Even when the range involves some expensive per-element computation, I
find that it's simpler to just compute and cache in .popFront instead of
adding extra baggage to .front to know when the computation has already
been performed.

I'm hard-pressed to come up with an example where deferring computation
to .front is a good idea!


There could be expensive operations done to construct the element that 
aren't necessary to iterate the range, but generally you achieve this by 
using something like map and cache.


But really the only valid use case for this is if you create a range but 
never use it. Some may need this, but I think a wrapper would be a 
better solution than enforcing a certain usage pattern.


-Steve


Re: Can't recreate a range?

2020-04-30 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/30/20 2:05 PM, Paul Backus wrote:

On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer wrote:
I would say part of the issue is that you are doing all your work in 
front and not popFront.



[...]


I'd say:

1. move your work to the popFront function (you then need to call 
popFront once before returning the range in your factory method).

2. change empty to check if the buffer is empty instead of the input.


Doing work in popFront instead of front is usually an anti-pattern, 
since it forces eager evaluation of the next element even when that 
element is never used. You should only do this if there's no reasonable 
way to avoid it.


In the general scheme of things, this only makes sense for the first 
front access or empty access. From then on, popFront should do the work. 
In most cases, you can't tell a range is empty unless you've done the 
work to determine the next element. After you call popFront, you need to 
call empty before calling front, or else you may be using an invalid 
range. So all the work is going to be done anyway.


A good example, byLine. How do you know that there is no more data until 
you attempt to read the next line from the stream? At which point, you 
have to do all the work.


-Steve

P.S. I see H.S. Teoh said much of the same thing, but I had already 
typed this out before reading it ;)


Re: Can't recreate a range?

2020-04-30 Thread Casey via Digitalmars-d-learn

On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:
On Thu, Apr 30, 2020 at 06:05:55PM +, Paul Backus via 
Digitalmars-d-learn wrote: [...]
Doing work in popFront instead of front is usually an 
anti-pattern, since it forces eager evaluation of the next 
element even when that element is never used. You should only 
do this if there's no reasonable way to avoid it.


Really?? IME, usually you *need* to do work in popFront instead 
of front, because otherwise .empty wouldn't be able to tell 
whether there is another element or not.  E.g., in filtering a 
range based on some criterion on each element, you can't defer 
computing the next element until .front because you can't 
predict whether there will be another element that won't be 
dropped by popFront.


Also, for ranges based on generator functions, if .front is 
lazy then you need to keep extra baggage around your range to 
indicate whether or not the generator has been invoked yet; 
it's easier to just always compute the next element eagerly and 
cache it, and .front just returns the cached data.


Even when the range involves some expensive per-element 
computation, I find that it's simpler to just compute and cache 
in .popFront instead of adding extra baggage to .front to know 
when the computation has already been performed.


I'm hard-pressed to come up with an example where deferring 
computation to .front is a good idea!



T


Well, I just remembered that I had a tab open to this: 
http://ddili.org/ders/d.en/ranges.html


Reading, you see the following:

* empty: specifies whether the range is empty; it must return 
true when the range is considered to be empty, and false otherwise
* front: provides access to the element at the beginning of the 
range
* popFront(): shortens the range from the beginning by removing 
the first element


Looking at that, if popFront shortens the range, then to me it 
sounds like the work should be done in popFront.


Re: Can't recreate a range?

2020-04-30 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Apr 30, 2020 at 06:05:55PM +, Paul Backus via Digitalmars-d-learn 
wrote:
[...]
> Doing work in popFront instead of front is usually an anti-pattern,
> since it forces eager evaluation of the next element even when that
> element is never used. You should only do this if there's no
> reasonable way to avoid it.

Really?? IME, usually you *need* to do work in popFront instead of
front, because otherwise .empty wouldn't be able to tell whether there
is another element or not.  E.g., in filtering a range based on some
criterion on each element, you can't defer computing the next element
until .front because you can't predict whether there will be another
element that won't be dropped by popFront.

Also, for ranges based on generator functions, if .front is lazy then
you need to keep extra baggage around your range to indicate whether or
not the generator has been invoked yet; it's easier to just always
compute the next element eagerly and cache it, and .front just returns
the cached data.

Even when the range involves some expensive per-element computation, I
find that it's simpler to just compute and cache in .popFront instead of
adding extra baggage to .front to know when the computation has already
been performed.

I'm hard-pressed to come up with an example where deferring computation
to .front is a good idea!


T

-- 
A mathematician is a device for turning coffee into theorems. -- P. Erdos


Re: Can't recreate a range?

2020-04-30 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/30/20 2:03 PM, Casey wrote:
Interesting.  I'll take this into account.  I was putting the work into 
front because I didn't want to do the work until it was requested.  
Putting the work in popFront makes more sense in some ways, but the fact 
you have to call it before getting any records seems like it would break 
normal range algorithms.  (Please correct me if I'm wrong)  I'm 
wondering if putting the work into it's own method and calling it one 
from the constructor and from popFront the rest of the way.


You don't necessarily have to do it in a separate function, because 
until you return it from your factory method (the only way you can 
create such a thing), it technically doesn't need to be "sane".


So what I mean is at the bottom of your function you have:

auto readStream(Range)(auto ref Range r) if (isInputRange!(Unqual!Range))
{
   ... // define your struct

   auto result = StreamRange(r);
   if(!r.empty) result.popFront(); // prime range
   return result;
}

There is a valid argument to be made about not doing any work until 
requested. You could do this inside front, but in that case, I would say 
you have an additional boolean that checks if it has ever been run, and 
run popFront once (setting that bool to false). But I don't necessarily 
like that, because that puts an extra burden on front. front is supposed 
to be non-mutating, called as much as you need (ideally it should be 
inout). Practically, you can make it do work, but really the bulk of the 
work should be done in popFront.


Note that you will have to check for priming in popFront as well, 
because if you did for instance:


auto r = readStream(...);
r.popFront(); // skip first element?

This would not do what is expected. In fact, thinking about it some 
more, you should be able to call popFront as many times as you want and 
have it skip that many elements without calling front once. Your range 
will not do that.


-Steve


Re: Can't recreate a range?

2020-04-30 Thread Paul Backus via Digitalmars-d-learn
On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer 
wrote:
I would say part of the issue is that you are doing all your 
work in front and not popFront.



[...]


I'd say:

1. move your work to the popFront function (you then need to 
call popFront once before returning the range in your factory 
method).
2. change empty to check if the buffer is empty instead of the 
input.


Doing work in popFront instead of front is usually an 
anti-pattern, since it forces eager evaluation of the next 
element even when that element is never used. You should only do 
this if there's no reasonable way to avoid it.


Re: Can't recreate a range?

2020-04-30 Thread Casey via Digitalmars-d-learn
On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer 
wrote:
I would say part of the issue is that you are doing all your 
work in front and not popFront.


What happens is that Appender is a pointer-to-implementation 
struct, and the compiler will allocate the first one shared 
amongst all initial StreamRange instances.


On your first call to front, it's going to utilize that shared 
one. Then on the call to popFront, it will reset it to another 
one.


For the second unittest, in your first call to front, it 
notices that it's already been filled, so it doesn't do any 
work (and returns the existing buffer).


Interesting.  I'll take this into account.  I was putting the 
work into front because I didn't want to do the work until it was 
requested.  Putting the work in popFront makes more sense in some 
ways, but the fact you have to call it before getting any records 
seems like it would break normal range algorithms.  (Please 
correct me if I'm wrong)  I'm wondering if putting the work into 
it's own method and calling it one from the constructor and from 
popFront the rest of the way.


another problem, your empty condition is based on the input, 
which violates range expectations. indeed, on the first call to 
front, the range all of a sudden becomes empty.


I was about to argue the point, but after thinking about the use 
of popFront to execute the work, that would work.


3. You don't need a further Range parameter for the nested 
struct, it can use the template parameter from the containing 
function.
4. Make it a static struct, or else it will retain a pointer to 
the function stack frame needlessly.


I'll definitely try that out.  Regarding 3, I think that was in 
the example code I used, so I just went with it.




Re: Idomatic way to guarantee to run destructor?

2020-04-30 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/30/20 12:55 PM, Robert M. Münch wrote:
For ressource management I mostly use this pattern, to ensure the 
destructor is run:


void myfunc(){
  MyClass X = new MyClass(); scope(exit) X.destroy;
}

I somewhere read, this would work too:

void myfunc(){
  auto MyClass X = new MyClass();
}

What does this "auto" does here? Wouldn't

void myfunc(){
  auto X = new MyClass();
}

be sufficient? And would this construct guarantee that the destructor is 
run? And if, why does "auto" has this effect, while just using "new" 
doesn't guarantee to run the destructor?


No, auto is declaring that there's about to be a variable here. In 
actuality, auto does nothing in the first case, it just means local 
variable. But without the type name, the type is inferred (i.e. your 
second example). This does not do any automatic destruction of your 
class, it's still left to the GC.


You can use scope instead of auto, and it will then allocate the class 
on the stack, and destroy it as Ben Jones said. There is danger there, 
however, as it's very easy to store a class reference elsewhere, and 
then you have a dangling pointer.


A safer thing to do is:

auto X = new MyClass();
scope(exit) destroy(X);

This runs the destructor and makes the class instance unusable, but does 
not free the memory (so any remaining references, if used, will not 
corrupt memory).


If your concern is guaranteeing destructors are run, that's what I would 
pick. If in addition you want guaranteed memory cleanup, then use scope 
(and be careful).


-Steve


Re: Idomatic way to guarantee to run destructor?

2020-04-30 Thread Ben Jones via Digitalmars-d-learn

On Thursday, 30 April 2020 at 16:55:36 UTC, Robert M. Münch wrote:
For ressource management I mostly use this pattern, to ensure 
the destructor is run:


void myfunc(){
 MyClass X = new MyClass(); scope(exit) X.destroy;
}

I somewhere read, this would work too:

void myfunc(){
 auto MyClass X = new MyClass();
}

What does this "auto" does here? Wouldn't

void myfunc(){
 auto X = new MyClass();
}

be sufficient? And would this construct guarantee that the 
destructor is run? And if, why does "auto" has this effect, 
while just using "new" doesn't guarantee to run the destructor?



I think you want to use scope rather than auto which will put the 
class on the stack and call its destructor: 
https://dlang.org/spec/attribute.html#scope


Idomatic way to guarantee to run destructor?

2020-04-30 Thread Robert M. Münch via Digitalmars-d-learn
For ressource management I mostly use this pattern, to ensure the 
destructor is run:


void myfunc(){
 MyClass X = new MyClass(); scope(exit) X.destroy;
}

I somewhere read, this would work too:

void myfunc(){
 auto MyClass X = new MyClass();
}

What does this "auto" does here? Wouldn't

void myfunc(){
 auto X = new MyClass();
}

be sufficient? And would this construct guarantee that the destructor 
is run? And if, why does "auto" has this effect, while just using "new" 
doesn't guarantee to run the destructor?


--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



Re: Can't recreate a range?

2020-04-30 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/30/20 11:42 AM, Casey wrote:

On Thursday, 30 April 2020 at 13:23:25 UTC, Paul Backus wrote:
Using a default value like this means that it will be shared among all 
instances of the struct. Instead, you should set `buff = 
appender!string` in the constructor, so that each struct will have its 
own appender.


I'll give it a try when I get back to it (fixing lint issues), but are 
you sure that's the issue?  In popFront, I recreate the appender.  So, 
the appender should be clear before the empty check after it processes 
the last of the data from _input.  Still a good thing to do as a best 
practice; I'm just wondering if something else is causing an issue and 
even if initializing the appender in the constructor solves the problem, 
I'd be suspect of future issues cropping up.


I would say part of the issue is that you are doing all your work in 
front and not popFront.


What happens is that Appender is a pointer-to-implementation struct, and 
the compiler will allocate the first one shared amongst all initial 
StreamRange instances.


On your first call to front, it's going to utilize that shared one. Then 
on the call to popFront, it will reset it to another one.


For the second unittest, in your first call to front, it notices that 
it's already been filled, so it doesn't do any work (and returns the 
existing buffer).


another problem, your empty condition is based on the input, which 
violates range expectations. indeed, on the first call to front, the 
range all of a sudden becomes empty.


I'd say:

1. move your work to the popFront function (you then need to call 
popFront once before returning the range in your factory method).

2. change empty to check if the buffer is empty instead of the input.

Some other observations:

3. You don't need a further Range parameter for the nested struct, it 
can use the template parameter from the containing function.
4. Make it a static struct, or else it will retain a pointer to the 
function stack frame needlessly.


-Steve


Re: Can't recreate a range?

2020-04-30 Thread Casey via Digitalmars-d-learn

On Thursday, 30 April 2020 at 15:42:03 UTC, Casey wrote:
I'll give it a try when I get back to it (fixing lint issues), 
but are you sure that's the issue?  In popFront, I recreate the 
appender.  So, the appender should be clear before the empty 
check after it processes the last of the data from _input.  
Still a good thing to do as a best practice; I'm just wondering 
if something else is causing an issue and even if initializing 
the appender in the constructor solves the problem, I'd be 
suspect of future issues cropping up.


O.K.  It's working now, but I suspect there may still be 
something else going on.  The more I think of it, the more I 
suspect there's a newline at the end of the file that isn't 
accounted for until after the last tag is read from the file, so 
the input is not empty, so my code's empty check fails.


Regardless, thanks for the help!


Re: Can't recreate a range?

2020-04-30 Thread Casey via Digitalmars-d-learn

On Thursday, 30 April 2020 at 13:23:25 UTC, Paul Backus wrote:
Using a default value like this means that it will be shared 
among all instances of the struct. Instead, you should set 
`buff = appender!string` in the constructor, so that each 
struct will have its own appender.


I'll give it a try when I get back to it (fixing lint issues), 
but are you sure that's the issue?  In popFront, I recreate the 
appender.  So, the appender should be clear before the empty 
check after it processes the last of the data from _input.  Still 
a good thing to do as a best practice; I'm just wondering if 
something else is causing an issue and even if initializing the 
appender in the constructor solves the problem, I'd be suspect of 
future issues cropping up.


Re: in vs inout

2020-04-30 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/30/20 10:52 AM, Adam D. Ruppe wrote:

On Thursday, 30 April 2020 at 14:00:40 UTC, Arredondo wrote:
I had been using inout for some time now for "purely input function 
parameters".


`inout` is more specifically for things you take in, look at, then pass 
back out. So it forms part of your return value.


We have discovered actually that inout has some nice properties even if 
you aren't returning it.


For example:

void rearrange(inout(int)*[] arr);

This can actually accept const(int)*[], int*[], and immutable(int)*[] 
with one function. The function can edit the array without editing any 
of the values pointed at.


But in terms of 'in' vs. 'inout', I'd say 'in' unless you have a 
specific reason to use inout (the above included).


-Steve


Re: in vs inout

2020-04-30 Thread Adam D. Ruppe via Digitalmars-d-learn

On Thursday, 30 April 2020 at 14:00:40 UTC, Arredondo wrote:
I had been using inout for some time now for "purely input 
function parameters".


`inout` is more specifically for things you take in, look at, 
then pass back out. So it forms part of your return value.


`const` is for when you are just looking without returning any of 
it.



scope inout may be more appropriate if you are going to return it 
at all...


in vs inout

2020-04-30 Thread Arredondo via Digitalmars-d-learn
The recent change log for v2.092.0 Beta says that with the new 
implementation for the `in` storage class:


`in` should be the storage class of choice for purely input 
function parameters.


I had been using inout for some time now for "purely input 
function parameters". Is there a case to be made in favor of 
const? why would be scope const preferred over scope inout?


Cheers!
Arredondo.


Re: Can't recreate a range?

2020-04-30 Thread Simen Kjærås via Digitalmars-d-learn

On Thursday, 30 April 2020 at 13:23:25 UTC, Paul Backus wrote:

On Thursday, 30 April 2020 at 13:04:47 UTC, Casey wrote:

Here's a minimal code example that duplicates the issue:

import std.array, std.range, std.stdio, std.traits, std.string;

auto readStream(Range)(auto ref Range r) if 
(isInputRange!(Unqual!Range))

{
struct StreamRange(Range)
{
alias R = Unqual!Range;
R _input;

auto buff = appender!string;


Using a default value like this means that it will be shared 
among all instances of the struct. Instead, you should set 
`buff = appender!string` in the constructor, so that each 
struct will have its own appender.


Yup, that's the one. No need to assign it at all, in fact - the 
line can be replaced with


Appender!string buff;

And things just work.

--
  Simen


Re: Can't recreate a range?

2020-04-30 Thread Paul Backus via Digitalmars-d-learn

On Thursday, 30 April 2020 at 13:04:47 UTC, Casey wrote:

Here's a minimal code example that duplicates the issue:

import std.array, std.range, std.stdio, std.traits, std.string;

auto readStream(Range)(auto ref Range r) if 
(isInputRange!(Unqual!Range))

{
struct StreamRange(Range)
{
alias R = Unqual!Range;
R _input;

auto buff = appender!string;


Using a default value like this means that it will be shared 
among all instances of the struct. Instead, you should set `buff 
= appender!string` in the constructor, so that each struct will 
have its own appender.


Re: Can't recreate a range?

2020-04-30 Thread Casey via Digitalmars-d-learn

On Wednesday, 29 April 2020 at 22:32:00 UTC, Simen Kjærås wrote:
I mean, it might be you messed up in posting this, but having 
an empty popFront and expecting it to do something is a tad 
optimistic.


I was just trying to get the example to a very minimal state.  I 
just added more descriptive code that replicates the bug.




What's the value of count when the code asserts?


The count ends up being the count of the previous test + the 
count of the current test.  In the code I uploaded, the first 
test will get 1, but the second test will get 2.  I'm printing 
out the output of the range to force it to actually perform the 
loop, so you can see that the contents of the first file are 
printed twice.


Re: Can't recreate a range?

2020-04-30 Thread Casey via Digitalmars-d-learn

Here's a minimal code example that duplicates the issue:

import std.array, std.range, std.stdio, std.traits, std.string;

auto readStream(Range)(auto ref Range r) if 
(isInputRange!(Unqual!Range))

{
struct StreamRange(Range)
{
alias R = Unqual!Range;
R _input;

auto buff = appender!string;

this(R input)
{
this._input = input;
}

bool empty()
{
return this._input.empty;
}

string front()
{
if (buff.capacity == 0)
{
bool iterate = true;
bool doCapture = false;
buff.reserve(1000);

while (iterate)
{
if (this._input.empty)
break;

auto value = this._input.front;
if (value.strip == "")
{
doCapture = true;
buff.put(value.strip);
buff.put("\n");
}
else if (value.strip == "")
{
buff.put(value.strip);
doCapture = false;
iterate = false;
}
else if (doCapture)
{
buff.put(value.strip);
buff.put("\n");
}
this._input.popFront();
}
}
return buff[];
}

void popFront()
{
buff = appender!string;
}
}

return StreamRange!(Range)(r);
}

unittest
{
auto range = readStream(File("test1.xml").byLine);
int count = 0;
while (!range.empty)
{
writeln(range.front);
count++;
writeln("Current count: ", count);
range.popFront();
}
assert(count == 1);
}

unittest
{
auto range = readStream(File("test2.xml").byLine);
int count = 0;
while (!range.empty)
{
writeln(range.front);
count++;
writeln("Current count: ", count);
range.popFront();
}
assert(count == 1);
}


Here are the two XML files (very simple):



Here is some text for the first file.




Here is some text for the second file.


I've even tried putting wrapping the code that is being tested in 
a function and looping over it with different parameters 
(filename and expected count) without any luck either.  I've also 
tried declaring the file handle separate and ensuring it was 
closed.  No luck.