Lazy caching map()?

2018-03-09 Thread H. S. Teoh via Digitalmars-d
Today I found myself needing a lazy, caching version of map() on an
array.  More precisely, given an array `T[] src` of source data and a
function func(T) that's pretty expensive to compute, return an object
`result` such that:

- result[i] == func(src[i]), for 0 ≤ i < src.length.

- If result[j] is never actually used, func(src[j]) is never invoked
  (lazy).

- If result[j] is referenced multiple times, a cached value is returned
  after the first time, i.e., the result of func(src[j]) is cached, and
  func(src[j]) is never invoked more than once.

I couldn't figure out how to build this using Phobos primitives, so I
wrote my own implementation of it.  Pretty simple, really, it's just a
wrapper struct that lazily initializes an array of Nullable!T and lazily
populates it with func(j) when opIndex(j) is invoked, and just returns
the cached value if opIndex(j) has been invoked before.

Can this be done using current Phobos primitives?


T

-- 
Don't throw out the baby with the bathwater. Use your hands...


Re: Lazy caching map()?

2018-03-09 Thread Jonathan M Davis via Digitalmars-d
On Friday, March 09, 2018 11:41:46 H. S. Teoh via Digitalmars-d wrote:
> Today I found myself needing a lazy, caching version of map() on an
> array.  More precisely, given an array `T[] src` of source data and a
> function func(T) that's pretty expensive to compute, return an object
> `result` such that:
>
> - result[i] == func(src[i]), for 0 ≤ i < src.length.
>
> - If result[j] is never actually used, func(src[j]) is never invoked
>   (lazy).
>
> - If result[j] is referenced multiple times, a cached value is returned
>   after the first time, i.e., the result of func(src[j]) is cached, and
>   func(src[j]) is never invoked more than once.
>
> I couldn't figure out how to build this using Phobos primitives, so I
> wrote my own implementation of it.  Pretty simple, really, it's just a
> wrapper struct that lazily initializes an array of Nullable!T and lazily
> populates it with func(j) when opIndex(j) is invoked, and just returns
> the cached value if opIndex(j) has been invoked before.
>
> Can this be done using current Phobos primitives?

Wasn't that what std.algorithm.iteration.cache was supposed to solve? I've
never used it, so I don't know how well it fits what you need, but IIRC,
using it with map was basically the use that it was invented for.

- Jonathan M Davis




Re: Lazy caching map()?

2018-03-09 Thread H. S. Teoh via Digitalmars-d
On Fri, Mar 09, 2018 at 12:47:14PM -0700, Jonathan M Davis via Digitalmars-d 
wrote:
> On Friday, March 09, 2018 11:41:46 H. S. Teoh via Digitalmars-d wrote:
> > [...]  More precisely, given an array `T[] src` of source data and a
> > function func(T) that's pretty expensive to compute, return an
> > object `result` such that:
> >
> > - result[i] == func(src[i]), for 0 ≤ i < src.length.
> >
> > - If result[j] is never actually used, func(src[j]) is never invoked
> >   (lazy).
> >
> > - If result[j] is referenced multiple times, a cached value is returned
> >   after the first time, i.e., the result of func(src[j]) is cached, and
> >   func(src[j]) is never invoked more than once.
[...]
> > Can this be done using current Phobos primitives?
> 
> Wasn't that what std.algorithm.iteration.cache was supposed to solve?
> I've never used it, so I don't know how well it fits what you need,
> but IIRC, using it with map was basically the use that it was invented
> for.
[...]

The problem with cache() is that it does not return a random-access
range, and for my use case random-access is mandatory, because I cannot
predict ahead of time which indices will be used.  Worse yet, cache()
eagerly evaluates .front and .back, which is a no-no in my case (it
should not compute elements that are not asked for).

These two limitations defeats the entire purpose, since no RA means I
have to iterate to find the elements I want, and eager evaluation means
func() will be computed as I iterate, even if I never call .front. So
basically the entire result array will be computed.

It's also not trivial to extend cache() to handle random access, because
that means it has to allocate in order to store cached elements, whereas
the current implementation statically allocates space in the returned
range to store cached .front and .back elements.


T

-- 
An imaginary friend squared is a real enemy.


Re: Lazy caching map()?

2018-03-10 Thread aliak via Digitalmars-d

On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
Today I found myself needing a lazy, caching version of map() 
on an array.  More precisely, given an array `T[] src` of 
source data and a function func(T) that's pretty expensive to 
compute, return an object `result` such that:


- result[i] == func(src[i]), for 0 ≤ i < src.length.

- If result[j] is never actually used, func(src[j]) is never 
invoked

  (lazy).

- If result[j] is referenced multiple times, a cached value is 
returned
  after the first time, i.e., the result of func(src[j]) is 
cached, and

  func(src[j]) is never invoked more than once.

I couldn't figure out how to build this using Phobos 
primitives, so I wrote my own implementation of it.  Pretty 
simple, really, it's just a wrapper struct that lazily 
initializes an array of Nullable!T and lazily populates it with 
func(j) when opIndex(j) is invoked, and just returns the cached 
value if opIndex(j) has been invoked before.


Can this be done using current Phobos primitives?


T


Unless I'm misunderstanding your requirements, are you looking 
for std.functionl.memoize?


auto fun(int a) {
writeln("here");
return a + 1;
}

void main() {
auto a = [1, 2, 3];
auto b = a.map!(memoize!fun);

writeln(b[1]);
writeln(b[1]);
writeln(b[1]);
}

will print "here" just once.


Re: Lazy caching map()?

2018-03-13 Thread H. S. Teoh via Digitalmars-d
On Sat, Mar 10, 2018 at 08:54:16PM +, aliak via Digitalmars-d wrote:
> On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
> > [...]  More precisely, given an array `T[] src` of source data and a
> > function func(T) that's pretty expensive to compute, return an
> > object `result` such that:
> > 
> > - result[i] == func(src[i]), for 0 ≤ i < src.length.
> > 
> > - If result[j] is never actually used, func(src[j]) is never invoked
> >   (lazy).
> > 
> > - If result[j] is referenced multiple times, a cached value is
> >   returned after the first time, i.e., the result of func(src[j]) is
> >   cached, and func(src[j]) is never invoked more than once.
[...]
> > Can this be done using current Phobos primitives?
[...]
> Unless I'm misunderstanding your requirements, are you looking for
> std.functionl.memoize?
> 
> auto fun(int a) {
> writeln("here");
> return a + 1;
> }
> 
> void main() {
> auto a = [1, 2, 3];
> auto b = a.map!(memoize!fun);
> 
> writeln(b[1]);
> writeln(b[1]);
> writeln(b[1]);
> }
> 
> will print "here" just once.

Very nice. Using memoize did occur to me, but I needed an array
interface to it.  Didn't think of using memoize with map, for some
reason.  Thanks for the idea!

However, looking at the implementation of memoize, it seems that it's
caching the result globally, with very little control over the cache
except the size. I wonder if there's a way to control it better, i.e.,
free the cache once there are no more references to it, and have the
cache size depend on the data size.  Basically, in my use case, once I
map an array it's not predictable in advance how many elements will be
accessed (i.e., it's hard to decide on an optimal cache size for
memoize), but this access will happen pretty soon afterwards, and then
the array will be discarded (i.e., no point holding on old entries in
the cache).  I would prefer that the cache will be cleared once the
mapped array has been GC'd, but memoize() seems to hold on to the cached
results indefinitely.


T

-- 
Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen


Re: Lazy caching map()?

2018-03-13 Thread aliak via Digitalmars-d

On Tuesday, 13 March 2018 at 17:24:35 UTC, H. S. Teoh wrote:
Very nice. Using memoize did occur to me, but I needed an array 
interface to it.  Didn't think of using memoize with map, for 
some reason.  Thanks for the idea!


However, looking at the implementation of memoize, it seems 
that it's caching the result globally, with very little control 
over the cache except the size. I wonder if there's a way to 
control it better, i.e., free the cache once there are no more 
references to it, and have the cache size depend on the data 
size.  Basically, in my use case, once I map an array it's not 
predictable in advance how many elements will be accessed 
(i.e., it's hard to decide on an optimal cache size for 
memoize), but this access will happen pretty soon afterwards, 
and then the array will be discarded (i.e., no point holding on 
old entries in the cache).  I would prefer that the cache will 
be cleared once the mapped array has been GC'd, but memoize() 
seems to hold on to the cached results indefinitely.



T


No worries! And ugh, can't think off the top of my head how to 
improve it other than to make it a type and give it scope so that 
it can die at some point. But that would make it uglier to use as 
well.


Btw, I just saw someone posted a link to an old forum post of 
yours: 
https://forum.dlang.org/post/mailman.2562.1403196857.2907.digitalmar...@puremagic.com


Mind if I add that (or a version of it) to a library I'm writing? 
(it's an optional type on dub)


Cheers
- Ali


Re: Lazy caching map()?

2018-03-13 Thread H. S. Teoh via Digitalmars-d
On Wed, Mar 14, 2018 at 12:12:44AM +, aliak via Digitalmars-d wrote:
[...]
> Btw, I just saw someone posted a link to an old forum post of yours: 
> https://forum.dlang.org/post/mailman.2562.1403196857.2907.digitalmar...@puremagic.com
> 
> Mind if I add that (or a version of it) to a library I'm writing?
> (it's an optional type on dub)
[...]

Sure, go right ahead!  Glad you found it useful!


T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees 
and try again."


Re: Lazy caching map()?

2018-03-19 Thread Graham Fawcett via Digitalmars-d

On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
Today I found myself needing a lazy, caching version of map() 
on an array.  More precisely, given an array `T[] src` of 
source data and a function func(T) that's pretty expensive to 
compute, return an object `result` such that:


[...]


Map the sources to std.parallelism.Task instances, and 
yieldForce() the instances when you need the results?


Graham


Re: Lazy caching map()?

2018-03-19 Thread Graham Fawcett via Digitalmars-d

On Monday, 19 March 2018 at 16:47:16 UTC, Graham Fawcett wrote:

On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
Today I found myself needing a lazy, caching version of map() 
on an array.  More precisely, given an array `T[] src` of 
source data and a function func(T) that's pretty expensive to 
compute, return an object `result` such that:


[...]


Map the sources to std.parallelism.Task instances, and 
yieldForce() the instances when you need the results?


On second thought, that won't work. AFIAK there's no way to take 
an unscheduled task and force it to execute on the current thread 
immediately and yield the result. But with such an operation, you 
could have a thread-local future/promise type that would meet 
your needs.


Graham



Re: Lazy caching map | Mir version

2018-03-18 Thread 9il via Digitalmars-d

On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
Today I found myself needing a lazy, caching version of map() 
on an array.  More precisely, given an array `T[] src` of 
source data and a function func(T) that's pretty expensive to 
compute, return an object `result` such that:


- result[i] == func(src[i]), for 0 ≤ i < src.length.

- If result[j] is never actually used, func(src[j]) is never 
invoked

  (lazy).

- If result[j] is referenced multiple times, a cached value is 
returned
  after the first time, i.e., the result of func(src[j]) is 
cached, and

  func(src[j]) is never invoked more than once.

Can this be done using current Phobos primitives?


T


Phobos implementation based on memoize would use hash map 
internally, which is slow for this use case.


mir-algorithm v0.9.5 (DUB would be updated soon) got `cached` and 
`cachedGC` topologies.


Example:

// Mir's map is required
import mir.ndslice: cachedGC, map, sliced;

auto ar = new int[5];
// init ar ...
auto cachedLazyMap = ar.map!fun.cachedGC;

`cachedGC` [1] internally allocates a vector of caches and a 
packed bitarray.


`cached` [2] allows to reuse already allocated buffers.

The result support all random access range primitive, slicing and 
etc.


ndslice architecture allows to add new random access topologies 
very easily.


[1] 
http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.cachedGC
[2] 
http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.cached


Best regards,
Ilya Yaroshenko