Re: David Simcha's std.parallelism

bearophile Sun, 02 Jan 2011 09:30:16 -0800

dsimcha:

> Andrei:
> > * I think it does make sense to evaluate a parallel map lazily by using
> > a finite buffer. Generally map looks the most promising so it may be
> > worth investing some more work in it to make it "smart lazy".
> 
> Can you elaborate on this?  I'm not sure what you're suggesting.


I think Andrei is talking about vectorized lazyness, I have explained the idea 
here two times in past. This isn't a replacement for the fully eager parallel 
map. Instead of computing the whole resulting array in parallel, you compute 
only a chunk of the result, in parallel, and you store it. When the code that 
uses the data lazily has exhausted that chunk, the lazy parallel map computes 
the next chunk and stores it inside, and so on.

Each chunk is large enough that performing it in parallel is advantageous, but 
not large enough to require a lot of memory.

An option is even self-tuning, let the library find the chunk size by itself, 
according to how much time each item computation (mapping function call) 
requires (this is an optional behaviour).

If you have a read-only memory mapped file that is readable from several 
threads in parallel, the map may perform some operation on the lines/records of 
the file. If the file is very large or huge, and you want to collect/summarize 
(reduce) the results of the mapping functions in some way, then a lazy parallel 
map is useful :-) This looks like a special case, but lot of heavy file 
processing (1 - 5000 gigabytes of data) can be done with this schema 
(map-reduce).

Bye,
bearophile

Re: David Simcha's std.parallelism

Reply via email to