Re: Spread work onto multiple threads (in pure Clojure)

Tassilo Horn Tue, 11 Oct 2011 11:22:05 -0700

Lee Spector <lspec...@hampshire.edu> writes:

Hi Lee,


>> So indeed, one starts with the number of available processors + 2,
>> and one single longer running task will wipe out any parallelism. :-(
>> 
>> IMO, that's not what 99% of the users would expect nor want when
>> calling (pmap foo coll).  I'd vote for making pmap eager in the sense
>> that it should always try to keep n threads working as long as more
>> tasks are available.  Clearly, the current implementation is useful
>> when your tasks are equally expensive, i.e., the costs don't depend
>> on the argument.
>
> I can imagine cases in which one *would* want the current behavior of
> pmap, especially with infinite lazy sequences, so I wouldn't go quite
> as far as you are going here.

You are right, I didn't think about infinite seqs.  So pmap should be
lazy in order to terminate at all and to be a drop-in replacement for
map.  But isn't there a way to always keep n submissions to the thread
pool ahead from the actual realization?  Of course, that would mean that
in the case when you don't realize all elements, you have calculated n
elements too much.

> But I agree that many users will often want an eager version that
> maximizes CPU utilization, and that they may expect pmap to do that.
> In fact that was what I wanted (and will usually want), and it's what
> I assumed pmap would do

Agreed.  Now you've convinced me that pmap shouldn't be eager, but there
should be an eager version.

> (even though, now that I look afresh at the doc string, it pretty
> clearly says that it *doesn't* do this).

Sort of:

  Semi-lazy in that the parallel computation stays ahead of the
  consumption, but doesn't realize the entire result unless required.

If I understand the code correctly (see my last mail), then the part
that the "parallel computation stays ahead of the computation" is not
true.  It starts parallel but converges to sequential evaluation.

> So my hope wouldn't be that we change pmap but rather that we provide
> something else that is just as simple to use but provides eager, "use
> all available cores to get the job done as fast as possible" behavior
> with a simple pmap-like interface.

Yes.

BTW: Am I the only one who sometimes would also like to have an eager
sequential map variant?  At the weekend I've written a proof-of-concept
evaluator that gets some syntax graph of some query language and some
graph on which the query should be evaluated.  The evaluation model is
simple syntax-driven recursive evaluation, i.e., to calculate the result
of some node in the syntax graph, one simply evaluates the children and
combines the results.  There are also nodes that declare variables with
value ranges, where the child subgraph is then evaluated once for each
possible variable binding.  That felt very natural to implement using a
^:dynamic hash-map *vars* holding the current binding in that scope and
`binding' to change the current one.  Something along the lines of:

  (for [b bindings]
    (binding [*vars* (merge *vars* b)]
       ;; evaluate the children
       ))

However, one has to force realization of any lazy seq in order to be
sure that the calculation is performed in the dynamic extent of the
surrounding variable binding.  So in the sketch above, there's a `doall'
around the `for'.

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Spread work onto multiple threads (in pure Clojure)

Reply via email to