I've looked at it only briefly (it's the end of the semester and grading
is due soon).

> 
> 
>     I would *love* to be proven wrong on this, but I think it's rare to
>     be able to get decent parallelization in practice using futures. You
>     may have better results using places, but it will depend on how the
>     amount of processing for a unit compares to the overhead of
>     communicating with the places i.e. you may get better results with 2
>     places than with 8 due to place communication overhead. In your
>     case, if it's easy for the places to input their own sets of
>     parameters, then the place overhead may be small since I think each
>     place would simply need to communicate its best value.
> 

This is not even remotely true, I am using futures to get 100%
utilization on all cores available. The current situation is that it
takes quite some effort to leverage futures to get there.

A few generic remarks first. Arbitrary partitioning does not work well
with futures. I always partition the work based on the processor-count
with something like:

(define futures-depth (make-parameter (inexact->exact (ceiling (log
(processor-count) 2)))))

(define-syntax (define-futurized stx)
  (syntax-case stx ()
    ((_ (proc start end) body ...)
     #'(begin
         (define max-depth (futures-depth))
         (define (proc start end (depth 0))
           (cond ((fx< depth max-depth)
                  (define mid (fx+ start (fxrshift (fx- end start) 1)))
                  (let ((f (future
                            (λ ()
                              (proc start mid (fx+ depth 1))))))
                    (proc mid end (fx+ depth 1))
                    (touch f)))
                 (else
                  body ...)))))))

Of course all those fx+, fx- and fx< must be unsafe versions from
racket/unsafe/ops.

Second problem is the allocation of flonums. The inner part of the loop
looks like even with flonums inlining it triggers the allocator more
than often. With CS just forget about this before the inlined flonums
are merged. In the meantime, you can drop the for/fold and use flvector
to store and accumulate whatever you need. Using futures-vizualizer is a
good start.

I'll look into it later this week. But generally you need to stick to
unsafe ops and avoid the allocator.

> 
> 
> While this may be true, it is also the case that the design of futures
> is such that incremental work on the primitives turns into incremental
> ability to parallelize programs. So while it is likely to be more work
> today, it may also be the case that people putting effort in to help
> their own programs will help us turn the corner here. Perhaps this is a
> place where an interested contributor can help us out a lot!

It is on the list :)


Dominik

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/16fc95cf-5e46-50b6-2d9d-0e336ff1be37%40trustica.cz.

Reply via email to