I was trying to create a much more elaborate example when Matthew sent
his tiny one which is enough to show the problem.

I started a 64core machine on aws to show the issue.

I see a massive degradation as the number of places increases.

I use this slightly modified code:
#lang racket

(define (go n)
  (place/context p
         (let ([v (vector 0.0)])
           (let loop ([i 3000000000])
             (unless (zero? i)
               (vector-set! v 0 (+ (vector-ref v 0) 1.0))
               (loop (sub1 i)))))
         (printf "Place ~a done~n" n)
         n))

(module+ main
  (define cores
    (command-line
     #:args (cores)
     (string->number cores)))

  (time
   (map place-wait
        (for/list ([i (in-range cores)])
          (printf "Starting core ~a~n" i)
          (go i)))))

Here's the results in the video (might take a few minutes until it is live):
https://youtu.be/cDe_KF6nmJM

The guide says about places:
"The place form creates a place, which is effectively a new Racket
instance that can run in parallel to other places, including the initial
place."

I think this is misleading at the moment. If this behaviour can be
'fixed' then great, if not I will have to redesign my system to use
'subprocess' to start another racket process and a footnote should be
added to places in documentation to alert the users about this behaviour.

Matthew, Sam, do you understand why this is happening?

On 05/10/2018 16:51, Sam Tobin-Hochstadt wrote:
> I tried this same program on my desktop, which also has 4 (i7-4770)
> cores with hyperthreading. Here's what I see:
> 
> [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
> plt] time r ~/Downloads/p.rkt 1
> N: 1, cpu: 5808/5808.0, real: 5804
> [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
> plt] time r ~/Downloads/p.rkt 2
> N: 2, cpu: 12057/6028.5, real: 6063
> [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
> plt] time r ~/Downloads/p.rkt 3
> N: 3, cpu: 23377/7792.333333333333, real: 7914
> [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
> plt] time r ~/Downloads/p.rkt 4
> N: 4, cpu: 41155/10288.75, real: 10357
> [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
> plt] time r ~/Downloads/p.rkt 6
> N: 6, cpu: 89932/14988.666666666666, real: 15687
> [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
> plt] time r ~/Downloads/p.rkt 8
> N: 8, cpu: 165152/20644.0, real: 21104
> 
> Real time goes up about 80% from 1-4 places, and then doubles again
> from 4 to 8. System time for 8 places is also about 10x what it is for
> 2 places, but only gets up to 2 seconds.
> On Fri, Oct 5, 2018 at 10:32 AM Matthew Flatt <mfl...@cs.utah.edu> wrote:
>>
>> At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote:
>>> Again, I am really surprised that you mention that places are not
>>> separate processes. Documentation does say they are separate racket
>>> virtual machines, how is this accomplished if not by using separate
>>> processes?
>>
>> Each place is an OS thread within the Racket process. The virtual
>> machine is essentially instantiated once in each thread, where things
>> that look like global variables at the C level are actually
>> thread-local variables to make them place-specific. Still, there is
>> some sharing among the threads.
>>
>>> My workers are really doing Z3 style work - number crushing and lots of
>>> searching. No IO (writing to disk) or communication so I would expect
>>> them to really max out all CPUs.
>>
>> My best guess is that it's memory-allocation bottlenecks, probably at
>> the point of using mmap() and mprotect(). Maybe things don't scale well
>> beyond the 4-core machines that I use.
>>
>> On my machines, the enclosed program can max out CPU use with system
>> time being a small fraction. It scales ok from 1 to 4 places (i.e.,
>> real time increased only some). The machine's core are hyperthreaded,
>> and the example maxes out CPU utilization at 8 --- but it takes twice
>> as long in real time, so the hardware threads don't help much in this
>> case. Running two processes with 4 places takes about the same real
>> time as running one process with 8 places, as does 2 processes with 2
>> places.
>>
>> Do you see similar effects, or does this little example stop scaling
>> before the number of processes matches the number of cores?
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 

-- 
Paulo Matos

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to