I tried this same program on my desktop, which also has 4 (i7-4770)
cores with hyperthreading. Here's what I see:

[samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
plt] time r ~/Downloads/p.rkt 1
N: 1, cpu: 5808/5808.0, real: 5804
[samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
plt] time r ~/Downloads/p.rkt 2
N: 2, cpu: 12057/6028.5, real: 6063
[samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
plt] time r ~/Downloads/p.rkt 3
N: 3, cpu: 23377/7792.333333333333, real: 7914
[samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
plt] time r ~/Downloads/p.rkt 4
N: 4, cpu: 41155/10288.75, real: 10357
[samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
plt] time r ~/Downloads/p.rkt 6
N: 6, cpu: 89932/14988.666666666666, real: 15687
[samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master)
plt] time r ~/Downloads/p.rkt 8
N: 8, cpu: 165152/20644.0, real: 21104

Real time goes up about 80% from 1-4 places, and then doubles again
from 4 to 8. System time for 8 places is also about 10x what it is for
2 places, but only gets up to 2 seconds.
On Fri, Oct 5, 2018 at 10:32 AM Matthew Flatt <mfl...@cs.utah.edu> wrote:
>
> At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote:
> > Again, I am really surprised that you mention that places are not
> > separate processes. Documentation does say they are separate racket
> > virtual machines, how is this accomplished if not by using separate
> > processes?
>
> Each place is an OS thread within the Racket process. The virtual
> machine is essentially instantiated once in each thread, where things
> that look like global variables at the C level are actually
> thread-local variables to make them place-specific. Still, there is
> some sharing among the threads.
>
> > My workers are really doing Z3 style work - number crushing and lots of
> > searching. No IO (writing to disk) or communication so I would expect
> > them to really max out all CPUs.
>
> My best guess is that it's memory-allocation bottlenecks, probably at
> the point of using mmap() and mprotect(). Maybe things don't scale well
> beyond the 4-core machines that I use.
>
> On my machines, the enclosed program can max out CPU use with system
> time being a small fraction. It scales ok from 1 to 4 places (i.e.,
> real time increased only some). The machine's core are hyperthreaded,
> and the example maxes out CPU utilization at 8 --- but it takes twice
> as long in real time, so the hardware threads don't help much in this
> case. Running two processes with 4 places takes about the same real
> time as running one process with 8 places, as does 2 processes with 2
> places.
>
> Do you see similar effects, or does this little example stop scaling
> before the number of processes matches the number of cores?
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to