Interesting! If that is true of Java (I don't know Java at all), then your
argument seems plausible. Cache-to-main-memory writes still take many more
CPU cycles (an order of magnitude more, last I knew) than
processor-to-cache. I don't think it's so much a bandwidth issue as
latency, AFAIK. Thanks for thinking about this more, so long after the
fact. We still see the issue.
On Sep 26, 2013 11:43 PM, "Andy Fingerhut" <andy.finger...@gmail.com> wrote:

> Adding to this thread from almost a year ago.  I don't have conclusive
> proof with experiments to show right now, but I do have some experiments
> that have led me to what I think is a plausible cause of not just Clojure
> programs running more slowly when multi-threaded than when single-threaded,
> but any programs running on JVM's memory model doing so.  Qualification: My
> explanation would be true only for multi-threaded programs running on the
> JVM that store significant amounts of data in memory, even if the data
> written by each thread is only read by that thread, and there is no locking
> or other inter-thread communication, i.e. for "embarrasingly parallel"
> problems.
>
> Outline of the argument:
>
> When you start a thread, and then wait for the thread to complete, the JVM
> memory model requires all loads and stores to satisfy certain
> restrictions.  One of these is that any store done before the thread is
> created should 'happen before' the thread start, and thus the updated
> stored values must be visible to the new thread.  'Visible' here means that
> the thread doing the store must cause the CPU it is running on to update
> main memory from whatever locally modified values it has written into its
> local cache.  That rule isn't so relevant to my argument.
>
> The one that is relevant is that any store performed by the thread is
> considered to 'happen before' a join operation on the thread.  Thus any
> store done by a thread must be written back to main memory, *even if the
> store is to a JVM object that later becomes garbage*.
>
> So imagine a single-threaded program that creates X bytes of garbage while
> it runs.  Those X bytes will definitely be written to the CPU's local
> cache, but they will only be written to main memory if the cache space runs
> out before the garbage collector does its work and allows that memory to be
> reused for allocations.  The CPU-to-local-cache bandwidth in many modern
> systems is significantly faster than local-cache-to-main-memory bandwidth.
>
> Now take that same program and spread its work across 2 or more threads,
> with a join at the end of each one.  For the sake of example, say that each
> thread will write X/N bytes of data while it runs.  Even if the only data
> needed later in the rest of the program is a single Long object, for
> example, all of those X/N bytes of data will be copied from the local cache
> to main memory (if that did not already happen before the thread
> terminated).
>
> If the number of threads is large enough, the amount of data written from
> all local caches to main memory can be higher in the multi-threaded case
> than in the single-threaded case.
>
> Anyway, that is my hypothesis about what could be happening here.  It
> isn't Clojure-specific, but it can be exacerbated by the common behavior of
> a lot of Clojure code to allocate significant amounts of memory that
> becomes garbage.
>
> Andy
>
>
>
> On Wed, Jan 30, 2013 at 6:20 PM, Lee Spector <lspec...@hampshire.edu>wrote:
>
>>
>> FYI we had a bit of a discussion about this at a meetup in Amherst MA
>> yesterday, and while I'm not sufficiently on top of the JVM or system
>> issues to have briefed everyone on all of the details there has been a
>> little of followup since the discussion, including results of some
>> different experiments by Chas Emerick, at:
>> http://www.meetup.com/Functional-Programming-Connoisseurs/messages/boards/thread/30946382
>>
>>  -Lee
>>
>> On Jan 30, 2013, at 8:39 PM, Marshall Bockrath-Vandegrift wrote:
>> >
>> > Apologies for my very-slow reply here.  I keep thinking that I’ll have
>> > more time to look into this issue, and keep having other things
>> > requiring my attention.  And on top of that, I’ve temporarily lost the
>> > many-way AMD system I was using as a test-bed.
>> >
>> > I very much want to see if I can get my hands on an Intel system to
>> > compare to.  My AMD system is in theory 32-way – two physical CPUs, each
>> > with 16 cores.  However, Linux reports (via /proc/cpuinfo) the cores in
>> > groups of 8 (“cpu cores : 8” etc).  And something very strange happens
>> > when extending parallelism beyond 8-way...  I ran several experiments
>> > using a version of your whole-application benchmark I modified to
>> > control the level of parallelism.  At parallelism 9+, the real time it
>> > takes to complete the benchmark hardly budges, but the user/CPU time
>> > increases linearly with the level of parallelism!  As far as I can tell,
>> > multi-processor AMD *is* a NUMA architecture, which might potentially
>> > explain things.  But enabling the JVM NUMA options doesn’t seem to
>> > affect the benchmark.
>> >
>> > I think next steps are two-fold: (1) examine parallelism vs real & CPU
>> > time on an Intel system, and (2) attempt to reproduce the observed
>> > behavior in pure Java.  I’m keeping my fingers crossed that I’ll have
>> > some time to look at this more soon, but I’m honestly not very hopeful.
>> >
>> > In the mean time, I hope you’ve managed to exploit multi-process
>> > parallelism to run more efficiently?
>> >
>> > -Marshall
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to