The disruptor project from LMAX has wrestled with these sort of issues at
length and achieved astounding levels of performance on the JVM

Martin Thompson, the original author of the disruptor, is a leading light
in the JVM performance space, his mechanical sympathy blog is a goldmine of
information and a must read for anyone wanting to understand the JVM /
hardware interface.

Find more info at:

http://mechanical-sympathy.blogspot.co.uk/

http://lmax-exchange.github.io/disruptor/


*Neale Swinnerton*
{t: @sw1nn <https://twitter.com/#!/sw1nn>, w: sw1nn.com }


On 27 September 2013 12:29, Wm. Josiah Erikson <wmjos...@gmail.com> wrote:

> Interesting! If that is true of Java (I don't know Java at all), then your
> argument seems plausible. Cache-to-main-memory writes still take many more
> CPU cycles (an order of magnitude more, last I knew) than
> processor-to-cache. I don't think it's so much a bandwidth issue as
> latency, AFAIK. Thanks for thinking about this more, so long after the
> fact. We still see the issue.
>  On Sep 26, 2013 11:43 PM, "Andy Fingerhut" <andy.finger...@gmail.com>
> wrote:
>
>> Adding to this thread from almost a year ago.  I don't have conclusive
>> proof with experiments to show right now, but I do have some experiments
>> that have led me to what I think is a plausible cause of not just Clojure
>> programs running more slowly when multi-threaded than when single-threaded,
>> but any programs running on JVM's memory model doing so.  Qualification: My
>> explanation would be true only for multi-threaded programs running on the
>> JVM that store significant amounts of data in memory, even if the data
>> written by each thread is only read by that thread, and there is no locking
>> or other inter-thread communication, i.e. for "embarrasingly parallel"
>> problems.
>>
>> Outline of the argument:
>>
>> When you start a thread, and then wait for the thread to complete, the
>> JVM memory model requires all loads and stores to satisfy certain
>> restrictions.  One of these is that any store done before the thread is
>> created should 'happen before' the thread start, and thus the updated
>> stored values must be visible to the new thread.  'Visible' here means that
>> the thread doing the store must cause the CPU it is running on to update
>> main memory from whatever locally modified values it has written into its
>> local cache.  That rule isn't so relevant to my argument.
>>
>> The one that is relevant is that any store performed by the thread is
>> considered to 'happen before' a join operation on the thread.  Thus any
>> store done by a thread must be written back to main memory, *even if the
>> store is to a JVM object that later becomes garbage*.
>>
>> So imagine a single-threaded program that creates X bytes of garbage
>> while it runs.  Those X bytes will definitely be written to the CPU's local
>> cache, but they will only be written to main memory if the cache space runs
>> out before the garbage collector does its work and allows that memory to be
>> reused for allocations.  The CPU-to-local-cache bandwidth in many modern
>> systems is significantly faster than local-cache-to-main-memory bandwidth.
>>
>> Now take that same program and spread its work across 2 or more threads,
>> with a join at the end of each one.  For the sake of example, say that each
>> thread will write X/N bytes of data while it runs.  Even if the only data
>> needed later in the rest of the program is a single Long object, for
>> example, all of those X/N bytes of data will be copied from the local cache
>> to main memory (if that did not already happen before the thread
>> terminated).
>>
>> If the number of threads is large enough, the amount of data written from
>> all local caches to main memory can be higher in the multi-threaded case
>> than in the single-threaded case.
>>
>> Anyway, that is my hypothesis about what could be happening here.  It
>> isn't Clojure-specific, but it can be exacerbated by the common behavior of
>> a lot of Clojure code to allocate significant amounts of memory that
>> becomes garbage.
>>
>> Andy
>>
>>
>>
>> On Wed, Jan 30, 2013 at 6:20 PM, Lee Spector <lspec...@hampshire.edu>wrote:
>>
>>>
>>> FYI we had a bit of a discussion about this at a meetup in Amherst MA
>>> yesterday, and while I'm not sufficiently on top of the JVM or system
>>> issues to have briefed everyone on all of the details there has been a
>>> little of followup since the discussion, including results of some
>>> different experiments by Chas Emerick, at:
>>> http://www.meetup.com/Functional-Programming-Connoisseurs/messages/boards/thread/30946382
>>>
>>>  -Lee
>>>
>>> On Jan 30, 2013, at 8:39 PM, Marshall Bockrath-Vandegrift wrote:
>>> >
>>> > Apologies for my very-slow reply here.  I keep thinking that I’ll have
>>> > more time to look into this issue, and keep having other things
>>> > requiring my attention.  And on top of that, I’ve temporarily lost the
>>> > many-way AMD system I was using as a test-bed.
>>> >
>>> > I very much want to see if I can get my hands on an Intel system to
>>> > compare to.  My AMD system is in theory 32-way – two physical CPUs,
>>> each
>>> > with 16 cores.  However, Linux reports (via /proc/cpuinfo) the cores in
>>> > groups of 8 (“cpu cores : 8” etc).  And something very strange happens
>>> > when extending parallelism beyond 8-way...  I ran several experiments
>>> > using a version of your whole-application benchmark I modified to
>>> > control the level of parallelism.  At parallelism 9+, the real time it
>>> > takes to complete the benchmark hardly budges, but the user/CPU time
>>> > increases linearly with the level of parallelism!  As far as I can
>>> tell,
>>> > multi-processor AMD *is* a NUMA architecture, which might potentially
>>> > explain things.  But enabling the JVM NUMA options doesn’t seem to
>>> > affect the benchmark.
>>> >
>>> > I think next steps are two-fold: (1) examine parallelism vs real & CPU
>>> > time on an Intel system, and (2) attempt to reproduce the observed
>>> > behavior in pure Java.  I’m keeping my fingers crossed that I’ll have
>>> > some time to look at this more soon, but I’m honestly not very hopeful.
>>> >
>>> > In the mean time, I hope you’ve managed to exploit multi-process
>>> > parallelism to run more efficiently?
>>> >
>>> > -Marshall
>>>
>>> --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clojure@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to