Interesting! If that is true of Java (I don't know Java at all), then your argument seems plausible. Cache-to-main-memory writes still take many more CPU cycles (an order of magnitude more, last I knew) than processor-to-cache. I don't think it's so much a bandwidth issue as latency, AFAIK. Thanks for thinking about this more, so long after the fact. We still see the issue. On Sep 26, 2013 11:43 PM, "Andy Fingerhut" <andy.finger...@gmail.com> wrote:
> Adding to this thread from almost a year ago. I don't have conclusive > proof with experiments to show right now, but I do have some experiments > that have led me to what I think is a plausible cause of not just Clojure > programs running more slowly when multi-threaded than when single-threaded, > but any programs running on JVM's memory model doing so. Qualification: My > explanation would be true only for multi-threaded programs running on the > JVM that store significant amounts of data in memory, even if the data > written by each thread is only read by that thread, and there is no locking > or other inter-thread communication, i.e. for "embarrasingly parallel" > problems. > > Outline of the argument: > > When you start a thread, and then wait for the thread to complete, the JVM > memory model requires all loads and stores to satisfy certain > restrictions. One of these is that any store done before the thread is > created should 'happen before' the thread start, and thus the updated > stored values must be visible to the new thread. 'Visible' here means that > the thread doing the store must cause the CPU it is running on to update > main memory from whatever locally modified values it has written into its > local cache. That rule isn't so relevant to my argument. > > The one that is relevant is that any store performed by the thread is > considered to 'happen before' a join operation on the thread. Thus any > store done by a thread must be written back to main memory, *even if the > store is to a JVM object that later becomes garbage*. > > So imagine a single-threaded program that creates X bytes of garbage while > it runs. Those X bytes will definitely be written to the CPU's local > cache, but they will only be written to main memory if the cache space runs > out before the garbage collector does its work and allows that memory to be > reused for allocations. The CPU-to-local-cache bandwidth in many modern > systems is significantly faster than local-cache-to-main-memory bandwidth. > > Now take that same program and spread its work across 2 or more threads, > with a join at the end of each one. For the sake of example, say that each > thread will write X/N bytes of data while it runs. Even if the only data > needed later in the rest of the program is a single Long object, for > example, all of those X/N bytes of data will be copied from the local cache > to main memory (if that did not already happen before the thread > terminated). > > If the number of threads is large enough, the amount of data written from > all local caches to main memory can be higher in the multi-threaded case > than in the single-threaded case. > > Anyway, that is my hypothesis about what could be happening here. It > isn't Clojure-specific, but it can be exacerbated by the common behavior of > a lot of Clojure code to allocate significant amounts of memory that > becomes garbage. > > Andy > > > > On Wed, Jan 30, 2013 at 6:20 PM, Lee Spector <lspec...@hampshire.edu>wrote: > >> >> FYI we had a bit of a discussion about this at a meetup in Amherst MA >> yesterday, and while I'm not sufficiently on top of the JVM or system >> issues to have briefed everyone on all of the details there has been a >> little of followup since the discussion, including results of some >> different experiments by Chas Emerick, at: >> http://www.meetup.com/Functional-Programming-Connoisseurs/messages/boards/thread/30946382 >> >> -Lee >> >> On Jan 30, 2013, at 8:39 PM, Marshall Bockrath-Vandegrift wrote: >> > >> > Apologies for my very-slow reply here. I keep thinking that I’ll have >> > more time to look into this issue, and keep having other things >> > requiring my attention. And on top of that, I’ve temporarily lost the >> > many-way AMD system I was using as a test-bed. >> > >> > I very much want to see if I can get my hands on an Intel system to >> > compare to. My AMD system is in theory 32-way – two physical CPUs, each >> > with 16 cores. However, Linux reports (via /proc/cpuinfo) the cores in >> > groups of 8 (“cpu cores : 8” etc). And something very strange happens >> > when extending parallelism beyond 8-way... I ran several experiments >> > using a version of your whole-application benchmark I modified to >> > control the level of parallelism. At parallelism 9+, the real time it >> > takes to complete the benchmark hardly budges, but the user/CPU time >> > increases linearly with the level of parallelism! As far as I can tell, >> > multi-processor AMD *is* a NUMA architecture, which might potentially >> > explain things. But enabling the JVM NUMA options doesn’t seem to >> > affect the benchmark. >> > >> > I think next steps are two-fold: (1) examine parallelism vs real & CPU >> > time on an Intel system, and (2) attempt to reproduce the observed >> > behavior in pure Java. I’m keeping my fingers crossed that I’ll have >> > some time to look at this more soon, but I’m honestly not very hopeful. >> > >> > In the mean time, I hope you’ve managed to exploit multi-process >> > parallelism to run more efficiently? >> > >> > -Marshall >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.