Here is the overall speedup so far since 0.7 https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=3
I back-ported the current benchmark code (Disabled SerializationBenchmark which doesn't seem to work with 0.7) and ran against 0.7. There is one regression. But rest have been pretty positive. On Sun, Apr 21, 2013 at 12:37 PM, Ted Dunning <[email protected]> wrote: > On Sun, Apr 21, 2013 at 10:27 AM, Dan Filimon > <[email protected]>wrote: > > > > But multi-threaded assign would be very dangerous. Even if you assign > > > different parts of the vector to different threads, you have to worry > > about > > > cache line alignment which is generally not visible to Java without > very > > > special effort. > > > > > > > I'm terrible at explaining what I mean. > > So, rather than have the threads assign chunks of a vector (which would > > only really work if the underlying Vector was an array of doubles), each > > thread would return an OrderedIntDoubleMapping, and they would be merged > > into a Vector by a single thread at the end. > > > > I wonder, even talking about cache alignment worries in Java makes me > > wonder whether we'd be trying to outwit the JVM. It feels kind of wrong, > as > > I'm certain that the people writing Hotspot are better at optimizing the > > code than me. :) > > > > Yeah... the single thread updater is pretty much what I meant when I talked > about a threaded map-reduce. Everybody produces new data in parallel and > then a single thread per vector makes it all coherent. > > This is actually kind of similar to the way that very large HPC matrix > libraries work. Message passing can be more efficient as an idiom for > communicating data like this than shared memory even when efficient shared > memory is available. >
