Caching layer would trivially give us a lot of benefit, especially for repeat calls. I think thats a very low hanging fruit.
Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. On Sat, May 4, 2013 at 10:31 PM, Robin Anil <[email protected]> wrote: > Here is the overall speedup so far since 0.7 > > https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=3 > > I back-ported the current benchmark code (Disabled SerializationBenchmark > which doesn't seem to work with 0.7) and ran against 0.7. > There is one regression. But rest have been pretty positive. > > > On Sun, Apr 21, 2013 at 12:37 PM, Ted Dunning <[email protected]>wrote: > >> On Sun, Apr 21, 2013 at 10:27 AM, Dan Filimon >> <[email protected]>wrote: >> >> > > But multi-threaded assign would be very dangerous. Even if you assign >> > > different parts of the vector to different threads, you have to worry >> > about >> > > cache line alignment which is generally not visible to Java without >> very >> > > special effort. >> > > >> > >> > I'm terrible at explaining what I mean. >> > So, rather than have the threads assign chunks of a vector (which would >> > only really work if the underlying Vector was an array of doubles), each >> > thread would return an OrderedIntDoubleMapping, and they would be merged >> > into a Vector by a single thread at the end. >> > >> > I wonder, even talking about cache alignment worries in Java makes me >> > wonder whether we'd be trying to outwit the JVM. It feels kind of >> wrong, as >> > I'm certain that the people writing Hotspot are better at optimizing the >> > code than me. :) >> > >> >> Yeah... the single thread updater is pretty much what I meant when I >> talked >> about a threaded map-reduce. Everybody produces new data in parallel and >> then a single thread per vector makes it all coherent. >> >> This is actually kind of similar to the way that very large HPC matrix >> libraries work. Message passing can be more efficient as an idiom for >> communicating data like this than shared memory even when efficient shared >> memory is available. >> > >
