Obviously we need to benchmark with caching disabled. But I will do that as
part of Caliper integration. There could be benchmarks which increased only
due to the caching. A clean layer for caching these derived attributes can
help a lot with higher level algorithms.

Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Sat, May 4, 2013 at 10:32 PM, Robin Anil <[email protected]> wrote:

> Caching layer would trivially give us a lot of benefit, especially for
> repeat calls. I think thats a very low hanging fruit.
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Sat, May 4, 2013 at 10:31 PM, Robin Anil <[email protected]> wrote:
>
>> Here is the overall speedup so far since 0.7
>>
>> https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=3
>>
>> I back-ported the current benchmark code (Disabled SerializationBenchmark
>> which doesn't seem to work with 0.7) and ran against 0.7.
>> There is one regression. But rest have been pretty positive.
>>
>>
>> On Sun, Apr 21, 2013 at 12:37 PM, Ted Dunning <[email protected]>wrote:
>>
>>> On Sun, Apr 21, 2013 at 10:27 AM, Dan Filimon
>>> <[email protected]>wrote:
>>>
>>> > > But multi-threaded assign would be very dangerous.  Even if you
>>> assign
>>> > > different parts of the vector to different threads, you have to worry
>>> > about
>>> > > cache line alignment which is generally not visible to Java without
>>> very
>>> > > special effort.
>>> > >
>>> >
>>> > I'm terrible at explaining what I mean.
>>> > So, rather than have the threads assign chunks of a vector (which would
>>> > only really work if the underlying Vector was an array of doubles),
>>> each
>>> > thread would return an OrderedIntDoubleMapping, and they would be
>>> merged
>>> > into a Vector by a single thread at the end.
>>> >
>>> > I wonder, even talking about cache alignment worries in Java makes me
>>> > wonder whether we'd be trying to outwit the JVM. It feels kind of
>>> wrong, as
>>> > I'm certain that the people writing Hotspot are better at optimizing
>>> the
>>> > code than me. :)
>>> >
>>>
>>> Yeah... the single thread updater is pretty much what I meant when I
>>> talked
>>> about a threaded map-reduce.  Everybody produces new data in parallel and
>>> then a single thread per vector makes it all coherent.
>>>
>>> This is actually kind of similar to the way that very large HPC matrix
>>> libraries work.  Message passing can be more efficient as an idiom for
>>> communicating data like this than shared memory even when efficient
>>> shared
>>> memory is available.
>>>
>>
>>
>

Reply via email to