On Thu, Apr 18, 2013 at 11:41 PM, Robin Anil <[email protected]> wrote:
> Next obvious speedups ideas I can think of are: > > 1) Batch insert into OpenIntDoubleHashMap(OIDHM) and > OrderedIntDoubleMapping(OIDM). This way mutable operations like plus() or > minus() can iterate on the Intersection elements and add the difference in > one go. Can anyone think of a smart way to rehash based on new input > elements ? > > 2) Speed up aggregate and assign methods(Dan is doing that with) > Regarding this, I'm testing the code to see if anything breaks and then want to see what the performance is like. I'm experiment with making every operation a variant of aggregate() or assign(). This is useful because there's just one code path to look and we can focus on high-level optimizations that apply to a larger class of functions. Here is a preliminary version: https://reviews.apache.org/r/10669/diff/#index_header Regarding the parallelization, the results would be valid as long as the aggregating function is both commutative and associative (which we can now check) but it adding the parallelization here might be too much work. > 3) Generalize caching framework of derived properties like > getLengthSquared() and extend it into other things, like commons norms (L1, > L2), numNonZeros(), > > 4) Parallelize operations: Use a consistent sharding function to trivially > parallelize certain iterative operations across multiple threads. > > 6) Replace current DenseVector and/or encapsulate JBlas inside it. > > 7) Improve exception handling. > > All these can be independent projects. I know I wont get time to get to > this, I am more than happy to review >
