I am sure that our Sparse Vector stuff is at least twice as slow as it
should be.

My preliminary experiments indicate that we should be able to get within
about a factor of 2 of the speed of a DenseVector (ish).  I am pretty sure
our current SparseVector is a long ways from there.  Shashi's experience
indicates that replacing SV can give us a near doubling of overall
performance in some cases.

Remember how much difference the distance optimization made?  By using
sparsity well, we can probably get 100x speedup in k-means for some datasets
because the distance computation uses cached values for the squared
magnitude of the centroids and only iterates over the non-zero data values.

I also expect that there *has* to be code that we have that conses way too
much.

And I am sure you remember how much difference it makes to have better data
storage in Taste.  For many applications, we can go further and have binary
sparse matrices.  These don't even need to store the data because if we know
a value is non-zero, we also know its value.  With integer compression and
no data storage cost, we might be able to store 10x more data in memory and
get an astronomical speed up for large data sets.

And then there are on-line algorithms like vowpal.  Using these often
results in many orders of magnitude speedup (5 orders is not impossible for
some probablems).  Even if the number of operations for an on-line algorithm
is twice as large, the effect of passing through the data completely
sequentially can make four orders of magnitude difference in speed.

So, my answer to this is that I can't imagine that we don't have 2-10x
speedups in just the low-hanging fruit.

On Fri, Sep 4, 2009 at 3:54 PM, Sean Owen <sro...@gmail.com> wrote:

>
> (Do you really think some implementations are twice as slow as they
> need be? can't tell if that's good or bad news. )




-- 
Ted Dunning, CTO
DeepDyve

Reply via email to