I'm sorry, I wasn't keeping up with the changes and I see the new
implementations now.... apologies for confusion -- I didn't notice the
new matrix/ data structures infrastructure because they are not
related (Matrix and colt's *Matrix2D* stuff).

> - delete
> - add tests, restructure to use tested API, rename to new style
> - completely re-implement in a maintainable style

Fair enough. I don't think colt's API was all that great either. We
are coupled with it pretty closely in the Lingo algorithm (the open
source one). But a cleaner API may be definitely worth the switch.

> My initial preference was for the second fate as much as possible.  That
> preference has changed a bit to prefer the first option with the third as a
> backup.  Partly this is because we are scraping down to the less-used, lower
> quality parts of Colt.

+1 for reimplementing Colt's functionality from scratch, not caring
much about backwards compatibility.

We don't use much of math from colt, to be honest. Basically matrix
decompositions (SVD), some sorting routines from Sorting class
(removed now, but this can be replaced) and a lot of multiplying/
basic operations of vectors and matrices. One thing we DO use heavily
is 2 dimensional matrix representation in a double[] array because
this allows us to plug in BLAS to work directly on Java data, without
copying or other manipulations... but then we don't have any newer
native BLAS build and it's been a pain to compile and link it with
Java. We care mostly about native Lapack's gesdd (SVD) and Blas's gemm
(general matrix multiplication); these do provide significant speedups
when clustering larger data sets using Lingo. But I can imagine
hardware-accelerated implementations will eventually surface inside
mahout-math anyway, so we could switch to these instead of doing all
the trickery we currently do with Colt.

So, to summarize: don't worry about us much, really. For now we will
stick to mahout-math release that we know works for us. I will try to
switch to the trunk of mahout-math as a proof of concept (without
native matrix computations support) and will let you know if I have
any problems. This is a much larger refactoring than I initially
thought though.

Dawid

-

D.

Reply via email to