Hi Dmitriy

I'd be interested to look at helping with this potentially (time
permitting).

I've recently been working on a port of Mahout's ALS implementation to
Spark. I spent a bit of time thinking about how much of mahout-math to use.

For now I found that using the Breeze linear algebra library I could get
what I needed, ie DenseVector, SparseVector, DenseMatrix, all with
in-memory multiply and solve that is backed by JBLAS (so very quick if you
have the native libraries active). It comes with very nice "Matlab-like"
syntax in Scala. So it ended up being a bit of a rewrite rather than a port
of the Mahout code.

The sparse matrix support is however a bit... well, sparse :) There is a
CSC matrix and some operations but Sparse SVD is not there, and the solvers
I think are not there just yet (in-core).

But of course the linear algebra objects are not easily usable from Java
due to the syntax and the heavy use of implicits. So for a fully functional
Java API version that can use the vectors/matrices directly, the options
would be to create a Java bridge to the Breeze vectors/matrices, or to
instead look at using mahout-math to drive the linear algebra. In that case
the Scala syntax would not be as nice, but some sugar can be added again
using implicits for common operations (I've tested this a bit and it can
work and probably be made reasonably efficient if copies are avoided in the
implicit conversion).

Anyway, I'd be happy to offer assistance.

Nick


On Wed, Jun 19, 2013 at 8:09 AM, Sebastian Schelter <s...@apache.org> wrote:

> Let us know how I went, I'm pretty interested to see how well our stuff
> integrates with Spark, especially since Spark is in the process of
> joining Apache.
>
> -sebastian
>
> On 19.06.2013 03:14, Dmitriy Lyubimov wrote:
> > Hello,
> >
> > so i finally got around to actually do it.
> >
> > I want to get Mahout sparse vectors and matrices (DRMs) and rebuild some
> > solvers using spark and Bagel /scala.
> >
> > I also want to use in-core solvers that run directly on Mahout.
> >
> > Question #1: which mahout artifacts are better be imported if I don't
> want
> > to pick the hadoop stuff dependencies? Is there even such a separation of
> > code? I know mahout-math seems to try to avoid being hadoop specfic but
> not
> > sure if it is followed strictly.
> >
> > Question #2: which in-core solvers are available for Mahout matrices? I
> > know there's SSVD, probably Cholesky, is there something else? In
> > paticular, i need to be solving linear systems, I guess Cholesky should
> be
> > equipped enough to do just that?
> >
> > Question #3: why did we try to import Colt solvers rather than actually
> > depend on Colt in the first place? Why did we not accept Colt's sparse
> > matrices and created native ones instead?
> >
> > Colt seems to have a notion of parse in-core matrices too and seems like
> a
> > well-rounded solution. However, it doesn't seem like being actively
> > supported, whereas I know Mahout experienced continued enhancements to
> the
> > in-core matrix support.
> >
> > Thanks in advance
> > -Dmitriy
> >
>
>

Reply via email to