solvers on spark

Dmitriy Lyubimov Wed, 19 Jun 2013 10:46:19 -0700

Thank you, Sebastian.

Actually ALS flavours are indeed one of my first pragmatic goals -- i have
also done a few customization for my employer -- so i probably will
pragmatically pursue those customizations first. In particular, i do use
Koren-Volinsky confidence weighting, but assume we still work with sparse
observations and therefore sparse algebra of ALS-WR still applies. I
provide fold-in routine for users with fewer than N observations and just
new users thus adding incremental approach to learning. I also spend a lot
of time of adaptive validation of weights and regularization (which is why
my R prototypes are no longer sufficient here, actually, my prototype
doesn't take the load of a midsize customer anymore.)



On Wed, Jun 19, 2013 at 3:54 AM, Sebastian Schelter <[email protected]> wrote:

> I have a JBlas version of our ALS solving code lying around [1], feel
> free to use it. Would also be interested to see the Spark port.
>
> -sebastian
>
>
> [1]
>
> https://github.com/sscdotopen/mahout-als/blob/jblas/math/src/main/java/org/apache/mahout/math/als/JBlasAlternatingLeastSquaresSolver.java
>
> On 19.06.2013 12:50, Nick Pentreath wrote:
> > Hi Dmitriy
> >
> > I'd be interested to look at helping with this potentially (time
> > permitting).
> >
> > I've recently been working on a port of Mahout's ALS implementation to
> > Spark. I spent a bit of time thinking about how much of mahout-math to
> use.
> >
> > For now I found that using the Breeze linear algebra library I could get
> > what I needed, ie DenseVector, SparseVector, DenseMatrix, all with
> > in-memory multiply and solve that is backed by JBLAS (so very quick if
> you
> > have the native libraries active). It comes with very nice "Matlab-like"
> > syntax in Scala. So it ended up being a bit of a rewrite rather than a
> port
> > of the Mahout code.
> >
> > The sparse matrix support is however a bit... well, sparse :) There is a
> > CSC matrix and some operations but Sparse SVD is not there, and the
> solvers
> > I think are not there just yet (in-core).
> >
> > But of course the linear algebra objects are not easily usable from Java
> > due to the syntax and the heavy use of implicits. So for a fully
> functional
> > Java API version that can use the vectors/matrices directly, the options
> > would be to create a Java bridge to the Breeze vectors/matrices, or to
> > instead look at using mahout-math to drive the linear algebra. In that
> case
> > the Scala syntax would not be as nice, but some sugar can be added again
> > using implicits for common operations (I've tested this a bit and it can
> > work and probably be made reasonably efficient if copies are avoided in
> the
> > implicit conversion).
> >
> > Anyway, I'd be happy to offer assistance.
> >
> > Nick
> >
> >
> > On Wed, Jun 19, 2013 at 8:09 AM, Sebastian Schelter <[email protected]>
> wrote:
> >
> >> Let us know how I went, I'm pretty interested to see how well our stuff
> >> integrates with Spark, especially since Spark is in the process of
> >> joining Apache.
> >>
> >> -sebastian
> >>
> >> On 19.06.2013 03:14, Dmitriy Lyubimov wrote:
> >>> Hello,
> >>>
> >>> so i finally got around to actually do it.
> >>>
> >>> I want to get Mahout sparse vectors and matrices (DRMs) and rebuild
> some
> >>> solvers using spark and Bagel /scala.
> >>>
> >>> I also want to use in-core solvers that run directly on Mahout.
> >>>
> >>> Question #1: which mahout artifacts are better be imported if I don't
> >> want
> >>> to pick the hadoop stuff dependencies? Is there even such a separation
> of
> >>> code? I know mahout-math seems to try to avoid being hadoop specfic but
> >> not
> >>> sure if it is followed strictly.
> >>>
> >>> Question #2: which in-core solvers are available for Mahout matrices? I
> >>> know there's SSVD, probably Cholesky, is there something else? In
> >>> paticular, i need to be solving linear systems, I guess Cholesky should
> >> be
> >>> equipped enough to do just that?
> >>>
> >>> Question #3: why did we try to import Colt solvers rather than actually
> >>> depend on Colt in the first place? Why did we not accept Colt's sparse
> >>> matrices and created native ones instead?
> >>>
> >>> Colt seems to have a notion of parse in-core matrices too and seems
> like
> >> a
> >>> well-rounded solution. However, it doesn't seem like being actively
> >>> supported, whereas I know Mahout experienced continued enhancements to
> >> the
> >>> in-core matrix support.
> >>>
> >>> Thanks in advance
> >>> -Dmitriy
> >>>
> >>
> >>
> >
>
>

Re: Mahout vectors/matrices/solvers on spark

Reply via email to