Thank you, Sebastian. Actually ALS flavours are indeed one of my first pragmatic goals -- i have also done a few customization for my employer -- so i probably will pragmatically pursue those customizations first. In particular, i do use Koren-Volinsky confidence weighting, but assume we still work with sparse observations and therefore sparse algebra of ALS-WR still applies. I provide fold-in routine for users with fewer than N observations and just new users thus adding incremental approach to learning. I also spend a lot of time of adaptive validation of weights and regularization (which is why my R prototypes are no longer sufficient here, actually, my prototype doesn't take the load of a midsize customer anymore.)
On Wed, Jun 19, 2013 at 3:54 AM, Sebastian Schelter <s...@apache.org> wrote: > I have a JBlas version of our ALS solving code lying around [1], feel > free to use it. Would also be interested to see the Spark port. > > -sebastian > > > [1] > > https://github.com/sscdotopen/mahout-als/blob/jblas/math/src/main/java/org/apache/mahout/math/als/JBlasAlternatingLeastSquaresSolver.java > > On 19.06.2013 12:50, Nick Pentreath wrote: > > Hi Dmitriy > > > > I'd be interested to look at helping with this potentially (time > > permitting). > > > > I've recently been working on a port of Mahout's ALS implementation to > > Spark. I spent a bit of time thinking about how much of mahout-math to > use. > > > > For now I found that using the Breeze linear algebra library I could get > > what I needed, ie DenseVector, SparseVector, DenseMatrix, all with > > in-memory multiply and solve that is backed by JBLAS (so very quick if > you > > have the native libraries active). It comes with very nice "Matlab-like" > > syntax in Scala. So it ended up being a bit of a rewrite rather than a > port > > of the Mahout code. > > > > The sparse matrix support is however a bit... well, sparse :) There is a > > CSC matrix and some operations but Sparse SVD is not there, and the > solvers > > I think are not there just yet (in-core). > > > > But of course the linear algebra objects are not easily usable from Java > > due to the syntax and the heavy use of implicits. So for a fully > functional > > Java API version that can use the vectors/matrices directly, the options > > would be to create a Java bridge to the Breeze vectors/matrices, or to > > instead look at using mahout-math to drive the linear algebra. In that > case > > the Scala syntax would not be as nice, but some sugar can be added again > > using implicits for common operations (I've tested this a bit and it can > > work and probably be made reasonably efficient if copies are avoided in > the > > implicit conversion). > > > > Anyway, I'd be happy to offer assistance. > > > > Nick > > > > > > On Wed, Jun 19, 2013 at 8:09 AM, Sebastian Schelter <s...@apache.org> > wrote: > > > >> Let us know how I went, I'm pretty interested to see how well our stuff > >> integrates with Spark, especially since Spark is in the process of > >> joining Apache. > >> > >> -sebastian > >> > >> On 19.06.2013 03:14, Dmitriy Lyubimov wrote: > >>> Hello, > >>> > >>> so i finally got around to actually do it. > >>> > >>> I want to get Mahout sparse vectors and matrices (DRMs) and rebuild > some > >>> solvers using spark and Bagel /scala. > >>> > >>> I also want to use in-core solvers that run directly on Mahout. > >>> > >>> Question #1: which mahout artifacts are better be imported if I don't > >> want > >>> to pick the hadoop stuff dependencies? Is there even such a separation > of > >>> code? I know mahout-math seems to try to avoid being hadoop specfic but > >> not > >>> sure if it is followed strictly. > >>> > >>> Question #2: which in-core solvers are available for Mahout matrices? I > >>> know there's SSVD, probably Cholesky, is there something else? In > >>> paticular, i need to be solving linear systems, I guess Cholesky should > >> be > >>> equipped enough to do just that? > >>> > >>> Question #3: why did we try to import Colt solvers rather than actually > >>> depend on Colt in the first place? Why did we not accept Colt's sparse > >>> matrices and created native ones instead? > >>> > >>> Colt seems to have a notion of parse in-core matrices too and seems > like > >> a > >>> well-rounded solution. However, it doesn't seem like being actively > >>> supported, whereas I know Mahout experienced continued enhancements to > >> the > >>> in-core matrix support. > >>> > >>> Thanks in advance > >>> -Dmitriy > >>> > >> > >> > > > >