On Mon, Jun 24, 2013 at 1:24 PM, Jake Mannix <jake.man...@gmail.com> wrote:
> Yeah, I'm totally on board with a pretty scala DSL on top of some of our > stuff. > > In particular, I've been experimenting with with wrapping the > DistributedRowMatrix > in a scalding wrapper, so we can do things like > > val matrixAsTypedPipe = > DistributedRowMatrixPipe(new DistributedRowMatrix(numRows, numCols, > path, conf)) > > // e.g. L1 normalize: > matrixAsTypedPipe.map((idx, v) : (Int, Vector) => (idx, v.normalize(1)) ) > .write(new > DistributedRowMatrixPipe(outputPath, conf)) > > // and anything else you would want to do with a scalding TypedPipe[Int, > Vector] > > Currently I've been doing this with a package structure directly in Mahout, > in: > > mahout/contrib/scalding > > What do people think about having this be something real, after 0.8 goes > out? Are > we ready for contrib modules which fold in diverse external projects in new > ways? > Integrating directly with pig and scalding is a bit too wide of a tent for > Mahout core, > but putting these integrations in entirely new projects is maybe a bit too > far away. > > +1, i've been putting this into module mahout-math-scala for the past couple of days on the Mahout itself and keep merging with trunk. (here: https://github.com/dlyubimov/mahout-commits/tree/dev-0.8.x-scala/math-scala) . In case anyone wants to look. Not much to look at at the moment though i guess. Since it is seamlessly compiled by maven and all scala stuff is readily available in maven repo, i don't see any operational reason not to include it in post-0.8 tree. However, this probably on its own is not terribly useful until I get spark-based distributed solver collection rolled in another module that depends on it. (well, it may turn out to be an ugly battle with our bosses to contribute it). but, i probably will have some straightforward Hu-Koren-Volinsky spark-based stuff in a couple of days on top of it. > > On Mon, Jun 24, 2013 at 11:30 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > Dmitriy, > > > > This is very pretty. > > > > > > > > > > On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > wrote: > > > > > Ok, so i was fairly easily able to build some DSL for our matrix > > > manipulation (similar to breeze) in scala: > > > > > > inline matrix or vector: > > > > > > val a = dense((1, 2, 3), (3, 4, 5)) > > > > > > val b:Vector = (1,2,3) > > > > > > block views and assignments (element/row/vector/block/block of row or > > > vector) > > > > > > > > > a(::, 0) > > > a(1, ::) > > > a(0 to 1, 1 to 2) > > > > > > assignments > > > > > > a(0, ::) :=(3, 5, 7) > > > a(0, 0 to 1) :=(3, 5) > > > a(0 to 1, 0 to 1) := dense((1, 1), (2, 2.5)) > > > > > > operators > > > > > > // hadamard > > > val c = a * b > > > a *= b > > > > > > // matrix mul > > > val m = a %*% b > > > > > > and bunch of other little things like sum, mean, colMeans etc. That > much > > is > > > easy. > > > > > > Also stuff like the ones found in breeze along the lines > > > > > > val (u,v,s) = svd(a) > > > > > > diag ((1,2,3)) > > > > > > and Cholesky in similar ways. > > > > > > I don't have "inline" initialization for sparse things (yet) simply > > because > > > i don't need them, but of course all regular java constructors and > > methods > > > are retained, all that is just a syntactic sugar in the spirit of DSLs > in > > > hope to make things a bit mroe readable. > > > > > > my (very little, and very insignificantly opinionated, really) > criticism > > of > > > Breeze in this context is its inconsistency between dense and sparse > > > representations, namely, lack of consistent overarching trait(s), so > that > > > building structure-agnostic solvers like Mahout's Cholesky solver is > > > impossible, as well as cross-type matrix use (say, the way i understand > > it, > > > it is pretty much imposible to multiply a sparse matrix by a dense > > matrix). > > > > > > I suspect these problems stem from the fact that the authors for > whatever > > > reason decided to hardwire dense things with JBlas solvers whereas i > dont > > > believe matrix storage structures must be. But these problems do appear > > to > > > be serious enough for me to ignore Breeze for now. If i decide to plug > > in > > > jblas dense solvers, i guess i will just have them as yet another > > top-level > > > routine interface taking any Matrix, e.g. > > > > > > val (u,v,s) = svd(m, jblas=true) > > > > > > > > > > > > On Sun, Jun 23, 2013 at 7:08 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > > wrote: > > > > > > > Thank you. > > > > On Jun 23, 2013 6:16 PM, "Ted Dunning" <ted.dunn...@gmail.com> > wrote: > > > > > > > >> I think that this contract has migrated a bit from the first > starting > > > >> point. > > > >> > > > >> My feeling is that there is a de facto contract now that the matrix > > > slice > > > >> is a single row. > > > >> > > > >> Sent from my iPhone > > > >> > > > >> On Jun 23, 2013, at 16:32, Dmitriy Lyubimov <dlie...@gmail.com> > > wrote: > > > >> > > > >> > What does Matrix. iterateAll() contractually do? Practically it > > seems > > > >> to be > > > >> > row wise iteration for some implementations but it doesnt seem > > > >> > contractually state so in the javadoc. What is MatrixSlice if it > is > > > >> neither > > > >> > a row nor a colimn? How can i tell what exactly it is i am > iterating > > > >> over? > > > >> > On Jun 19, 2013 12:21 AM, "Ted Dunning" <ted.dunn...@gmail.com> > > > wrote: > > > >> > > > > >> >> On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix < > > jake.man...@gmail.com> > > > >> >> wrote: > > > >> >> > > > >> >>>> Question #2: which in-core solvers are available for Mahout > > > >> matrices? I > > > >> >>>> know there's SSVD, probably Cholesky, is there something else? > In > > > >> >>>> paticular, i need to be solving linear systems, I guess > Cholesky > > > >> should > > > >> >>> be > > > >> >>>> equipped enough to do just that? > > > >> >>>> > > > >> >>>> Question #3: why did we try to import Colt solvers rather than > > > >> actually > > > >> >>>> depend on Colt in the first place? Why did we not accept Colt's > > > >> sparse > > > >> >>>> matrices and created native ones instead? > > > >> >>>> > > > >> >>>> Colt seems to have a notion of parse in-core matrices too and > > seems > > > >> >> like > > > >> >>> a > > > >> >>>> well-rounded solution. However, it doesn't seem like being > > actively > > > >> >>>> supported, whereas I know Mahout experienced continued > > enhancements > > > >> to > > > >> >>> the > > > >> >>>> in-core matrix support. > > > >> >>>> > > > >> >>> > > > >> >>> Colt was totally abandoned, and I talked to the original author > > and > > > he > > > >> >>> blessed it's adoption. When we pulled it in, we found it was > > > woefully > > > >> >>> undertested, > > > >> >>> and tried our best to hook it in with proper tests and use APIs > > that > > > >> fit > > > >> >>> with > > > >> >>> the use cases we had. Plus, we already had the start of some > > linear > > > >> apis > > > >> >>> (i.e. > > > >> >>> the Vector interface) and dropping the API completely seemed not > > > >> terribly > > > >> >>> worth it at the time. > > > >> >>> > > > >> >> > > > >> >> There was even more to it than that. > > > >> >> > > > >> >> Colt was under-tested and there have been warts that had to be > > pulled > > > >> out > > > >> >> in much of the code. > > > >> >> > > > >> >> But, worse than that, Colt's matrix and vector structure was a > real > > > >> bugger > > > >> >> to extend or change. It also had all kinds of cruft where it > > > >> pretended to > > > >> >> support matrices of things, but in fact only supported matrices > of > > > >> doubles > > > >> >> and floats. > > > >> >> > > > >> >> So using Colt as it was (and is since it is largely abandoned) > was > > a > > > >> >> non-starter. > > > >> >> > > > >> >> As far as in-memory solvers, we have: > > > >> >> > > > >> >> 1) LR decomposition (tested and kinda fast) > > > >> >> > > > >> >> 2) Cholesky decomposition (tested) > > > >> >> > > > >> >> 3) SVD (tested) > > > >> >> > > > >> > > > > > > > > > > > > > -- > > -jake >