I think that contrib modules would be very interesting. Specifically, good Scala DSL, pig integration and so on.
On Mon, Jun 24, 2013 at 9:55 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath <nick.pentre...@gmail.com > >wrote: > > > That looks great Dmitry! > > > > > > The thing about Breeze that drives the complexity in it is partly > > specialization for Float, Double and Int matrices, and partly getting the > > syntax to "just work" for all combinations of matrix types and operands > > etc. mostly it does "just work" but occasionally not. > > yes i noticed that, but since i am wrapping Mahout matrices, there's only a > choice of double-filled matrices and vectors. Actually, i would argue > that's the way it is supposed to be in the interest of KISS principle. I am > not sure i see a value in "int" matrices for any problem i ever worked on, > and skipping on precision to save the space is even more far-fetched notion > as in real life numbers don't take as much space as their pre-vectorized > features and annotations. In fact. model training parts and linear algebra > are not where memory bottleneck seems to fat-up at all in my experience. > There's often exponentially growing cpu-bound behavior, yes, but not RAM. > > > > > > > > > I am surprised that dense * sparse matrix doesn't work but I guess as I > > previously mentioned the sparse matrix support is a bit shaky. > > > This is solely based on eye-balling the trait architecture. I did not > actually attempt it. But there's no single unifying trait for sure. > > > > > > > David Hall is pretty happy to both look into enhancements and help out > for > > contributions (eg I'm hoping to find time to look into a proper Diagonal > > matrix implementation and he was very helpful with pointers etc), so > please > > do drop things into the google group mailing list. Hopefully wider > adoption > > especially by this type of community will drive Breeze development. > > > > > > In another note I also really like Scaldings matrix API so scala ish > > wrappers for mahout would be cool - another pet project of mine is a port > > of that API to spark too :) > > > > > > N > > > > > > > > — > > Sent from Mailbox for iPhone > > > > On Mon, Jun 24, 2013 at 10:25 PM, Jake Mannix <jake.man...@gmail.com> > > wrote: > > > > > Yeah, I'm totally on board with a pretty scala DSL on top of some of > our > > > stuff. > > > In particular, I've been experimenting with with wrapping the > > > DistributedRowMatrix > > > in a scalding wrapper, so we can do things like > > > val matrixAsTypedPipe = > > > DistributedRowMatrixPipe(new DistributedRowMatrix(numRows, numCols, > > > path, conf)) > > > // e.g. L1 normalize: > > > matrixAsTypedPipe.map((idx, v) : (Int, Vector) => (idx, > > v.normalize(1)) ) > > > .write(new > > > DistributedRowMatrixPipe(outputPath, conf)) > > > // and anything else you would want to do with a scalding > TypedPipe[Int, > > > Vector] > > > Currently I've been doing this with a package structure directly in > > Mahout, > > > in: > > > mahout/contrib/scalding > > > What do people think about having this be something real, after 0.8 > goes > > > out? Are > > > we ready for contrib modules which fold in diverse external projects in > > new > > > ways? > > > Integrating directly with pig and scalding is a bit too wide of a tent > > for > > > Mahout core, > > > but putting these integrations in entirely new projects is maybe a bit > > too > > > far away. > > > On Mon, Jun 24, 2013 at 11:30 AM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > >> Dmitriy, > > >> > > >> This is very pretty. > > >> > > >> > > >> > > >> > > >> On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > >> wrote: > > >> > > >> > Ok, so i was fairly easily able to build some DSL for our matrix > > >> > manipulation (similar to breeze) in scala: > > >> > > > >> > inline matrix or vector: > > >> > > > >> > val a = dense((1, 2, 3), (3, 4, 5)) > > >> > > > >> > val b:Vector = (1,2,3) > > >> > > > >> > block views and assignments (element/row/vector/block/block of row > or > > >> > vector) > > >> > > > >> > > > >> > a(::, 0) > > >> > a(1, ::) > > >> > a(0 to 1, 1 to 2) > > >> > > > >> > assignments > > >> > > > >> > a(0, ::) :=(3, 5, 7) > > >> > a(0, 0 to 1) :=(3, 5) > > >> > a(0 to 1, 0 to 1) := dense((1, 1), (2, 2.5)) > > >> > > > >> > operators > > >> > > > >> > // hadamard > > >> > val c = a * b > > >> > a *= b > > >> > > > >> > // matrix mul > > >> > val m = a %*% b > > >> > > > >> > and bunch of other little things like sum, mean, colMeans etc. That > > much > > >> is > > >> > easy. > > >> > > > >> > Also stuff like the ones found in breeze along the lines > > >> > > > >> > val (u,v,s) = svd(a) > > >> > > > >> > diag ((1,2,3)) > > >> > > > >> > and Cholesky in similar ways. > > >> > > > >> > I don't have "inline" initialization for sparse things (yet) simply > > >> because > > >> > i don't need them, but of course all regular java constructors and > > >> methods > > >> > are retained, all that is just a syntactic sugar in the spirit of > > DSLs in > > >> > hope to make things a bit mroe readable. > > >> > > > >> > my (very little, and very insignificantly opinionated, really) > > criticism > > >> of > > >> > Breeze in this context is its inconsistency between dense and sparse > > >> > representations, namely, lack of consistent overarching trait(s), so > > that > > >> > building structure-agnostic solvers like Mahout's Cholesky solver is > > >> > impossible, as well as cross-type matrix use (say, the way i > > understand > > >> it, > > >> > it is pretty much imposible to multiply a sparse matrix by a dense > > >> matrix). > > >> > > > >> > I suspect these problems stem from the fact that the authors for > > whatever > > >> > reason decided to hardwire dense things with JBlas solvers whereas i > > dont > > >> > believe matrix storage structures must be. But these problems do > > appear > > >> to > > >> > be serious enough for me to ignore Breeze for now. If i decide to > > plug > > >> in > > >> > jblas dense solvers, i guess i will just have them as yet another > > >> top-level > > >> > routine interface taking any Matrix, e.g. > > >> > > > >> > val (u,v,s) = svd(m, jblas=true) > > >> > > > >> > > > >> > > > >> > On Sun, Jun 23, 2013 at 7:08 PM, Dmitriy Lyubimov < > dlie...@gmail.com> > > >> > wrote: > > >> > > > >> > > Thank you. > > >> > > On Jun 23, 2013 6:16 PM, "Ted Dunning" <ted.dunn...@gmail.com> > > wrote: > > >> > > > > >> > >> I think that this contract has migrated a bit from the first > > starting > > >> > >> point. > > >> > >> > > >> > >> My feeling is that there is a de facto contract now that the > matrix > > >> > slice > > >> > >> is a single row. > > >> > >> > > >> > >> Sent from my iPhone > > >> > >> > > >> > >> On Jun 23, 2013, at 16:32, Dmitriy Lyubimov <dlie...@gmail.com> > > >> wrote: > > >> > >> > > >> > >> > What does Matrix. iterateAll() contractually do? Practically it > > >> seems > > >> > >> to be > > >> > >> > row wise iteration for some implementations but it doesnt seem > > >> > >> > contractually state so in the javadoc. What is MatrixSlice if > it > > is > > >> > >> neither > > >> > >> > a row nor a colimn? How can i tell what exactly it is i am > > iterating > > >> > >> over? > > >> > >> > On Jun 19, 2013 12:21 AM, "Ted Dunning" <ted.dunn...@gmail.com > > > > >> > wrote: > > >> > >> > > > >> > >> >> On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix < > > >> jake.man...@gmail.com> > > >> > >> >> wrote: > > >> > >> >> > > >> > >> >>>> Question #2: which in-core solvers are available for Mahout > > >> > >> matrices? I > > >> > >> >>>> know there's SSVD, probably Cholesky, is there something > > else? In > > >> > >> >>>> paticular, i need to be solving linear systems, I guess > > Cholesky > > >> > >> should > > >> > >> >>> be > > >> > >> >>>> equipped enough to do just that? > > >> > >> >>>> > > >> > >> >>>> Question #3: why did we try to import Colt solvers rather > than > > >> > >> actually > > >> > >> >>>> depend on Colt in the first place? Why did we not accept > > Colt's > > >> > >> sparse > > >> > >> >>>> matrices and created native ones instead? > > >> > >> >>>> > > >> > >> >>>> Colt seems to have a notion of parse in-core matrices too > and > > >> seems > > >> > >> >> like > > >> > >> >>> a > > >> > >> >>>> well-rounded solution. However, it doesn't seem like being > > >> actively > > >> > >> >>>> supported, whereas I know Mahout experienced continued > > >> enhancements > > >> > >> to > > >> > >> >>> the > > >> > >> >>>> in-core matrix support. > > >> > >> >>>> > > >> > >> >>> > > >> > >> >>> Colt was totally abandoned, and I talked to the original > author > > >> and > > >> > he > > >> > >> >>> blessed it's adoption. When we pulled it in, we found it was > > >> > woefully > > >> > >> >>> undertested, > > >> > >> >>> and tried our best to hook it in with proper tests and use > APIs > > >> that > > >> > >> fit > > >> > >> >>> with > > >> > >> >>> the use cases we had. Plus, we already had the start of some > > >> linear > > >> > >> apis > > >> > >> >>> (i.e. > > >> > >> >>> the Vector interface) and dropping the API completely seemed > > not > > >> > >> terribly > > >> > >> >>> worth it at the time. > > >> > >> >>> > > >> > >> >> > > >> > >> >> There was even more to it than that. > > >> > >> >> > > >> > >> >> Colt was under-tested and there have been warts that had to be > > >> pulled > > >> > >> out > > >> > >> >> in much of the code. > > >> > >> >> > > >> > >> >> But, worse than that, Colt's matrix and vector structure was a > > real > > >> > >> bugger > > >> > >> >> to extend or change. It also had all kinds of cruft where it > > >> > >> pretended to > > >> > >> >> support matrices of things, but in fact only supported > matrices > > of > > >> > >> doubles > > >> > >> >> and floats. > > >> > >> >> > > >> > >> >> So using Colt as it was (and is since it is largely abandoned) > > was > > >> a > > >> > >> >> non-starter. > > >> > >> >> > > >> > >> >> As far as in-memory solvers, we have: > > >> > >> >> > > >> > >> >> 1) LR decomposition (tested and kinda fast) > > >> > >> >> > > >> > >> >> 2) Cholesky decomposition (tested) > > >> > >> >> > > >> > >> >> 3) SVD (tested) > > >> > >> >> > > >> > >> > > >> > > > > >> > > > >> > > > -- > > > -jake > > >