I think that contrib modules would be very interesting.  Specifically, good
Scala DSL, pig integration and so on.


On Mon, Jun 24, 2013 at 9:55 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath <nick.pentre...@gmail.com
> >wrote:
>
> > That looks great Dmitry!
> >
> >
> > The thing about Breeze that drives the complexity in it is partly
> > specialization for Float, Double and Int matrices, and partly getting the
> > syntax to "just work" for all combinations of matrix types and operands
> > etc. mostly it does "just work" but occasionally not.
>
> yes i noticed that, but since i am wrapping Mahout matrices, there's only a
> choice of double-filled matrices and vectors. Actually, i would argue
> that's the way it is supposed to be in the interest of KISS principle. I am
> not sure i see a value in "int" matrices for any problem i ever worked on,
> and skipping on precision to save the space is even more far-fetched notion
> as in real life numbers don't take as much space as their pre-vectorized
> features and annotations. In fact. model training parts and linear algebra
> are not where memory bottleneck seems to fat-up at all in my experience.
> There's often exponentially growing cpu-bound behavior, yes, but not RAM.
>
>
>
> >
> >
> > I am surprised that dense * sparse matrix doesn't work but I guess as I
> > previously mentioned the sparse matrix support is a bit shaky.
> >
> This is solely based on eye-balling the trait architecture. I did not
> actually attempt it. But there's no single unifying trait for sure.
>
> >
> >
> > David Hall is pretty happy to both look into enhancements and help out
> for
> > contributions (eg I'm hoping to find time to look into a proper Diagonal
> > matrix implementation and he was very helpful with pointers etc), so
> please
> > do drop things into the google group mailing list. Hopefully wider
> adoption
> > especially by this type of community will drive Breeze development.
> >
> >
> > In another note I also really like Scaldings matrix API so scala ish
> > wrappers for mahout would be cool - another pet project of mine is a port
> > of that API to spark too :)
> >
> >
> > N
> >
> >
> >
> > —
> > Sent from Mailbox for iPhone
> >
> > On Mon, Jun 24, 2013 at 10:25 PM, Jake Mannix <jake.man...@gmail.com>
> > wrote:
> >
> > > Yeah, I'm totally on board with a pretty scala DSL on top of some of
> our
> > > stuff.
> > > In particular, I've been experimenting with with wrapping the
> > > DistributedRowMatrix
> > > in a scalding wrapper, so we can do things like
> > > val matrixAsTypedPipe =
> > >    DistributedRowMatrixPipe(new DistributedRowMatrix(numRows, numCols,
> > > path, conf))
> > > // e.g. L1 normalize:
> > >   matrixAsTypedPipe.map((idx, v) : (Int, Vector) => (idx,
> > v.normalize(1)) )
> > >                                  .write(new
> > > DistributedRowMatrixPipe(outputPath, conf))
> > > // and anything else you would want to do with a scalding
> TypedPipe[Int,
> > > Vector]
> > > Currently I've been doing this with a package structure directly in
> > Mahout,
> > > in:
> > >    mahout/contrib/scalding
> > > What do people think about having this be something real, after 0.8
> goes
> > > out?  Are
> > > we ready for contrib modules which fold in diverse external projects in
> > new
> > > ways?
> > > Integrating directly with pig and scalding is a bit too wide of a tent
> > for
> > > Mahout core,
> > > but putting these integrations in entirely new projects is maybe a bit
> > too
> > > far away.
> > > On Mon, Jun 24, 2013 at 11:30 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> > >> Dmitriy,
> > >>
> > >> This is very pretty.
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > >> wrote:
> > >>
> > >> > Ok, so i was fairly easily able to build some DSL for our matrix
> > >> > manipulation (similar to breeze) in scala:
> > >> >
> > >> > inline matrix or vector:
> > >> >
> > >> > val  a = dense((1, 2, 3), (3, 4, 5))
> > >> >
> > >> > val b:Vector = (1,2,3)
> > >> >
> > >> > block views and assignments (element/row/vector/block/block of row
> or
> > >> > vector)
> > >> >
> > >> >
> > >> > a(::, 0)
> > >> > a(1, ::)
> > >> > a(0 to 1, 1 to 2)
> > >> >
> > >> > assignments
> > >> >
> > >> > a(0, ::) :=(3, 5, 7)
> > >> > a(0, 0 to 1) :=(3, 5)
> > >> > a(0 to 1, 0 to 1) := dense((1, 1), (2, 2.5))
> > >> >
> > >> > operators
> > >> >
> > >> > // hadamard
> > >> > val c = a * b
> > >> >  a *= b
> > >> >
> > >> > // matrix mul
> > >> >  val m = a %*% b
> > >> >
> > >> > and bunch of other little things like sum, mean, colMeans etc. That
> > much
> > >> is
> > >> > easy.
> > >> >
> > >> > Also stuff like the ones found in breeze along the lines
> > >> >
> > >> > val (u,v,s) = svd(a)
> > >> >
> > >> > diag ((1,2,3))
> > >> >
> > >> > and Cholesky in similar ways.
> > >> >
> > >> > I don't have "inline" initialization for sparse things (yet) simply
> > >> because
> > >> > i don't need them, but of course all regular java constructors and
> > >> methods
> > >> > are retained, all that is just a syntactic sugar in the spirit of
> > DSLs in
> > >> > hope to make things a bit mroe readable.
> > >> >
> > >> > my (very little, and very insignificantly opinionated, really)
> > criticism
> > >> of
> > >> > Breeze in this context is its inconsistency between dense and sparse
> > >> > representations, namely, lack of consistent overarching trait(s), so
> > that
> > >> > building structure-agnostic solvers like Mahout's Cholesky solver is
> > >> > impossible, as well as cross-type matrix use (say, the way i
> > understand
> > >> it,
> > >> > it is pretty much imposible to multiply a sparse matrix by a dense
> > >> matrix).
> > >> >
> > >> > I suspect these problems stem from the fact that the authors for
> > whatever
> > >> > reason decided to hardwire dense things with JBlas solvers whereas i
> > dont
> > >> > believe matrix storage structures must be. But these problems do
> > appear
> > >> to
> > >> > be serious enough  for me to ignore Breeze for now. If i decide to
> > plug
> > >> in
> > >> > jblas dense solvers, i guess i will just have them as yet another
> > >> top-level
> > >> > routine interface taking any Matrix, e.g.
> > >> >
> > >> > val (u,v,s) = svd(m, jblas=true)
> > >> >
> > >> >
> > >> >
> > >> > On Sun, Jun 23, 2013 at 7:08 PM, Dmitriy Lyubimov <
> dlie...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Thank you.
> > >> > > On Jun 23, 2013 6:16 PM, "Ted Dunning" <ted.dunn...@gmail.com>
> > wrote:
> > >> > >
> > >> > >> I think that this contract has migrated a bit from the first
> > starting
> > >> > >> point.
> > >> > >>
> > >> > >> My feeling is that there is a de facto contract now that the
> matrix
> > >> > slice
> > >> > >> is a single row.
> > >> > >>
> > >> > >> Sent from my iPhone
> > >> > >>
> > >> > >> On Jun 23, 2013, at 16:32, Dmitriy Lyubimov <dlie...@gmail.com>
> > >> wrote:
> > >> > >>
> > >> > >> > What does Matrix. iterateAll() contractually do? Practically it
> > >> seems
> > >> > >> to be
> > >> > >> > row wise iteration for some implementations but it doesnt seem
> > >> > >> > contractually state so in the javadoc. What is MatrixSlice if
> it
> > is
> > >> > >> neither
> > >> > >> > a row nor a colimn? How can i tell what exactly it is i am
> > iterating
> > >> > >> over?
> > >> > >> > On Jun 19, 2013 12:21 AM, "Ted Dunning" <ted.dunn...@gmail.com
> >
> > >> > wrote:
> > >> > >> >
> > >> > >> >> On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix <
> > >> jake.man...@gmail.com>
> > >> > >> >> wrote:
> > >> > >> >>
> > >> > >> >>>> Question #2: which in-core solvers are available for Mahout
> > >> > >> matrices? I
> > >> > >> >>>> know there's SSVD, probably Cholesky, is there something
> > else? In
> > >> > >> >>>> paticular, i need to be solving linear systems, I guess
> > Cholesky
> > >> > >> should
> > >> > >> >>> be
> > >> > >> >>>> equipped enough to do just that?
> > >> > >> >>>>
> > >> > >> >>>> Question #3: why did we try to import Colt solvers rather
> than
> > >> > >> actually
> > >> > >> >>>> depend on Colt in the first place? Why did we not accept
> > Colt's
> > >> > >> sparse
> > >> > >> >>>> matrices and created native ones instead?
> > >> > >> >>>>
> > >> > >> >>>> Colt seems to have a notion of parse in-core matrices too
> and
> > >> seems
> > >> > >> >> like
> > >> > >> >>> a
> > >> > >> >>>> well-rounded solution. However, it doesn't seem like being
> > >> actively
> > >> > >> >>>> supported, whereas I know Mahout experienced continued
> > >> enhancements
> > >> > >> to
> > >> > >> >>> the
> > >> > >> >>>> in-core matrix support.
> > >> > >> >>>>
> > >> > >> >>>
> > >> > >> >>> Colt was totally abandoned, and I talked to the original
> author
> > >> and
> > >> > he
> > >> > >> >>> blessed it's adoption.  When we pulled it in, we found it was
> > >> > woefully
> > >> > >> >>> undertested,
> > >> > >> >>> and tried our best to hook it in with proper tests and use
> APIs
> > >> that
> > >> > >> fit
> > >> > >> >>> with
> > >> > >> >>> the use cases we had.  Plus, we already had the start of some
> > >> linear
> > >> > >> apis
> > >> > >> >>> (i.e.
> > >> > >> >>> the Vector interface) and dropping the API completely seemed
> > not
> > >> > >> terribly
> > >> > >> >>> worth it at the time.
> > >> > >> >>>
> > >> > >> >>
> > >> > >> >> There was even more to it than that.
> > >> > >> >>
> > >> > >> >> Colt was under-tested and there have been warts that had to be
> > >> pulled
> > >> > >> out
> > >> > >> >> in much of the code.
> > >> > >> >>
> > >> > >> >> But, worse than that, Colt's matrix and vector structure was a
> > real
> > >> > >> bugger
> > >> > >> >> to extend or change.  It also had all kinds of cruft where it
> > >> > >> pretended to
> > >> > >> >> support matrices of things, but in fact only supported
> matrices
> > of
> > >> > >> doubles
> > >> > >> >> and floats.
> > >> > >> >>
> > >> > >> >> So using Colt as it was (and is since it is largely abandoned)
> > was
> > >> a
> > >> > >> >> non-starter.
> > >> > >> >>
> > >> > >> >> As far as in-memory solvers, we have:
> > >> > >> >>
> > >> > >> >> 1) LR decomposition (tested and kinda fast)
> > >> > >> >>
> > >> > >> >> 2) Cholesky decomposition (tested)
> > >> > >> >>
> > >> > >> >> 3) SVD (tested)
> > >> > >> >>
> > >> > >>
> > >> > >
> > >> >
> > >>
> > > --
> > >   -jake
> >
>

Reply via email to