solvers on spark

Dmitriy Lyubimov Mon, 24 Jun 2013 13:53:55 -0700

On Mon, Jun 24, 2013 at 1:24 PM, Jake Mannix <jake.man...@gmail.com> wrote:


> Yeah, I'm totally on board with a pretty scala DSL on top of some of our
> stuff.
>
> In particular, I've been experimenting with with wrapping the
> DistributedRowMatrix
> in a scalding wrapper, so we can do things like
>
> val matrixAsTypedPipe =
>    DistributedRowMatrixPipe(new DistributedRowMatrix(numRows, numCols,
> path, conf))
>
> // e.g. L1 normalize:
>   matrixAsTypedPipe.map((idx, v) : (Int, Vector) => (idx, v.normalize(1)) )
>                                  .write(new
> DistributedRowMatrixPipe(outputPath, conf))
>
> // and anything else you would want to do with a scalding TypedPipe[Int,
> Vector]
>
> Currently I've been doing this with a package structure directly in Mahout,
> in:
>
>    mahout/contrib/scalding
>
> What do people think about having this be something real, after 0.8 goes
> out?  Are
> we ready for contrib modules which fold in diverse external projects in new
> ways?
> Integrating directly with pig and scalding is a bit too wide of a tent for
> Mahout core,
> but putting these integrations in entirely new projects is maybe a bit too
> far away.
>
> +1,

i've been putting this into module mahout-math-scala for the past couple of
days on the Mahout itself and keep merging with trunk. (here:
https://github.com/dlyubimov/mahout-commits/tree/dev-0.8.x-scala/math-scala)
. In case anyone wants to look. Not much to look at at the moment though i
guess.

Since it is seamlessly compiled by maven and all scala stuff is readily
available in maven repo, i don't see any operational reason not to include
it in post-0.8 tree.

However, this probably on its own is not terribly useful until I get
spark-based distributed solver collection rolled in another module that
depends on it. (well, it may turn out to be  an ugly battle with our bosses
to contribute it). but, i probably will have some straightforward
Hu-Koren-Volinsky spark-based stuff in a couple of days on top of it.




>
> On Mon, Jun 24, 2013 at 11:30 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> > Dmitriy,
> >
> > This is very pretty.
> >
> >
> >
> >
> > On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> >
> > > Ok, so i was fairly easily able to build some DSL for our matrix
> > > manipulation (similar to breeze) in scala:
> > >
> > > inline matrix or vector:
> > >
> > > val  a = dense((1, 2, 3), (3, 4, 5))
> > >
> > > val b:Vector = (1,2,3)
> > >
> > > block views and assignments (element/row/vector/block/block of row or
> > > vector)
> > >
> > >
> > > a(::, 0)
> > > a(1, ::)
> > > a(0 to 1, 1 to 2)
> > >
> > > assignments
> > >
> > > a(0, ::) :=(3, 5, 7)
> > > a(0, 0 to 1) :=(3, 5)
> > > a(0 to 1, 0 to 1) := dense((1, 1), (2, 2.5))
> > >
> > > operators
> > >
> > > // hadamard
> > > val c = a * b
> > >  a *= b
> > >
> > > // matrix mul
> > >  val m = a %*% b
> > >
> > > and bunch of other little things like sum, mean, colMeans etc. That
> much
> > is
> > > easy.
> > >
> > > Also stuff like the ones found in breeze along the lines
> > >
> > > val (u,v,s) = svd(a)
> > >
> > > diag ((1,2,3))
> > >
> > > and Cholesky in similar ways.
> > >
> > > I don't have "inline" initialization for sparse things (yet) simply
> > because
> > > i don't need them, but of course all regular java constructors and
> > methods
> > > are retained, all that is just a syntactic sugar in the spirit of DSLs
> in
> > > hope to make things a bit mroe readable.
> > >
> > > my (very little, and very insignificantly opinionated, really)
> criticism
> > of
> > > Breeze in this context is its inconsistency between dense and sparse
> > > representations, namely, lack of consistent overarching trait(s), so
> that
> > > building structure-agnostic solvers like Mahout's Cholesky solver is
> > > impossible, as well as cross-type matrix use (say, the way i understand
> > it,
> > > it is pretty much imposible to multiply a sparse matrix by a dense
> > matrix).
> > >
> > > I suspect these problems stem from the fact that the authors for
> whatever
> > > reason decided to hardwire dense things with JBlas solvers whereas i
> dont
> > > believe matrix storage structures must be. But these problems do appear
> > to
> > > be serious enough  for me to ignore Breeze for now. If i decide to plug
> > in
> > > jblas dense solvers, i guess i will just have them as yet another
> > top-level
> > > routine interface taking any Matrix, e.g.
> > >
> > > val (u,v,s) = svd(m, jblas=true)
> > >
> > >
> > >
> > > On Sun, Jun 23, 2013 at 7:08 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > > wrote:
> > >
> > > > Thank you.
> > > > On Jun 23, 2013 6:16 PM, "Ted Dunning" <ted.dunn...@gmail.com>
> wrote:
> > > >
> > > >> I think that this contract has migrated a bit from the first
> starting
> > > >> point.
> > > >>
> > > >> My feeling is that there is a de facto contract now that the matrix
> > > slice
> > > >> is a single row.
> > > >>
> > > >> Sent from my iPhone
> > > >>
> > > >> On Jun 23, 2013, at 16:32, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> > > >>
> > > >> > What does Matrix. iterateAll() contractually do? Practically it
> > seems
> > > >> to be
> > > >> > row wise iteration for some implementations but it doesnt seem
> > > >> > contractually state so in the javadoc. What is MatrixSlice if it
> is
> > > >> neither
> > > >> > a row nor a colimn? How can i tell what exactly it is i am
> iterating
> > > >> over?
> > > >> > On Jun 19, 2013 12:21 AM, "Ted Dunning" <ted.dunn...@gmail.com>
> > > wrote:
> > > >> >
> > > >> >> On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix <
> > jake.man...@gmail.com>
> > > >> >> wrote:
> > > >> >>
> > > >> >>>> Question #2: which in-core solvers are available for Mahout
> > > >> matrices? I
> > > >> >>>> know there's SSVD, probably Cholesky, is there something else?
> In
> > > >> >>>> paticular, i need to be solving linear systems, I guess
> Cholesky
> > > >> should
> > > >> >>> be
> > > >> >>>> equipped enough to do just that?
> > > >> >>>>
> > > >> >>>> Question #3: why did we try to import Colt solvers rather than
> > > >> actually
> > > >> >>>> depend on Colt in the first place? Why did we not accept Colt's
> > > >> sparse
> > > >> >>>> matrices and created native ones instead?
> > > >> >>>>
> > > >> >>>> Colt seems to have a notion of parse in-core matrices too and
> > seems
> > > >> >> like
> > > >> >>> a
> > > >> >>>> well-rounded solution. However, it doesn't seem like being
> > actively
> > > >> >>>> supported, whereas I know Mahout experienced continued
> > enhancements
> > > >> to
> > > >> >>> the
> > > >> >>>> in-core matrix support.
> > > >> >>>>
> > > >> >>>
> > > >> >>> Colt was totally abandoned, and I talked to the original author
> > and
> > > he
> > > >> >>> blessed it's adoption.  When we pulled it in, we found it was
> > > woefully
> > > >> >>> undertested,
> > > >> >>> and tried our best to hook it in with proper tests and use APIs
> > that
> > > >> fit
> > > >> >>> with
> > > >> >>> the use cases we had.  Plus, we already had the start of some
> > linear
> > > >> apis
> > > >> >>> (i.e.
> > > >> >>> the Vector interface) and dropping the API completely seemed not
> > > >> terribly
> > > >> >>> worth it at the time.
> > > >> >>>
> > > >> >>
> > > >> >> There was even more to it than that.
> > > >> >>
> > > >> >> Colt was under-tested and there have been warts that had to be
> > pulled
> > > >> out
> > > >> >> in much of the code.
> > > >> >>
> > > >> >> But, worse than that, Colt's matrix and vector structure was a
> real
> > > >> bugger
> > > >> >> to extend or change.  It also had all kinds of cruft where it
> > > >> pretended to
> > > >> >> support matrices of things, but in fact only supported matrices
> of
> > > >> doubles
> > > >> >> and floats.
> > > >> >>
> > > >> >> So using Colt as it was (and is since it is largely abandoned)
> was
> > a
> > > >> >> non-starter.
> > > >> >>
> > > >> >> As far as in-memory solvers, we have:
> > > >> >>
> > > >> >> 1) LR decomposition (tested and kinda fast)
> > > >> >>
> > > >> >> 2) Cholesky decomposition (tested)
> > > >> >>
> > > >> >> 3) SVD (tested)
> > > >> >>
> > > >>
> > > >
> > >
> >
>
>
>
> --
>
>   -jake
>

Re: Mahout vectors/matrices/solvers on spark

Reply via email to