solvers on spark

Dmitriy Lyubimov Mon, 24 Jun 2013 10:49:58 -0700

Ok, so i was fairly easily able to build some DSL for our matrix
manipulation (similar to breeze) in scala:


inline matrix or vector:

val  a = dense((1, 2, 3), (3, 4, 5))

val b:Vector = (1,2,3)

block views and assignments (element/row/vector/block/block of row or
vector)


a(::, 0)
a(1, ::)
a(0 to 1, 1 to 2)

assignments

a(0, ::) :=(3, 5, 7)
a(0, 0 to 1) :=(3, 5)
a(0 to 1, 0 to 1) := dense((1, 1), (2, 2.5))

operators

// hadamard
val c = a * b
 a *= b

// matrix mul
 val m = a %*% b

and bunch of other little things like sum, mean, colMeans etc. That much is
easy.

Also stuff like the ones found in breeze along the lines

val (u,v,s) = svd(a)

diag ((1,2,3))

and Cholesky in similar ways.

I don't have "inline" initialization for sparse things (yet) simply because
i don't need them, but of course all regular java constructors and methods
are retained, all that is just a syntactic sugar in the spirit of DSLs in
hope to make things a bit mroe readable.

my (very little, and very insignificantly opinionated, really) criticism of
Breeze in this context is its inconsistency between dense and sparse
representations, namely, lack of consistent overarching trait(s), so that
building structure-agnostic solvers like Mahout's Cholesky solver is
impossible, as well as cross-type matrix use (say, the way i understand it,
it is pretty much imposible to multiply a sparse matrix by a dense matrix).

I suspect these problems stem from the fact that the authors for whatever
reason decided to hardwire dense things with JBlas solvers whereas i dont
believe matrix storage structures must be. But these problems do appear to
be serious enough  for me to ignore Breeze for now. If i decide to plug in
jblas dense solvers, i guess i will just have them as yet another top-level
routine interface taking any Matrix, e.g.

val (u,v,s) = svd(m, jblas=true)



On Sun, Jun 23, 2013 at 7:08 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> Thank you.
> On Jun 23, 2013 6:16 PM, "Ted Dunning" <ted.dunn...@gmail.com> wrote:
>
>> I think that this contract has migrated a bit from the first starting
>> point.
>>
>> My feeling is that there is a de facto contract now that the matrix slice
>> is a single row.
>>
>> Sent from my iPhone
>>
>> On Jun 23, 2013, at 16:32, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>
>> > What does Matrix. iterateAll() contractually do? Practically it seems
>> to be
>> > row wise iteration for some implementations but it doesnt seem
>> > contractually state so in the javadoc. What is MatrixSlice if it is
>> neither
>> > a row nor a colimn? How can i tell what exactly it is i am iterating
>> over?
>> > On Jun 19, 2013 12:21 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote:
>> >
>> >> On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix <jake.man...@gmail.com>
>> >> wrote:
>> >>
>> >>>> Question #2: which in-core solvers are available for Mahout
>> matrices? I
>> >>>> know there's SSVD, probably Cholesky, is there something else? In
>> >>>> paticular, i need to be solving linear systems, I guess Cholesky
>> should
>> >>> be
>> >>>> equipped enough to do just that?
>> >>>>
>> >>>> Question #3: why did we try to import Colt solvers rather than
>> actually
>> >>>> depend on Colt in the first place? Why did we not accept Colt's
>> sparse
>> >>>> matrices and created native ones instead?
>> >>>>
>> >>>> Colt seems to have a notion of parse in-core matrices too and seems
>> >> like
>> >>> a
>> >>>> well-rounded solution. However, it doesn't seem like being actively
>> >>>> supported, whereas I know Mahout experienced continued enhancements
>> to
>> >>> the
>> >>>> in-core matrix support.
>> >>>>
>> >>>
>> >>> Colt was totally abandoned, and I talked to the original author and he
>> >>> blessed it's adoption.  When we pulled it in, we found it was woefully
>> >>> undertested,
>> >>> and tried our best to hook it in with proper tests and use APIs that
>> fit
>> >>> with
>> >>> the use cases we had.  Plus, we already had the start of some linear
>> apis
>> >>> (i.e.
>> >>> the Vector interface) and dropping the API completely seemed not
>> terribly
>> >>> worth it at the time.
>> >>>
>> >>
>> >> There was even more to it than that.
>> >>
>> >> Colt was under-tested and there have been warts that had to be pulled
>> out
>> >> in much of the code.
>> >>
>> >> But, worse than that, Colt's matrix and vector structure was a real
>> bugger
>> >> to extend or change.  It also had all kinds of cruft where it
>> pretended to
>> >> support matrices of things, but in fact only supported matrices of
>> doubles
>> >> and floats.
>> >>
>> >> So using Colt as it was (and is since it is largely abandoned) was a
>> >> non-starter.
>> >>
>> >> As far as in-memory solvers, we have:
>> >>
>> >> 1) LR decomposition (tested and kinda fast)
>> >>
>> >> 2) Cholesky decomposition (tested)
>> >>
>> >> 3) SVD (tested)
>> >>
>>
>

Re: Mahout vectors/matrices/solvers on spark

Reply via email to