Re: elementwise operator improvements experiments

Dmitriy Lyubimov Mon, 17 Nov 2014 12:21:11 -0800

This is an astute observation.

There are few things that make it difficult though to accept however:


(1) Occam razor assumption that f will be sought as most simple
implementations. Namely, that they would be deterministic and stateless.
Note that something like

new SparseMatrix() := { if (rnd.next > 0.5) 0 else 1 }

will render our test for sparsity assignment also non-deterministic, which
is not good.

Now, solely for the sake of self-irony, we can play Sheldon Cooper for a
second and say that this implementation is pointless, and that in general
any non-deterministic implementation is pointless, and therefore
implausible.

But being implausible is not the same as impossible, so then we have a
classic conflict between performance and correctness here. The problem is
being exacerbated by the fact that implausible problems hardest to find,
and the fact that this is now user-facing code, not just @mahout-dev facing
problem where we could have tackled it simply by documenting conventions.

Now big downside is that of course @public will likely write a lot of dense
elementwise things aMx := f instead of aMx ::= f, but this will only cause
performance errors, but not correctness errors.

And i generally side with correctness and explicit overrides in case like
this. (and proper tutorial)

(2) call f(0) may be illegal due to function domain issues and cause
interruption. Now, again, not very plausible thing, but same argumentations
apply (non fool-proof things are hard to find).

(3) `matrix argument is sparse` condition is problematic because it assumes
that there's nothing to gain by skipping zeros in dense matrix. This relies
heavily on assumption that cost of computing f is about the same as cost of
traversal (and in cases like  f=exp(x) it probably is). But in cases when
cost of computing f >> cost of dense traversal, we may find that
denseMatrix ::= f is actually non-trivially better performant than
denseMatrix := f.

(4) finally, in current code there's only one version of assign, which is

mx := f(row, col, x)

and test code is just adding version

mx := f(x)

but the former form of functional assignment is still pretty much in
demand. This former form is what is a particular problem, since we cannot
efficiently test  f(0) = 0 for the entire domain of { (row,col) }.



Finally, unfortunately it looks like there will always be some performance
tricks like that. e.g. for in-memory stuff, as it stands, one needs to
realize that stuff like

2 *=: (A *= A)

(i.e. simple 2*A^2  computation)

is going to do 2 traversals of the same matrix structure whereas

 A ::= { (x) => 2 * x * x }

is going to have only one and therefore is likely to be perhaps almost
twice as efficient (not to mention sparsity assignment optimality issues),
so efficiency will require some black magic for foreseeable future in cases
beyond just prototyping, especially for in-memory stuff, anyway.



On Mon, Nov 17, 2014 at 2:16 AM, Ted Dunning <[email protected]> wrote:

> Isn't it true that sparse iteration should always be used for m := f iff
>
> 1) the matrix argument is sparse
>
> AND
>
> 2) f(0) == 0
>
> ?
>
> Why the need for syntactic notation at all?  This property is much easier
> to test than commutativity.
>
>
> On Sun, Nov 16, 2014 at 7:42 AM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > another thing is that optimizer isn't capable of figuring out all
> > elementwise fusions in an elementwise expression, e.g. it is not seing
> > commutativity rewrites such as
> >
> > A * B * A
> >
> > should optimally be computed as sqr(A) * B (it will do it as two pairwise
> > operators (A*B)*A).
> >
> > Bummer.
> >
> > To do it truly right, it needs to fuse entire elementwise expressions
> first
> > and then optmize them separately.
> >
> > Ok that's probably too much for now. I am quite ok with writing something
> > like -0.5 * (a * a ) for now.
> >
> > On Sat, Nov 15, 2014 at 10:14 AM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > PS
> > >
> > > actually applying an exponent funciton in place will require addtional
> > > underscore it looks. It doesn't want to treat function name as function
> > > type in this context for some reason (although it does not require
> > partial
> > > syntax when used in arguments inside parenthesis):
> > >
> > > m := exp _
> > >
> > > Scala is quirky this way i guess
> > >
> > >
> > > On Sat, Nov 15, 2014 at 10:02 AM, Dmitriy Lyubimov <[email protected]>
> > > wrote:
> > >
> > >> So i did quick experimentation with elementwise operator improvements:
> > >>
> > >> (1) stuff like 1 + exp (M):
> > >>
> > >> (1a): this requires generalization in optimizer for elementwise unary
> > >> operators. I've added things like notion if operators require non-zero
> > >> iteration only or not.
> > >>
> > >> (1b): added fusion of elemntwise operators, i.e.
> > >>    ew(1+, ew(exp, A)) is rewritten as ew (1+exp, A) for performance
> > >> reasons. It still uses an application of a fold over functional
> monoid,
> > but
> > >> i think it should be fairly ok performance/DSL trade-off here. to get
> it
> > >> even better, we may add functional assignment syntax to distributed
> > >> operands similar to in-memory types as descrbed further down.
> > >>
> > >> (1c): notion that self elementwise things such as expr1 * expr1 (which
> > is
> > >> surprisingly often ocurrence, e..g in Torgerson MDS) are rewritten as
> > ew(A,
> > >> square) etc.
> > >>
> > >> So that much works. (Note that this also obsoletes dedicated
> > >> scalar/matrix elementwise operators that there currently are). Good.
> > >>
> > >>
> > >> The problem here is that (of course!) semantics of the scala language
> > has
> > >> problem importing something like exp(Double):Double alongside with
> > >> exp(DRM):DRM apparently because it doesn't adhere to overloading rules
> > >> (different results) so in practice even though it is allowed, one
> import
> > >> overshadows the other.
> > >>
> > >> Which means, for the sake of DSL we can't have exp(matrix), we have to
> > >> name it something else. Unless you see a better solution.
> > >>
> > >> So ... elementwise naming options:
> > >>
> > >> Matrix: mexp(m), msqrt(m). msignum(m)
> > >> Vector: vexp(v), vsqrt(v)...
> > >> DRM: dexp(drm), dsqrt(drm) .... ?
> > >>
> > >> Let me know what you think.
> > >>
> > >> (2) Another problem is that actually doing something like 1+exp(m) on
> > >> Matrix or Vector types is pretty impractical since, unlike in R (that
> > can
> > >> count number of bound variables to an object) the semantics requires
> > >> creating a clone of m for something like exp(m) to guarantee no side
> > >> effects on m itself.
> > >>
> > >> That is, expression 1 + exp(m) for Matrix or vector types causes 2
> > >> clone-copies of original argument.
> > >>
> > >> actually that's why i use in-place syntax for in-memory types quite
> > >> often, something like
> > >>
> > >>    1+=: (x *= x)  instead of more naturally looking 1+ x * x.
> > >>
> > >>
> > >> But unlike with simple elementwise operators (+=), there's no in-place
> > >> modification syntax for a function. We could put an additional
> > parameter,
> > >> something like
> > >>
> > >> mexp(m, inPlace=true) but i don't like it too much.
> > >>
> > >> What i like much more is functional assignment (we already have
> > >> assignment to a function (row, col, x) => Double but we can add
> > elementwise
> > >> function assignment) so that it really looks like
> > >>
> > >> m := exp
> > >>
> > >> That is pretty cool.
> > >> Except there's a problem of optimality of assignment.
> > >>
> > >> There are functions here (e.g. abs, sqrt) that don't require full
> > >> iteration but rather non-zero iteration only. by default notation
> > >>
> > >> m := func
> > >>
> > >> implies dense iteration.
> > >>
> > >> So what i suggest here is add a new syntax to do sparse iteration
> > >> functional assignments:
> > >>
> > >> m ::= abs
> > >>
> > >> I actually like it (a lot) because it short and because it allows for
> > >> more complex formulas in the same traversal, e.g. proverbial R's
> > exp(m)+1
> > >> in-place will look
> > >>
> > >> m := (1 + exp(_))
> > >>
> > >> So not terrible.
> > >>
> > >> What it lacks though is automatic determination of composite function
> > >> need to apply to all vs. non-zeros only for in-memory types (for
> > >> distributed types optimizer tracks this automatically).
> > >>
> > >> i.e.
> > >>
> > >> m := abs
> > >>
> > >> is not optimal (because abs doesn't affect 0s) and
> > >>
> > >> m ::= (abs(_) + 1)
> > >>
> > >> is probably also not what one wants (when we have composition of dense
> > >> and sparse affecting functions, result is dense affecting function).
> > >>
> > >> So here thoughts are also welcome. (I am inclining to go with ::= and
> :=
> > >> operators because functional assignment of complex expressions is the
> > only
> > >> well performing strategy i see here for in-memory types).
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> >
>

Re: elementwise operator improvements experiments

Reply via email to