Re: SGD Implementation and Questions for mapBlock like functionality

Gokhan Capan Wed, 12 Nov 2014 13:29:21 -0800

My only concern is to add certain loss minimization tools for people to
write machine learning algorithms.


mapBlock as you suggested can work equally, but I happened to have
implemented the aggregate op while thinking.

Apart from this SGD implementation,
blockify-a-matrix-and-run-an-operation-in-parallel-on-blocks is, I believe,
certainly required, since block level parallelization is really common in
matrix computations. Plus, if we are to add, say, a descriptive statistics
package, that would require a similar functionality, too.

If mapBlocks for passing custom operators was more flexible, I'd be more
than happy, but I understand the idea behind its requirement of mapping
should be block-to-block with the same row size.

Could you give a little more detail on the 'common distributed strategy'
idea?


Aside: Do we have certain elementwise Math functions in Matrix DSL? That
is, how can I do this?

1 + exp(drmA)




Gokhan

On Wed, Nov 12, 2014 at 7:55 PM, Dmitriy Lyubimov <[email protected]> wrote:

> yes i usually follow #2 too.
>
> The thing is, pretty often algorithm can define its own set of strategies
> the backend need to support (like this distributedEngine strategy) and keep
> a lot of logic still common accross all strategies. But then if all-reduce
> aggregate operation is incredibly common among such algorithm speicfic
> strategies, then it stands to reason to implement it only once.
>
> I have an idea.
>
> Maybe we need a common distributed strategy which is different from
> algebraic optimizer. That way we don't have to mess with algebraic
> rewrites. how about that?
>
> On Wed, Nov 12, 2014 at 9:12 AM, Pat Ferrel <[email protected]> wrote:
>
> > So you are following #2, which is good. #1 seems a bit like a hack. For a
> > long time to come we will have to add things to the DSL if it is to be
> kept
> > engine independent. Yours looks pretty general and simple.
> >
> > Are you familiar with the existing Mahout aggregate methods? They show up
> > in the SGDHelper.java and other places in legacy code. I don’t know much
> > about them but they seem to be a pre-functional programming attempt at
> this
> > kind of thing. It looks like you are proposing a replacement for those
> > based on rdd.aggregate, if so that would be very useful. For one thing it
> > looks like the old aggregate was not parallel, rdd.aggregate is.
> >
> >
> > On Nov 11, 2014, at 1:18 PM, Gokhan Capan <[email protected]> wrote:
> >
> > So the alternatives are:
> >
> > 1- mapBlock to a matrix whose all rows-but-the first are empty, then
> > aggregate
> > 2- depend on a backend
> >
> > 1 is obviously OK.
> >
> > I don't like the idea of depending on a backend since SGD is a generic
> loss
> > minimization, on which other algorithms will possibly depend.
> >
> > In this context, client-side aggregation is not an overhead, but even if
> it
> > happens to be so, it doesn't have to be a client-side aggregate at all.
> >
> > Alternative to 1, I am thinking of at least having an aggregation
> > operation, which will return an accumulated value anyway, and shouldn't
> > affect algebra optimizations.
> >
> > I quickly implemented a naive one (supporting only Spark- I know I said
> > that I don't like depending on a backend, but at least the backends-wide
> > interface is consistent, and as a client, I still don't have to deal with
> > Spark primitives directly).
> >
> > Is this nice enough? Is it too bad to have in the DSL?
> > https://github.com/gcapan/mahout/compare/accumulateblocks
> >
> > Best
> >
> > Gokhan
> >
> > On Tue, Nov 11, 2014 at 10:45 PM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > Oh. algorithm actually collects the vectors and runs another cycle in
> the
> > > client!
> > >
> > > Still, technically, you can collect almost-empty blocks to the client
> > > (since they are mostly empty, it won't cause THAT huge overhead
> compared
> > to
> > > collecting single vectors, after all, how many partitions are we
> talking
> > > about? 1000? ).
> > >
> > > On Tue, Nov 11, 2014 at 12:41 PM, Dmitriy Lyubimov <[email protected]>
> > > wrote:
> > >
> > >>
> > >>
> > >> On Sat, Nov 8, 2014 at 12:42 PM, Gokhan Capan <[email protected]>
> > wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Based on Zinkevich et al.'s Parallelized Stochastic Gradient paper (
> > >>> http://martin.zinkevich.org/publications/nips2010.pdf), I tried to
> > >>> implement SGD, and a regularized least squares solution for linear
> > >>> regression (can easily be extended to other GLMs, too).
> > >>>
> > >>> How the algorithm works is as follows:
> > >>> 1. Split data into partitions of T examples
> > >>> 2. in parallel, for each partition:
> > >>>   2.0. shuffle partition
> > >>>   2.1. initialize parameter vector
> > >>>   2.2. for each example in the shuffled partition
> > >>>       2.2.1 update the parameter vector
> > >>> 3. Aggregate all the parameter vectors and return
> > >>>
> > >>
> > >> I guess technically it is possible (transform each block to a
> > >> SparseRowMatrix or SparseMatrix with only first valid row) and then
> > invoke
> > >> colSums() or colMeans() (whatever aggregate means).
> > >>
> > >> However, i am not sure it is worth the ugliness. isn't it easier to
> > >> declare these things quasi-algebraic and just do direct spark calls on
> > the
> > >> matrix RDD (map, aggregate)?
> > >>
> > >> The real danger is to introduce non-algebra things into algebra so
> that
> > >> the rest of the algebra doesn't optimize any more.
> > >>
> > >>
> > >
> >
> >
>

Re: SGD Implementation and Questions for mapBlock like functionality

Reply via email to