My only concern is to add certain loss minimization tools for people to write machine learning algorithms.
mapBlock as you suggested can work equally, but I happened to have implemented the aggregate op while thinking. Apart from this SGD implementation, blockify-a-matrix-and-run-an-operation-in-parallel-on-blocks is, I believe, certainly required, since block level parallelization is really common in matrix computations. Plus, if we are to add, say, a descriptive statistics package, that would require a similar functionality, too. If mapBlocks for passing custom operators was more flexible, I'd be more than happy, but I understand the idea behind its requirement of mapping should be block-to-block with the same row size. Could you give a little more detail on the 'common distributed strategy' idea? Aside: Do we have certain elementwise Math functions in Matrix DSL? That is, how can I do this? 1 + exp(drmA) Gokhan On Wed, Nov 12, 2014 at 7:55 PM, Dmitriy Lyubimov <[email protected]> wrote: > yes i usually follow #2 too. > > The thing is, pretty often algorithm can define its own set of strategies > the backend need to support (like this distributedEngine strategy) and keep > a lot of logic still common accross all strategies. But then if all-reduce > aggregate operation is incredibly common among such algorithm speicfic > strategies, then it stands to reason to implement it only once. > > I have an idea. > > Maybe we need a common distributed strategy which is different from > algebraic optimizer. That way we don't have to mess with algebraic > rewrites. how about that? > > On Wed, Nov 12, 2014 at 9:12 AM, Pat Ferrel <[email protected]> wrote: > > > So you are following #2, which is good. #1 seems a bit like a hack. For a > > long time to come we will have to add things to the DSL if it is to be > kept > > engine independent. Yours looks pretty general and simple. > > > > Are you familiar with the existing Mahout aggregate methods? They show up > > in the SGDHelper.java and other places in legacy code. I don’t know much > > about them but they seem to be a pre-functional programming attempt at > this > > kind of thing. It looks like you are proposing a replacement for those > > based on rdd.aggregate, if so that would be very useful. For one thing it > > looks like the old aggregate was not parallel, rdd.aggregate is. > > > > > > On Nov 11, 2014, at 1:18 PM, Gokhan Capan <[email protected]> wrote: > > > > So the alternatives are: > > > > 1- mapBlock to a matrix whose all rows-but-the first are empty, then > > aggregate > > 2- depend on a backend > > > > 1 is obviously OK. > > > > I don't like the idea of depending on a backend since SGD is a generic > loss > > minimization, on which other algorithms will possibly depend. > > > > In this context, client-side aggregation is not an overhead, but even if > it > > happens to be so, it doesn't have to be a client-side aggregate at all. > > > > Alternative to 1, I am thinking of at least having an aggregation > > operation, which will return an accumulated value anyway, and shouldn't > > affect algebra optimizations. > > > > I quickly implemented a naive one (supporting only Spark- I know I said > > that I don't like depending on a backend, but at least the backends-wide > > interface is consistent, and as a client, I still don't have to deal with > > Spark primitives directly). > > > > Is this nice enough? Is it too bad to have in the DSL? > > https://github.com/gcapan/mahout/compare/accumulateblocks > > > > Best > > > > Gokhan > > > > On Tue, Nov 11, 2014 at 10:45 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > Oh. algorithm actually collects the vectors and runs another cycle in > the > > > client! > > > > > > Still, technically, you can collect almost-empty blocks to the client > > > (since they are mostly empty, it won't cause THAT huge overhead > compared > > to > > > collecting single vectors, after all, how many partitions are we > talking > > > about? 1000? ). > > > > > > On Tue, Nov 11, 2014 at 12:41 PM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > > >> > > >> > > >> On Sat, Nov 8, 2014 at 12:42 PM, Gokhan Capan <[email protected]> > > wrote: > > >> > > >>> Hi, > > >>> > > >>> Based on Zinkevich et al.'s Parallelized Stochastic Gradient paper ( > > >>> http://martin.zinkevich.org/publications/nips2010.pdf), I tried to > > >>> implement SGD, and a regularized least squares solution for linear > > >>> regression (can easily be extended to other GLMs, too). > > >>> > > >>> How the algorithm works is as follows: > > >>> 1. Split data into partitions of T examples > > >>> 2. in parallel, for each partition: > > >>> 2.0. shuffle partition > > >>> 2.1. initialize parameter vector > > >>> 2.2. for each example in the shuffled partition > > >>> 2.2.1 update the parameter vector > > >>> 3. Aggregate all the parameter vectors and return > > >>> > > >> > > >> I guess technically it is possible (transform each block to a > > >> SparseRowMatrix or SparseMatrix with only first valid row) and then > > invoke > > >> colSums() or colMeans() (whatever aggregate means). > > >> > > >> However, i am not sure it is worth the ugliness. isn't it easier to > > >> declare these things quasi-algebraic and just do direct spark calls on > > the > > >> matrix RDD (map, aggregate)? > > >> > > >> The real danger is to introduce non-algebra things into algebra so > that > > >> the rest of the algebra doesn't optimize any more. > > >> > > >> > > > > > > > >
