On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
> My only concern is to add certain loss minimization tools for people to > write machine learning algorithms. > > mapBlock as you suggested can work equally, but I happened to have > implemented the aggregate op while thinking. > > Apart from this SGD implementation, > blockify-a-matrix-and-run-an-operation-in-parallel-on-blocks is, I believe, > certainly required, since block level parallelization is really common in > matrix computations. Plus, if we are to add, say, a descriptive statistics > package, that would require a similar functionality, too. > > If mapBlocks for passing custom operators was more flexible, I'd be more > than happy, but I understand the idea behind its requirement of mapping > should be block-to-block with the same row size. > > Could you give a little more detail on the 'common distributed strategy' > idea? > > > Aside: Do we have certain elementwise Math functions in Matrix DSL? That > is, how can I do this? > > 1 + exp(drmA) > if i understand is correctly, you mean you want to compute a_i,j <- 1+ exp(a_i,j) ? you can do simple things like 1 + drmA, that works. to apply more interesting functions elementwise, there's no concise dsl notation for that. but of course you can always use mapblock such as: drmA.mapBlock() { case (keys,block) => block := ((r,c,x) => 1+exp(x)) keys -> block } i guess you can figure dsl shortcut for that indeed, as in R, by defining package-level exp(DrmLike[K]) to expend to this fragment. But i guess then you'd need to get on the entire set of synonyms for scala.math functions, really. Not entirely untreasonable idea IMO. although single mapblock will probably be faster because it will not be a chain of composed closures. > > > > > Gokhan > > On Wed, Nov 12, 2014 at 7:55 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > yes i usually follow #2 too. > > > > The thing is, pretty often algorithm can define its own set of strategies > > the backend need to support (like this distributedEngine strategy) and > keep > > a lot of logic still common accross all strategies. But then if > all-reduce > > aggregate operation is incredibly common among such algorithm speicfic > > strategies, then it stands to reason to implement it only once. > > > > I have an idea. > > > > Maybe we need a common distributed strategy which is different from > > algebraic optimizer. That way we don't have to mess with algebraic > > rewrites. how about that? > > > > On Wed, Nov 12, 2014 at 9:12 AM, Pat Ferrel <p...@occamsmachete.com> > wrote: > > > > > So you are following #2, which is good. #1 seems a bit like a hack. > For a > > > long time to come we will have to add things to the DSL if it is to be > > kept > > > engine independent. Yours looks pretty general and simple. > > > > > > Are you familiar with the existing Mahout aggregate methods? They show > up > > > in the SGDHelper.java and other places in legacy code. I don’t know > much > > > about them but they seem to be a pre-functional programming attempt at > > this > > > kind of thing. It looks like you are proposing a replacement for those > > > based on rdd.aggregate, if so that would be very useful. For one thing > it > > > looks like the old aggregate was not parallel, rdd.aggregate is. > > > > > > > > > On Nov 11, 2014, at 1:18 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > > > > > > So the alternatives are: > > > > > > 1- mapBlock to a matrix whose all rows-but-the first are empty, then > > > aggregate > > > 2- depend on a backend > > > > > > 1 is obviously OK. > > > > > > I don't like the idea of depending on a backend since SGD is a generic > > loss > > > minimization, on which other algorithms will possibly depend. > > > > > > In this context, client-side aggregation is not an overhead, but even > if > > it > > > happens to be so, it doesn't have to be a client-side aggregate at all. > > > > > > Alternative to 1, I am thinking of at least having an aggregation > > > operation, which will return an accumulated value anyway, and shouldn't > > > affect algebra optimizations. > > > > > > I quickly implemented a naive one (supporting only Spark- I know I said > > > that I don't like depending on a backend, but at least the > backends-wide > > > interface is consistent, and as a client, I still don't have to deal > with > > > Spark primitives directly). > > > > > > Is this nice enough? Is it too bad to have in the DSL? > > > https://github.com/gcapan/mahout/compare/accumulateblocks > > > > > > Best > > > > > > Gokhan > > > > > > On Tue, Nov 11, 2014 at 10:45 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > > wrote: > > > > > > > Oh. algorithm actually collects the vectors and runs another cycle in > > the > > > > client! > > > > > > > > Still, technically, you can collect almost-empty blocks to the client > > > > (since they are mostly empty, it won't cause THAT huge overhead > > compared > > > to > > > > collecting single vectors, after all, how many partitions are we > > talking > > > > about? 1000? ). > > > > > > > > On Tue, Nov 11, 2014 at 12:41 PM, Dmitriy Lyubimov < > dlie...@gmail.com> > > > > wrote: > > > > > > > >> > > > >> > > > >> On Sat, Nov 8, 2014 at 12:42 PM, Gokhan Capan <gkhn...@gmail.com> > > > wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> Based on Zinkevich et al.'s Parallelized Stochastic Gradient paper > ( > > > >>> http://martin.zinkevich.org/publications/nips2010.pdf), I tried to > > > >>> implement SGD, and a regularized least squares solution for linear > > > >>> regression (can easily be extended to other GLMs, too). > > > >>> > > > >>> How the algorithm works is as follows: > > > >>> 1. Split data into partitions of T examples > > > >>> 2. in parallel, for each partition: > > > >>> 2.0. shuffle partition > > > >>> 2.1. initialize parameter vector > > > >>> 2.2. for each example in the shuffled partition > > > >>> 2.2.1 update the parameter vector > > > >>> 3. Aggregate all the parameter vectors and return > > > >>> > > > >> > > > >> I guess technically it is possible (transform each block to a > > > >> SparseRowMatrix or SparseMatrix with only first valid row) and then > > > invoke > > > >> colSums() or colMeans() (whatever aggregate means). > > > >> > > > >> However, i am not sure it is worth the ugliness. isn't it easier to > > > >> declare these things quasi-algebraic and just do direct spark calls > on > > > the > > > >> matrix RDD (map, aggregate)? > > > >> > > > >> The real danger is to introduce non-algebra things into algebra so > > that > > > >> the rest of the algebra doesn't optimize any more. > > > >> > > > >> > > > > > > > > > > > > >