Well, in that specific case, I will accumulate in the client side,
collection of the intermediate parameters is not that big (numBlocks x
X.ncol). What I need is just mapping (keys, block) to a vector (currently,
a mapBlock has to map the block to the new block)

>From a general perspective, you are right, this is an accumulation.

Gokhan

On Mon, Nov 10, 2014 at 8:26 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Do you need a reduce or could you use an accumulator? Either is not really
> supported in the DSL but clearly these are required for certain algos.
> Broadcast vals supported but are read only.
>
> On Nov 8, 2014, at 12:42 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
>
> Hi,
>
> Based on Zinkevich et al.'s Parallelized Stochastic Gradient paper (
> http://martin.zinkevich.org/publications/nips2010.pdf), I tried to
> implement SGD, and a regularized least squares solution for linear
> regression (can easily be extended to other GLMs, too).
>
> How the algorithm works is as follows:
> 1. Split data into partitions of T examples
> 2. in parallel, for each partition:
>   2.0. shuffle partition
>   2.1. initialize parameter vector
>   2.2. for each example in the shuffled partition
>       2.2.1 update the parameter vector
> 3. Aggregate all the parameter vectors and return
>
> Here is an initial implementation to illustrate where I am stuck:
> https://github.com/gcapan/mahout/compare/optimization
>
> (See TODO in SGD.minimizeWithSgd[K])
>
> I was thinking that using a blockified matrix of training instances, step 2
> of the algorithm can run on blocks, and they can be aggregated in
> client-side. However, the only operator that I know in the DSL is mapBlock,
> and it requires the BlockMapFunction to map a block to another block of the
> same row size. In this context, I want to map a block (numRows x n) to the
> parameter vector of size n.
>
> The question is:
> 1- Is it possible to easily implement the above algorithm using DSL's
> current functionality? Could you tell me what I'm missing?
> 2- If there is not an easy way other than using the currently-non-existing
> mapBlock-like method, shall we add such an operator?
>
> Best,
>
> Gokhan
>
>

Reply via email to