K-means needs Matrix oriented reducer.

K-sparse encoding needs a Matrix.

Micro-batch SGD needs Vector oriented.

Streaming k-means might be able to do with a matrix, but a structure with a
scalar and a matrix would be handier.

Also, in all the algorithms I have looked at, the reduce follows a mapBlock
call immediately.  Not sure if that makes fusion worthwhile.  Probably it
just means that may be an important optimizer idiom if fusion is helpful.





On Sun, Jul 13, 2014 at 10:02 PM, Anand Avati <[email protected]> wrote:

> How about a new drm API:
>
>
>   type ReduceFunc = (Vector, Vector) => Vector
>
>   def reduce(rf: ReduceFunc): Vector = { ... }
>
> The row keys in this case are ignored/erased, but I'm not sure if they are
> useful (or even meaningful) for reduction. Such an API should be sufficient
> for kmeans (in combination with mapBlock). But does this feel generic
> enough? Maybe a good start? Feedback welcome.
>
>
>
> On Sun, Jul 13, 2014 at 6:34 PM, Ted Dunning <[email protected]>
> wrote:
>
> >
> > Yeah.  Collect was where I had gotten, and was rather sulky about the
> > results.
> >
> > It does seem like a reduce is going to be necessary.
> >
> > Anybody else have thoughts on this?
> >
> > Sent from my iPhone
> >
> > > On Jul 13, 2014, at 17:58, Anand Avati <[email protected]> wrote:
> > >
> > > collect(), hoping the result fits in memory, and do the reduction
> > in-core.
> > > I think some kind of a reduce operator needs to be introduced for doing
> > > even simple things like scalable kmeans. Haven't thought of how it
> would
> > > look yet.
> >
>

Reply via email to