On Sun, Jul 13, 2014 at 4:22 PM, Ted Dunning <[email protected]> wrote:
> I have a program that I am trying to build that has this pattern: > > broadcast state to all blocks > block map to do a bit of computation, create local state > merge all of the local states back to the global state > repeat > > What is the suggestion for merging the local state back to the global > state? > I don't think there is an operator designed to do this. One way you could do this, is it run a "thin" (ncol=1) mapBlock() output followed by a collect(), hoping the result fits in memory, and do the reduction in-core. I think some kind of a reduce operator needs to be introduced for doing even simple things like scalable kmeans. Haven't thought of how it would look yet.
