Jake, I think you have a good idea here.

I wonder if this effect could be achieved by having each mapper simply block
until it gets the word from some coordination layer that it is time for it
to read the current state of the SGD from a shared store somewhere.
Something like ZK would make a fine coordination layer.  Handing the state
around efficiently will take some thought, though.  Putting it in HDFS is
probably a bit slow.  Sending it directly to the next worker seems fragile.
If it is relatively small then putting it in ZK might even work.


On Thu, Jan 28, 2010 at 3:17 PM, Jake Mannix <[email protected]> wrote:

> Hadoop isn't doing real parallism via this approach, but is sending
> your process to where your data is, which is a lot better than opening
> up a hook into one big HDFS stream and slurping down the entire set
> locally, I'd imagine, given that he says that network latency is the
> bottleneck when he streams data.
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to