Re: computing conditional probabilities with Hadoop?

Chris Dyer Mon, 01 Oct 2007 18:06:11 -0700

Thanks for the helpful replies on this.  The data that I am dealing
with has the characteristic that I may not be able to/want to load an
entire set of counts for <A, *> into memory for some values of A (the
curse of Zipfian distributions), so the final "join" step of the
process is the tricky part.


As of right now, I'm still having trouble determining how I can force
the first element of the set that will be iterated over by a single
reducer to be the marginal, and not some individual count.  Does
anyone know if Hadoop guarantees (can be made to guarantee) that the
relative order of keys that are equal will be left unchanged?  If so,
this would be a fairly easy solution.

Thank you!
Chris

Re: computing conditional probabilities with Hadoop?

Reply via email to