Re: computing conditional probabilities with Hadoop?

Chris Dyer Mon, 01 Oct 2007 19:00:38 -0700

> Left unchanged would be fine but is probably very hard to enforce because of
> the many map tasks and some uncertainty about which maps finished first.
> Similarly useful would be the ability to require a particular sort ordering
> on reduce values.
Yes, there is often ambiguity, but often there is not.  At the end of
the day, as long as some consistent (if arbitrary) decision is made
about the relative initial order about any two rows, then it seems
this would work.  Is there any reason why a consistent decision about
the order couldn't be made?  For the use case I'm considering, I would
have all the marginals in one file that would be prepended to the
file(s) containing the individual counts so there would be no
ambiguity since they would come from two different map/reduce jobs.

Another option would be the ability to rewind the value iterator to
the start of the key whose values were being iterated over.  Looking
at the existing iterator implementation makes it clear that this would
be a non-trivial change, but I can't think of any real reason why it
couldn't be done.  However, I'm not sure if this is sort of operation
is generally useful enough to warrant its inclusion.

So, has anyone found a way to make this work?

Chris

On 10/1/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
> This is a common requirement.
>
> Left unchanged would be fine but is probably very hard to enforce because of
> the many map tasks and some uncertainty about which maps finished first.
> Similarly useful would be the ability to require a particular sort ordering
> on reduce values.
>
>
> On 10/1/07 6:05 PM, "Chris Dyer" <[EMAIL PROTECTED]> wrote:
>
> > Does anyone know if Hadoop guarantees (can be made to guarantee) that the
> > relative order of keys that are equal will be left unchanged?
>
>

Re: computing conditional probabilities with Hadoop?

Reply via email to