> Left unchanged would be fine but is probably very hard to enforce because of > the many map tasks and some uncertainty about which maps finished first. > Similarly useful would be the ability to require a particular sort ordering > on reduce values. Yes, there is often ambiguity, but often there is not. At the end of the day, as long as some consistent (if arbitrary) decision is made about the relative initial order about any two rows, then it seems this would work. Is there any reason why a consistent decision about the order couldn't be made? For the use case I'm considering, I would have all the marginals in one file that would be prepended to the file(s) containing the individual counts so there would be no ambiguity since they would come from two different map/reduce jobs.
Another option would be the ability to rewind the value iterator to the start of the key whose values were being iterated over. Looking at the existing iterator implementation makes it clear that this would be a non-trivial change, but I can't think of any real reason why it couldn't be done. However, I'm not sure if this is sort of operation is generally useful enough to warrant its inclusion. So, has anyone found a way to make this work? Chris On 10/1/07, Ted Dunning <[EMAIL PROTECTED]> wrote: > > This is a common requirement. > > Left unchanged would be fine but is probably very hard to enforce because of > the many map tasks and some uncertainty about which maps finished first. > Similarly useful would be the ability to require a particular sort ordering > on reduce values. > > > On 10/1/07 6:05 PM, "Chris Dyer" <[EMAIL PROTECTED]> wrote: > > > Does anyone know if Hadoop guarantees (can be made to guarantee) that the > > relative order of keys that are equal will be left unchanged? > >