Hi, I suspect it's something to do with your custom Writable. Do you have a clear method on your container? If so, that should be used before the obj is initialized every time to avoid retaining previous values due to object reuse during ser-de process.
Thanks Sudhan S On Mon, Sep 5, 2011 at 6:11 AM, Rick Ross <r...@semanticresearch.com> wrote: > Hi all, > > I have ensured that my mapper produces a unique key for every value it > writes and further more that each map() call only writes one value. I > note here that the value is a custom for which I handle the Writable > interface methods. > > I realize that it isn't very real world to have (well, want) no combining > done prior to reducing, but I'm still getting my feet wet. > > When the reducer runs, I expected to see one reduce() call for every map() > call, and I do. However, the value I get is the composite of all the > reduce() calls that came before it. > > So, for example, the mapper gets data like this : > > ID, Name, Type, Other stuff... > A000, Cream, Group, ... > B231, Led Zeppelin, Group, ... > A044, Liberace, Individual, ... > > > ID is the external key from the source data and is guaranteed to be unique. > > When I map it, I create a container for the row data and output that > container with all the data from that row only and use the ID field as a > key. > > Since the key is always unique I expected the sort/shuffle step to never > coalesce any two values. So I expected my reduce() method to be called > once per mapped input row, and it is. > > The problem is, as each row is processed, the reducer sees a set of > cumulative value data instead of a container with a row of data in it. So > the 'value' parameter to reduce always has the information from previous > reduce steps. > > For example, given the data above : > > 1st Reducer Call : > Key = A000 > Value = > Container : > (object 1) : Name = Cream, Type = Group, MBID = A000, ... > > 2nd Reducer Call : > Key = B231 > Value = > Container : > (object 1) : Name = Led Zeppelin, Type = Group, MBID = B231, ... > (object 2) : Name = Cream, Type = Group, MBID = A000, ... > > So the second reduce call has data in it from the first reduce call. Very > strange! At a guess I would say the reducer is re-using the object when it > reads the objects back from the mapping step. I dunno.. > > If anyone has any ideas, I'm open to suggestions. 0.20.2-cdh3u0 > > Thanks! > > R > > > >