Re: I keep getting multiple values for unique reduce keys

Sudharsan Sampath Sun, 04 Sep 2011 21:37:59 -0700

Hi,

I suspect it's something to do with your custom Writable. Do you have a
clear method on your container? If so, that should be used before the obj is
initialized every time to avoid retaining previous values due to object
reuse during ser-de process.


Thanks
Sudhan S



On Mon, Sep 5, 2011 at 6:11 AM, Rick Ross <r...@semanticresearch.com> wrote:

> Hi all,
>
> I have ensured that my mapper produces a unique key for every value it
> writes and further more that each map() call only writes one value.    I
> note here that the value is a custom for which I handle the Writable
> interface methods.
>
> I realize that it isn't very real world to have (well, want) no combining
> done prior to reducing, but I'm still getting my feet wet.
>
> When the reducer runs, I expected to see one reduce() call for every map()
> call, and I do.    However, the value I get is the composite of all the
> reduce() calls that came before it.
>
> So, for example, the mapper gets data like this :
>
>   ID,     Name,          Type,          Other stuff...
>   A000,   Cream,         Group,         ...
>   B231,   Led Zeppelin,  Group,         ...
>   A044,   Liberace,      Individual,    ...
>
>
> ID is the external key from the source data and is guaranteed to be unique.
>
> When I map it, I create a container for the row data and output that
> container with all the data from that row only and use the ID field as a
> key.
>
> Since the key is always unique I expected the sort/shuffle step to never
> coalesce any two values.    So I expected my reduce() method to be called
> once per mapped input row, and it is.
>
> The problem is, as each row is processed, the reducer sees a set of
> cumulative value data instead of a container with a row of data in it.  So
> the 'value' parameter to reduce always has the information from previous
> reduce steps.
>
> For example, given the data above :
>
> 1st Reducer Call :
>   Key = A000
>   Value =
>       Container :
>          (object 1) : Name = Cream, Type = Group, MBID = A000, ...
>
> 2nd Reducer Call :
>   Key = B231
>   Value =
>       Container :
>          (object 1) : Name = Led Zeppelin, Type = Group, MBID = B231, ...
>          (object 2) : Name = Cream, Type = Group, MBID = A000, ...
>
> So the second reduce call has data in it from the first reduce call.   Very
> strange!   At a guess I would say the reducer is re-using the object when it
> reads the objects back from the mapping step.  I dunno..
>
> If anyone has any ideas, I'm open to suggestions.      0.20.2-cdh3u0
>
> Thanks!
>
> R
>
>
>
>

Re: I keep getting multiple values for unique reduce keys

Reply via email to