Re: I keep getting multiple values for unique reduce keys

Sudharsan Sampath Mon, 05 Sep 2011 22:25:50 -0700

Hi Rick,

If possible can u share your custom writable that's configured as the value
type for the reducer.


Thanks
Sudhan S

On Tue, Sep 6, 2011 at 10:41 AM, Rick Ross <r...@semanticresearch.com>wrote:

> I'm still poking around on this and I was wondering if there is a way to
> see the intermediate files that the mapper writes and the ones that the
> reducer reads.    I might get some clues in there.
>
> Thanks
>
> R
>
> On Sep 4, 2011, at 10:14 PM, Rick Ross wrote:
>
> Thanks, but unless I misread you, that didn't do it.     Naturally the
> object that I am creating just has a couple of ArrayLists to gather up Name
> and Type objects.
>
> I suspect I need to extend ArrayWritable instead.   I'll try that next.
>
> Cheers.
>
> R
>
> On Sep 4, 2011, at 9:37 PM, Sudharsan Sampath wrote:
>
> Hi,
>
> I suspect it's something to do with your custom Writable. Do you have a
> clear method on your container? If so, that should be used before the obj is
> initialized every time to avoid retaining previous values due to object
> reuse during ser-de process.
>
> Thanks
> Sudhan S
>
>
>
> On Mon, Sep 5, 2011 at 6:11 AM, Rick Ross <r...@semanticresearch.com>wrote:
>
>> Hi all,
>>
>> I have ensured that my mapper produces a unique key for every value it
>> writes and further more that each map() call only writes one value.    I
>> note here that the value is a custom for which I handle the Writable
>> interface methods.
>>
>> I realize that it isn't very real world to have (well, want) no combining
>> done prior to reducing, but I'm still getting my feet wet.
>>
>> When the reducer runs, I expected to see one reduce() call for every map()
>> call, and I do.    However, the value I get is the composite of all the
>> reduce() calls that came before it.
>>
>> So, for example, the mapper gets data like this :
>>
>>   ID,     Name,          Type,          Other stuff...
>>   A000,   Cream,         Group,         ...
>>   B231,   Led Zeppelin,  Group,         ...
>>   A044,   Liberace,      Individual,    ...
>>
>>
>> ID is the external key from the source data and is guaranteed to be
>> unique.
>>
>> When I map it, I create a container for the row data and output that
>> container with all the data from that row only and use the ID field as a
>> key.
>>
>> Since the key is always unique I expected the sort/shuffle step to never
>> coalesce any two values.    So I expected my reduce() method to be called
>> once per mapped input row, and it is.
>>
>> The problem is, as each row is processed, the reducer sees a set of
>> cumulative value data instead of a container with a row of data in it.  So
>> the 'value' parameter to reduce always has the information from previous
>> reduce steps.
>>
>> For example, given the data above :
>>
>> 1st Reducer Call :
>>   Key = A000
>>   Value =
>>       Container :
>>          (object 1) : Name = Cream, Type = Group, MBID = A000, ...
>>
>> 2nd Reducer Call :
>>   Key = B231
>>   Value =
>>       Container :
>>          (object 1) : Name = Led Zeppelin, Type = Group, MBID = B231, ...
>>          (object 2) : Name = Cream, Type = Group, MBID = A000, ...
>>
>> So the second reduce call has data in it from the first reduce call.
>> Very strange!   At a guess I would say the reducer is re-using the object
>> when it reads the objects back from the mapping step.  I dunno..
>>
>> If anyone has any ideas, I'm open to suggestions.      0.20.2-cdh3u0
>>
>> Thanks!
>>
>> R
>>
>>
>>
>>
>
>
>

Re: I keep getting multiple values for unique reduce keys

Reply via email to