Re: Hadoop in Action Partitioner Example

2011-08-23 Thread Mehmet Tepedelenlioglu
Thanks, that is very useful to know.

On Aug 23, 2011, at 4:40 PM, Chris White wrote:

> Job.setGroupingComparatorClass will allow you to define a
> RawComparator class, in which you can only compare the K1 component of
> K. The Reduce sort will still sort all K's using the compareTo method
> of K, but will use the grouping comparator when deciding which values
> to pass to the reduce method.
> 
> On Tue, Aug 23, 2011 at 7:25 PM, Mehmet Tepedelenlioglu
>  wrote:
>> For those of you who has the book, on page 49 there is a custom partitioner 
>> example. It basically describes a situation where the map emits , but 
>> the key is a compound key like (K1,K2), and we want to reduce over K1s and 
>> not the whole of the Ks. This is used as an example of a situation where a 
>> custom partitioner should be written to hash over K1 to send the right keys 
>> to the same reducers. But as far as I know, although this would partition 
>> the keys correctly (send them to the correct reducers), the reduce function 
>> would still be called (grouped under) with the original keys K, not yielding 
>> the desired results. The only way of doing this that I know of is to create 
>> a new WritableComparable, that carries all of K, but only uses K1 for 
>> hash/equal/compare methods, in which case you would not need to write your 
>> own partitioner anyways. Am I misinterpreting something the author meant, or 
>> is there something I don't know going on? It would have been sweet if I 
>> could accomplish all that with just the partitioner. Either I am 
>> misunderstanding something fundamental, or I am misunderstanding the 
>> example's intention, or there is something wrong with it.
>> 
>> Thanks,
>> 
>> Mehmet
>> 
>> 



Re: Hadoop in Action Partitioner Example

2011-08-23 Thread Chris White
Job.setGroupingComparatorClass will allow you to define a
RawComparator class, in which you can only compare the K1 component of
K. The Reduce sort will still sort all K's using the compareTo method
of K, but will use the grouping comparator when deciding which values
to pass to the reduce method.

On Tue, Aug 23, 2011 at 7:25 PM, Mehmet Tepedelenlioglu
 wrote:
> For those of you who has the book, on page 49 there is a custom partitioner 
> example. It basically describes a situation where the map emits , but 
> the key is a compound key like (K1,K2), and we want to reduce over K1s and 
> not the whole of the Ks. This is used as an example of a situation where a 
> custom partitioner should be written to hash over K1 to send the right keys 
> to the same reducers. But as far as I know, although this would partition the 
> keys correctly (send them to the correct reducers), the reduce function would 
> still be called (grouped under) with the original keys K, not yielding the 
> desired results. The only way of doing this that I know of is to create a new 
> WritableComparable, that carries all of K, but only uses K1 for 
> hash/equal/compare methods, in which case you would not need to write your 
> own partitioner anyways. Am I misinterpreting something the author meant, or 
> is there something I don't know going on? It would have been sweet if I could 
> accomplish all that with just the partitioner. Either I am misunderstanding 
> something fundamental, or I am misunderstanding the example's intention, or 
> there is something wrong with it.
>
> Thanks,
>
> Mehmet
>
>


Hadoop in Action Partitioner Example

2011-08-23 Thread Mehmet Tepedelenlioglu
For those of you who has the book, on page 49 there is a custom partitioner 
example. It basically describes a situation where the map emits , but the 
key is a compound key like (K1,K2), and we want to reduce over K1s and not the 
whole of the Ks. This is used as an example of a situation where a custom 
partitioner should be written to hash over K1 to send the right keys to the 
same reducers. But as far as I know, although this would partition the keys 
correctly (send them to the correct reducers), the reduce function would still 
be called (grouped under) with the original keys K, not yielding the desired 
results. The only way of doing this that I know of is to create a new 
WritableComparable, that carries all of K, but only uses K1 for 
hash/equal/compare methods, in which case you would not need to write your own 
partitioner anyways. Am I misinterpreting something the author meant, or is 
there something I don't know going on? It would have been sweet if I could 
accomplish all that with just the partitioner. Either I am misunderstanding 
something fundamental, or I am misunderstanding the example's intention, or 
there is something wrong with it. 

Thanks,

Mehmet