Re: data partitioning question

Shirley Cohen Mon, 04 Aug 2008 19:49:26 -0700

Thanks, Qin. It sounds like you're saying that this type ofpartitioning needs its own map-reduce set.


I was hoping it could be done in the InputFormat class :))


Shirley

On Aug 4, 2008, at 2:49 PM, Qin Gao wrote:

For the first question, I think it is better to do it at reduce stage,
because the partitioner only consider the size of blocks in bytes.Instead
you can output the intermediate key/value pair as this:

key: 1 if C=1,3,5,7.     0 otherwise
value: the tuple.
In reducer you can have a reducer deal with all the key withc=1,3,5,7.
On Mon, Aug 4, 2008 at 3:29 PM, Shirley Cohen<[EMAIL PROTECTED]> wrote:
Hi,
I want to implement some data partitioning logic where a mapper isassigneda specific range of values. Here is a concrete example of what Ihave in
mind:

Suppose I have attributes A, B, C and the following tuples:

(A, B, C)
(1, 3, 1)
(1, 2, 2)
(1, 2, 3)
(12, 3, 4)
(12, 2, 5)
(12, 8, 6)
(12,  2, 7)
What I want to do is assign mapper x all the tuples where the Cattribute =
1, 3, 5, and 7.
1-Is it possible to write a smart InputFormat class that canassign a set
of records to a specific mapper? If so, how?
2-How will this type of partitioning logic interact with HDFS data
locality?


Thanks,

Shirley

Re: data partitioning question

Reply via email to