Thanks, Qin. It sounds like you're saying that this type of partitioning needs its own map-reduce set.

I was hoping it could be done in the InputFormat class :))

Shirley

On Aug 4, 2008, at 2:49 PM, Qin Gao wrote:

For the first question, I think it is better to do it at reduce stage,
because the partitioner only consider the size of blocks in bytes. Instead
you can output the intermediate key/value pair as this:

key: 1 if C=1,3,5,7.     0 otherwise
value: the tuple.

In reducer you can have a reducer deal with all the key with c=1,3,5,7.

On Mon, Aug 4, 2008 at 3:29 PM, Shirley Cohen <[EMAIL PROTECTED]> wrote:

Hi,

I want to implement some data partitioning logic where a mapper is assigned a specific range of values. Here is a concrete example of what I have in
mind:

Suppose I have attributes A, B, C and the following tuples:

(A, B, C)
(1, 3, 1)
(1, 2, 2)
(1, 2, 3)
(12, 3, 4)
(12, 2, 5)
(12, 8, 6)
(12,  2, 7)

What I want to do is assign mapper x all the tuples where the C attribute =
1, 3, 5, and 7.

1-Is it possible to write a smart InputFormat class that can assign a set
of records to a specific mapper? If so, how?
2-How will this type of partitioning logic interact with HDFS data
locality?


Thanks,

Shirley



Reply via email to