Hi,

I want to implement some data partitioning logic where a mapper is assigned a specific range of values. Here is a concrete example of what I have in mind:

Suppose I have attributes A, B, C and the following tuples:

(A, B, C)
(1, 3, 1)
(1, 2, 2)
(1, 2, 3)
(12, 3, 4)
(12, 2, 5)
(12, 8, 6)
(12,  2, 7)

What I want to do is assign mapper x all the tuples where the C attribute = 1, 3, 5, and 7.

1-Is it possible to write a smart InputFormat class that can assign a set of records to a specific mapper? If so, how? 2-How will this type of partitioning logic interact with HDFS data locality?


Thanks,

Shirley

Reply via email to