Hi,
I want to implement some data partitioning logic where a mapper is
assigned a specific range of values. Here is a concrete example of
what I have in mind:
Suppose I have attributes A, B, C and the following tuples:
(A, B, C)
(1, 3, 1)
(1, 2, 2)
(1, 2, 3)
(12, 3, 4)
(12, 2, 5)
(12, 8, 6)
(12, 2, 7)
What I want to do is assign mapper x all the tuples where the C
attribute = 1, 3, 5, and 7.
1-Is it possible to write a smart InputFormat class that can assign a
set of records to a specific mapper? If so, how?
2-How will this type of partitioning logic interact with HDFS data
locality?
Thanks,
Shirley