Thanks, Qin. It sounds like you're saying that this type of
partitioning needs its own map-reduce set.
I was hoping it could be done in the InputFormat class :))
Shirley
On Aug 4, 2008, at 2:49 PM, Qin Gao wrote:
For the first question, I think it is better to do it at reduce stage,
because the partitioner only consider the size of blocks in bytes.
Instead
you can output the intermediate key/value pair as this:
key: 1 if C=1,3,5,7. 0 otherwise
value: the tuple.
In reducer you can have a reducer deal with all the key with
c=1,3,5,7.
On Mon, Aug 4, 2008 at 3:29 PM, Shirley Cohen
<[EMAIL PROTECTED]> wrote:
Hi,
I want to implement some data partitioning logic where a mapper is
assigned
a specific range of values. Here is a concrete example of what I
have in
mind:
Suppose I have attributes A, B, C and the following tuples:
(A, B, C)
(1, 3, 1)
(1, 2, 2)
(1, 2, 3)
(12, 3, 4)
(12, 2, 5)
(12, 8, 6)
(12, 2, 7)
What I want to do is assign mapper x all the tuples where the C
attribute =
1, 3, 5, and 7.
1-Is it possible to write a smart InputFormat class that can
assign a set
of records to a specific mapper? If so, how?
2-How will this type of partitioning logic interact with HDFS data
locality?
Thanks,
Shirley