[ This question is probably more appropriate for hadoop-user. ]
Anis Ahmed wrote:
1. Do the processing as part of REDUCE. I will ensure that i use the same
intermediate key for a batch of 50 entries inside MAP. (have a static
counter, for every 50 change the intermediate key and so on) so that REDUCE
will get an iterator of 50
This is probably the simplest approach. Has it proven too slow?
2. The option above has a lot of I/O, sorting etc. So instead...
Inside MAP, create a in mem pool (intialized in configure() ) and when
50 is
reached do the Bizz logic and clear pool.
Alternately you could define an InputFormat that reads 50 lines at a
time instead of a single line.
Doug