Re: Question on static chunking.

Doug Cutting Thu, 25 Jan 2007 11:35:36 -0800

[ This question is probably more appropriate for hadoop-user. ]


Anis Ahmed wrote:

1. Do the processing as part of REDUCE. I will ensure that i use the same
intermediate key for a batch of 50 entries inside MAP. (have a static
counter, for every 50 change the intermediate key and so on) so that REDUCE
will get an iterator of 50


This is probably the simplest approach.  Has it proven too slow?

2. The option above has a lot of I/O, sorting etc. So instead...
Inside MAP, create a in mem pool (intialized in configure() ) and when50 is
reached do the Bizz logic and clear pool.

Alternately you could define an InputFormat that reads 50 lines at atime instead of a single line.


Doug

Re: Question on static chunking.

Reply via email to