Sure there is something wrong with requiring extra map-reduce passes. Without significant development it can be very expensive (shuffling, sorting and rewriting your whole output set can be a significant burden). Pointlessly so, since the extension is clear, safe and easier to explain then the restriction.

I think we can all agree that a project goal is to keep the design as simple and focused as possible. I'd find an argument against an extension based on those goals pretty compelling, but the lack of a feature in a paper from google doesn't seem like a compelling reason to reject something. The hadoop approach to many decisions varies from google's, this is not a bad thing.

I can not think of a case where this proposed extension complicates code or reduces compressibility. Since it is backwards compatible with your desired API, purists can simply ignore the option.

On Apr 1, 2006, at 9:29 AM, Andrew McNabb wrote:

On Sat, Apr 01, 2006 at 06:19:27PM +0100, Teppo Kurki (JIRA) wrote:

My original post about the issue gives a simple case that would benefit from this: http://www.mail-archive.com/hadoop-user% 40lucene.apache.org/msg00073.html


This should be done in two map-reduce phases.  There's nothing wrong
with running two phases (or 10,000).

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

Reply via email to