Sure there is something wrong with requiring extra map-reduce
passes. Without significant development it can be very expensive
(shuffling, sorting and rewriting your whole output set can be a
significant burden). Pointlessly so, since the extension is clear,
safe and easier to explain then the restriction.
I think we can all agree that a project goal is to keep the design as
simple and focused as possible. I'd find an argument against an
extension based on those goals pretty compelling, but the lack of a
feature in a paper from google doesn't seem like a compelling reason
to reject something. The hadoop approach to many decisions varies
from google's, this is not a bad thing.
I can not think of a case where this proposed extension complicates
code or reduces compressibility. Since it is backwards compatible
with your desired API, purists can simply ignore the option.
On Apr 1, 2006, at 9:29 AM, Andrew McNabb wrote:
On Sat, Apr 01, 2006 at 06:19:27PM +0100, Teppo Kurki (JIRA) wrote:
My original post about the issue gives a simple case that would
benefit from this: http://www.mail-archive.com/hadoop-user%
40lucene.apache.org/msg00073.html
This should be done in two map-reduce phases. There's nothing wrong
with running two phases (or 10,000).
--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868