Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

Eric Baldeschwieler Sat, 01 Apr 2006 16:40:10 -0800

Sure there is something wrong with requiring extra map-reducepasses. Without significant development it can be very expensive(shuffling, sorting and rewriting your whole output set can be asignificant burden). Pointlessly so, since the extension is clear,safe and easier to explain then the restriction.

I think we can all agree that a project goal is to keep the design assimple and focused as possible. I'd find an argument against anextension based on those goals pretty compelling, but the lack of afeature in a paper from google doesn't seem like a compelling reasonto reject something. The hadoop approach to many decisions variesfrom google's, this is not a bad thing.

I can not think of a case where this proposed extension complicatescode or reduces compressibility. Since it is backwards compatiblewith your desired API, purists can simply ignore the option.


On Apr 1, 2006, at 9:29 AM, Andrew McNabb wrote:

On Sat, Apr 01, 2006 at 06:19:27PM +0100, Teppo Kurki (JIRA) wrote:
My original post about the issue gives a simple case that wouldbenefit from this: http://www.mail-archive.com/hadoop-user%40lucene.apache.org/msg00073.html
This should be done in two map-reduce phases.  There's nothing wrong
with running two phases (or 10,000).

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

Reply via email to