Reducer.reduce method's OutputCollector is too strict, it shoudn't need the key
to be WritableComparable
--------------------------------------------------------------------------------------------------------
Key: HADOOP-1827
URL: https://issues.apache.org/jira/browse/HADOOP-1827
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.14.0
Reporter: Arun C Murthy
The output of the {{Reducer}}'s reduce method is *not* sorted, hence the
{{OutputCollector}} passed to it shouldn't require the *key* to be
{{WritableComparable}}; passing a {{Writable}} should suffice.
Thus
{code: title=Reducer.java}
public interface Reducer<K2 extends WritableComparable, V2 extends Writable,
K3 extends WritableComparable, V3 extends Writable>
extends JobConfigurable, Closeable {
void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output,
Reporter reporter)
throws IOException;
}
{code}
should, technically, be:
{code: title=Reducer.java}
public interface Reducer<K2 extends WritableComparable, V2 extends Writable,
K3 extends Writable, V3 extends Writable>
extends JobConfigurable, Closeable {
void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output,
Reporter reporter)
throws IOException;
}
{code}
Pros:
It removes an artificial limitation where it forces applications to emit
<{{WritableComparable}}, {{Writable}}> pair, rather than a <{{Writable}},
{{Writable}}> pair, there-by easing some applications (I ran into a few
recently... admittedly trivial ones).
Cons:
1. We now need a separate {{Combiner}} interface, since the combiner's
{{OutputCollector}} *needs* to be able to sort keys, hence requires a
{{WritableComparable}} - same as the {{Mapper}}.
2. We need a separate {{SortableOutputCollector}} (for {{Mapper}}/{{Combiner}})
and a {{NonSortableOutputCollector}} (for {{Reducer}}).
3. Alas! As a consequence of (1) & (2)we cannot use the same class as both a
{{Reducer}} and {{Combiner}} anymore, a serious compatibility issue.
The purpose of this issue is two-fold:
1. Spark a discussion among folks, both hadoop-dev & hadoop-users, to figure if
this really is a problem i.e. do folks really care about this anomaly in the
existing {{Reducer}} interface? Also, is it worth the pain (@see 'Cons') to go
fix it.
2. Even if we decide to live with it, this issue could record for posterity why
we love hadoop, warts and all. *smile*
Lets discuss...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.