Konstantin Shvachko wrote:
Here is another example, that I dealt with.
I wanted to use different value types (long, float or string) for both
map and reduce tasks,
depending on the actual key values. So the solution was to encode the
value type into the key value.
I used keys of the form
l:<name> - indicating the value type is expected to be long
f:<name> - value type is expected to be float
s:<name> - value is a string
The example is under HADOOP-95.
Thought somebody might find it practical.
On a related note, ObjectWritable can be used as input or output type,
and can wrap any Writable class, thus permitting polymorphic inputs and
outputs. Nutch uses this to, e.g., combine a URL's incoming anchor
texts and its content when indexing. The input type is ObjectWritable,
and the indexer's InputFormat wraps values from a variety of files. The
indexing reducer can then use the 'instanceof' operator to determine how
to process each input value. To be more object-oriented, one could have
all of these classes implement some Indexable interface whose methods
are invoked when reducing.
Doug