It is also possible to extend the input format so that it handles some files one way and other files another. The key can be any common supertype of the keys from the inputs (at worst Writable).
On Sun, May 22, 2011 at 12:35 PM, Sean Owen <[email protected]> wrote: > One solution is to create an "XOrYWritable" which holds either an X or > a Y. Then the jobs that output an X or a Y both output one same value > type, XOrYWritable. See VectorOrPrefWritable for instance. > > The Reducer can then check each value to pick out an X or a Y and get both. > > > In some cases you have to know the ordering, whether you'll get an X > or Y first. In this case you need some cleverness with the key. > Instead of a VarLongWritable for a key, you need something like > "EntityJoinKey" which contains a long value (the ID) but also a > boolean or integer that indicates an ordering. Maybe it adds a boolean > called "before". > > It needs to implement WritableComparable and order by the ID value, > but then by the before/after flag. > It also needs to specify a Partitioner which maps keys to the same > reducer if they have the same ID, regardless of before/after flag. > > This is fairly convenient because you have a clearer picture of which > values are coming in on "before" keys and then which are coming after. > > > It's definitely more complex, but it's doable. >
