You can extend/customize MultipleOutputs and pass schema related settings via 
properties prefixed with MO name, just like it is done with format classes 
there.

Also to send a dummy key or value why not just to use NullWritable? It's 
efficient as it does not consume any space.

Sent from my iPhone

On Jul 26, 2011, at 5:46 AM, Vyacheslav Zholudev 
<vyacheslav.zholu...@gmail.com> wrote:

> Hi,
> 
> I'm using the avro format both for input and output, for a mapper and a 
> reducer. I would like to output multiple avro items with different schemata. 
> For sequence files I would use the MultipleOutputs class from the mapreduce 
> package.
> 
> I looked into the same class but from the old package "mapred" and realized 
> that I can pass an AvroOutputFormat.class parameter when adding another 
> output. However, I didn't manage to figure out how to provide an avro schema 
> for each output. Moreover, when writing to output , I need to provide a key 
> and a value, but in case of avro we usually just pass a specific avro object. 
> All above makes me think that the old MultipleOutputs API wouldn't work with 
> avro files. Am I right?
> 
> Any pointers of how to output multiple avro records in the same reducer are 
> appreciated. 
> 
> P.S. Another thought was to create an avro schema of type union that will 
> contain all possible output schemata, but I would like to avoid that.
> 
> Thanks in advance!!!
> 
> -- 
> Best,
> Vyacheslav

Reply via email to