You can extend/customize MultipleOutputs and pass schema related settings via properties prefixed with MO name, just like it is done with format classes there.
Also to send a dummy key or value why not just to use NullWritable? It's efficient as it does not consume any space. Sent from my iPhone On Jul 26, 2011, at 5:46 AM, Vyacheslav Zholudev <vyacheslav.zholu...@gmail.com> wrote: > Hi, > > I'm using the avro format both for input and output, for a mapper and a > reducer. I would like to output multiple avro items with different schemata. > For sequence files I would use the MultipleOutputs class from the mapreduce > package. > > I looked into the same class but from the old package "mapred" and realized > that I can pass an AvroOutputFormat.class parameter when adding another > output. However, I didn't manage to figure out how to provide an avro schema > for each output. Moreover, when writing to output , I need to provide a key > and a value, but in case of avro we usually just pass a specific avro object. > All above makes me think that the old MultipleOutputs API wouldn't work with > avro files. Am I right? > > Any pointers of how to output multiple avro records in the same reducer are > appreciated. > > P.S. Another thought was to create an avro schema of type union that will > contain all possible output schemata, but I would like to avoid that. > > Thanks in advance!!! > > -- > Best, > Vyacheslav