Re: How to get started with avro?

Doug Cutting Fri, 18 Sep 2009 14:45:20 -0700

Stuart White wrote:

So I guess I'm (1) looking for "hello world" in avro, and (2)
attempting to determine the level of integration between avro and
Hadoop.  Do avro InputFormat/OutputFormat classes exist?

This is not yet a mature area. I wish integration with Hadoop wasfurther along.

In Hadoop 0.21 (the next release) should be possible to useSequenceFile{Input,Output}Format with Avro specific and reflect data.


This is due to the changes in:

https://issues.apache.org/jira/browse/HADOOP-6120

and

https://issues.apache.org/jira/browse/HADOOP-6165

(Note however that patch did not add tests for end-to-end MapReduce, sothere may still be some issues.)

For Avro generic data, perhaps the most useful with MapReduce, you'dneed to somehow get the schema to the Serializer and Deserializer thatare used by the shuffle, since I think it still uses the deprecatedSerializationFactory#getSerialization(Class). This could be done byhaving the application or InputFormat add the schema to the job'sConfiguration, then have (a subclass of) AvroGenericDeserializer findfor it there. (The Deserializer is Configurable, so it should have acopy of the Configuration available to it.) You'd use the class namepassed in (metadata.get(CLASS_KEY) as the key to help lookup the schemain the config. Does that make any sense?

There's also an open issue to define an InputFormat/OutputFormat forAvro's container file format:


https://issues.apache.org/jira/browse/MAPREDUCE-815

If you're interested in helping push this forward I'll help too.

Doug

Re: How to get started with avro?

Reply via email to