I am using Hadoop to process large data files whose formats are
dynamic.  In order to support dynamic data without embedding the
metadata with the data on every record (such as with MapWritable), I
decided to write my values as BytesWritable "blobs", with an external
schema file that describes the name, type, and ordering of the fields
written into the BytesWritable.  I wrote 2 methods
"BytesWritableToMap" and "MapToBytesWritable" that, using the schema,
convert the data from the BytesWritable "blob" to a Map<String,
Writable> and vice-verse.  The data is stored as BytesWritable,
converted to Map<String, Writable> when I'm dealing with it, and
converted back to BytesWritable to output.  The schema file is a
separate text file that looks something like this:

Field1 : org.apache.hadoop.io.Text;
Field2 : org.apache.hadoop.io.Text;

When I read the description of Avro, it sounded exactly like what I
had done, except with a much broader scope.  If it turns out to be a
replacement for what I've written, then it only makes sense for me to
adopt it.

So I guess I'm (1) looking for "hello world" in avro, and (2)
attempting to determine the level of integration between avro and
Hadoop.  Do avro InputFormat/OutputFormat classes exist?  (I'm not
even sure if that question makes sense yet... don't know enough about
avro yet...)

I'll take a look at the junit tests.  Thanks!


On Fri, Sep 18, 2009 at 3:16 PM, Doug Cutting <[email protected]> wrote:
> Unfortunately Avro does not yet have good introductory documentation or
> examples.  The closest thing to examples are the unit tests.
>
> Can you tell more about what you want to do?
>
> Doug
>
> Stuart White wrote:
>>
>> I *think* avro is applicable to a problem I'm working on, but I'm
>> having difficulty getting started with it.  Is there a "getting
>> started" guide available?  Or an example "hello world" that I can look
>> at?  Can someone point me in the direction for how to start using
>> avro?
>>
>> Thanks!
>

Reply via email to