Hi Doug, thanks for your response - yeah i had worked it out. However I felt there was a need for a SeekableByteArrayInput - I filed a JIRA ( http://issues.apache.org/jira/browse/AVRO-126) and submitted a patch. That was really useful when storing things in Voldemort - in the case of a K/V store, it may be overkill to always store the schema along...
Thanks, Florian On Mon, Sep 28, 2009 at 12:14 PM, Doug Cutting <[email protected]> wrote: > Florian Leibert wrote: > >> I just figured out that I can just use the GenericDatumWriter instead of >> the DataFileWriter - the former doesn't store the schema in the file while >> the latter does. >> > > Florian, > > It sounds like you worked this one out for yourself. Different DatumWriter > implementations encode equivalent data identically. They differ in how the > data is represented in Java, not when serialized. > > The best practice with Avro is to store the schema with serialized data, so > that later, even if the schema in your application has changed, you can > still read that data. Avro's data file stores the schema once per file. > Avro RPC clients pass the MD5 hash of their schema with each request, and, > when a server has not seen that version of the schema, the client must > resubmit the request with the full schema. If you're, e.g., potentially > storing different versions of a record in a database, then you might > consider annotating each entry with the hash of its schema and separately > maintaining a table mapping hashes to schemas, so that applications can > always find the schema that was used to write the data when processing it. > > I hope this helps! > > Cheers, > > Doug >
