Hi Lewis, This is for sure a very interesting and something that GORA should deal with. It is funny that only now I found out that GORA actually means "Generic Object Representation using Avro". This means that we will always have to use Avro for everything? Never mind, we all can discuss about this when the time comes. For the little reading I did about data evolution, : - Schema along with data -> This could be done in a similar way as we are approaching the union fields i.e. append an extra field to the data with its schema, deserialize the schema, and then check if the data can actually suffice the query or not. Of course this would be part of 0.5 :) - Hash of the Schema along with the data, Schema versioning, Schema fingerprinting -> This needs some way of looking up saved schemas (versions, hashes, or schema fingerprints).
Renato M. 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney <[email protected]>: > Hi Folks, > I've ended up in a conversation [0] over on user@avro regarding Schema > evolution. > Right now our workflow is as follows > > * write .avsc schema and use GoraCompiler to generate Persistent data > beans. > * use the Persistent class whenever we wish to read to or write from the > data. > > AFAICT, as explained in [0], this presents us with a problem. Namely that > we have very sketchy support to Schema evolution over time. > > We narrowly avoided minor situation over in Nutch when we added a 'batchId' > Field to our WebPage Schema as some Tools when attempting to read Field's > which were simply not present for some records. > > So this thread is opened to discussion surrounding what we can/must do to > improve this. > Should we store the Schema along with the data? > Should we store a Hash of the Schema along with the data? > Should we support Schema versioning? > Should we support Schema fingerprinting? > > Of course this is something for the 0.5-SNAPSHOT development drive but it > is something which we need to sort out as time goes on. > > Ta > Lewis > > [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html > > -- > *Lewis* >

