On Sep 22, 2010, at 8:54 AM, Thiruvalluvan M. G. wrote: > >> I'm worried that the semantics of reader and writer schemas are already >> complicated enough; adding in sets of schemas makes it even trickier. > > I understand your concern. I agree the union schema could become really > complicated over time, say after 5 revisions. We'll have to carry all the > five revisions even if know that nobody needs all the fiver revisions at any > given time. >
I'd also worry about the size. My schemas are 1k to 8k in JSON size. Keeping a copy of each revision in union branches is something I'd avoid with any larger schema. I'm fine with storing the schema as metadata with my data for most uses. There are a few tricky ones where something like your proposal would be helpful but I'm not sure overloading Unions for it is the right thing. For example, what if you want to serialize data into a browser cookie using Avro? There is no 'store the schema with the data' option here, period. You have to be able to identify what schema was used via a version identifier. The application can manage that, or Avro can. At minimum we should strive for documentation and advice on the issue. > Given this, let me work with the "external" schema-id idea and gain some > experience and then come back with a proposal. > > For now, let me withdraw my proposal. > > Thank you and Doug for the valuable feedback. > > Thiru > >
