[ https://issues.apache.org/jira/browse/AVRO-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790298#action_12790298 ]
Philip Zeyliger commented on AVRO-251: -------------------------------------- Does the serialization and deserialization to binary schemas belong in Schema.java or does it belong in a nearby class? I think the usecase for it (I know you have one in mind, and we're hinting at it in this JIRA) ought to be spelled out in the JavaDoc for the appropriate methods. bq. Note that this currently does not preserve every nuance, e.g., user properties. So my vote is to remove default values as well. If you're not preserving user properties, I'm +1 for killing the defaults. This leaves us in a place where we have representations of schemas that, without other representations, we can't read data with. (The way I think of it, we always need two schemas: the schema the data was written with, and the schema the data is being read with. We can use the binary version for the former, but not the latter. Is that right? Do we have names for these two schemas?) If you were inclined towards keeping the defaults, I would keep pushing for storing them as avro-encoded binary bytes. bq. It's nice to see how little code is required to incorporate full JSON data into Avro. Yes, that JSON itself has a small schema is re-assuring. I'm +1 for taking this out of this patch, but separately producing a tool to represent "binary JSON" in Avro. Just to be sure we've thought of it, one alternative is to ditch the whole binary representation and store the original schema in Avro-encoded binary JSON. I actually prefer schemas to be typed. bq. This "event based" programming style requires only a bit more coding than wrapper classes, but saves a level of redirection and/or copies. I appreciate that with ValidatingEncoder we get a sense of security. But I have a hard time buying the performance argument here. I think you would agree that using either the specific (my preference) API or the generic API would be clearer from a code perspective. If the performance of the specific API is crap, then we need to measure it and fix it: after all, that is the API Avro recommends people to use. Considering that set of schemas in a program should have small cardinality, and the binary representation could be cached, speed doesn't seem paramount here. I agree that event-based models are very useful for things that, say, don't fit into memory readily. Schemas pretty much have to fit into memory readily, so I don't think the case applies here. > add schema for schemas > ---------------------- > > Key: AVRO-251 > URL: https://issues.apache.org/jira/browse/AVRO-251 > Project: Avro > Issue Type: New Feature > Components: java > Reporter: Doug Cutting > Assignee: Doug Cutting > Fix For: 1.3.0 > > Attachments: AVRO-251.patch, AVRO-251.patch > > > A schema for schemas would permits schemas to be written in binary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.