[jira] Commented: (AVRO-251) add schema for schemas

Philip Zeyliger (JIRA) Mon, 14 Dec 2009 11:15:44 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790298#action_12790298
 ]


Philip Zeyliger commented on AVRO-251:
--------------------------------------

Does the serialization and deserialization to binary schemas belong in 
Schema.java or does it belong in a nearby class?  I think the usecase for it (I 
know you have one in mind, and we're hinting at it in this JIRA) ought to be 
spelled out in the JavaDoc for the appropriate methods.

bq. Note that this currently does not preserve every nuance, e.g., user 
properties. So my vote is to remove default values as well.

If you're not preserving user properties, I'm +1 for killing the defaults.  
This leaves us in a place where we have representations of schemas that, 
without other representations, we can't read data with.  (The way I think of 
it, we always need two schemas: the schema the data was written with, and the 
schema the data is being read with.  We can use the binary version for the 
former, but not the latter.  Is that right?  Do we have names for these two 
schemas?)

If you were inclined towards keeping the defaults, I would keep pushing for 
storing them as avro-encoded binary bytes.

bq. It's nice to see how little code is required to incorporate full JSON data 
into Avro.

Yes, that JSON itself has a small schema is re-assuring.  I'm +1 for taking 
this out of this patch, but separately producing a tool to represent "binary 
JSON" in Avro.

Just to be sure we've thought of it, one alternative is to ditch the whole 
binary representation and store the original schema in Avro-encoded binary 
JSON.  I actually prefer schemas to be typed.

bq. This "event based" programming style requires only a bit more coding than 
wrapper classes, but saves a level of redirection and/or copies.

I appreciate that with ValidatingEncoder we get a sense of security.  But I 
have a hard time buying the performance argument here.  I think you would agree 
that using either the specific (my preference) API or the generic API would be 
clearer from a code perspective.  If the performance of the specific API is 
crap, then we need to measure it and fix it: after all, that is the API Avro 
recommends people to use.  Considering that set of schemas in a program should 
have small cardinality, and the binary representation could be cached, speed 
doesn't seem paramount here.

I agree that event-based models are very useful for things that, say, don't fit 
into memory readily.  Schemas pretty much have to fit into memory readily, so I 
don't think the case applies here.
  

> add schema for schemas
> ----------------------
>
>                 Key: AVRO-251
>                 URL: https://issues.apache.org/jira/browse/AVRO-251
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>         Attachments: AVRO-251.patch, AVRO-251.patch
>
>
> A schema for schemas would permits schemas to be written in binary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-251) add schema for schemas

Reply via email to