[jira] Commented: (AVRO-249) in reflection, represent Java short as fixed data

Scott Carey (JIRA) Fri, 04 Dec 2009 14:05:45 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786194#action_12786194
 ]


Scott Carey commented on AVRO-249:
----------------------------------

{quote}It also seems pretty arbitrary that integers, longs, and floats are 
represented with zigzag varint encoding, but shorts are always two bytes.{quote}
Floats aren't encoded varint, are they?  I can't see the advantage there, the 
high bits will be set too frequently.

{quote}At this point I am not concerned about performance here.{quote}
Thats fair.

{quote}It wouldn't surprise me if Avro evolved to have int8, int16, int32, 
int64 and fixed8, fixed,16, fixed32, fixed64 "types"{quote}
Maybe, but if this happens, it sounds like Avro 2.0 not Avro 1.x.  
Also,, there would be no benefit to varint of one byte, and for the int16 case 
there may be very little or no benefit.  Its easy to speculate that a int32 is 
very often less than 2^20 in size.  Its hard to speculate that shorts are 
mostly less than 2^6 and not frequently more than 2^13.


{quote}Not sure what you mean by mix-ins, but, yes, you could annotate the 
field in the class whose schema is being induced.{quote}
Basically if you don't want, or can't change class A, you can write MixIn class 
B that has annotations that "target" the methods and members of class A.  See:
http://wiki.fasterxml.com/JacksonMixInAnnotations
The goal, is to allow annotating a class you can't change the source code for.

Ok, if we're talking about the long term Reflect API, I will add this:

I have been starting to dig in to using Avro myself, and thinking about schema 
evolution.  I don't particularly like the Specific API and its code generation, 
I'd generally rather direct a schema at my own classes for most use cases.  I 
don't want to use Reflection either, with its restrictions and performance 
pitfalls (my requirements differ from those Doug is working on for Hadoop RPC 
significantly).

I think that these two APIs can be combined in one annotations based API.  
Sure, we can still have code generation from avro schemas with basic defaults 
to create classes, but that step can be optional, even for inter-language use 
cases.

Imagine something like this.   
You have a pre-existing class, Stuff, and you want to define how it is 
serialized.  You make an Avro schema for it, to share with other 
languages/tools.   Now, you want to map the two together.  Using Specific, you 
have to write wrapper code to read the Avro generated class into your current 
class (that has a little bit of logic in it, maybe a custom hashCode() and 
equals(), a few other constructors for test cases , and some setters and 
getters that aren't just "return foo" and "this.foo = foo".  If this class is 
an already long lived class with lots of unit tests, there aren't a lot of nice 
ways to do this without refactoring more than just the class.  More 
importantly, if you have 40 or so such classes --- :(
Reflect can somewhat get around this, but then if you want to share the data 
with other languages and tools you've just exposed your Java implementation of 
your object to the world... I'd rather not have a schema change just because I 
changed some internal data structure already encapsulated with getters/setters.

Ideally, I would like to just annotate the class with something that says "this 
is serializeable with Avro with an avro type named org.something.X".
Then map the getters/setters or the fields to avro fields, and build any custom 
logic there if needed to deal with versions.  Being able to map to a 
constructor would be cool too (like Jackson), but less important at the start.
We could even set it up to map projected schemas --  "this class can be 
serialized as  org.something.X, or the projection  org.something.minimalX if 
method 'isMinimal()' returns true""
This same mapping can be done with an annotation MixIn if the class can't be 
modified at this time.
Now, when decoding anything where an avro tye of X is encountered, it just 
builds the object as instructed by the annotations.  Of course, this can all be 
optimized early on at class loading time rather than with runtime reflection 
with something like ASM.

It may even be possible to just 'borrow' Jackson's annotations entirely, and be 
nearly or completely compatible with those.  

The reason why I say that a 'complete' annotation style API can replace both 
reflection and specific, is that the rules for specific can be one set of 
defaults -- what to construct when a type does not map to a known class, and 
the reflection default rules the other (how to serialize when a class is not 
annotated).  The need to generate classes at compile time might go away (It can 
be defined when first encountered with ASM).  The default behavior for both 
cases can be defined as some sort of Mix-In default:   When reflecting, if you 
find a short, serialize it as an avro fixed 2 byte quantity.  When generating 
an object from a type that is not declared, create 
org.apache.commons.MutableInteger for avro ints.

I had intended to create a ticket for something like the above after learning 
more and exploring -- I should have more time over the next couple months to do 
more than observe and comment.

> in reflection, represent Java short as fixed data
> -------------------------------------------------
>
>                 Key: AVRO-249
>                 URL: https://issues.apache.org/jira/browse/AVRO-249
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>         Attachments: AVRO-249.patch, AVRO-249.patch, AVRO-249.patch
>
>
> Currently the Java reflect API treats shorts as ints.  This fails however to 
> naturally handle shorts in arrays and other containers.  It would be better 
> if we used a distinct type for reflected Java short values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-249) in reflection, represent Java short as fixed data

Reply via email to