[ 
https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794273#action_12794273
 ] 

Todd Lipcon commented on AVRO-248:
----------------------------------

I am strongly pro-naming. AVRO-266 (object reuse for deserializing unions) is 
another reason that having names for unions makes sense.

As for nullability, I agree that we definitely don't want to force type names 
on all nullable fields. Anonymous unions are one solution, but special-casing 
nullability in schemas doesn't seem entirely wrong to me either...

As for naming other types, is a typedef construct useful? This would solve the 
union-of-arrays issue as well as some others. To give a concrete example, 
imagine an MR job where we want to aggregate over both users and products. 
Users and products are both represented by their database IDs. I'd want to 
write:

{"type": "union", "branches": [{"name": "user_id", "type": "int"}, {"name": 
"product_id", "type": "int"}]}

or with typedefs:
{"type": "typedef", "name": "UserId", "is_type": "int"},
{"type": "typedef", "name": "ProductId", "is_type": "int"}
and then use ["UserId", "ProductId"] with some way to distinguish between the 
two.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were 
> named.  In particular:
>  - when code is generated for a union, a class could be generated that 
> includes an enum indicating which branch of the union is taken, e.g., a union 
> of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else 
> throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process 
> union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete 
> implementations can be represented as a union (AVRO-241).  However, if one 
> wishes to create an array one must know the name of the base class, which is 
> not represented in the Avro schema.  One approach would be to add an 
> annotation to the reflected array schema (AVRO-242) noting the base class.  
> But if the union itself were named, that could name the base class.  This 
> would also make reflected protocol interfaces more consise, since the base 
> class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to 
> model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we 
> should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an 
> incompatible change to data file formats, which may also force a major 
> release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to