[ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794273#action_12794273 ]
Todd Lipcon commented on AVRO-248: ---------------------------------- I am strongly pro-naming. AVRO-266 (object reuse for deserializing unions) is another reason that having names for unions makes sense. As for nullability, I agree that we definitely don't want to force type names on all nullable fields. Anonymous unions are one solution, but special-casing nullability in schemas doesn't seem entirely wrong to me either... As for naming other types, is a typedef construct useful? This would solve the union-of-arrays issue as well as some others. To give a concrete example, imagine an MR job where we want to aggregate over both users and products. Users and products are both represented by their database IDs. I'd want to write: {"type": "union", "branches": [{"name": "user_id", "type": "int"}, {"name": "product_id", "type": "int"}]} or with typedefs: {"type": "typedef", "name": "UserId", "is_type": "int"}, {"type": "typedef", "name": "ProductId", "is_type": "int"} and then use ["UserId", "ProductId"] with some way to distinguish between the two. > make unions a named type > ------------------------ > > Key: AVRO-248 > URL: https://issues.apache.org/jira/browse/AVRO-248 > Project: Avro > Issue Type: New Feature > Components: spec > Reporter: Doug Cutting > Fix For: 1.3.0 > > > Unions are currently anonymous. However it might be convenient if they were > named. In particular: > - when code is generated for a union, a class could be generated that > includes an enum indicating which branch of the union is taken, e.g., a union > of string and int named Foo might cause a Java class like {code} > public class Foo { > public static enum Type {STRING, INT}; > private Type type; > private Object datum; > public Type getType(); > public String getString() { if (type==STRING) return (String)datum; else > throw ... } > public void setString(String s) { type = STRING; datum = s; } > .... > } > {code} Then Java applications can easily use a switch statement to process > union values rather than using instanceof. > - when using reflection, an abstract class with a set of concrete > implementations can be represented as a union (AVRO-241). However, if one > wishes to create an array one must know the name of the base class, which is > not represented in the Avro schema. One approach would be to add an > annotation to the reflected array schema (AVRO-242) noting the base class. > But if the union itself were named, that could name the base class. This > would also make reflected protocol interfaces more consise, since the base > class name could be used in parameters return types and fields. > - Generalizing the above: Avro lacks class inheritance, unions are a way to > model inheritance, and this model is more useful if the union is named. > This would be an incompatible change to schemas. If we go this way, we > should probably rename 1.3 to 2.0. Note that AVRO-160 proposes an > incompatible change to data file formats, which may also force a major > release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.