[jira] Commented: (AVRO-248) make unions a named type
[ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909605#action_12909605 ] Erik Frey commented on AVRO-248: Just to chime in with a concrete use case: it would be very useful if Avro allowed me pass around arbitrary-depth arrays of arrays. In Python I use this to represent n-ary trees. Right now I'm forced to do something like: { "type": "record", name: "NestedArray", "fields": {"name": "value", "type": ["string", "NestedArray"] } } But the following would be simpler, easier to conceptualize, and easier to integrate into my current stack: { "type": "array", name: "NestedArray", "items": ["string", "NestedArray"] } Would allow x[0][1] instead of x['value'][0]['value'][1] > make unions a named type > > > Key: AVRO-248 > URL: https://issues.apache.org/jira/browse/AVRO-248 > Project: Avro > Issue Type: New Feature > Components: spec >Reporter: Doug Cutting > > Unions are currently anonymous. However it might be convenient if they were > named. In particular: > - when code is generated for a union, a class could be generated that > includes an enum indicating which branch of the union is taken, e.g., a union > of string and int named Foo might cause a Java class like {code} > public class Foo { > public static enum Type {STRING, INT}; > private Type type; > private Object datum; > public Type getType(); > public String getString() { if (type==STRING) return (String)datum; else > throw ... } > public void setString(String s) { type = STRING; datum = s; } > > } > {code} Then Java applications can easily use a switch statement to process > union values rather than using instanceof. > - when using reflection, an abstract class with a set of concrete > implementations can be represented as a union (AVRO-241). However, if one > wishes to create an array one must know the name of the base class, which is > not represented in the Avro schema. One approach would be to add an > annotation to the reflected array schema (AVRO-242) noting the base class. > But if the union itself were named, that could name the base class. This > would also make reflected protocol interfaces more consise, since the base > class name could be used in parameters return types and fields. > - Generalizing the above: Avro lacks class inheritance, unions are a way to > model inheritance, and this model is more useful if the union is named. > This would be an incompatible change to schemas. If we go this way, we > should probably rename 1.3 to 2.0. Note that AVRO-160 proposes an > incompatible change to data file formats, which may also force a major > release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-248) make unions a named type
[ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909847#action_12909847 ] Erik Frey commented on AVRO-248: That would certainly work. The branches would be string and array of the NestedArray union. Then I'd have to wrap the union in another array so that the top level cannot resolve to a string alone. Much to the chagrin of my team-mates, I ended up implementing my requirement of this feature in the schema like this: { "type": "array", "items": ["string", { "type": "array", "items": ["string", { "type": "array", "items": ["string", { "type": "array", "items": ["string", { "type": "array", "items": ["string", { "type": "array", "items": ["string", { "type": "array", "items": ["string", { "type": "array", "items": ["string", { "type": "array", "items": ["string" ]}]}]}]}]}]}]}]}]}]} > make unions a named type > > > Key: AVRO-248 > URL: https://issues.apache.org/jira/browse/AVRO-248 > Project: Avro > Issue Type: New Feature > Components: spec >Reporter: Doug Cutting > > Unions are currently anonymous. However it might be convenient if they were > named. In particular: > - when code is generated for a union, a class could be generated that > includes an enum indicating which branch of the union is taken, e.g., a union > of string and int named Foo might cause a Java class like {code} > public class Foo { > public static enum Type {STRING, INT}; > private Type type; > private Object datum; > public Type getType(); > public String getString() { if (type==STRING) return (String)datum; else > throw ... } > public void setString(String s) { type = STRING; datum = s; } > > } > {code} Then Java applications can easily use a switch statement to process > union values rather than using instanceof. > - when using reflection, an abstract class with a set of concrete > implementations can be represented as a union (AVRO-241). However, if one > wishes to create an array one must know the name of the base class, which is > not represented in the Avro schema. One approach would be to add an > annotation to the reflected array schema (AVRO-242) noting the base class. > But if the union itself were named, that could name the base class. This > would also make reflected protocol interfaces more consise, since the base > class name could be used in parameters return types and fields. > - Generalizing the above: Avro lacks class inheritance, unions are a way to > model inheritance, and this model is more useful if the union is named. > This would be an incompatible change to schemas. If we go this way, we > should probably rename 1.3 to 2.0. Note that AVRO-160 proposes an > incompatible change to data file formats, which may also force a major > release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-248) make unions a named type
[ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909969#action_12909969 ] Jeff Hammerbacher commented on AVRO-248: bq. Erik, your example schema above has arrays as a named type. If unions were named, this might look like: {"type":"union", "name":"NestedArray", "branches":["string", "NestedArray"]} That's my fault. I pointed Erik to this ticket, as it was the only place I could recall discussing making all types named (see Todd's comment above). In any case, it's good to have an explicit use case for this feature. > make unions a named type > > > Key: AVRO-248 > URL: https://issues.apache.org/jira/browse/AVRO-248 > Project: Avro > Issue Type: New Feature > Components: spec >Reporter: Doug Cutting > > Unions are currently anonymous. However it might be convenient if they were > named. In particular: > - when code is generated for a union, a class could be generated that > includes an enum indicating which branch of the union is taken, e.g., a union > of string and int named Foo might cause a Java class like {code} > public class Foo { > public static enum Type {STRING, INT}; > private Type type; > private Object datum; > public Type getType(); > public String getString() { if (type==STRING) return (String)datum; else > throw ... } > public void setString(String s) { type = STRING; datum = s; } > > } > {code} Then Java applications can easily use a switch statement to process > union values rather than using instanceof. > - when using reflection, an abstract class with a set of concrete > implementations can be represented as a union (AVRO-241). However, if one > wishes to create an array one must know the name of the base class, which is > not represented in the Avro schema. One approach would be to add an > annotation to the reflected array schema (AVRO-242) noting the base class. > But if the union itself were named, that could name the base class. This > would also make reflected protocol interfaces more consise, since the base > class name could be used in parameters return types and fields. > - Generalizing the above: Avro lacks class inheritance, unions are a way to > model inheritance, and this model is more useful if the union is named. > This would be an incompatible change to schemas. If we go this way, we > should probably rename 1.3 to 2.0. Note that AVRO-160 proposes an > incompatible change to data file formats, which may also force a major > release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] [Commented] (AVRO-248) make unions a named type
[ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089535#comment-17089535 ] Elliot West commented on AVRO-248: -- I see this issue is quite old, but I am wondering if there would be any interest in adding this to the specification and implementing it? Specifically, I'm thinking about this kind of construct as described previously by [~cutting]: {code:java} { "type": "union", "name": "Foo", "branches": [ "string", "Bar", ... ] }{code} The reason I ask is that I believe that there are new use-cases that could greatly benefit from this feature, specifically those that currently require [multi-typed streams in Kafka|https://www.confluent.io/blog/put-several-event-types-kafka-topic/] or indeed any streaming platform. There is already [an alternative implementation for this|https://github.com/confluentinc/schema-registry/pull/680#issuecomment-511796090] for this functionality, but this sits outside of Avro and in my opinion a sub-optimal work-around with [a number of significant issues|https://github.com/confluentinc/schema-registry/pull/680#issuecomment-511796090]. I would suggest that by implementing this feature in Avro, we can fully satisfy multi-typed stream use-cases in a clean, simple, and elegant manner, without needing to build out external implementations that attempt to work around this absent Avro feature. > make unions a named type > > > Key: AVRO-248 > URL: https://issues.apache.org/jira/browse/AVRO-248 > Project: Apache Avro > Issue Type: New Feature > Components: spec >Reporter: Doug Cutting >Priority: Major > > Unions are currently anonymous. However it might be convenient if they were > named. In particular: > - when code is generated for a union, a class could be generated that > includes an enum indicating which branch of the union is taken, e.g., a union > of string and int named Foo might cause a Java class like {code} > public class Foo { > public static enum Type {STRING, INT}; > private Type type; > private Object datum; > public Type getType(); > public String getString() { if (type==STRING) return (String)datum; else > throw ... } > public void setString(String s) { type = STRING; datum = s; } > > } > {code} Then Java applications can easily use a switch statement to process > union values rather than using instanceof. > - when using reflection, an abstract class with a set of concrete > implementations can be represented as a union (AVRO-241). However, if one > wishes to create an array one must know the name of the base class, which is > not represented in the Avro schema. One approach would be to add an > annotation to the reflected array schema (AVRO-242) noting the base class. > But if the union itself were named, that could name the base class. This > would also make reflected protocol interfaces more consise, since the base > class name could be used in parameters return types and fields. > - Generalizing the above: Avro lacks class inheritance, unions are a way to > model inheritance, and this model is more useful if the union is named. > This would be an incompatible change to schemas. If we go this way, we > should probably rename 1.3 to 2.0. Note that AVRO-160 proposes an > incompatible change to data file formats, which may also force a major > release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-248) make unions a named type
[ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089555#comment-17089555 ] Andy Le commented on AVRO-248: -- [~teabot] I think it's a good time to have RFCs for Avro 2.x > make unions a named type > > > Key: AVRO-248 > URL: https://issues.apache.org/jira/browse/AVRO-248 > Project: Apache Avro > Issue Type: New Feature > Components: spec >Reporter: Doug Cutting >Priority: Major > > Unions are currently anonymous. However it might be convenient if they were > named. In particular: > - when code is generated for a union, a class could be generated that > includes an enum indicating which branch of the union is taken, e.g., a union > of string and int named Foo might cause a Java class like {code} > public class Foo { > public static enum Type {STRING, INT}; > private Type type; > private Object datum; > public Type getType(); > public String getString() { if (type==STRING) return (String)datum; else > throw ... } > public void setString(String s) { type = STRING; datum = s; } > > } > {code} Then Java applications can easily use a switch statement to process > union values rather than using instanceof. > - when using reflection, an abstract class with a set of concrete > implementations can be represented as a union (AVRO-241). However, if one > wishes to create an array one must know the name of the base class, which is > not represented in the Avro schema. One approach would be to add an > annotation to the reflected array schema (AVRO-242) noting the base class. > But if the union itself were named, that could name the base class. This > would also make reflected protocol interfaces more consise, since the base > class name could be used in parameters return types and fields. > - Generalizing the above: Avro lacks class inheritance, unions are a way to > model inheritance, and this model is more useful if the union is named. > This would be an incompatible change to schemas. If we go this way, we > should probably rename 1.3 to 2.0. Note that AVRO-160 proposes an > incompatible change to data file formats, which may also force a major > release. -- This message was sent by Atlassian Jira (v8.3.4#803005)