[ 
https://issues.apache.org/jira/browse/AVRO-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Skraba updated AVRO-2647:
------------------------------
    Description: 
Currently the specification does not make the semantics clear for union types 
within complex types clear. In particular, the spec talks about union fields, 
but leaves the semantics for unions in other contexts unspecified.

Here's an example which is undefined according to the current specification:

{code:json}
{
    "type": "record",
    "name": "R",
    "fields": [
        {
            "name": "F",
            "type": {
                "type": "array",
                "items": [
                    {
                        "type": "enum",
                        "name": "E1",
                        "symbols": ["A", "B"]
                    },
                    {
                        "type": "enum",
                        "name": "E2",
                        "symbols": ["B", "A", "C"]
                    }
                ]
            },
            "default": ["A", "B", "C"]
        }
    ]
}
{code}

By experiment, most implementations seem to have chosen the semantics that are 
documented in this PR.

In Java, the schema above is parsed without error, but when attempting to use 
the default value, it fails with a NullPointerException (trying to find the 
symbol C in E1). (Thanks for Ryan Skraba for this).

In [gogen-avro|https://github.com/actgardner/gogen-avro] it generates invalid 
code because it's assuming E1 but generating the symbol for "C" anyway.

FWIW at some point in the future, I believe that it would be nice to align the 
default value specification with the JSON encoding for Avro so there aren't two 
subtly different JSON encodings of an Avro value.


  was:
Currently the specification does not make the semantics clear for union types 
within complex types clear. In particular, the spec talks about union fields, 
but leaves the semantics for unions in other contexts unspecified.

Here's an example which is undefined according to the current specification:

```
{
    "type": "record",
    "name": "R",
    "fields": [
        {
            "name": "F",
            "type": {
                "type": "array",
                "items": [
                    {
                        "type": "enum",
                        "name": "E1",
                        "symbols": ["A", "B"]
                    },
                    {
                        "type": "enum",
                        "name": "E2",
                        "symbols": ["B", "A", "C"]
                    }
                ]
            },
            "default": ["A", "B", "C"]
        }
    ]
}
```

By experiment, most implementations seem to have chosen the semantics that are 
documented in this PR.

In Java, the schema above is parsed without error, but when attempting to use 
the default value, it fails with a NullPointerException (trying to find the 
symbol C in E1). (Thanks for Ryan Skraba for this).

In [gogen-avro](https://github.com/actgardner/gogen-avro) it generates invalid 
code because it's assuming E1 but generating the symbol for "C" anyway.

FWIW at some point in the future, I believe that it would be nice to align the 
default value specification with the JSON encoding for Avro so there aren't two 
subtly different JSON encodings of an Avro value.



> specification does not fully specify semantics for unions in default types
> --------------------------------------------------------------------------
>
>                 Key: AVRO-2647
>                 URL: https://issues.apache.org/jira/browse/AVRO-2647
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: doc
>            Reporter: Roger
>            Priority: Minor
>
> Currently the specification does not make the semantics clear for union types 
> within complex types clear. In particular, the spec talks about union fields, 
> but leaves the semantics for unions in other contexts unspecified.
> Here's an example which is undefined according to the current specification:
> {code:json}
> {
>     "type": "record",
>     "name": "R",
>     "fields": [
>         {
>             "name": "F",
>             "type": {
>                 "type": "array",
>                 "items": [
>                     {
>                         "type": "enum",
>                         "name": "E1",
>                         "symbols": ["A", "B"]
>                     },
>                     {
>                         "type": "enum",
>                         "name": "E2",
>                         "symbols": ["B", "A", "C"]
>                     }
>                 ]
>             },
>             "default": ["A", "B", "C"]
>         }
>     ]
> }
> {code}
> By experiment, most implementations seem to have chosen the semantics that 
> are documented in this PR.
> In Java, the schema above is parsed without error, but when attempting to use 
> the default value, it fails with a NullPointerException (trying to find the 
> symbol C in E1). (Thanks for Ryan Skraba for this).
> In [gogen-avro|https://github.com/actgardner/gogen-avro] it generates invalid 
> code because it's assuming E1 but generating the symbol for "C" anyway.
> FWIW at some point in the future, I believe that it would be nice to align 
> the default value specification with the JSON encoding for Avro so there 
> aren't two subtly different JSON encodings of an Avro value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to