[jira] [Updated] (AVRO-3512) aliases to the null namespace do not work as expected

Radai Rosenblatt (Jira) Sat, 07 May 2022 07:10:00 -0700


     [ 
https://issues.apache.org/jira/browse/AVRO-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Radai Rosenblatt updated AVRO-3512:
-----------------------------------
    Description: 
the avro spec allows for the "null namespace" (when no namespace is specified 
anywhere). it also has [the 
following|https://avro.apache.org/docs/current/spec.html#Aliases] to say about 
aliases:
{quote}if a type named "a.b" has aliases of "c" and "x.y", then the fully 
qualified names of its aliases are "a.c" and "x.y"
{quote}
which means a "simple" alias ("c" above) inherits any namespace defined on the 
declaring type.

 

now suppose i was to use aliases on a namespaced schema to be able to read data 
written using a schema that is in the null namespace (has no namespace).

here are my writer schema:

{quote}

{
  "type": "record",
  "name": "AncientSchema",
  "fields": [
    {
      "name" : "enumField",
      "type" :

{         "type" : "enum",         "name" : "AncientEnum",         "symbols" : 
[ "THE", "SPEC", "IS", "A", "LIE" ]       }

    }
  ]
}

{quote}

and reader schema:

{quote}

{
  "type": "record",
  "namespace": "much.namespace",
  "name": "ModernRecord",
  "fields": [
    {
      "name" : "enumField",
      "type" :

{         "type" : "enum",         "name" : "ModernEnum",         "symbols" : [ 
"THE", "SPEC", "IS", "A", "LIE" ],         "aliases": [           
".AncientEnum"         ]       }

    }
  ],
  "aliases": [
    ".AncientSchema"
  ]
}

{quote}

notice the dots used in the aliases. as far as i understand the spec this 
should be the only legal way to do this. and it does indeed work .... to a 
point.

 

when testing this i found multiple issues with avro's handling of such aliases, 
dating back to late avro 1.7.*

 
 # without these aliases, decoding does fail, but it fails over the nested 
enum, whereas it should have failed "immediately" on the fullname mismatch on 
the top level record schema. in fact, on further testing i think avro (at least 
in java) doesnt bother comparing the fullnames on the top level writer vs 
reader schemas at all?
 # while the schema with the aliases parse()es fine, Schema.toString() strips 
out the dots from the aliases, thereby creating a "monsanto terminator schema" 
- once printed and parsed again the aliases would become "simple aliases" and 
stop working
 # the spec doesnt explicitly talk about how to use aliases to "target" the 
null namespace. if this is an intentional specification I think the spec should 
be expanded a little to cover it?

 

i have code to reproduce all these issues in 
[https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java]
 (coded against master)

 

i also have code to reproduce all the above against multiple older avro 
versions in 
[https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java]

  was:
the avro spec allows for the "null namespace" (when no namespace is specified 
anywhere). it also has [the 
following|https://avro.apache.org/docs/current/spec.html#Aliases] to say about 
aliases:

{quote}

if a type named "a.b" has aliases of "c" and "x.y", then the fully qualified 
names of its aliases are "a.c" and "x.y"

{quote}

which means a "simple" alias ("c" above) inherits any namespace defined on the 
declaring type.

 

now suppose i was to use aliases on a namespaced schema to be able to read data 
written using a schema that is in the null namespace (has no namespace).

here are my writer schema:

```

{
  "type": "record",
  "name": "AncientSchema",
  "fields": [
    {
      "name" : "enumField",
      "type" :

{         "type" : "enum",         "name" : "AncientEnum",         "symbols" : 
[ "THE", "SPEC", "IS", "A", "LIE" ]       }

    }
  ]
}

```

and reader schema:

```

{
  "type": "record",
  "namespace": "much.namespace",
  "name": "ModernRecord",
  "fields": [
    {
      "name" : "enumField",
      "type" :

{         "type" : "enum",         "name" : "ModernEnum",         "symbols" : [ 
"THE", "SPEC", "IS", "A", "LIE" ],         "aliases": [           
".AncientEnum"         ]       }

    }
  ],
  "aliases": [
    ".AncientSchema"
  ]
}

```

notice the dots used in the aliases. as far as i understand the spec this 
should be the only legal way to do this. and it does indeed work .... to a 
point.

 

when testing this i found multiple issues with avro's handling of such aliases, 
dating back to late avro 1.7.*

 
 # without these aliases, decoding does fail, but it fails over the nested 
enum, whereas it should have failed "immediately" on the fullname mismatch on 
the top level record schema. in fact, on further testing i think avro (at least 
in java) doesnt bother comparing the fullnames on the top level writer vs 
reader schemas at all?
 # while the schema with the aliases parse()es fine, Schema.toString() strips 
out the dots from the aliases, thereby creating a "monsanto terminator schema" 
- once printed and parsed again the aliases would become "simple aliases" and 
stop working
 # the spec doesnt explicitly talk about how to use aliases to "target" the 
null namespace. if this is an intentional specification I think the spec should 
be expanded a little to cover it?

 

i have code to reproduce all these issues in 
[https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java]
 (coded against master)

 

i also have code to reproduce all the above against multiple older avro 
versions in 
[https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java]


> aliases to the null namespace do not work as expected
> -----------------------------------------------------
>
>                 Key: AVRO-3512
>                 URL: https://issues.apache.org/jira/browse/AVRO-3512
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java, spec
>    Affects Versions: 1.11.0
>            Reporter: Radai Rosenblatt
>            Priority: Major
>
> the avro spec allows for the "null namespace" (when no namespace is specified 
> anywhere). it also has [the 
> following|https://avro.apache.org/docs/current/spec.html#Aliases] to say 
> about aliases:
> {quote}if a type named "a.b" has aliases of "c" and "x.y", then the fully 
> qualified names of its aliases are "a.c" and "x.y"
> {quote}
> which means a "simple" alias ("c" above) inherits any namespace defined on 
> the declaring type.
>  
> now suppose i was to use aliases on a namespaced schema to be able to read 
> data written using a schema that is in the null namespace (has no namespace).
> here are my writer schema:
> {quote}
> {
>   "type": "record",
>   "name": "AncientSchema",
>   "fields": [
>     {
>       "name" : "enumField",
>       "type" :
> {         "type" : "enum",         "name" : "AncientEnum",         "symbols" 
> : [ "THE", "SPEC", "IS", "A", "LIE" ]       }
>     }
>   ]
> }
> {quote}
> and reader schema:
> {quote}
> {
>   "type": "record",
>   "namespace": "much.namespace",
>   "name": "ModernRecord",
>   "fields": [
>     {
>       "name" : "enumField",
>       "type" :
> {         "type" : "enum",         "name" : "ModernEnum",         "symbols" : 
> [ "THE", "SPEC", "IS", "A", "LIE" ],         "aliases": [           
> ".AncientEnum"         ]       }
>     }
>   ],
>   "aliases": [
>     ".AncientSchema"
>   ]
> }
> {quote}
> notice the dots used in the aliases. as far as i understand the spec this 
> should be the only legal way to do this. and it does indeed work .... to a 
> point.
>  
> when testing this i found multiple issues with avro's handling of such 
> aliases, dating back to late avro 1.7.*
>  
>  # without these aliases, decoding does fail, but it fails over the nested 
> enum, whereas it should have failed "immediately" on the fullname mismatch on 
> the top level record schema. in fact, on further testing i think avro (at 
> least in java) doesnt bother comparing the fullnames on the top level writer 
> vs reader schemas at all?
>  # while the schema with the aliases parse()es fine, Schema.toString() strips 
> out the dots from the aliases, thereby creating a "monsanto terminator 
> schema" - once printed and parsed again the aliases would become "simple 
> aliases" and stop working
>  # the spec doesnt explicitly talk about how to use aliases to "target" the 
> null namespace. if this is an intentional specification I think the spec 
> should be expanded a little to cover it?
>  
> i have code to reproduce all these issues in 
> [https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java]
>  (coded against master)
>  
> i also have code to reproduce all the above against multiple older avro 
> versions in 
> [https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (AVRO-3512) aliases to the null namespace do not work as expected

Reply via email to