[
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203230#comment-13203230
]
Scott Carey commented on AVRO-1006:
-----------------------------------
{quote}
The parser assumes that names are defined before they are used, although
this is not required by the specification. We recommend that the spec be
changed to agree with the impl (i.e., require that names are defined before
they are used).
{quote}
I think we can fix the parser. I have been thinking about how to implement
Schema/Field/Protocol as immutable data structures and a requirement for that
is to prevent cyclic references in the Schema objects, which requires storing
name references and a name based schema registry -- the same tools needed for
such a parser.
{quote}
When a schema name is redefined, the parser throws an exception, even if
the two definitions of the name are the same. This is contrary to the spec,
which says "A schema may only contain multiple definitions of a fullname if the
definitions are equivalent." We recommend that the spec be changed to agree
with the implementation (i.e., disallow multiple definitions of the same name,
even if the def's are the same).
{quote}
I think this is a decent spec change, especiallyl since "if the definitions are
equivalent" is insufficiently defined currently -- equivalent at what level?
Including all metadata, even 'doc'? It is probably best if a single schema
definition does not re-define any named schema elements.
{quote}
The parser calls validateName on the symbols of an enumeration, restricting
the syntax of enumeration symbols. The spec does not call for such a
restriction. We recommend that the spec be changed to conform to the
implementation (i.e., restrict symbols the same way we restrict names). This
helps in cannonicalization (don't have to worry about Unicode normalization).
{quote}
I wonder how various language implementations deal with this currently, it
would not surprise me if more than Java already have an implicit restriction
beyond the spec. We should change the spec to restrict symbols to the same
restriction as field names.
{quote}
Schema.validateName uses Character.isLetter and Character.isLetterOrDigit
to test characters. These accept all Unicode characters (except supplemental
ones). The Avro spec says that names should be restricted to ASCII letters. We
think this is an implementation bug and should be fixed. (Again, nice to avoid
Unicode normalization.)
{quote}
I agree.
>From the spec:
{quote}
h3. Names
Record, enums and fixed are named types. Each has a fullname that is composed
of two parts; a name and a namespace. Equality of names is defined on the
fullname.
The name portion of a fullname, record field names, and enum symbols must:
* start with [A-Za-z_]
* subsequently contain only [A-Za-z0-9_]
A namespace is a dot-separated sequence of such names.
{quote}
{quote}
When the parser descends into a named schema, the default namespace in the
names variable is stored into the local variable savedSpace, which is restored
on exit. However, if the routine exits abruptly (with an exception), this
restoration does not occur. This is probably a bug, and restoration should be
in a finally. (In Parser.parse, the flag validateNames is restored in a finally
clause.)
{quote}
Sounds like a bug, a patch containing a reproducible test case that fails and a
fix for another ticket would be great!
> Fingerprints for Avro Schemas
> -----------------------------
>
> Key: AVRO-1006
> URL: https://issues.apache.org/jira/browse/AVRO-1006
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Labels: features
> Attachments: schema-fingerprinting.html, schema-fingerprinting.html,
> schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.
> Fingerprints are designed such that the chances of collisions is very, very
> low.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira