[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203230#comment-13203230
 ] 

Scott Carey commented on AVRO-1006:
-----------------------------------

{quote}
    The parser assumes that names are defined before they are used, although 
this is not required by the specification. We recommend that the spec be 
changed to agree with the impl (i.e., require that names are defined before 
they are used).
{quote} 
I think we can fix the parser.  I have been thinking about how to implement 
Schema/Field/Protocol as immutable data structures and a requirement for that 
is to prevent cyclic references in the Schema objects, which requires storing 
name references and a name based schema registry -- the same tools needed for 
such a parser.

{quote}
    When a schema name is redefined, the parser throws an exception, even if 
the two definitions of the name are the same. This is contrary to the spec, 
which says "A schema may only contain multiple definitions of a fullname if the 
definitions are equivalent." We recommend that the spec be changed to agree 
with the implementation (i.e., disallow multiple definitions of the same name, 
even if the def's are the same).
{quote}

I think this is a decent spec change, especiallyl since "if the definitions are 
equivalent" is insufficiently defined currently -- equivalent at what level?  
Including all metadata, even 'doc'?  It is probably best if a single schema 
definition does not re-define any named schema elements.

{quote}
    The parser calls validateName on the symbols of an enumeration, restricting 
the syntax of enumeration symbols. The spec does not call for such a 
restriction. We recommend that the spec be changed to conform to the 
implementation (i.e., restrict symbols the same way we restrict names). This 
helps in cannonicalization (don't have to worry about Unicode normalization).
{quote}

I wonder how various language implementations deal with this currently, it 
would not surprise me if more than Java already have an implicit restriction 
beyond the spec.  We should change the spec to restrict symbols to the same 
restriction as field names.

{quote}
    Schema.validateName uses Character.isLetter and Character.isLetterOrDigit 
to test characters. These accept all Unicode characters (except supplemental 
ones). The Avro spec says that names should be restricted to ASCII letters. We 
think this is an implementation bug and should be fixed. (Again, nice to avoid 
Unicode normalization.)
{quote}

I agree.

>From the spec:
{quote}
h3. Names
Record, enums and fixed are named types. Each has a fullname that is composed 
of two parts; a name and a namespace. Equality of names is defined on the 
fullname.

The name portion of a fullname, record field names, and enum symbols must:
* start with [A-Za-z_]
* subsequently contain only [A-Za-z0-9_]

A namespace is a dot-separated sequence of such names.
{quote}

{quote}
    When the parser descends into a named schema, the default namespace in the 
names variable is stored into the local variable savedSpace, which is restored 
on exit. However, if the routine exits abruptly (with an exception), this 
restoration does not occur. This is probably a bug, and restoration should be 
in a finally. (In Parser.parse, the flag validateNames is restored in a finally 
clause.)
{quote}

Sounds like a bug, a patch containing a reproducible test case that fails and a 
fix for another ticket would be great!

                
> Fingerprints for Avro Schemas
> -----------------------------
>
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: schema-fingerprinting.html, schema-fingerprinting.html, 
> schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.  
> Fingerprints are designed such that the chances of collisions is very, very 
> low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to