[
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206546#comment-13206546
]
Scott Carey commented on AVRO-1022:
-----------------------------------
I see the wisdom in restricting names to be a simple set of ASCII characters.
Until just a few minutes ago the arguments above were convincing me that the
[A-Za-z_][A-Za-z0-9_]+
name format was a very useful simplification.
But now I think names should be almost entirely open. Defining "isLetter() or
isDigit()" is problematic as pointed out above. So don't even bother with
that. How about defining it only with respect to ASCII. The naming rule in
the spec would apply to ASCII only, all other code points are allowed. Unlike
some notion of isLetter(), this does not imply c or c++ needs a big library
like ICU. All implementations must already support UTF-8 in order to support
JSON. Languages can define internally how they map messy names to variables,
types, or enum symbols.
If AVRO restricts valid names, then it won't be able to convert schemas from
other systems into avro schemas.
For example, how does this relate to
https://issues.apache.org/jira/browse/PIG-1339
?
If names are restricted, then consuming schemas from other systems will be
difficult. Fewer restrictions in Avro make it more compatible and capable.
If there are stringent naming rules in the spec, it would be wise to
standardize name mangling from external sources into Avro in the spec.
So I see two options that make sense:
* Enforce the restriction in the current spec, add flexibility for reading
schemas that do not comply (that may have already been persisted into permanent
storage), and add to the spec standardized name mangling for translating
schemas from other systems to Avro and back.
* Open up the spec for naming to be significantly more flexible. At minimum
also allow all code points above 127. Consider opening up even more characters
in ASCII as valid names.
There are two kinds of mangling to consider.
* "External system" to and from Avro. For example, a valid name in an external
system might start with a number. If translated into Avro and Avro does not
allow this, it would be very useful if all languages could look at the
resulting name and convert it back if required. This should be standardized
across Avro. The fewer restrictions in Avro, the easier this translation
process is.
* Avro to and from language identifiers in an implementation. This is a
different issue that is language local. Because it is language local and up to
the Avro implementation, this is less of a concern to me than translation from
external schema sources. Most languages don't allow a newline in an
identifier, but should Avro disallow that? Language implementations need to be
prepared to mangle disallowed characters and strings regardless of what Avro
specifies.
> Error in validate name
> ----------------------
>
> Key: AVRO-1022
> URL: https://issues.apache.org/jira/browse/AVRO-1022
> Project: Avro
> Issue Type: Bug
> Components: java
> Reporter: Raymie Stata
> Priority: Minor
> Attachments: AVRO-1022.patch, AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira