Hello!  I have a specific opinion about the "Robustness Principle",
especially in this case!

"Accepting liberally, generating strictly" (the paraphrasing of
Postel's idea) has it's place, and might be a good principle for
binary encoding and decoding.  It's not so great for "accepting and
generating schemas".  In this case, it's led directly to this debate:
accepting "invalid" names has become one facto standard for a
_certain_ category of users, who are now blocked from interoperating
with other language SDKs (and potentially future versions of their own
SDK, if "fixed").

> If we can use a "non rigorous validation" and
> it can run wthout bugs, why switch to a rigorous validation mode that would
> follow current specification and not change the specification to "accept
> schemas as liberally as possible" (meaning, while it doesn't generate bugs).

Here's where I think the logic is faulty: even if we don't count
interoperability failures as a bug, I'm not convinced that using names
outside the specification run without bugs!  There's several things to
think about: internal aspects like unicode normalisation, internal
features like schema evolution (which might actually be OK), but
especially external ones like downstream projects and tools.

As it is, if you follow the specification, Java and Python are
interoperable and there's a certain guarantee that existing libraries
and projects can count on.

The configuration approach is one that would allow upstream projects
to continue working with out-of-spec names, while alerting them that
these could cause interoperability problems outside of their current
cases!  One thing for certain, the specification should allow invalid
names for "aliases" so that users can migrate away from these issues.

A slightly related resource: https://langsec.org/occupy/

All my best, Ryan


On Tue, Nov 15, 2022 at 4:41 PM Christophe Le Saëc <[email protected]> wrote:
>
> Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed shows
> that name should not contains space (pb when generate java code) nor dot
> (pb to separate names in a path).
>
> AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last comment
> reinforce rules for dot and contains a nice principle : "accept schemas as
> liberally as possible"
>
>
> ** allowing two language SDKs to implement the spec differently will make
> users unhappy about cross-platform, cross-language compatibility.*
> -> Indeed, that's the case with current version, where Java and C# accept
> accents when C and Rust strictly follow the spec.
>
> Others possibilities :
> - *putting human-readable or internationalised names in other metadata
> properties* : Yes, this can already be done on record fields for example as
> field is a JsonProperties class (and we use it already in some case, that's
> help).
> - *using configuration / environment / system properties to turn rigorous
> spec validation on and off* : If we can use a "non rigorous validation" and
> it can run wthout bugs, why switch to a rigorous validation mode that would
> follow current specification and not change the specification to "accept
> schemas as liberally as possible" (meaning, while it doesn't generate bugs).
>
>
>
> *My preference would be to *tighten* the SDKs to match the existing Avro
> spec, and provide language-specific ways to easily disable validating names
> if desired*
> Personnally, i like the idea to have mandatory name control you can't
> deactivate, to ensure it won't generate bug (For Java code generation
> mainly and to be able to separate name and namespace), but control
> specification should be limited to ban names that would generate a bug (and
> not a rule that seems to have no real reason, until it would be explain in
> doc).
>
> Best regards,
> Christophe.
>
>
>
> Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <[email protected]> a écrit :
>
> > Hello!  Here's a couple of related JIRA from the past that we can use
> > to inform our discussion:
> >
> > * AVRO-2659 demonstrates a pretty disastrous schema name that the Java
> > SDK accepts.
> > * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about
> > accepting UTF-8 that didn't quite get enough follow-up to make it into
> > the spec!
> >
> > We're been in the current (unsatisfactory) state for so long because:
> >
> > * making a change to an SDK changing its behaviour (even to "fix it")
> > will make users unhappy about backwards/forwards version
> > compatibility, and
> > * allowing two language SDKs to implement the spec differently will
> > make users unhappy about cross-platform, cross-language compatibility.
> >
> > In my opinion, with modern streaming and event processing, we have to
> > take the latter into account!
> >
> > There were a couple of other options than the two you propose in the
> > original discussion thread (such as putting human-readable or
> > internationalised names in other metadata properties, or using
> > configuration / environment / system properties to turn rigorous spec
> > validation on and off).  Have you given them any consideration for
> > your use case?
> >
> > My preference would be to *tighten* the SDKs to match the existing
> > Avro spec, and provide language-specific ways to easily disable
> > validating names if desired.  There's some precedence for this in the
> > Schema.Parser#validate method.
> >
> > There's a bit more going on here that's worth doing right for the future!
> >
> > All my best, Ryan
> >
> >
> >
> > On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind
> > <[email protected]> wrote:
> > >
> > > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <[email protected]>
> > wrote:
> > >
> > > > So, discussion is to choose between
> > > >
> > > >    1. "change the documentation" (and adapt module as proposed in this
> > PR
> > > >    for RUST <https://github.com/apache/avro/pull/1787> and this other
> > for
> > > > C
> > > >    <https://github.com/apache/avro/pull/1798>)
> > > >    2. change the code (in Java and C# at least) to be conformed to
> > > >    documentation.
> > > >
> > >
> > > For compatibility, I like option 1. If we're to change naming rules, I'd
> > > vote for logging warnings before tightening the rules.
> > >
> > > Kind regards,
> > > Oscar
> > >
> > > --
> > >
> > > ✉️ Oscar Westra van Holthe - Kind <[email protected]>
> >

Reply via email to