Another point, could we mimic this java rules
<https://docs.oracle.com/javase/tutorial/java/nutsandbolts/variables.html>
? (excluding 'dot' and 'space' from names)

Le ven. 25 nov. 2022 à 17:28, Christophe Le Saëc <[email protected]> a
écrit :

> Thanks for sharing the url https://langsec.org/occupy/, very interesting
> site.
>
> * internal aspects like unicode normalisation* => Yes, may be main
> argument to keep specification like this and fix validation code. Does
> someone knows how java compiler validates names (*for method, variables
> ...*), as it accepts unicode name ? could it be a solution for Avro ?
>
> This 2 arguments would also be valuables for properties (names & values
> when String) contained in JsonProperties class which is parent of Schema &
> Field classes ?
>
> So, why let one without control (*properties name and value*) and second
> with restrictive control (*field name*) ?
>
> Best regards,
> Christophe.
>
> Le mer. 23 nov. 2022 à 20:20, Ryan Skraba <[email protected]> a écrit :
>
>> Hello!  I have a specific opinion about the "Robustness Principle",
>> especially in this case!
>>
>> "Accepting liberally, generating strictly" (the paraphrasing of
>> Postel's idea) has it's place, and might be a good principle for
>> binary encoding and decoding.  It's not so great for "accepting and
>> generating schemas".  In this case, it's led directly to this debate:
>> accepting "invalid" names has become one facto standard for a
>> _certain_ category of users, who are now blocked from interoperating
>> with other language SDKs (and potentially future versions of their own
>> SDK, if "fixed").
>>
>> > If we can use a "non rigorous validation" and
>> > it can run wthout bugs, why switch to a rigorous validation mode that
>> would
>> > follow current specification and not change the specification to "accept
>> > schemas as liberally as possible" (meaning, while it doesn't generate
>> bugs).
>>
>> Here's where I think the logic is faulty: even if we don't count
>> interoperability failures as a bug, I'm not convinced that using names
>> outside the specification run without bugs!  There's several things to
>> think about: internal aspects like unicode normalisation, internal
>> features like schema evolution (which might actually be OK), but
>> especially external ones like downstream projects and tools.
>>
>> As it is, if you follow the specification, Java and Python are
>> interoperable and there's a certain guarantee that existing libraries
>> and projects can count on.
>>
>> The configuration approach is one that would allow upstream projects
>> to continue working with out-of-spec names, while alerting them that
>> these could cause interoperability problems outside of their current
>> cases!  One thing for certain, the specification should allow invalid
>> names for "aliases" so that users can migrate away from these issues.
>>
>> A slightly related resource: https://langsec.org/occupy/
>>
>> All my best, Ryan
>>
>>
>> On Tue, Nov 15, 2022 at 4:41 PM Christophe Le Saëc <[email protected]>
>> wrote:
>> >
>> > Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed
>> shows
>> > that name should not contains space (pb when generate java code) nor dot
>> > (pb to separate names in a path).
>> >
>> > AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last
>> comment
>> > reinforce rules for dot and contains a nice principle : "accept schemas
>> as
>> > liberally as possible"
>> >
>> >
>> > ** allowing two language SDKs to implement the spec differently will
>> make
>> > users unhappy about cross-platform, cross-language compatibility.*
>> > -> Indeed, that's the case with current version, where Java and C#
>> accept
>> > accents when C and Rust strictly follow the spec.
>> >
>> > Others possibilities :
>> > - *putting human-readable or internationalised names in other metadata
>> > properties* : Yes, this can already be done on record fields for
>> example as
>> > field is a JsonProperties class (and we use it already in some case,
>> that's
>> > help).
>> > - *using configuration / environment / system properties to turn
>> rigorous
>> > spec validation on and off* : If we can use a "non rigorous validation"
>> and
>> > it can run wthout bugs, why switch to a rigorous validation mode that
>> would
>> > follow current specification and not change the specification to "accept
>> > schemas as liberally as possible" (meaning, while it doesn't generate
>> bugs).
>> >
>> >
>> >
>> > *My preference would be to *tighten* the SDKs to match the existing Avro
>> > spec, and provide language-specific ways to easily disable validating
>> names
>> > if desired*
>> > Personnally, i like the idea to have mandatory name control you can't
>> > deactivate, to ensure it won't generate bug (For Java code generation
>> > mainly and to be able to separate name and namespace), but control
>> > specification should be limited to ban names that would generate a bug
>> (and
>> > not a rule that seems to have no real reason, until it would be explain
>> in
>> > doc).
>> >
>> > Best regards,
>> > Christophe.
>> >
>> >
>> >
>> > Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <[email protected]> a écrit :
>> >
>> > > Hello!  Here's a couple of related JIRA from the past that we can use
>> > > to inform our discussion:
>> > >
>> > > * AVRO-2659 demonstrates a pretty disastrous schema name that the Java
>> > > SDK accepts.
>> > > * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about
>> > > accepting UTF-8 that didn't quite get enough follow-up to make it into
>> > > the spec!
>> > >
>> > > We're been in the current (unsatisfactory) state for so long because:
>> > >
>> > > * making a change to an SDK changing its behaviour (even to "fix it")
>> > > will make users unhappy about backwards/forwards version
>> > > compatibility, and
>> > > * allowing two language SDKs to implement the spec differently will
>> > > make users unhappy about cross-platform, cross-language compatibility.
>> > >
>> > > In my opinion, with modern streaming and event processing, we have to
>> > > take the latter into account!
>> > >
>> > > There were a couple of other options than the two you propose in the
>> > > original discussion thread (such as putting human-readable or
>> > > internationalised names in other metadata properties, or using
>> > > configuration / environment / system properties to turn rigorous spec
>> > > validation on and off).  Have you given them any consideration for
>> > > your use case?
>> > >
>> > > My preference would be to *tighten* the SDKs to match the existing
>> > > Avro spec, and provide language-specific ways to easily disable
>> > > validating names if desired.  There's some precedence for this in the
>> > > Schema.Parser#validate method.
>> > >
>> > > There's a bit more going on here that's worth doing right for the
>> future!
>> > >
>> > > All my best, Ryan
>> > >
>> > >
>> > >
>> > > On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind
>> > > <[email protected]> wrote:
>> > > >
>> > > > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <
>> [email protected]>
>> > > wrote:
>> > > >
>> > > > > So, discussion is to choose between
>> > > > >
>> > > > >    1. "change the documentation" (and adapt module as proposed in
>> this
>> > > PR
>> > > > >    for RUST <https://github.com/apache/avro/pull/1787> and this
>> other
>> > > for
>> > > > > C
>> > > > >    <https://github.com/apache/avro/pull/1798>)
>> > > > >    2. change the code (in Java and C# at least) to be conformed to
>> > > > >    documentation.
>> > > > >
>> > > >
>> > > > For compatibility, I like option 1. If we're to change naming
>> rules, I'd
>> > > > vote for logging warnings before tightening the rules.
>> > > >
>> > > > Kind regards,
>> > > > Oscar
>> > > >
>> > > > --
>> > > >
>> > > > ✉️ Oscar Westra van Holthe - Kind <[email protected]>
>> > >
>>
>

Reply via email to