Thanks for sharing the url https://langsec.org/occupy/, very interesting site.
* internal aspects like unicode normalisation* => Yes, may be main argument to keep specification like this and fix validation code. Does someone knows how java compiler validates names (*for method, variables ...*), as it accepts unicode name ? could it be a solution for Avro ? This 2 arguments would also be valuables for properties (names & values when String) contained in JsonProperties class which is parent of Schema & Field classes ? So, why let one without control (*properties name and value*) and second with restrictive control (*field name*) ? Best regards, Christophe. Le mer. 23 nov. 2022 à 20:20, Ryan Skraba <[email protected]> a écrit : > Hello! I have a specific opinion about the "Robustness Principle", > especially in this case! > > "Accepting liberally, generating strictly" (the paraphrasing of > Postel's idea) has it's place, and might be a good principle for > binary encoding and decoding. It's not so great for "accepting and > generating schemas". In this case, it's led directly to this debate: > accepting "invalid" names has become one facto standard for a > _certain_ category of users, who are now blocked from interoperating > with other language SDKs (and potentially future versions of their own > SDK, if "fixed"). > > > If we can use a "non rigorous validation" and > > it can run wthout bugs, why switch to a rigorous validation mode that > would > > follow current specification and not change the specification to "accept > > schemas as liberally as possible" (meaning, while it doesn't generate > bugs). > > Here's where I think the logic is faulty: even if we don't count > interoperability failures as a bug, I'm not convinced that using names > outside the specification run without bugs! There's several things to > think about: internal aspects like unicode normalisation, internal > features like schema evolution (which might actually be OK), but > especially external ones like downstream projects and tools. > > As it is, if you follow the specification, Java and Python are > interoperable and there's a certain guarantee that existing libraries > and projects can count on. > > The configuration approach is one that would allow upstream projects > to continue working with out-of-spec names, while alerting them that > these could cause interoperability problems outside of their current > cases! One thing for certain, the specification should allow invalid > names for "aliases" so that users can migrate away from these issues. > > A slightly related resource: https://langsec.org/occupy/ > > All my best, Ryan > > > On Tue, Nov 15, 2022 at 4:41 PM Christophe Le Saëc <[email protected]> > wrote: > > > > Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed shows > > that name should not contains space (pb when generate java code) nor dot > > (pb to separate names in a path). > > > > AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last > comment > > reinforce rules for dot and contains a nice principle : "accept schemas > as > > liberally as possible" > > > > > > ** allowing two language SDKs to implement the spec differently will make > > users unhappy about cross-platform, cross-language compatibility.* > > -> Indeed, that's the case with current version, where Java and C# accept > > accents when C and Rust strictly follow the spec. > > > > Others possibilities : > > - *putting human-readable or internationalised names in other metadata > > properties* : Yes, this can already be done on record fields for example > as > > field is a JsonProperties class (and we use it already in some case, > that's > > help). > > - *using configuration / environment / system properties to turn rigorous > > spec validation on and off* : If we can use a "non rigorous validation" > and > > it can run wthout bugs, why switch to a rigorous validation mode that > would > > follow current specification and not change the specification to "accept > > schemas as liberally as possible" (meaning, while it doesn't generate > bugs). > > > > > > > > *My preference would be to *tighten* the SDKs to match the existing Avro > > spec, and provide language-specific ways to easily disable validating > names > > if desired* > > Personnally, i like the idea to have mandatory name control you can't > > deactivate, to ensure it won't generate bug (For Java code generation > > mainly and to be able to separate name and namespace), but control > > specification should be limited to ban names that would generate a bug > (and > > not a rule that seems to have no real reason, until it would be explain > in > > doc). > > > > Best regards, > > Christophe. > > > > > > > > Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <[email protected]> a écrit : > > > > > Hello! Here's a couple of related JIRA from the past that we can use > > > to inform our discussion: > > > > > > * AVRO-2659 demonstrates a pretty disastrous schema name that the Java > > > SDK accepts. > > > * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about > > > accepting UTF-8 that didn't quite get enough follow-up to make it into > > > the spec! > > > > > > We're been in the current (unsatisfactory) state for so long because: > > > > > > * making a change to an SDK changing its behaviour (even to "fix it") > > > will make users unhappy about backwards/forwards version > > > compatibility, and > > > * allowing two language SDKs to implement the spec differently will > > > make users unhappy about cross-platform, cross-language compatibility. > > > > > > In my opinion, with modern streaming and event processing, we have to > > > take the latter into account! > > > > > > There were a couple of other options than the two you propose in the > > > original discussion thread (such as putting human-readable or > > > internationalised names in other metadata properties, or using > > > configuration / environment / system properties to turn rigorous spec > > > validation on and off). Have you given them any consideration for > > > your use case? > > > > > > My preference would be to *tighten* the SDKs to match the existing > > > Avro spec, and provide language-specific ways to easily disable > > > validating names if desired. There's some precedence for this in the > > > Schema.Parser#validate method. > > > > > > There's a bit more going on here that's worth doing right for the > future! > > > > > > All my best, Ryan > > > > > > > > > > > > On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind > > > <[email protected]> wrote: > > > > > > > > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <[email protected] > > > > > wrote: > > > > > > > > > So, discussion is to choose between > > > > > > > > > > 1. "change the documentation" (and adapt module as proposed in > this > > > PR > > > > > for RUST <https://github.com/apache/avro/pull/1787> and this > other > > > for > > > > > C > > > > > <https://github.com/apache/avro/pull/1798>) > > > > > 2. change the code (in Java and C# at least) to be conformed to > > > > > documentation. > > > > > > > > > > > > > For compatibility, I like option 1. If we're to change naming rules, > I'd > > > > vote for logging warnings before tightening the rules. > > > > > > > > Kind regards, > > > > Oscar > > > > > > > > -- > > > > > > > > ✉️ Oscar Westra van Holthe - Kind <[email protected]> > > > >
