Another point, could we mimic this java rules <https://docs.oracle.com/javase/tutorial/java/nutsandbolts/variables.html> ? (excluding 'dot' and 'space' from names)
Le ven. 25 nov. 2022 à 17:28, Christophe Le Saëc <[email protected]> a écrit : > Thanks for sharing the url https://langsec.org/occupy/, very interesting > site. > > * internal aspects like unicode normalisation* => Yes, may be main > argument to keep specification like this and fix validation code. Does > someone knows how java compiler validates names (*for method, variables > ...*), as it accepts unicode name ? could it be a solution for Avro ? > > This 2 arguments would also be valuables for properties (names & values > when String) contained in JsonProperties class which is parent of Schema & > Field classes ? > > So, why let one without control (*properties name and value*) and second > with restrictive control (*field name*) ? > > Best regards, > Christophe. > > Le mer. 23 nov. 2022 à 20:20, Ryan Skraba <[email protected]> a écrit : > >> Hello! I have a specific opinion about the "Robustness Principle", >> especially in this case! >> >> "Accepting liberally, generating strictly" (the paraphrasing of >> Postel's idea) has it's place, and might be a good principle for >> binary encoding and decoding. It's not so great for "accepting and >> generating schemas". In this case, it's led directly to this debate: >> accepting "invalid" names has become one facto standard for a >> _certain_ category of users, who are now blocked from interoperating >> with other language SDKs (and potentially future versions of their own >> SDK, if "fixed"). >> >> > If we can use a "non rigorous validation" and >> > it can run wthout bugs, why switch to a rigorous validation mode that >> would >> > follow current specification and not change the specification to "accept >> > schemas as liberally as possible" (meaning, while it doesn't generate >> bugs). >> >> Here's where I think the logic is faulty: even if we don't count >> interoperability failures as a bug, I'm not convinced that using names >> outside the specification run without bugs! There's several things to >> think about: internal aspects like unicode normalisation, internal >> features like schema evolution (which might actually be OK), but >> especially external ones like downstream projects and tools. >> >> As it is, if you follow the specification, Java and Python are >> interoperable and there's a certain guarantee that existing libraries >> and projects can count on. >> >> The configuration approach is one that would allow upstream projects >> to continue working with out-of-spec names, while alerting them that >> these could cause interoperability problems outside of their current >> cases! One thing for certain, the specification should allow invalid >> names for "aliases" so that users can migrate away from these issues. >> >> A slightly related resource: https://langsec.org/occupy/ >> >> All my best, Ryan >> >> >> On Tue, Nov 15, 2022 at 4:41 PM Christophe Le Saëc <[email protected]> >> wrote: >> > >> > Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed >> shows >> > that name should not contains space (pb when generate java code) nor dot >> > (pb to separate names in a path). >> > >> > AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last >> comment >> > reinforce rules for dot and contains a nice principle : "accept schemas >> as >> > liberally as possible" >> > >> > >> > ** allowing two language SDKs to implement the spec differently will >> make >> > users unhappy about cross-platform, cross-language compatibility.* >> > -> Indeed, that's the case with current version, where Java and C# >> accept >> > accents when C and Rust strictly follow the spec. >> > >> > Others possibilities : >> > - *putting human-readable or internationalised names in other metadata >> > properties* : Yes, this can already be done on record fields for >> example as >> > field is a JsonProperties class (and we use it already in some case, >> that's >> > help). >> > - *using configuration / environment / system properties to turn >> rigorous >> > spec validation on and off* : If we can use a "non rigorous validation" >> and >> > it can run wthout bugs, why switch to a rigorous validation mode that >> would >> > follow current specification and not change the specification to "accept >> > schemas as liberally as possible" (meaning, while it doesn't generate >> bugs). >> > >> > >> > >> > *My preference would be to *tighten* the SDKs to match the existing Avro >> > spec, and provide language-specific ways to easily disable validating >> names >> > if desired* >> > Personnally, i like the idea to have mandatory name control you can't >> > deactivate, to ensure it won't generate bug (For Java code generation >> > mainly and to be able to separate name and namespace), but control >> > specification should be limited to ban names that would generate a bug >> (and >> > not a rule that seems to have no real reason, until it would be explain >> in >> > doc). >> > >> > Best regards, >> > Christophe. >> > >> > >> > >> > Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <[email protected]> a écrit : >> > >> > > Hello! Here's a couple of related JIRA from the past that we can use >> > > to inform our discussion: >> > > >> > > * AVRO-2659 demonstrates a pretty disastrous schema name that the Java >> > > SDK accepts. >> > > * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about >> > > accepting UTF-8 that didn't quite get enough follow-up to make it into >> > > the spec! >> > > >> > > We're been in the current (unsatisfactory) state for so long because: >> > > >> > > * making a change to an SDK changing its behaviour (even to "fix it") >> > > will make users unhappy about backwards/forwards version >> > > compatibility, and >> > > * allowing two language SDKs to implement the spec differently will >> > > make users unhappy about cross-platform, cross-language compatibility. >> > > >> > > In my opinion, with modern streaming and event processing, we have to >> > > take the latter into account! >> > > >> > > There were a couple of other options than the two you propose in the >> > > original discussion thread (such as putting human-readable or >> > > internationalised names in other metadata properties, or using >> > > configuration / environment / system properties to turn rigorous spec >> > > validation on and off). Have you given them any consideration for >> > > your use case? >> > > >> > > My preference would be to *tighten* the SDKs to match the existing >> > > Avro spec, and provide language-specific ways to easily disable >> > > validating names if desired. There's some precedence for this in the >> > > Schema.Parser#validate method. >> > > >> > > There's a bit more going on here that's worth doing right for the >> future! >> > > >> > > All my best, Ryan >> > > >> > > >> > > >> > > On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind >> > > <[email protected]> wrote: >> > > > >> > > > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc < >> [email protected]> >> > > wrote: >> > > > >> > > > > So, discussion is to choose between >> > > > > >> > > > > 1. "change the documentation" (and adapt module as proposed in >> this >> > > PR >> > > > > for RUST <https://github.com/apache/avro/pull/1787> and this >> other >> > > for >> > > > > C >> > > > > <https://github.com/apache/avro/pull/1798>) >> > > > > 2. change the code (in Java and C# at least) to be conformed to >> > > > > documentation. >> > > > > >> > > > >> > > > For compatibility, I like option 1. If we're to change naming >> rules, I'd >> > > > vote for logging warnings before tightening the rules. >> > > > >> > > > Kind regards, >> > > > Oscar >> > > > >> > > > -- >> > > > >> > > > ✉️ Oscar Westra van Holthe - Kind <[email protected]> >> > > >> >
