On Thu, Sep 1, 2022 at 11:08 PM Brennan Vincent <[email protected]> wrote:
> > > On 2022-08-31 17:18, Martin Grigorov wrote: > > On Wed, Aug 31, 2022 at 9:59 PM Brennan Vincent <[email protected]> > > wrote: > > > >> > >> > >> On 2022-08-31 13:38, Ryan Skraba wrote: > >>> Hello! I've been trying out some POC code with Java to see what would > >>> be the impact on that SDK -- in the past, a lot of the development has > >>> been pretty Java-centric, but this is definitely not a requirement! > >>> > >>> Currently, the worst scenario I found is something like: > >>> > >>> { "type" : "record", > >>> "name" : "A", > >>> "fields" : [ { "name" : "a1", > >>> "type" : { > >>> "type" : "record", > >>> "name" : "B", > >>> "fields" : [ { "name" : "b1", "type" : [ "null", "A" ], > >>> "default" : null } ] } } ] } > >>> > >>> This is a recursive definition that would like like a linked list > >>> alternating A records containing B records containing A records, etc. > >>> > >>> If you were to only change the name of B to test.B (A fully qualified > >>> namespace), Java can still parse the schema but the generated code > >>> unsurprisingly no longer compiles. It correctly finds the outer > >>> schema (and doesn't try to look for test.A) but it's impossible to > >>> import into the generated Java code. > >>> > >>> If you were to only change the name of A to test A, this is fine. > >>> > >>> I was playing around a bit with "auto-mangling" the packages to put A > >>> in root$.A for this case, but I think it's a hopeless case for Java -- > >>> there's too many ways for the default package to "sneak" into the > >>> system from other previously compiled classes, or from IDL, etc. > >>> > >>> I think it's still possible to try and accept the .Foo syntax but we'd > >>> have to note that (for Java) mixing namespaced schemas and > >>> null-namespaced schemas is either not supported, or we supply a > >>> mechanism in Java to put ALL unnamespaced generated classes in a > >>> folder like root$. > >>> > >>> Thanks for pointing out part 4, I'm also taking a look at the impact > >>> there! Given that these mixed namespace schemas are likely to already > >>> be broken, I don't know if it's too big of an impact! Especially if > >>> we say that the dot is only added when strictly necessary to prevent > >>> namespace inheritance. > >> > >> There is still a question for non-mixed schemas. > >> > >> Consider the following schema: > >> > >> { > >> "type": "fixed", > >> "name": "Foo", > >> "size": 10 > >> } > >> > >> Now, if we clarify the spec to say that leading dots are valid in > >> default-namespace fullnames, then when this is normalized, the > >> current language of the description of PCF implies that its > >> > > > > Please copy/paste the text from the spec that implies that the name > should > > be ".Foo". > > Otherwise we will have to guess which sentence you mean exactly. > > [FULLNAMES] Replace short names with fullnames, using applicable namespaces > to do so. Then eliminate namespace attributes, which are now redundant. > I totally agree that using namespaces everywhere is a best practice! But eliminating the namespace attribute is not really an option due to backward compatibility. > > > > > I don't see any pluses or minuses in using the leading dot in the PCF for > > top-level names. IMO there is no difference with both representations. > > For inner names the leading dot should be preserved in the PCF. Otherwise > > it will start using the enclosing namespace after parsing. > > > > > >> name should be rewritten to ".Foo". However, this is contrary to current > >> behavior. > >> > >> So, if it's okay to change the behavior on existing valid schemas, then > >> we should do so. If it's not okay, then we should clarify the spec to > >> say that names are normalized to fullnames for PCF, _except_ > >> in the special case of the non-default namespace. > >> > >>> > >>> I'll keep digging on the Java side. Anybody else from the other SDKs > >>> want to weigh in? What would happen with C# generated code? > >>> > >>> All my best, Ryan > >>> > >>> > >>> > >>> On Fri, Aug 26, 2022 at 4:10 PM Brennan Vincent < > [email protected]> > >> wrote: > >>>> > >>>> I’m in favor of allowing .Foo as a fullname for the following reasons: > >>>> > >>>> 1. I believe the *intent* of the initial change to the spec was to > only > >> refer to namespaces; > >>>> 2. Even if it is not possible in Java to generate code that refers to > a > >> non-namespaced context from a namespaced one, it may be possible in > other > >> languages; > >>>> 3. We do not lose anything by supporting it. > >>>> 4. Other parts of the spec assume that all names can be converted to a > >> fullname, specifically the parsing canonical form algorithm. > >>>> > >>>> Point 4. brings me to another issue. Currently, non-namespaced names > >> are left as bare names in PCF, at least by the Python SDK - they are not > >> converted to fullnames like .Foo (which makes sense, since that is out > of > >> spec). However, it contradicts the spec: > >>>> > >>>> [FULLNAMES] Replace short names with fullnames, using applicable > >> namespaces to do so. > >>>> > >>>> The spec doesn’t say “only if the non-empty namespace is used”. It > says > >> to always do this. So if we enable the ability to write fullnames like > >> .Foo, we need to decide whether to change the PCF behavior (this will > >> change the fingerprints of existing schemas) to match the spec, or > change > >> the spec to match the current behavior. > >>>> > >>>>> On Aug 26, 2022, at 03:57, Ryan Skraba <[email protected]> wrote: > >>>>> > >>>>> Hello! We can just discuss the impact here in the mailing list and > >>>>> make a decision by consensus. Sometimes for major changes, we do a > >>>>> more formal VOTE thread -- this might be one of those cases. > >>>>> > >>>>> What would happen if we were to say that ".MyRecord" was valid in the > >>>>> next major version of Avro? > >>>>> > >>>>> Some SDKs used to accept this in the past and were made more strict, > >>>>> causing working examples to break? That is really unfortunate. > >>>>> > >>>>> On the other hand, if we generate Java code today and map packages > 1:1 > >>>>> to namespaces... we still won't be able to mix namespaced (in a > >>>>> package) and unnamespaced (unpackaged) generated code. Would we just > >>>>> mangle the default namespace to "default$" or ... ? A configuration > >>>>> option for the SpecificCompiler in Java? > >>>>> > >>>>> Either way, it would be great if we didn't leave this point vague in > >>>>> the spec! There's always the possibility to allow language SDKs to > >>>>> deviate from the spec -- if e.g. python or Java has a > >>>>> "setValidateUnqualifiedNamespace(boolean)" method, we can leave it up > >>>>> to the user whether or not to follow the strict spec. We already do > >>>>> this with validating defaults in Java, for example. > >>>>> > >>>>> It might take a bit of thought, but if we can find some elegant way > to > >>>>> make this work I don't see why we wouldn't make specification > changes! > >>>>> > >>>>> Ryan > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> On Thu, Aug 25, 2022 at 7:31 PM Brennan Vincent < > >> [email protected]> wrote: > >>>>>> > >>>>>> That is a fair point also. > >>>>>> > >>>>>> Anyway, since I'm not an Apache project member, I'm not quite sure > >> what > >>>>>> is the best way to move forward here. Is there a formal process for > >> proposing > >>>>>> changes to the spec and reaching a consensus? > >>>>>> > >>>>>> Thanks > >>>>>> Brennan > >>>>>> > >>>>>>> On 2022-08-25 01:36, Oscar Westra van Holthe - Kind wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>>> Allowing references to the null namespace from within another > >> namespace > >>>>>>> gives schema authors more options. > >>>>>>> > >>>>>>> But if you're using namespaces at all, there must be a reason for > >> it. As a > >>>>>>> schema author, you've made the decision to group your schemata. > >>>>>>> > >>>>>>> To make this decision from schema authors more visible, I'd opt to > >> choose > >>>>>>> the Java route and in that case force all schemata to belong to a > >> group. > >>>>>>> I.e., explicitly disallow identifiers to start with a dot (and > >> disallow > >>>>>>> references to the null namespace from within another namespace). > >>>>>>> > >>>>>>> > >>>>>>> Kind regards, > >>>>>>> Oscar > >>>>>>> > >>>>>>> -- > >>>>>>> Oscar Westra van Holthe - Kind <[email protected]> > >>>>>>> > >>>>>>> Op wo 24 aug. 2022 14:42 schreef Ryan Skraba <[email protected]>: > >>>>>>> > >>>>>>>> Hello! There is definitely an ambiguity here caused by inheriting > >>>>>>>> namespaces. > >>>>>>>> > >>>>>>>> The obvious takeaway is to use a namespace with all of your named > >>>>>>>> schemas. As a best practice, that avoids the problem of mixing > >>>>>>>> schemas with and without namespaces, and it's probably this > techniq > >>>>>>>> > >>>>>>>> This same problem occurs in Java classes, where you can have a > class > >>>>>>>> in the default package (without a package name), but it's an error > >> to > >>>>>>>> import it into other packages. > >>>>>>>> > >>>>>>>> The ".MyRecord" notation might be the right way to clarify this, > but > >>>>>>>> we can also go the Java route (i.e. you can't mix namespaced > schema > >>>>>>>> and non-namespaced schemas). What do you think? > >>>>>>>> > >>>>>>>> Best regards, Ryan > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Aug 22, 2022 at 10:49 PM Brennan Vincent < > >> [email protected]> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> On 2022/08/22 20:05:22 Martin Grigorov wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I might be wrong but I think your sample schema should be valid! > >> Does > >>>>>>>> it > >>>>>>>>>> fail with any of the SDKs ? > >>>>>>>>> > >>>>>>>>> Yes. It fails with the Python avro package. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This part of the spec talks about the namespace, not the type. > >> I.e. > >>>>>>>>>> "namespace": ".ns" would be an error. > >>>>>>>>> > >>>>>>>>> The linked thread ( > >>>>>>>> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587 > ) > >>>>>>>>> is a bit vague -- it's not totally clear whether the restriction > is > >>>>>>>> meant to apply to > >>>>>>>>> namespaces only, or to fullnames also. > >>>>>>>>> > >>>>>>>>> "The null namespace may not be used in a dot-separated sequence > of > >>>>>>>> names." > >>>>>>>>> > >>>>>>>>> certainly makes it sound like it applies to _any_ sequence of > >> names, > >>>>>>>> though, > >>>>>>>>> not just in a namespace field. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Mon, Aug 22, 2022 at 10:40 PM Brennan Vincent < > >> [email protected] > >>>>>>>>> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hello, > >>>>>>>>>>> > >>>>>>>>>>> https://github.com/apache/avro/pull/917 introduced the > following > >>>>>>>> language > >>>>>>>>>>> to the spec: > >>>>>>>>>>> > >>>>>>>>>>>> The null namespace may not be used in a dot-separated sequence > >> of > >>>>>>>> names. > >>>>>>>>>>> > >>>>>>>>>>> Thus ruling out fullnames like ".foo". > >>>>>>>>>>> > >>>>>>>>>>> However, this seems to rule out referring to names in the > default > >>>>>>>>>>> namespace from another namespace. > >>>>>>>>>>> > >>>>>>>>>>> For example, this schema was previously allowed by the spec: > >>>>>>>>>>> > >>>>>>>>>>> { > >>>>>>>>>>> "type": "record", > >>>>>>>>>>> "name": "r", > >>>>>>>>>>> "fields": [ > >>>>>>>>>>> { > >>>>>>>>>>> "name": "f", > >>>>>>>>>>> "type": { > >>>>>>>>>>> "type": "record", > >>>>>>>>>>> "name": "r2", > >>>>>>>>>>> "namespace": "ns", > >>>>>>>>>>> "fields": [ > >>>>>>>>>>> { > >>>>>>>>>>> "name": "f2", > >>>>>>>>>>> "type": ["null", ".r"] > >>>>>>>>>>> } > >>>>>>>>>>> ] > >>>>>>>>>>> } > >>>>>>>>>>> } > >>>>>>>>>>> ] > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> Note ".r" in the type of "f2". This can't be changed to "r", > >>>>>>>>>>> because that would be interpreted as "ns.r" due to "ns" being > the > >>>>>>>> nearest > >>>>>>>>>>> enclosing namespace. > >>>>>>>>>>> > >>>>>>>>>>> Thus it seems that the new spec has restricted the set of valid > >>>>>>>> schemas > >>>>>>>>>>> and there is no longer > >>>>>>>>>>> any way to accomplish this. > >>>>>>>>>>> > >>>>>>>>>>> Am I misinterpreting the spec? Does the empty namespace being > >>>>>>>> disallowed > >>>>>>>>>>> in dotted sequences > >>>>>>>>>>> of names only apply to initial name definitions, but not to > later > >>>>>>>> name > >>>>>>>>>>> references? Or is there > >>>>>>>>>>> some other way to express this? > >>>>>>>>>>> > >>>>>>>>>>> Here is the initial discussion of this change, where the issue > >> I'm > >>>>>>>> raising > >>>>>>>>>>> here doesn't > >>>>>>>>>>> appear to have come up: > >>>>>>>>>>> > https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587 > >>>>>>>>>>> >
