Hmmmm --  maybe there's a solution that doesn't change the PCF behaviour.

In the PCF specification, instead of:

[FULLNAMES] Replace short names with fullnames, using applicable
namespaces to do so. Then eliminate namespace attributes, which are
now redundant.

we change to:

[FULLNAMES] Replace short names with fullnames, using applicable
namespaces to do so. The only namepace attributes that can remain are
namespace="" when a named schema is in the default namespace AND
shouldn't inherit a namespace from a parent schema.  All other
namespace attributes are now redundant, and eliminated.

This still leaves the problem of needing fullnames in a UNION,
aliases, or as a named reference to a previously defined schema.  This
would really only pose a problem when there's an ambiguity between
Foo, and ns.Foo.  Is there a clever way of disambiguating between
these that would leave existing fingerprints unchanged?

Another alternative might be to have a schema parsing mode that
disables namespace inheritance entirely, and to consider that PCF
schemas are only appropriately parsed in that mode.

There might be a couple of things we can do here to close this
loophole without breaking PCF and fingerprints!

All my best, Ryan


On Fri, Sep 2, 2022 at 5:39 PM Brennan Vincent <[email protected]> wrote:
>
>
>
> On 2022-09-02 01:34, Martin Grigorov wrote:
> > On Fri, Sep 2, 2022, 02:53 Brennan Vincent <[email protected]> wrote:
> >
> >> I don’t understand what you mean. I am talking about what to do with names
> >> that have no namespace. Obviously, in such a case there are no namespace
> >> attributes to remove.
> >>
> >
> > It seems I misunderstood your previous message then.
> >
> My point was that currently, there is no fullname
> corresponding to a name with no namespace. In the future, if
> we allow ".Foo", there will be one. Thus, following the
> description of PCF which mandates replacing all names by fullnames,
> we would replace "Foo" in a non-namespaced context by ".Foo", which
> differs from the current behavior of PCF.
> >
> >
> >>> On Sep 1, 2022, at 16:34, Martin Grigorov <[email protected]> wrote:
> >>>
> >>> 
> >>>
> >>>
> >>>> On Thu, Sep 1, 2022 at 11:08 PM Brennan Vincent <[email protected]>
> >> wrote:
> >>>>
> >>>>
> >>>> On 2022-08-31 17:18, Martin Grigorov wrote:
> >>>>> On Wed, Aug 31, 2022 at 9:59 PM Brennan Vincent <
> >> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2022-08-31 13:38, Ryan Skraba wrote:
> >>>>>>> Hello!  I've been trying out some POC code with Java to see what
> >> would
> >>>>>>> be the impact on that SDK -- in the past, a lot of the development
> >> has
> >>>>>>> been pretty Java-centric, but this is definitely not a requirement!
> >>>>>>>
> >>>>>>> Currently, the worst scenario I found is something like:
> >>>>>>>
> >>>>>>> { "type" : "record",
> >>>>>>>   "name" : "A",
> >>>>>>>   "fields" : [ { "name" : "a1",
> >>>>>>>     "type" : {
> >>>>>>>       "type" : "record",
> >>>>>>>       "name" : "B",
> >>>>>>>       "fields" : [ { "name" : "b1",  "type" : [ "null", "A" ],
> >>>>>>> "default" : null  } ] } } ] }
> >>>>>>>
> >>>>>>> This is a recursive definition that would like like a linked list
> >>>>>>> alternating A records containing B records containing A records,
> >> etc.
> >>>>>>>
> >>>>>>> If you were to only change the name of B to test.B (A fully
> >> qualified
> >>>>>>> namespace), Java can still parse the schema but the generated code
> >>>>>>> unsurprisingly no longer compiles.  It correctly finds the outer
> >>>>>>> schema (and doesn't try to look for test.A) but it's impossible to
> >>>>>>> import into the generated Java code.
> >>>>>>>
> >>>>>>> If you were to only change the name of A to test A, this is fine.
> >>>>>>>
> >>>>>>> I was playing around a bit with "auto-mangling" the packages to put
> >> A
> >>>>>>> in root$.A for this case, but I think it's a hopeless case for Java
> >> --
> >>>>>>> there's too many ways for the default package to "sneak" into the
> >>>>>>> system from other previously compiled classes, or from IDL, etc.
> >>>>>>>
> >>>>>>> I think it's still possible to try and accept the .Foo syntax but
> >> we'd
> >>>>>>> have to note that (for Java) mixing namespaced schemas and
> >>>>>>> null-namespaced schemas is either not supported, or we supply a
> >>>>>>> mechanism in Java to put ALL unnamespaced generated classes in a
> >>>>>>> folder like root$.
> >>>>>>>
> >>>>>>> Thanks for pointing out part 4, I'm also taking a look at the impact
> >>>>>>> there!  Given that these mixed namespace schemas are likely to
> >> already
> >>>>>>> be broken, I don't know if it's too big of an impact!  Especially if
> >>>>>>> we say that the dot is only added when strictly necessary to prevent
> >>>>>>> namespace inheritance.
> >>>>>>
> >>>>>> There is still a question for non-mixed schemas.
> >>>>>>
> >>>>>> Consider the following schema:
> >>>>>>
> >>>>>> {
> >>>>>>     "type": "fixed",
> >>>>>>     "name": "Foo",
> >>>>>>     "size": 10
> >>>>>> }
> >>>>>>
> >>>>>> Now, if we clarify the spec to say that leading dots are valid in
> >>>>>> default-namespace fullnames, then when this is normalized, the
> >>>>>> current language of the description of PCF implies that its
> >>>>>>
> >>>>>
> >>>>> Please copy/paste the text from the spec that implies that the name
> >> should
> >>>>> be ".Foo".
> >>>>> Otherwise we will have to guess which sentence you mean exactly.
> >>>>
> >>>> [FULLNAMES] Replace short names with fullnames, using applicable
> >> namespaces
> >>>> to do so. Then eliminate namespace attributes, which are now redundant.
> >>>
> >>> I totally agree that using namespaces everywhere is a best practice!
> >>> But eliminating the namespace attribute is not really an option due to
> >> backward compatibility.
> >>>
> >>>
> >>>>
> >>>>>
> >>>>> I don't see any pluses or minuses in using the leading dot in the PCF
> >> for
> >>>>> top-level names. IMO there is no difference with both representations.
> >>>>> For inner names the leading dot should be preserved in the PCF.
> >> Otherwise
> >>>>> it will start using the enclosing namespace after parsing.
> >>>>>
> >>>>>
> >>>>>> name should be rewritten to ".Foo". However, this is contrary to
> >> current
> >>>>>> behavior.
> >>>>>>
> >>>>>> So, if it's okay to change the behavior on existing valid schemas,
> >> then
> >>>>>> we should do so. If it's not okay, then we should clarify the spec to
> >>>>>> say that names are normalized to fullnames for PCF, _except_
> >>>>>> in the special case of the non-default namespace.
> >>>>>>
> >>>>>>>
> >>>>>>> I'll keep digging on the Java side.  Anybody else from the other
> >> SDKs
> >>>>>>> want to weigh in?  What would happen with C# generated code?
> >>>>>>>
> >>>>>>> All my best, Ryan
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Aug 26, 2022 at 4:10 PM Brennan Vincent <
> >> [email protected]>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I’m in favor of allowing .Foo as a fullname for the following
> >> reasons:
> >>>>>>>>
> >>>>>>>> 1. I believe the *intent* of the initial change to the spec was to
> >> only
> >>>>>> refer to namespaces;
> >>>>>>>> 2. Even if it is not possible in Java to generate code that refers
> >> to a
> >>>>>> non-namespaced context from a namespaced one, it may be possible in
> >> other
> >>>>>> languages;
> >>>>>>>> 3. We do not lose anything by supporting it.
> >>>>>>>> 4. Other parts of the spec assume that all names can be converted
> >> to a
> >>>>>> fullname, specifically the parsing canonical form algorithm.
> >>>>>>>>
> >>>>>>>> Point 4. brings me to another issue. Currently, non-namespaced
> >> names
> >>>>>> are left as bare names in PCF, at least by the Python SDK - they are
> >> not
> >>>>>> converted to fullnames like .Foo (which makes sense, since that is
> >> out of
> >>>>>> spec). However, it contradicts the spec:
> >>>>>>>>
> >>>>>>>> [FULLNAMES] Replace short names with fullnames, using applicable
> >>>>>> namespaces to do so.
> >>>>>>>>
> >>>>>>>> The spec doesn’t say “only if the non-empty namespace is used”. It
> >> says
> >>>>>> to always do this. So if we enable the ability to write fullnames
> >> like
> >>>>>> .Foo, we need to decide whether to change the PCF behavior (this will
> >>>>>> change the fingerprints of existing schemas) to match the spec, or
> >> change
> >>>>>> the spec to match the current behavior.
> >>>>>>>>
> >>>>>>>>> On Aug 26, 2022, at 03:57, Ryan Skraba <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> Hello!  We can just discuss the impact here in the mailing list
> >> and
> >>>>>>>>> make a decision by consensus.  Sometimes for major changes, we do
> >> a
> >>>>>>>>> more formal VOTE thread -- this might be one of those cases.
> >>>>>>>>>
> >>>>>>>>> What would happen if we were to say that ".MyRecord" was valid in
> >> the
> >>>>>>>>> next major version of Avro?
> >>>>>>>>>
> >>>>>>>>> Some SDKs used to accept this in the past and were made more
> >> strict,
> >>>>>>>>> causing working examples to break?  That is really unfortunate.
> >>>>>>>>>
> >>>>>>>>> On the other hand, if we generate Java code today and map
> >> packages 1:1
> >>>>>>>>> to namespaces... we still won't be able to mix namespaced (in a
> >>>>>>>>> package) and unnamespaced (unpackaged) generated code.  Would we
> >> just
> >>>>>>>>> mangle the default namespace to "default$" or ... ?  A
> >> configuration
> >>>>>>>>> option for the SpecificCompiler in Java?
> >>>>>>>>>
> >>>>>>>>> Either way, it would be great if we didn't leave this point vague
> >> in
> >>>>>>>>> the spec!   There's always the possibility to allow language SDKs
> >> to
> >>>>>>>>> deviate from the spec -- if e.g. python or Java has a
> >>>>>>>>> "setValidateUnqualifiedNamespace(boolean)" method, we can leave
> >> it up
> >>>>>>>>> to the user whether or not to follow the strict spec.  We already
> >> do
> >>>>>>>>> this with validating defaults in Java, for example.
> >>>>>>>>>
> >>>>>>>>> It might take a bit of thought, but if we can find some elegant
> >> way to
> >>>>>>>>> make this work I don't see why we wouldn't make specification
> >> changes!
> >>>>>>>>>
> >>>>>>>>> Ryan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Thu, Aug 25, 2022 at 7:31 PM Brennan Vincent <
> >>>>>> [email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> That is a fair point also.
> >>>>>>>>>>
> >>>>>>>>>> Anyway, since I'm not an Apache project member, I'm not quite
> >> sure
> >>>>>> what
> >>>>>>>>>> is the best way to move forward here. Is there a formal process
> >> for
> >>>>>> proposing
> >>>>>>>>>> changes to the spec and reaching a consensus?
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Brennan
> >>>>>>>>>>
> >>>>>>>>>>> On 2022-08-25 01:36, Oscar Westra van Holthe - Kind wrote:
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> Allowing references to the null namespace from within another
> >>>>>> namespace
> >>>>>>>>>>> gives schema authors more options.
> >>>>>>>>>>>
> >>>>>>>>>>> But if you're using namespaces at all, there must be a reason
> >> for
> >>>>>> it. As a
> >>>>>>>>>>> schema author, you've made the decision to group your schemata.
> >>>>>>>>>>>
> >>>>>>>>>>> To make this decision from schema authors more visible, I'd opt
> >> to
> >>>>>> choose
> >>>>>>>>>>> the Java route and in that case force all schemata to belong to
> >> a
> >>>>>> group.
> >>>>>>>>>>> I.e., explicitly disallow identifiers to start with a dot (and
> >>>>>> disallow
> >>>>>>>>>>> references to the null namespace from within another namespace).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Kind regards,
> >>>>>>>>>>> Oscar
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Oscar Westra van Holthe - Kind <[email protected]>
> >>>>>>>>>>>
> >>>>>>>>>>> Op wo 24 aug. 2022 14:42 schreef Ryan Skraba <[email protected]>:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hello!  There is definitely an ambiguity here caused by
> >> inheriting
> >>>>>>>>>>>> namespaces.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The obvious takeaway is to use a namespace with all of your
> >> named
> >>>>>>>>>>>> schemas.  As a best practice, that avoids the problem of mixing
> >>>>>>>>>>>> schemas with and without namespaces, and it's probably this
> >> techniq
> >>>>>>>>>>>>
> >>>>>>>>>>>> This same problem occurs in Java classes, where you can have a
> >> class
> >>>>>>>>>>>> in the default package (without a package name), but it's an
> >> error
> >>>>>> to
> >>>>>>>>>>>> import it into other packages.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The ".MyRecord" notation might be the right way to clarify
> >> this, but
> >>>>>>>>>>>> we can also go the Java route (i.e. you can't mix namespaced
> >> schema
> >>>>>>>>>>>> and non-namespaced schemas).  What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards, Ryan
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Aug 22, 2022 at 10:49 PM Brennan Vincent <
> >>>>>> [email protected]>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2022/08/22 20:05:22 Martin Grigorov wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I might be wrong but I think your sample schema should be
> >> valid!
> >>>>>> Does
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>> fail with any of the SDKs ?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Yes. It fails with the Python avro package.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This part of the spec talks about the namespace, not the
> >> type.
> >>>>>> I.e.
> >>>>>>>>>>>>>> "namespace": ".ns" would be an error.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The linked thread (
> >>>>>>>>>>>>
> >> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587 )
> >>>>>>>>>>>>> is a bit vague -- it's not totally clear whether the
> >> restriction is
> >>>>>>>>>>>> meant to apply to
> >>>>>>>>>>>>> namespaces only, or to fullnames also.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> "The null namespace may not be used in a dot-separated
> >> sequence of
> >>>>>>>>>>>> names."
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> certainly makes it sound like it applies to _any_ sequence of
> >>>>>> names,
> >>>>>>>>>>>> though,
> >>>>>>>>>>>>> not just in a namespace field.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Aug 22, 2022 at 10:40 PM Brennan Vincent <
> >>>>>> [email protected]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> https://github.com/apache/avro/pull/917 introduced the
> >> following
> >>>>>>>>>>>> language
> >>>>>>>>>>>>>>> to the spec:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The null namespace may not be used in a dot-separated
> >> sequence
> >>>>>> of
> >>>>>>>>>>>> names.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thus ruling out fullnames like ".foo".
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> However, this seems to rule out referring to names in the
> >> default
> >>>>>>>>>>>>>>> namespace from another namespace.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> For example, this schema was previously allowed by the spec:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>    "type": "record",
> >>>>>>>>>>>>>>>    "name": "r",
> >>>>>>>>>>>>>>>    "fields": [
> >>>>>>>>>>>>>>>        {
> >>>>>>>>>>>>>>>            "name": "f",
> >>>>>>>>>>>>>>>            "type": {
> >>>>>>>>>>>>>>>                "type": "record",
> >>>>>>>>>>>>>>>                "name": "r2",
> >>>>>>>>>>>>>>>                "namespace": "ns",
> >>>>>>>>>>>>>>>                "fields": [
> >>>>>>>>>>>>>>>                    {
> >>>>>>>>>>>>>>>                        "name": "f2",
> >>>>>>>>>>>>>>>                        "type": ["null", ".r"]
> >>>>>>>>>>>>>>>                    }
> >>>>>>>>>>>>>>>                ]
> >>>>>>>>>>>>>>>            }
> >>>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>>    ]
> >>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Note ".r" in the type of "f2". This can't be changed to "r",
> >>>>>>>>>>>>>>> because that would be interpreted as "ns.r" due to "ns"
> >> being the
> >>>>>>>>>>>> nearest
> >>>>>>>>>>>>>>> enclosing namespace.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thus it seems that the new spec has restricted the set of
> >> valid
> >>>>>>>>>>>> schemas
> >>>>>>>>>>>>>>> and there is no longer
> >>>>>>>>>>>>>>> any way to accomplish this.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Am I misinterpreting the spec? Does the empty namespace
> >> being
> >>>>>>>>>>>> disallowed
> >>>>>>>>>>>>>>> in dotted sequences
> >>>>>>>>>>>>>>> of names only apply to initial name definitions, but not to
> >> later
> >>>>>>>>>>>> name
> >>>>>>>>>>>>>>> references? Or is there
> >>>>>>>>>>>>>>> some other way to express this?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Here is the initial discussion of this change, where the
> >> issue
> >>>>>> I'm
> >>>>>>>>>>>> raising
> >>>>>>>>>>>>>>> here doesn't
> >>>>>>>>>>>>>>> appear to have come up:
> >>>>>>>>>>>>>>>
> >> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587
> >>>>>>>>>>>>>>>
> >>

Reply via email to