Re: Formal spec for Avro Schema
Hi everyone, A bit late, but I though I’d add a few thoughts on this as well. For one, I love the idea of improving our documentation. Separating the schema specification from the encoding makes perfect sense to me, and also allows us to clearly state which encoding to use. Currently, most problems I see in online questions arise from using raw / internal encodings, which I think is an easy problem to prevent. As to the specification, I think it’s a good start. Some things I really like: the introduction of an Avro Schema Document, limiting top types to a (union of) named type(s) an explicit "no imports" rule to ensure a schema document is self-contained There are some things I think we can improve, such as explicitly mentioning all attributes (the ’type’ attribute is not introduced), fixing a few errors in the document, etc. I’ve taken the liberty of doing so. One notable addition is a de facto requirement: that names and aliases must be unique within their context. I’ve put my changes in a fork of Clemens gist: https://gist.github.com/opwvhk/38481bf19a175a86c703d8ba0ab08866 As a followup to this schema specification, we can make specifications for the binary encoding (warning to never use it directly), Avro files, the Single-Object encoding, protocols, the protocol wire format, and the IDL schema and protocol definitions. Kind regards. Oscar -- Oscar Westra van Holthe - Kind > On 15 May 2024, at 11:17, Clemens Vasters via user > wrote: > > Hi Martin, > > I am saying that the specification of the schema is currently entangled with > the specification of the serialization framework. Avro Schema is useful and > usable even if you never touch the Avro binaries (the framework, an > implementation using the spec). > > I am indeed proposing to separate the schema spec from the specs of the Avro > binary encoding and the Avro JSON encoding, which also avoids strange > entanglements like the JSON encoding pointing to the schema description’s > default values section, which is in itself rather lacking in precision, i.e. > the encoding rule for binary or fixed is “defined” with a rather terse > example: "\u00ff" > > Microsoft would like to propose Avro and Avro Schema in several > standardization efforts, but we need a spec that works in those contexts and > that can stand on its own. I would also like to see “application/avro” as a > formal media type, but the route towards that only goes via formal > standardization of both schema and encodings. > > I believe the Avro project’s reach and importance is such that schema and > encodings should have formal specs that can stand on their own as JSON and > CBOR and AMQP and XML and OPC/Binary and other serialization schemas/formats > do. I don’t think existence of a formal spec gets in the way of progress and > Avro is so mature that the spec captures a fairly stable state. > > Best Regards > Clemens > > From: Martin Grigorov > Sent: Wednesday, May 15, 2024 10:54 AM > To: d...@avro.apache.org > Cc: user@avro.apache.org > Subject: Re: Formal spec for Avro Schema > > Hi Clemens, > > On Wed, May 15, 2024 at 11:18 AM Clemens Vasters > mailto:cleme...@microsoft.com.invalid>> > wrote: > Hi Martin, > > we find Avro Schema to be a great fit for describing application data > structures in general and even independent of wire-serialization scenarios. > > Therefore, I would like to have a spec that focuses specifically on the > schema format, is grounded in the IETF RFC specs, and which follows the > conventions set by IETF, so that folks who need a sane schema format to > describe data structures independent of implementation can use that. > > Do you say that the specification document is implementation dependent ? > If this is the case then maybe we should work on improving it instead of > duplicating it. > > > The benefit for the Avro serialization framework of having such a formal spec > that is untangled from the wire-serialization specs is that all schemas > defined by that schema model are compatible with the framework. > > What do you mean by "framework" here ? > > > The differences are organization, scope, and language style (including > keywords etc.). The expressed ruleset is the same. > > I don't think it is a good idea to add a second document that is very similar > to the specification but uses a different language style. > To me this looks like a duplication. > IMO it would be better to suggest (many) (smaller) improvements for the > existing document. > > > > Best Regards > Clemens > > -Original Message- > From: Martin Grigorov mailto:mgrigo...@apache.org>> > Sent: Wednesd
RE: Formal spec for Avro Schema
Hi Martin, I am saying that the specification of the schema is currently entangled with the specification of the serialization framework. Avro Schema is useful and usable even if you never touch the Avro binaries (the framework, an implementation using the spec). I am indeed proposing to separate the schema spec from the specs of the Avro binary encoding and the Avro JSON encoding, which also avoids strange entanglements like the JSON encoding pointing to the schema description’s default values section, which is in itself rather lacking in precision, i.e. the encoding rule for binary or fixed is “defined” with a rather terse example: "\u00ff" Microsoft would like to propose Avro and Avro Schema in several standardization efforts, but we need a spec that works in those contexts and that can stand on its own. I would also like to see “application/avro” as a formal media type, but the route towards that only goes via formal standardization of both schema and encodings. I believe the Avro project’s reach and importance is such that schema and encodings should have formal specs that can stand on their own as JSON and CBOR and AMQP and XML and OPC/Binary and other serialization schemas/formats do. I don’t think existence of a formal spec gets in the way of progress and Avro is so mature that the spec captures a fairly stable state. Best Regards Clemens From: Martin Grigorov Sent: Wednesday, May 15, 2024 10:54 AM To: d...@avro.apache.org Cc: user@avro.apache.org Subject: Re: Formal spec for Avro Schema Hi Clemens, On Wed, May 15, 2024 at 11:18 AM Clemens Vasters mailto:cleme...@microsoft.com.invalid>> wrote: Hi Martin, we find Avro Schema to be a great fit for describing application data structures in general and even independent of wire-serialization scenarios. Therefore, I would like to have a spec that focuses specifically on the schema format, is grounded in the IETF RFC specs, and which follows the conventions set by IETF, so that folks who need a sane schema format to describe data structures independent of implementation can use that. Do you say that the specification document is implementation dependent ? If this is the case then maybe we should work on improving it instead of duplicating it. The benefit for the Avro serialization framework of having such a formal spec that is untangled from the wire-serialization specs is that all schemas defined by that schema model are compatible with the framework. What do you mean by "framework" here ? The differences are organization, scope, and language style (including keywords etc.). The expressed ruleset is the same. I don't think it is a good idea to add a second document that is very similar to the specification but uses a different language style. To me this looks like a duplication. IMO it would be better to suggest (many) (smaller) improvements for the existing document. Best Regards Clemens -Original Message- From: Martin Grigorov mailto:mgrigo...@apache.org>> Sent: Wednesday, May 15, 2024 9:26 AM To: d...@avro.apache.org<mailto:d...@avro.apache.org> Cc: user@avro.apache.org<mailto:user@avro.apache.org> Subject: Re: Formal spec for Avro Schema [Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org<mailto:mgrigo...@apache.org>. Weitere Informationen, warum dies wichtig ist, finden Sie unter https://aka.ms/LearnAboutSenderIdentification ] Hi Clemens, What is the difference between your document and the specification [1] ? I haven't read it completely but it looks very similar to the specification to me. 1. https://avro.apache.org/docs/1.11.1/specification/ 2. https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification - sources of the specification On Wed, May 15, 2024 at 9:28 AM Clemens Vasters mailto:cleme...@microsoft.com>.invalid> wrote: > I wrote a formal spec for the Avro Schema format. > > > > https://gist/ > .github.com<http://github.com/>%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02 > %7Cclemensv%40microsoft.com<http://40microsoft.com/>%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98 > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI > 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB > ec%3D=0 > > > > Where would that go in the repo? > > > > > > > <http://www/. > microsoft.com<http://microsoft.com/>%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A > 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com<http://40microsoft.com/>%7C5c > d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1% > 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ > YEAjqkSV
Re: Formal spec for Avro Schema
Hi Elliot, I am not sure which document you are referring to - the new proposal by Clemens or the official specification. Please start a new email thread or file a Jira ticket if you think something needs to be improved in the specification! On Wed, May 15, 2024 at 10:56 AM Elliot West wrote: > I note that the enum type appears to be missing the specification of the > default attribute. > > On Wed, 15 May 2024 at 08:26, Martin Grigorov > wrote: > >> Hi Clemens, >> >> What is the difference between your document and the specification [1] ? >> I haven't read it completely but it looks very similar to the >> specification to me. >> >> 1. https://avro.apache.org/docs/1.11.1/specification/ >> 2. >> https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification >> - sources of the specification >> >> On Wed, May 15, 2024 at 9:28 AM Clemens Vasters >> wrote: >> >>> I wrote a formal spec for the Avro Schema format. >>> >>> >>> >>> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333 >>> >>> >>> >>> Where would that go in the repo? >>> >>> >>> >>> >>> >>> >>> <http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA> >>> >>> *Clemens Vasters* >>> >>> Messaging Platform Architect >>> >>> Microsoft Azure >>> >>> È+49 151 44063557 >>> >>> * cleme...@microsoft.com >>> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | >>> 80539 Munich| Germany >>> <https://www.google.com/maps/search/Gew%C3%BCrzm%C3%BChlstrasse+11%C2%A0%7C+80539+Munich%7C+Germany?entry=gmail=g> >>> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff >>> Amtsgericht Aachen, HRB 12066 >>> >>> >>> >>> >>> >>
Re: Formal spec for Avro Schema
Hi Clemens, On Wed, May 15, 2024 at 11:18 AM Clemens Vasters wrote: > Hi Martin, > > we find Avro Schema to be a great fit for describing application data > structures in general and even independent of wire-serialization scenarios. > Therefore, I would like to have a spec that focuses specifically on the > schema format, is grounded in the IETF RFC specs, and which follows the > conventions set by IETF, so that folks who need a sane schema format to > describe data structures independent of implementation can use that. > Do you say that the specification document is implementation dependent ? If this is the case then maybe we should work on improving it instead of duplicating it. > > The benefit for the Avro serialization framework of having such a formal > spec that is untangled from the wire-serialization specs is that all > schemas defined by that schema model are compatible with the framework. > What do you mean by "framework" here ? > > The differences are organization, scope, and language style (including > keywords etc.). The expressed ruleset is the same. > I don't think it is a good idea to add a second document that is very similar to the specification but uses a different language style. To me this looks like a duplication. IMO it would be better to suggest (many) (smaller) improvements for the existing document. > > Best Regards > Clemens > > -Original Message- > From: Martin Grigorov > Sent: Wednesday, May 15, 2024 9:26 AM > To: d...@avro.apache.org > Cc: user@avro.apache.org > Subject: Re: Formal spec for Avro Schema > > [Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org. Weitere > Informationen, warum dies wichtig ist, finden Sie unter > https://aka.ms/LearnAboutSenderIdentification ] > > Hi Clemens, > > What is the difference between your document and the specification [1] ? > I haven't read it completely but it looks very similar to the > specification to me. > > 1. https://avro.apache.org/docs/1.11.1/specification/ > 2. > > https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification > - sources of the specification > > On Wed, May 15, 2024 at 9:28 AM Clemens Vasters > > wrote: > > > I wrote a formal spec for the Avro Schema format. > > > > > > > > https://gist/ > > .github.com%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02 > > %7Cclemensv%40microsoft.com%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98 > > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT > > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI > > 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB > > ec%3D=0 > > > > > > > > Where would that go in the repo? > > > > > > > > > > > > > > <http://www/. > > microsoft.com%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A > > 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com%7C5c > > d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1% > > 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ > > YEAjqkSVznt3N%2FKGjZzE%2BJietvHZuaiqVQYuDY%3D=0> > > > > *Clemens Vasters* > > > > Messaging Platform Architect > > > > Microsoft Azure > > > > È+49 151 44063557 > > > > * cleme...@microsoft.com > > European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | > > 80539 > > Munich| Germany > > Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff > > Amtsgericht Aachen, HRB 12066 > > > > > > > > > > >
RE: Formal spec for Avro Schema
Hi Martin, we find Avro Schema to be a great fit for describing application data structures in general and even independent of wire-serialization scenarios. Therefore, I would like to have a spec that focuses specifically on the schema format, is grounded in the IETF RFC specs, and which follows the conventions set by IETF, so that folks who need a sane schema format to describe data structures independent of implementation can use that. The benefit for the Avro serialization framework of having such a formal spec that is untangled from the wire-serialization specs is that all schemas defined by that schema model are compatible with the framework. The differences are organization, scope, and language style (including keywords etc.). The expressed ruleset is the same. Best Regards Clemens -Original Message- From: Martin Grigorov Sent: Wednesday, May 15, 2024 9:26 AM To: d...@avro.apache.org Cc: user@avro.apache.org Subject: Re: Formal spec for Avro Schema [Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org. Weitere Informationen, warum dies wichtig ist, finden Sie unter https://aka.ms/LearnAboutSenderIdentification ] Hi Clemens, What is the difference between your document and the specification [1] ? I haven't read it completely but it looks very similar to the specification to me. 1. https://avro.apache.org/docs/1.11.1/specification/ 2. https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification - sources of the specification On Wed, May 15, 2024 at 9:28 AM Clemens Vasters wrote: > I wrote a formal spec for the Avro Schema format. > > > > https://gist/ > .github.com%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02 > %7Cclemensv%40microsoft.com%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98 > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI > 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB > ec%3D=0 > > > > Where would that go in the repo? > > > > > > > <http://www/. > microsoft.com%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A > 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com%7C5c > d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1% > 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ > YEAjqkSVznt3N%2FKGjZzE%2BJietvHZuaiqVQYuDY%3D=0> > > *Clemens Vasters* > > Messaging Platform Architect > > Microsoft Azure > > È+49 151 44063557 > > * cleme...@microsoft.com > European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | > 80539 > Munich| Germany > Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff > Amtsgericht Aachen, HRB 12066 > > > > >
Re: Formal spec for Avro Schema
I note that the enum type appears to be missing the specification of the default attribute. On Wed, 15 May 2024 at 08:26, Martin Grigorov wrote: > Hi Clemens, > > What is the difference between your document and the specification [1] ? > I haven't read it completely but it looks very similar to the > specification to me. > > 1. https://avro.apache.org/docs/1.11.1/specification/ > 2. > https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification > - sources of the specification > > On Wed, May 15, 2024 at 9:28 AM Clemens Vasters > wrote: > >> I wrote a formal spec for the Avro Schema format. >> >> >> >> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333 >> >> >> >> Where would that go in the repo? >> >> >> >> >> >> >> <http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA> >> >> *Clemens Vasters* >> >> Messaging Platform Architect >> >> Microsoft Azure >> >> È+49 151 44063557 >> >> * cleme...@microsoft.com >> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 >> Munich| Germany >> <https://www.google.com/maps/search/Gew%C3%BCrzm%C3%BChlstrasse+11%C2%A0%7C+80539+Munich%7C+Germany?entry=gmail=g> >> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff >> Amtsgericht Aachen, HRB 12066 >> >> >> >> >> >
Re: Formal spec for Avro Schema
Hi Clemens, What is the difference between your document and the specification [1] ? I haven't read it completely but it looks very similar to the specification to me. 1. https://avro.apache.org/docs/1.11.1/specification/ 2. https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification - sources of the specification On Wed, May 15, 2024 at 9:28 AM Clemens Vasters wrote: > I wrote a formal spec for the Avro Schema format. > > > > https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333 > > > > Where would that go in the repo? > > > > > > > <http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA> > > *Clemens Vasters* > > Messaging Platform Architect > > Microsoft Azure > > È+49 151 44063557 > > * cleme...@microsoft.com > European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 > Munich| Germany > Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff > Amtsgericht Aachen, HRB 12066 > > > > >
Formal spec for Avro Schema
I wrote a formal spec for the Avro Schema format. https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333 Where would that go in the repo? [cid:image001.jpg@01DAA6A1.96E35FC0]<http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA> Clemens Vasters Messaging Platform Architect Microsoft Azure È+49 151 44063557 * cleme...@microsoft.com<mailto:cleme...@microsoft.com> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 Munich| Germany Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff Amtsgericht Aachen, HRB 12066