from:"๏̯͡๏"

Re: Formal spec for Avro Schema

2024-05-22 Thread Oscar Westra van Holthe - Kind

Hi everyone,

A bit late, but I though I’d add a few thoughts on this as well.

For one, I love the idea of improving our documentation. Separating the schema 
specification from the encoding makes perfect sense to me, and also allows us 
to clearly state which encoding to use. Currently, most problems I see in 
online questions arise from using raw / internal encodings, which I think is an 
easy problem to prevent.

As to the specification, I think it’s a good start. Some things I really like:

the introduction of an Avro Schema Document, limiting top types to a (union of) 
named type(s)
an explicit "no imports" rule to ensure a schema document is self-contained

There are some things I think we can improve, such as explicitly mentioning all 
attributes (the ’type’ attribute is not introduced), fixing a few errors in the 
document, etc. I’ve taken the liberty of doing so.

One notable addition is a de facto requirement: that names and aliases must be 
unique within their context.

I’ve put my changes in a fork of Clemens gist: 
https://gist.github.com/opwvhk/38481bf19a175a86c703d8ba0ab08866


As a followup to this schema specification, we can make specifications for the 
binary encoding (warning to never use it directly), Avro files, the 
Single-Object encoding, protocols, the protocol wire format, and the IDL schema 
and protocol definitions.


Kind regards.
Oscar


-- 
Oscar Westra van Holthe - Kind 

> On 15 May 2024, at 11:17, Clemens Vasters via user  
> wrote:
> 
> Hi Martin,
>  
> I am saying that the specification of the schema is currently entangled with 
> the specification of the serialization framework. Avro Schema is useful and 
> usable even if you never touch the Avro binaries (the framework, an 
> implementation using the spec). 
>  
> I am indeed proposing to separate the schema spec from the specs of the Avro 
> binary encoding and the Avro JSON encoding, which also avoids strange 
> entanglements like the JSON encoding pointing to the schema description’s 
> default values section, which is in itself rather lacking in precision, i.e. 
> the encoding rule for binary or fixed is “defined” with a rather terse 
> example: "\u00ff"
>  
> Microsoft would like to propose Avro and Avro Schema in several 
> standardization efforts, but we need a spec that works in those contexts and 
> that can stand on its own. I would also like to see “application/avro” as a 
> formal media type, but the route towards that only goes via formal 
> standardization of both schema and encodings.
>  
> I believe the Avro project’s reach and importance is such that schema and 
> encodings should have formal specs that can stand on their own as JSON and 
> CBOR and AMQP and XML and OPC/Binary and other serialization schemas/formats 
> do. I don’t think existence of a formal spec gets in the way of progress and 
> Avro is so mature that the spec captures a fairly stable state.
>  
> Best Regards
> Clemens
>  
> From: Martin Grigorov 
> Sent: Wednesday, May 15, 2024 10:54 AM
> To: d...@avro.apache.org
> Cc: user@avro.apache.org
> Subject: Re: Formal spec for Avro Schema
>  
> Hi Clemens,
>  
> On Wed, May 15, 2024 at 11:18 AM Clemens Vasters 
> mailto:cleme...@microsoft.com.invalid>> 
> wrote:
> Hi Martin,
> 
> we find Avro Schema to be a great fit for describing application data 
> structures in general and even independent of wire-serialization scenarios.
> 
> Therefore, I would like to have a spec that focuses specifically on the 
> schema format, is grounded in the IETF RFC specs, and which follows the 
> conventions set by IETF, so that folks who need a sane schema format to 
> describe data structures independent of implementation can use that.
>  
> Do you say that the specification document is implementation dependent ?
> If this is the case then maybe we should work on improving it instead of 
> duplicating it.
>  
> 
> The benefit for the Avro serialization framework of having such a formal spec 
> that is untangled from the wire-serialization specs is that all schemas 
> defined by that schema model are compatible with the framework.
>  
> What do you mean by "framework" here ?
>  
> 
> The differences are organization, scope, and language style (including 
> keywords etc.). The expressed ruleset is the same.
>  
> I don't think it is a good idea to add a second document that is very similar 
> to the specification but uses a different language style.
> To me this looks like a duplication.
> IMO it would be better to suggest (many) (smaller) improvements for the 
> existing document. 
>  
>  
> 
> Best Regards
> Clemens
> 
> -Original Message-
> From: Martin Grigorov mailto:mgrigo...@apache.org>>
> Sent: Wednesday, May 15, 2024 9:26 AM
> To: d...@avro.apache.org 
> Cc: user@avro.apache.org 
> Subject: Re: Formal spec for Avro Schema
> 
> [Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org 
> . Weitere

Re: Schema version in namespace? good practice

2024-05-16 Thread Oscar Westra van Holthe - Kind

Hi,

As far as best practices are concerned, I have never seen any that change
the names and/or namespace of a schema. But that is also because of
generating code from the Avro schema.

What I usually see is adding an extra property to the root schema, named
"version" or similar, combined with a version name in the .avsc file name.

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

Op do 16 mei 2024 07:48 schreef Vignesh Kumar Kathiresan via user <
user@avro.apache.org>:

> Hi,
>
> I am new to avro and started working on it recently. I am in the process
> of designing a schema evolution process. We use java applications and make
> use of maven plugin to auto generate the classes from .avsc schema files.
>
> I am thinking of adding the version in the namespace during each evolution
> as compatibility is based on unqualified-name only according to
> specification. This is because I can now have a central library which keeps
> track of all the versions and all the client applications can just import
> the library and use different versions of schemas(java classes). Instead of
> every client importing the required schema files and auto-generating at
> their end every time they are upgrading to a newer version of the schema.
> Is this a good practice to include version_id in the namespace?
>
> Also we use a schema registry with full compatibility checks on.
>
>  Thanks,
> Vignesh
>

Schema version in namespace? good practice

2024-05-15 Thread Vignesh Kumar Kathiresan via user

Hi,

I am new to avro and started working on it recently. I am in the process of
designing a schema evolution process. We use java applications and make use
of maven plugin to auto generate the classes from .avsc schema files.

I am thinking of adding the version in the namespace during each evolution
as compatibility is based on unqualified-name only according to
specification. This is because I can now have a central library which keeps
track of all the versions and all the client applications can just import
the library and use different versions of schemas(java classes). Instead of
every client importing the required schema files and auto-generating at
their end every time they are upgrading to a newer version of the schema.
Is this a good practice to include version_id in the namespace?

Also we use a schema registry with full compatibility checks on.

 Thanks,
Vignesh

RE: Formal spec for Avro Schema

2024-05-15 Thread Clemens Vasters via user

Hi Martin,

I am saying that the specification of the schema is currently entangled with 
the specification of the serialization framework. Avro Schema is useful and 
usable even if you never touch the Avro binaries (the framework, an 
implementation using the spec).

I am indeed proposing to separate the schema spec from the specs of the Avro 
binary encoding and the Avro JSON encoding, which also avoids strange 
entanglements like the JSON encoding pointing to the schema description’s 
default values section, which is in itself rather lacking in precision, i.e. 
the encoding rule for binary or fixed is “defined” with a rather terse example: 
"\u00ff"

Microsoft would like to propose Avro and Avro Schema in several standardization 
efforts, but we need a spec that works in those contexts and that can stand on 
its own. I would also like to see “application/avro” as a formal media type, 
but the route towards that only goes via formal standardization of both schema 
and encodings.

I believe the Avro project’s reach and importance is such that schema and 
encodings should have formal specs that can stand on their own as JSON and CBOR 
and AMQP and XML and OPC/Binary and other serialization schemas/formats do. I 
don’t think existence of a formal spec gets in the way of progress and Avro is 
so mature that the spec captures a fairly stable state.

Best Regards
Clemens

From: Martin Grigorov 
Sent: Wednesday, May 15, 2024 10:54 AM
To: d...@avro.apache.org
Cc: user@avro.apache.org
Subject: Re: Formal spec for Avro Schema

Hi Clemens,

On Wed, May 15, 2024 at 11:18 AM Clemens Vasters 
mailto:cleme...@microsoft.com.invalid>> wrote:
Hi Martin,

we find Avro Schema to be a great fit for describing application data 
structures in general and even independent of wire-serialization scenarios.

Therefore, I would like to have a spec that focuses specifically on the schema 
format, is grounded in the IETF RFC specs, and which follows the conventions 
set by IETF, so that folks who need a sane schema format to describe data 
structures independent of implementation can use that.

Do you say that the specification document is implementation dependent ?
If this is the case then maybe we should work on improving it instead of 
duplicating it.

The benefit for the Avro serialization framework of having such a formal spec 
that is untangled from the wire-serialization specs is that all schemas defined 
by that schema model are compatible with the framework.

What do you mean by "framework" here ?

The differences are organization, scope, and language style (including keywords 
etc.). The expressed ruleset is the same.

I don't think it is a good idea to add a second document that is very similar 
to the specification but uses a different language style.
To me this looks like a duplication.
IMO it would be better to suggest (many) (smaller) improvements for the 
existing document.

Best Regards
Clemens

-Original Message-
From: Martin Grigorov mailto:mgrigo...@apache.org>>
Sent: Wednesday, May 15, 2024 9:26 AM
To: d...@avro.apache.org
Cc: user@avro.apache.org
Subject: Re: Formal spec for Avro Schema

[Sie erhalten nicht häufig E-Mails von 
mgrigo...@apache.org. Weitere Informationen, warum 
dies wichtig ist, finden Sie unter 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Clemens,

What is the difference between your document and the specification [1] ?
I haven't read it completely but it looks very similar to the specification to 
me.

1. https://avro.apache.org/docs/1.11.1/specification/
2.
https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
- sources of the specification

On Wed, May 15, 2024 at 9:28 AM Clemens Vasters 
mailto:cleme...@microsoft.com>.invalid> wrote:

> I wrote a formal spec for the Avro Schema format.
>
>
>
> https://gist/
> .github.com%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02
> %7Cclemensv%40microsoft.com%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98
> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB
> ec%3D=0
>
>
>
> Where would that go in the repo?
>
>
>
>
>
>
>  microsoft.com%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A
> 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com%7C5c
> d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ
> YEAjqkSVznt3N%2FKGjZzE%2BJietvHZuaiqVQYuDY%3D=0>
>
> *Clemens Vasters*
>
> Messaging Platform Architect
>
> Microsoft Azure
>
> È+49 151 44063557
>
> *

Re: Formal spec for Avro Schema

2024-05-15 Thread Martin Grigorov

Hi Elliot,

I am not sure which document you are referring to - the new proposal by
Clemens or the official specification.
Please start a new email thread or file a Jira ticket if you think
something needs to be improved in the specification!

On Wed, May 15, 2024 at 10:56 AM Elliot West  wrote:

> I note that the enum type appears to be missing the specification of the
> default attribute.
>
> On Wed, 15 May 2024 at 08:26, Martin Grigorov 
> wrote:
>
>> Hi Clemens,
>>
>> What is the difference between your document and the specification [1] ?
>> I haven't read it completely but it looks very similar to the
>> specification to me.
>>
>> 1. https://avro.apache.org/docs/1.11.1/specification/
>> 2.
>> https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
>> - sources of the specification
>>
>> On Wed, May 15, 2024 at 9:28 AM Clemens Vasters
>>  wrote:
>>
>>> I wrote a formal spec for the Avro Schema format.
>>>
>>>
>>>
>>> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333
>>>
>>>
>>>
>>> Where would that go in the repo?
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>>
>>> *Clemens Vasters*
>>>
>>> Messaging Platform Architect
>>>
>>> Microsoft Azure
>>>
>>> È+49 151 44063557
>>>
>>> *  cleme...@microsoft.com
>>> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 |
>>> 80539 Munich| Germany
>>> 
>>> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
>>> Amtsgericht Aachen, HRB 12066
>>>
>>>
>>>
>>>
>>>
>>

Re: Formal spec for Avro Schema

2024-05-15 Thread Martin Grigorov

Hi Clemens,

On Wed, May 15, 2024 at 11:18 AM Clemens Vasters
 wrote:

> Hi Martin,
>
> we find Avro Schema to be a great fit for describing application data
> structures in general and even independent of wire-serialization scenarios.


> Therefore, I would like to have a spec that focuses specifically on the
> schema format, is grounded in the IETF RFC specs, and which follows the
> conventions set by IETF, so that folks who need a sane schema format to
> describe data structures independent of implementation can use that.
>

Do you say that the specification document is implementation dependent ?
If this is the case then maybe we should work on improving it instead of
duplicating it.


>
> The benefit for the Avro serialization framework of having such a formal
> spec that is untangled from the wire-serialization specs is that all
> schemas defined by that schema model are compatible with the framework.
>

What do you mean by "framework" here ?


>
> The differences are organization, scope, and language style (including
> keywords etc.). The expressed ruleset is the same.
>

I don't think it is a good idea to add a second document that is very
similar to the specification but uses a different language style.
To me this looks like a duplication.
IMO it would be better to suggest (many) (smaller) improvements for the
existing document.



>
> Best Regards
> Clemens
>
> -Original Message-
> From: Martin Grigorov 
> Sent: Wednesday, May 15, 2024 9:26 AM
> To: d...@avro.apache.org
> Cc: user@avro.apache.org
> Subject: Re: Formal spec for Avro Schema
>
> [Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org. Weitere
> Informationen, warum dies wichtig ist, finden Sie unter
> https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Clemens,
>
> What is the difference between your document and the specification [1] ?
> I haven't read it completely but it looks very similar to the
> specification to me.
>
> 1. https://avro.apache.org/docs/1.11.1/specification/
> 2.
>
> https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
> - sources of the specification
>
> On Wed, May 15, 2024 at 9:28 AM Clemens Vasters 
> 
> wrote:
>
> > I wrote a formal spec for the Avro Schema format.
> >
> >
> >
> > https://gist/
> > .github.com%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02
> > %7Cclemensv%40microsoft.com%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98
> > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT
> > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB
> > ec%3D=0
> >
> >
> >
> > Where would that go in the repo?
> >
> >
> >
> >
> >
> >
> >  > microsoft.com%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A
> > 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com%7C5c
> > d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> > 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ
> > YEAjqkSVznt3N%2FKGjZzE%2BJietvHZuaiqVQYuDY%3D=0>
> >
> > *Clemens Vasters*
> >
> > Messaging Platform Architect
> >
> > Microsoft Azure
> >
> > È+49 151 44063557
> >
> > *  cleme...@microsoft.com
> > European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 |
> > 80539
> > Munich| Germany
> > Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
> > Amtsgericht Aachen, HRB 12066
> >
> >
> >
> >
> >
>

RE: Formal spec for Avro Schema

2024-05-15 Thread Clemens Vasters via user

Hi Martin,

we find Avro Schema to be a great fit for describing application data 
structures in general and even independent of wire-serialization scenarios.

Therefore, I would like to have a spec that focuses specifically on the schema 
format, is grounded in the IETF RFC specs, and which follows the conventions 
set by IETF, so that folks who need a sane schema format to describe data 
structures independent of implementation can use that.

The benefit for the Avro serialization framework of having such a formal spec 
that is untangled from the wire-serialization specs is that all schemas defined 
by that schema model are compatible with the framework.

The differences are organization, scope, and language style (including keywords 
etc.). The expressed ruleset is the same.

Best Regards
Clemens

-Original Message-
From: Martin Grigorov 
Sent: Wednesday, May 15, 2024 9:26 AM
To: d...@avro.apache.org
Cc: user@avro.apache.org
Subject: Re: Formal spec for Avro Schema

[Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org. Weitere 
Informationen, warum dies wichtig ist, finden Sie unter 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Clemens,

What is the difference between your document and the specification [1] ?
I haven't read it completely but it looks very similar to the specification to 
me.

1. https://avro.apache.org/docs/1.11.1/specification/
2.
https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
- sources of the specification

On Wed, May 15, 2024 at 9:28 AM Clemens Vasters 
 wrote:

> I wrote a formal spec for the Avro Schema format.
>
>
>
> https://gist/
> .github.com%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02
> %7Cclemensv%40microsoft.com%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98
> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB
> ec%3D=0
>
>
>
> Where would that go in the repo?
>
>
>
>
>
>
>  microsoft.com%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A
> 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com%7C5c
> d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ
> YEAjqkSVznt3N%2FKGjZzE%2BJietvHZuaiqVQYuDY%3D=0>
>
> *Clemens Vasters*
>
> Messaging Platform Architect
>
> Microsoft Azure
>
> È+49 151 44063557
>
> *  cleme...@microsoft.com
> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 |
> 80539
> Munich| Germany
> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
> Amtsgericht Aachen, HRB 12066
>
>
>
>
>

Re: Formal spec for Avro Schema

2024-05-15 Thread Elliot West

I note that the enum type appears to be missing the specification of the
default attribute.

On Wed, 15 May 2024 at 08:26, Martin Grigorov  wrote:

> Hi Clemens,
>
> What is the difference between your document and the specification [1] ?
> I haven't read it completely but it looks very similar to the
> specification to me.
>
> 1. https://avro.apache.org/docs/1.11.1/specification/
> 2.
> https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
> - sources of the specification
>
> On Wed, May 15, 2024 at 9:28 AM Clemens Vasters
>  wrote:
>
>> I wrote a formal spec for the Avro Schema format.
>>
>>
>>
>> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333
>>
>>
>>
>> Where would that go in the repo?
>>
>>
>>
>>
>>
>>
>> 
>>
>> *Clemens Vasters*
>>
>> Messaging Platform Architect
>>
>> Microsoft Azure
>>
>> È+49 151 44063557
>>
>> *  cleme...@microsoft.com
>> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539
>> Munich| Germany
>> 
>> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
>> Amtsgericht Aachen, HRB 12066
>>
>>
>>
>>
>>
>

Re: Formal spec for Avro Schema

2024-05-15 Thread Martin Grigorov

Hi Clemens,

What is the difference between your document and the specification [1] ?
I haven't read it completely but it looks very similar to the specification
to me.

1. https://avro.apache.org/docs/1.11.1/specification/
2.
https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
- sources of the specification

On Wed, May 15, 2024 at 9:28 AM Clemens Vasters
 wrote:

> I wrote a formal spec for the Avro Schema format.
>
>
>
> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333
>
>
>
> Where would that go in the repo?
>
>
>
>
>
>
> 
>
> *Clemens Vasters*
>
> Messaging Platform Architect
>
> Microsoft Azure
>
> È+49 151 44063557
>
> *  cleme...@microsoft.com
> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539
> Munich| Germany
> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
> Amtsgericht Aachen, HRB 12066
>
>
>
>
>

Formal spec for Avro Schema

2024-05-15 Thread Clemens Vasters via user

I wrote a formal spec for the Avro Schema format.

https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333

Where would that go in the repo?


[cid:image001.jpg@01DAA6A1.96E35FC0]
Clemens Vasters
Messaging Platform Architect
Microsoft Azure
È+49 151 44063557
*  cleme...@microsoft.com
European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 
Munich| Germany
Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
Amtsgericht Aachen, HRB 12066

Re: Avro JSON Encoding

2024-04-25 Thread Jean-Baptiste Onofré

Hi Clemens

Yeah it makes sense. I will be back from the US next week so I will have
time to work with you on that and also move forward on the releases.

Thanks !
Regards
JB

Le mer. 24 avr. 2024 à 07:03, Clemens Vasters  a
écrit :

> Hi JB,
>
> since there seems to be interest in the group even if not full consensus
> on the scope, I propose that I open an umbrella issue on this with more
> specific focus on the "what"/"how" more than the "why" as I did in the
> opening email, which can then be broken down into individual feature
> issues. I can work on that early next week.
>
> Best Regards
> Clemens
>
> --
> *Von:* Jean-Baptiste Onofré 
> *Gesendet:* Donnerstag, April 18, 2024 10:58 AM
> *An:* Clemens Vasters 
> *Cc:* Jean-Baptiste Onofré ; user@avro.apache.org <
> user@avro.apache.org>
> *Betreff:* Re: Avro JSON Encoding
>
> Hi Clemens,
>
> I propose to wait a bit to give a chance to the community to review
> your email and points.
>
> Then, we will create the Jira accordingly.
>
> Regards
> JB
>
> On Thu, Apr 18, 2024 at 9:20 AM Clemens Vasters 
> wrote:
> >
> > Hi JB,
> >
> >
> >
> > I have not done that yet. I’m happy to break that up into items once I
> get the sense that this is a direction we can get to a consensus on.
> >
> >
> >
> > Shall I file the whole email as a “New Feature” issue first?
> >
> >
> >
> > Thanks
> >
> > Clemens
> >
> >
> >
> > From: Jean-Baptiste Onofré 
> > Sent: Thursday, April 18, 2024 10:17 AM
> > To: Clemens Vasters ; user@avro.apache.org
> > Subject: Re: Avro JSON Encoding
> >
> >
> >
> > Hi Clemens
> >
> >
> >
> > Thanks for the detailed email.
> >
> >
> >
> > Quick question : did you already create Jira about each
> improvements/issues ?
> >
> >
> >
> > I will take the time to read asap.
> >
> >
> >
> > Thanks
> >
> > Regards
> >
> > JB
> >
> >
> >
> > Le jeu. 18 avr. 2024 à 09:12, Clemens Vasters via user <
> user@avro.apache.org> a écrit :
> >
> > Hi everyone,
> >
> >
> >
> > the current JSON Encoding approach severely limits interoperability with
> other JSON serialization frameworks. In my view, the JSON Encoding is only
> really useful if it acts as a bridge into and from JSON-centric
> applications and it currently gets in its own way.
> >
> >
> >
> > The current encoding being what it is, there should be an alternate mode
> that emphasizes interoperability with JSON “as-is” and allows Avro Schema
> to describe existing JSON document instances such that I can take someone’s
> existing JSON document in on one side of a piece of software and emit Avro
> binary on the other side while acting on the same schema.
> >
> >
> >
> > There are four specific issues:
> >
> >
> >
> > Binary Values
> > Unions with Primitive Type Values and Enum Values
> > Unions with Record Values
> > DateTime
> >
> >
> >
> > One by one:
> >
> >
> >
> > 1. Binary values:
> >
> > -
> >
> >
> >
> > Binary values are (fixed and bytes) are encoded as escaped unicode
> literals. While I appreciate the creative trick, it costs 6 bytes for each
> encoded byte. I have a hard time finding any JSON libraries that provide a
> conversion of such strings from/to byte arrays, so this approach appears to
> be idiosyncratic for Avro’s JSON Encoding.
> >
> >
> >
> > The common way to encode binary in JSON is to use base64 encoding and
> that is widely and well supported in libraries. Base64 is 33% larger than
> plain bytes, the encoding chosen here is 500% (!) larger than plain bytes.
> >
> >
> >
> > The Avro decoder is schema-informed and it knows that a field is
> expected to hold bytes, so it’s easy to mandate base64 for the field
> content in the alternate mode.
> >
> >
> >
> > 2. Unions with Primitive Type Values and Enum Values
> >
> > -
> >
> >
> >
> > It’s common to express optionality in Avro Schema by creating a union
> with the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts
> to encode such unions, like any union, as { “{type}”: {value} } when the
> value is non-null.
> >
> >
> >
> > This choice ignores common practice and the fact that JSON’s values are
> dynamically typed (RFC8259 Section-3) and inherently accommodate unions.
> The conformant way to encode a value choice of null or “string” into a JSON
> value is plainly null and “string”.
> >
> >
> >
> > “foo” : null
> >
> > “foo”: “value”
> >
> >
> >
> > The “field default values” table in the Avro spec maps Avro types to the
> JSON types null, boolean, integer, number, string, object, and array, all
> of which can be encoded into and, more importantly, unambiguously decoded
> from a JSON value. The only semi-ambiguous case is integer vs. number,
> which is a convention in JSON rather than a distinct type, but any Avro
> serializer is guided by type information and can easily make that
> distinction.
> >
> >
> >
> > 3. Unions with Record Values
> >
> > -
> >
> >
> >
> > The JSON Encoding pattern of unions also covers “record” typed

Re: Avro JSON Encoding

2024-04-24 Thread Clemens Vasters via user

Hi JB,

since there seems to be interest in the group even if not full consensus on the 
scope, I propose that I open an umbrella issue on this with more specific focus 
on the "what"/"how" more than the "why" as I did in the opening email, which 
can then be broken down into individual feature issues. I can work on that 
early next week.

Best Regards
Clemens


Von: Jean-Baptiste Onofré 
Gesendet: Donnerstag, April 18, 2024 10:58 AM
An: Clemens Vasters 
Cc: Jean-Baptiste Onofré ; user@avro.apache.org 

Betreff: Re: Avro JSON Encoding

Hi Clemens,

I propose to wait a bit to give a chance to the community to review
your email and points.

Then, we will create the Jira accordingly.

Regards
JB

On Thu, Apr 18, 2024 at 9:20 AM Clemens Vasters  wrote:
>
> Hi JB,
>
>
>
> I have not done that yet. I’m happy to break that up into items once I get 
> the sense that this is a direction we can get to a consensus on.
>
>
>
> Shall I file the whole email as a “New Feature” issue first?
>
>
>
> Thanks
>
> Clemens
>
>
>
> From: Jean-Baptiste Onofré 
> Sent: Thursday, April 18, 2024 10:17 AM
> To: Clemens Vasters ; user@avro.apache.org
> Subject: Re: Avro JSON Encoding
>
>
>
> Hi Clemens
>
>
>
> Thanks for the detailed email.
>
>
>
> Quick question : did you already create Jira about each improvements/issues ?
>
>
>
> I will take the time to read asap.
>
>
>
> Thanks
>
> Regards
>
> JB
>
>
>
> Le jeu. 18 avr. 2024 à 09:12, Clemens Vasters via user  
> a écrit :
>
> Hi everyone,
>
>
>
> the current JSON Encoding approach severely limits interoperability with 
> other JSON serialization frameworks. In my view, the JSON Encoding is only 
> really useful if it acts as a bridge into and from JSON-centric applications 
> and it currently gets in its own way.
>
>
>
> The current encoding being what it is, there should be an alternate mode that 
> emphasizes interoperability with JSON “as-is” and allows Avro Schema to 
> describe existing JSON document instances such that I can take someone’s 
> existing JSON document in on one side of a piece of software and emit Avro 
> binary on the other side while acting on the same schema.
>
>
>
> There are four specific issues:
>
>
>
> Binary Values
> Unions with Primitive Type Values and Enum Values
> Unions with Record Values
> DateTime
>
>
>
> One by one:
>
>
>
> 1. Binary values:
>
> -
>
>
>
> Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
> While I appreciate the creative trick, it costs 6 bytes for each encoded 
> byte. I have a hard time finding any JSON libraries that provide a conversion 
> of such strings from/to byte arrays, so this approach appears to be 
> idiosyncratic for Avro’s JSON Encoding.
>
>
>
> The common way to encode binary in JSON is to use base64 encoding and that is 
> widely and well supported in libraries. Base64 is 33% larger than plain 
> bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>
>
>
> The Avro decoder is schema-informed and it knows that a field is expected to 
> hold bytes, so it’s easy to mandate base64 for the field content in the 
> alternate mode.
>
>
>
> 2. Unions with Primitive Type Values and Enum Values
>
> -
>
>
>
> It’s common to express optionality in Avro Schema by creating a union with 
> the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to 
> encode such unions, like any union, as { “{type}”: {value} } when the value 
> is non-null.
>
>
>
> This choice ignores common practice and the fact that JSON’s values are 
> dynamically typed (RFC8259 Section-3) and inherently accommodate unions. The 
> conformant way to encode a value choice of null or “string” into a JSON value 
> is plainly null and “string”.
>
>
>
> “foo” : null
>
> “foo”: “value”
>
>
>
> The “field default values” table in the Avro spec maps Avro types to the JSON 
> types null, boolean, integer, number, string, object, and array, all of which 
> can be encoded into and, more importantly, unambiguously decoded from a JSON 
> value. The only semi-ambiguous case is integer vs. number, which is a 
> convention in JSON rather than a distinct type, but any Avro serializer is 
> guided by type information and can easily make that distinction.
>
>
>
> 3. Unions with Record Values
>
> -
>
>
>
> The JSON Encoding pattern of unions also covers “record” typed values, of 
> course, and this is indeed a tricky scenario during deserialization since 
> JSON does not have any built-in notion of type hints for “object” typed 
> values.
>
>
>
> The problem of having to disambiguate instances of different types in a field 
> value is a common one also for users of JSON Schema when using the “oneOf” 
> construct, which is equivalent to Avro unions. There are two common 
> strategies:
>
>
>
> - “Duck Typing”:  Every conformant JSON Schema Validator determines the 
> validity of a JSON node against a “oneOf" rule by testing the

Re: Avro JSON Encoding

2024-04-23 Thread Oscar Westra van Holthe - Kind

Hi,

Using a JSON encoding to bridge Avro to/from JSON is indeed a good idea.

But the systems still need to talk the same “language” (data structure). I 
rarely encounter systems that allow fully free-form objects (and never in 
production); there’s always some data structure behind it. This data structure 
(dict/struct/record/…) limits what can be transferred, and in at least 19 out 
of 20 cases covers records (with required fields), arrays and 
primitives.

The cases where I do see more complex data structures, the ones that use the 
more advanced XML features (like mixing namespaces), free-form JSON options 
(like mixing fixed properties with patternProperties/additionalProperties), are 
generally so much tied to that format that bridging to Avro makes little to no 
sense.

That doesn’t mean that there is no use case, but it does mean that you’re more 
helped with a dedicated parser that emits Avro records. Kind of like I’ve 
dabbled a bit with: https://github.com/opwvhk/avro-conversions


Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

> On 23 Apr 2024, at 15:40, Clemens Vasters via user  
> wrote:
> 
> I don't think you get around either maps or unions for a model where Avro 
> Schema can describe a JSON originating from an existing producer that isn't 
> aware of Avro Schema being used by the consumer. That is the test I would 
> apply for whether the encoder (or decoder in this case) is practically 
> useful. Avro Binary sufficiently covers the scenario where both parties are 
> known to be implemented with Avro. JSON is primarily useful as a bridge 
> to/from producers and consumers which do not use Avro bits and thus likely 
> not Avro Schema.
> Von: Oscar Westra van Holthe - Kind 
> Gesendet: Tuesday, April 23, 2024 1:10:16 PM
> An: user@avro.apache.org 
> Betreff: Re: Avro JSON Encoding
>  
> Sie erhalten nicht oft eine E-Mail von opw...@apache.org. Erfahren Sie, warum 
> dies wichtig ist 
> Hi everyone,
> 
> Having looked a bit more into what I usually see when using JSON to transfer 
> data, I think we can limit cross-format support (what this essentially is) to 
> a common denominator as we can see between Python objects / dicts, Rust 
> structs, Java POJOs/records, and Parquet MessageTypes, just to name a few.
> 
> This essentially boils down to all Avro constructs except for maps and unions 
> other than a single type plus null (i.e., the recent ‘?’ addition to the IDL 
> syntax). It also means we can omit support for most of the esoteric JSON 
> schema constructs, like additionalProperties, patternProperties, 
> if/then/else, etc.
> 
> 
> However, as Ryan noted, it still makes sense to find a way to promote the 
> Avro binary format. Especially the single-message encoding, I’d add: most 
> questions to use JSON, for example, are for single records. Currently 
> however, only the Rust and Java SDKs mention the byte marker for the 
> single-message encoding at all. It’s very much lacking from Python.
> 
> In fact, if we want to promote the use of Avro (and especially its binary 
> format), we must have a better documentation and implementation of the 
> single-message encoding.
> 
> 
> Kind regards,
> Oscar
> 
> -- 
> Oscar Westra van Holthe - Kind 
> 
>> On 19 Apr 2024, at 23:45, Andrew Otto  wrote:
>> 
>> > There's probably a nice balance between a rigorous and interoperable (but 
>> > less customizable) JSON encoding, and trying to accommodate arbitrary JSON 
>> > in the Avro project.
>> 
>> For my own purposes, I'd only need a very limited set of JSON support. For 
>> event streaming, we limit JSONSchema usages to those that can be easily and 
>> explicitly mapped to SQL (Hive, Spark, Flink) type systems. e.g. No 
>> undefined additionalProperties 
>> ,
>>  no union types 
>> ,
>>  etc. etc.  
>> 
>> 
>> 
>> 
>> On Fri, Apr 19, 2024 at 11:58 AM Ryan Skraba > > wrote:
>> Hello!
>> 
>> A bit tongue in cheek: the one advantage of the current Avro JSON
>> encoding is that it drives users rapidly to prefer the binary
>> encoding!  In its current state, Avro isn't really a satisfactory
>> toolkit for JSON interoperability, while it shines for binary
>> interoperability. Using JSON with Avro schemas is pretty unwieldy and
>> a JSON data designer will almost never be entirely satisfied with the
>> JSON "shape" they can get... today it's useful for testing and
>> debugging.
>> 
>> That being said, it's hard to argue with improving this experience
>> where it can help developers that really want to use Avro JSON for
>> data transfer, especially for things accepting JSON where the
>> intention is clearly unambiguous or allowing optional attributes to be
>> missing.  I'd be enthusiastic to see some of these

Re: Avro JSON Encoding

2024-04-23 Thread Clemens Vasters via user

I don't think you get around either maps or unions for a model where Avro 
Schema can describe a JSON originating from an existing producer that isn't 
aware of Avro Schema being used by the consumer. That is the test I would apply 
for whether the encoder (or decoder in this case) is practically useful. Avro 
Binary sufficiently covers the scenario where both parties are known to be 
implemented with Avro. JSON is primarily useful as a bridge to/from producers 
and consumers which do not use Avro bits and thus likely not Avro Schema.

Von: Oscar Westra van Holthe - Kind 
Gesendet: Tuesday, April 23, 2024 1:10:16 PM
An: user@avro.apache.org 
Betreff: Re: Avro JSON Encoding

Sie erhalten nicht oft eine E-Mail von opw...@apache.org. Erfahren Sie, warum 
dies wichtig ist
Hi everyone,

Having looked a bit more into what I usually see when using JSON to transfer 
data, I think we can limit cross-format support (what this essentially is) to a 
common denominator as we can see between Python objects / dicts, Rust structs, 
Java POJOs/records, and Parquet MessageTypes, just to name a few.

This essentially boils down to all Avro constructs except for maps and unions 
other than a single type plus null (i.e., the recent ‘?’ addition to the IDL 
syntax). It also means we can omit support for most of the esoteric JSON schema 
constructs, like additionalProperties, patternProperties, if/then/else, etc.

However, as Ryan noted, it still makes sense to find a way to promote the Avro 
binary format. Especially the single-message encoding, I’d add: most questions 
to use JSON, for example, are for single records. Currently however, only the 
Rust and Java SDKs mention the byte marker for the single-message encoding at 
all. It’s very much lacking from Python.

In fact, if we want to promote the use of Avro (and especially its binary 
format), we must have a better documentation and implementation of the 
single-message encoding.

Kind regards,
Oscar

--
Oscar Westra van Holthe - Kind 

On 19 Apr 2024, at 23:45, Andrew Otto  wrote:

> There's probably a nice balance between a rigorous and interoperable (but 
> less customizable) JSON encoding, and trying to accommodate arbitrary JSON in 
> the Avro project.

For my own purposes, I'd only need a very limited set of JSON support. For 
event streaming, we limit JSONSchema usages to those that can be easily and 
explicitly mapped to SQL (Hive, Spark, Flink) type systems. e.g. No undefined 
additionalProperties,
 no union 
types,
 etc. etc.

On Fri, Apr 19, 2024 at 11:58 AM Ryan Skraba 
mailto:r...@skraba.com>> wrote:
Hello!

A bit tongue in cheek: the one advantage of the current Avro JSON
encoding is that it drives users rapidly to prefer the binary
encoding!  In its current state, Avro isn't really a satisfactory
toolkit for JSON interoperability, while it shines for binary
interoperability. Using JSON with Avro schemas is pretty unwieldy and
a JSON data designer will almost never be entirely satisfied with the
JSON "shape" they can get... today it's useful for testing and
debugging.

That being said, it's hard to argue with improving this experience
where it can help developers that really want to use Avro JSON for
data transfer, especially for things accepting JSON where the
intention is clearly unambiguous or allowing optional attributes to be
missing.  I'd be enthusiastic to see some of these improvements,
especially if we keep the possibility of generating strict Avro JSON
for forwards and backwards compatibility.

My preference would be to avoid adding JSON-specific attributes to the
spec where possible.  Maybe we could consider implementing Avro JSON
"variants" by implementing encoder options, or alternative encorders
for an SDK. There's probably a nice balance between a rigorous and
interoperable (but less customizable) JSON encoding, and trying to
accommodate arbitrary JSON in the Avro project.

All my best and thanks for this analysis -- I'm excited to see where
this leads!  Ryan

On Thu, Apr 18, 2024 at 8:01 PM Oscar Westra van Holthe - Kind
mailto:os...@westravanholthe.nl>> wrote:
>
> Thank you Clemens,
>
> This is a very detailed set of proposals, and it looks like it would work.
>
> I do however, feel we'd need to define a way to unions with records. Your 
> proposal lists various options, of which the discriminatory option seems most 
> portable to me.
>
> You mention the "displayName" proposal. I don't like that, as it mixes data 
> with UI elements. The discriminator option can specify a fixed or 
> configurable field to hold the type of the record.
>
> Kind regards,
> Oscar
>
>
> --
> Oscar Westra van Holthe - Kind 
> mailto:os...@westravanholthe.nl>>
>
> Op do 18

Re: Avro JSON Encoding

2024-04-23 Thread Oscar Westra van Holthe - Kind

Hi everyone,

Having looked a bit more into what I usually see when using JSON to transfer 
data, I think we can limit cross-format support (what this essentially is) to a 
common denominator as we can see between Python objects / dicts, Rust structs, 
Java POJOs/records, and Parquet MessageTypes, just to name a few.

This essentially boils down to all Avro constructs except for maps and unions 
other than a single type plus null (i.e., the recent ‘?’ addition to the IDL 
syntax). It also means we can omit support for most of the esoteric JSON schema 
constructs, like additionalProperties, patternProperties, if/then/else, etc.


However, as Ryan noted, it still makes sense to find a way to promote the Avro 
binary format. Especially the single-message encoding, I’d add: most questions 
to use JSON, for example, are for single records. Currently however, only the 
Rust and Java SDKs mention the byte marker for the single-message encoding at 
all. It’s very much lacking from Python.

In fact, if we want to promote the use of Avro (and especially its binary 
format), we must have a better documentation and implementation of the 
single-message encoding.


Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

> On 19 Apr 2024, at 23:45, Andrew Otto  wrote:
> 
> > There's probably a nice balance between a rigorous and interoperable (but 
> > less customizable) JSON encoding, and trying to accommodate arbitrary JSON 
> > in the Avro project.
> 
> For my own purposes, I'd only need a very limited set of JSON support. For 
> event streaming, we limit JSONSchema usages to those that can be easily and 
> explicitly mapped to SQL (Hive, Spark, Flink) type systems. e.g. No undefined 
> additionalProperties 
> ,
>  no union types 
> ,
>  etc. etc.  
> 
> 
> 
> 
> On Fri, Apr 19, 2024 at 11:58 AM Ryan Skraba  > wrote:
>> Hello!
>> 
>> A bit tongue in cheek: the one advantage of the current Avro JSON
>> encoding is that it drives users rapidly to prefer the binary
>> encoding!  In its current state, Avro isn't really a satisfactory
>> toolkit for JSON interoperability, while it shines for binary
>> interoperability. Using JSON with Avro schemas is pretty unwieldy and
>> a JSON data designer will almost never be entirely satisfied with the
>> JSON "shape" they can get... today it's useful for testing and
>> debugging.
>> 
>> That being said, it's hard to argue with improving this experience
>> where it can help developers that really want to use Avro JSON for
>> data transfer, especially for things accepting JSON where the
>> intention is clearly unambiguous or allowing optional attributes to be
>> missing.  I'd be enthusiastic to see some of these improvements,
>> especially if we keep the possibility of generating strict Avro JSON
>> for forwards and backwards compatibility.
>> 
>> My preference would be to avoid adding JSON-specific attributes to the
>> spec where possible.  Maybe we could consider implementing Avro JSON
>> "variants" by implementing encoder options, or alternative encorders
>> for an SDK. There's probably a nice balance between a rigorous and
>> interoperable (but less customizable) JSON encoding, and trying to
>> accommodate arbitrary JSON in the Avro project.
>> 
>> All my best and thanks for this analysis -- I'm excited to see where
>> this leads!  Ryan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, Apr 18, 2024 at 8:01 PM Oscar Westra van Holthe - Kind
>> mailto:os...@westravanholthe.nl>> wrote:
>> >
>> > Thank you Clemens,
>> >
>> > This is a very detailed set of proposals, and it looks like it would work.
>> >
>> > I do however, feel we'd need to define a way to unions with records. Your 
>> > proposal lists various options, of which the discriminatory option seems 
>> > most portable to me.
>> >
>> > You mention the "displayName" proposal. I don't like that, as it mixes 
>> > data with UI elements. The discriminator option can specify a fixed or 
>> > configurable field to hold the type of the record.
>> >
>> > Kind regards,
>> > Oscar
>> >
>> >
>> > --
>> > Oscar Westra van Holthe - Kind > > >
>> >
>> > Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user 
>> > mailto:user@avro.apache.org>>:
>> >>
>> >> Hi everyone,
>> >>
>> >>
>> >>
>> >> the current JSON Encoding approach severely limits interoperability with 
>> >> other JSON serialization frameworks. In my view, the JSON Encoding is 
>> >> only really useful if it acts as a bridge into and from JSON-centric 
>> >> applications and it currently gets in its own way.
>> >>
>> >>
>> >>
>> >> The current encoding being what it is, there should be an alternate mode 
>> >> that emphasizes interoperability with JSON “as-is” and allows Avro Schema 
>> >> to describe existing JSON

Re: Avro JSON Encoding

2024-04-19 Thread Andrew Otto

> There's probably a nice balance between a rigorous and interoperable (but
less customizable) JSON encoding, and trying to accommodate arbitrary JSON
in the Avro project.

For my own purposes, I'd only need a very limited set of JSON support. For
event streaming, we limit JSONSchema usages to those that can be easily and
explicitly mapped to SQL (Hive, Spark, Flink) type systems. e.g. No undefined
additionalProperties
,
*no union types
*,
etc. etc.




On Fri, Apr 19, 2024 at 11:58 AM Ryan Skraba  wrote:

> Hello!
>
> A bit tongue in cheek: the one advantage of the current Avro JSON
> encoding is that it drives users rapidly to prefer the binary
> encoding!  In its current state, Avro isn't really a satisfactory
> toolkit for JSON interoperability, while it shines for binary
> interoperability. Using JSON with Avro schemas is pretty unwieldy and
> a JSON data designer will almost never be entirely satisfied with the
> JSON "shape" they can get... today it's useful for testing and
> debugging.
>
> That being said, it's hard to argue with improving this experience
> where it can help developers that really want to use Avro JSON for
> data transfer, especially for things accepting JSON where the
> intention is clearly unambiguous or allowing optional attributes to be
> missing.  I'd be enthusiastic to see some of these improvements,
> especially if we keep the possibility of generating strict Avro JSON
> for forwards and backwards compatibility.
>
> My preference would be to avoid adding JSON-specific attributes to the
> spec where possible.  Maybe we could consider implementing Avro JSON
> "variants" by implementing encoder options, or alternative encorders
> for an SDK. There's probably a nice balance between a rigorous and
> interoperable (but less customizable) JSON encoding, and trying to
> accommodate arbitrary JSON in the Avro project.
>
> All my best and thanks for this analysis -- I'm excited to see where
> this leads!  Ryan
>
>
>
>
>
>
>
>
>
> On Thu, Apr 18, 2024 at 8:01 PM Oscar Westra van Holthe - Kind
>  wrote:
> >
> > Thank you Clemens,
> >
> > This is a very detailed set of proposals, and it looks like it would
> work.
> >
> > I do however, feel we'd need to define a way to unions with records.
> Your proposal lists various options, of which the discriminatory option
> seems most portable to me.
> >
> > You mention the "displayName" proposal. I don't like that, as it mixes
> data with UI elements. The discriminator option can specify a fixed or
> configurable field to hold the type of the record.
> >
> > Kind regards,
> > Oscar
> >
> >
> > --
> > Oscar Westra van Holthe - Kind 
> >
> > Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user <
> user@avro.apache.org>:
> >>
> >> Hi everyone,
> >>
> >>
> >>
> >> the current JSON Encoding approach severely limits interoperability
> with other JSON serialization frameworks. In my view, the JSON Encoding is
> only really useful if it acts as a bridge into and from JSON-centric
> applications and it currently gets in its own way.
> >>
> >>
> >>
> >> The current encoding being what it is, there should be an alternate
> mode that emphasizes interoperability with JSON “as-is” and allows Avro
> Schema to describe existing JSON document instances such that I can take
> someone’s existing JSON document in on one side of a piece of software and
> emit Avro binary on the other side while acting on the same schema.
> >>
> >>
> >>
> >> There are four specific issues:
> >>
> >>
> >>
> >> Binary Values
> >> Unions with Primitive Type Values and Enum Values
> >> Unions with Record Values
> >> DateTime
> >>
> >>
> >>
> >> One by one:
> >>
> >>
> >>
> >> 1. Binary values:
> >>
> >> -
> >>
> >>
> >>
> >> Binary values are (fixed and bytes) are encoded as escaped unicode
> literals. While I appreciate the creative trick, it costs 6 bytes for each
> encoded byte. I have a hard time finding any JSON libraries that provide a
> conversion of such strings from/to byte arrays, so this approach appears to
> be idiosyncratic for Avro’s JSON Encoding.
> >>
> >>
> >>
> >> The common way to encode binary in JSON is to use base64 encoding and
> that is widely and well supported in libraries. Base64 is 33% larger than
> plain bytes, the encoding chosen here is 500% (!) larger than plain bytes.
> >>
> >>
> >>
> >> The Avro decoder is schema-informed and it knows that a field is
> expected to hold bytes, so it’s easy to mandate base64 for the field
> content in the alternate mode.
> >>
> >>
> >>
> >> 2. Unions with Primitive Type Values and Enum Values
> >>
> >> -
> >>
> >>
> >>
> >> It’s common to express optionality in Avro Schema by creating a union
> with the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts
> to

Re: Avro JSON Encoding

2024-04-19 Thread Clemens Vasters via user

Thank you, Ryan. I am specifically trying to avoid JSON specific attributes 
that would not be otherwise useful (hence "const" and "displayname") and I do 
indeed imagine the alternate encoding to be selected by a new switch on the 
encoders.

Gesendet von Outlook für iOS

Von: Ryan Skraba 
Gesendet: Friday, April 19, 2024 5:57:37 PM
An: user@avro.apache.org 
Cc: Clemens Vasters 
Betreff: Re: Avro JSON Encoding

[Sie erhalten nicht häufig E-Mails von r...@skraba.com. Weitere Informationen, 
warum dies wichtig ist, finden Sie unter 
https://aka.ms/LearnAboutSenderIdentification ]

Hello!

A bit tongue in cheek: the one advantage of the current Avro JSON
encoding is that it drives users rapidly to prefer the binary
encoding!  In its current state, Avro isn't really a satisfactory
toolkit for JSON interoperability, while it shines for binary
interoperability. Using JSON with Avro schemas is pretty unwieldy and
a JSON data designer will almost never be entirely satisfied with the
JSON "shape" they can get... today it's useful for testing and
debugging.

That being said, it's hard to argue with improving this experience
where it can help developers that really want to use Avro JSON for
data transfer, especially for things accepting JSON where the
intention is clearly unambiguous or allowing optional attributes to be
missing.  I'd be enthusiastic to see some of these improvements,
especially if we keep the possibility of generating strict Avro JSON
for forwards and backwards compatibility.

My preference would be to avoid adding JSON-specific attributes to the
spec where possible.  Maybe we could consider implementing Avro JSON
"variants" by implementing encoder options, or alternative encorders
for an SDK. There's probably a nice balance between a rigorous and
interoperable (but less customizable) JSON encoding, and trying to
accommodate arbitrary JSON in the Avro project.

All my best and thanks for this analysis -- I'm excited to see where
this leads!  Ryan

On Thu, Apr 18, 2024 at 8:01 PM Oscar Westra van Holthe - Kind
 wrote:
>
> Thank you Clemens,
>
> This is a very detailed set of proposals, and it looks like it would work.
>
> I do however, feel we'd need to define a way to unions with records. Your 
> proposal lists various options, of which the discriminatory option seems most 
> portable to me.
>
> You mention the "displayName" proposal. I don't like that, as it mixes data 
> with UI elements. The discriminator option can specify a fixed or 
> configurable field to hold the type of the record.
>
> Kind regards,
> Oscar
>
>
> --
> Oscar Westra van Holthe - Kind 
>
> Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user 
> :
>>
>> Hi everyone,
>>
>>
>>
>> the current JSON Encoding approach severely limits interoperability with 
>> other JSON serialization frameworks. In my view, the JSON Encoding is only 
>> really useful if it acts as a bridge into and from JSON-centric applications 
>> and it currently gets in its own way.
>>
>>
>>
>> The current encoding being what it is, there should be an alternate mode 
>> that emphasizes interoperability with JSON “as-is” and allows Avro Schema to 
>> describe existing JSON document instances such that I can take someone’s 
>> existing JSON document in on one side of a piece of software and emit Avro 
>> binary on the other side while acting on the same schema.
>>
>>
>>
>> There are four specific issues:
>>
>>
>>
>> Binary Values
>> Unions with Primitive Type Values and Enum Values
>> Unions with Record Values
>> DateTime
>>
>>
>>
>> One by one:
>>
>>
>>
>> 1. Binary values:
>>
>> -
>>
>>
>>
>> Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
>> While I appreciate the creative trick, it costs 6 bytes for each encoded 
>> byte. I have a hard time finding any JSON libraries that provide a 
>> conversion of such strings from/to byte arrays, so this approach appears to 
>> be idiosyncratic for Avro’s JSON Encoding.
>>
>>
>>
>> The common way to encode binary in JSON is to use base64 encoding and that 
>> is widely and well supported in libraries. Base64 is 33% larger than plain 
>> bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>>
>>
>>
>> The Avro decoder is schema-informed and it knows that a field is expected to 
>> hold bytes, so it’s easy to mandate base64 for the field content in the 
>> alternate mode.
>>
>>
>>
>> 2. Unions with Primitive Type Values and Enum Values
>>
>> -
>>
>>
>>
>> It’s common to express optionality in Avro Schema by creating a union with 
>> the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to 
>> encode such unions, like any union, as { “{type}”: {value} } when the value 
>> is non-null.
>>
>>
>>
>> This choice ignores common practice and the fact that JSON’s values are 
>> dynamically typed (RFC8259 Section-3) and inherently accommodate unions. The 
>>

Re: Avro JSON Encoding

2024-04-19 Thread Ryan Skraba

Hello!

A bit tongue in cheek: the one advantage of the current Avro JSON
encoding is that it drives users rapidly to prefer the binary
encoding!  In its current state, Avro isn't really a satisfactory
toolkit for JSON interoperability, while it shines for binary
interoperability. Using JSON with Avro schemas is pretty unwieldy and
a JSON data designer will almost never be entirely satisfied with the
JSON "shape" they can get... today it's useful for testing and
debugging.

That being said, it's hard to argue with improving this experience
where it can help developers that really want to use Avro JSON for
data transfer, especially for things accepting JSON where the
intention is clearly unambiguous or allowing optional attributes to be
missing.  I'd be enthusiastic to see some of these improvements,
especially if we keep the possibility of generating strict Avro JSON
for forwards and backwards compatibility.

My preference would be to avoid adding JSON-specific attributes to the
spec where possible.  Maybe we could consider implementing Avro JSON
"variants" by implementing encoder options, or alternative encorders
for an SDK. There's probably a nice balance between a rigorous and
interoperable (but less customizable) JSON encoding, and trying to
accommodate arbitrary JSON in the Avro project.

All my best and thanks for this analysis -- I'm excited to see where
this leads!  Ryan









On Thu, Apr 18, 2024 at 8:01 PM Oscar Westra van Holthe - Kind
 wrote:
>
> Thank you Clemens,
>
> This is a very detailed set of proposals, and it looks like it would work.
>
> I do however, feel we'd need to define a way to unions with records. Your 
> proposal lists various options, of which the discriminatory option seems most 
> portable to me.
>
> You mention the "displayName" proposal. I don't like that, as it mixes data 
> with UI elements. The discriminator option can specify a fixed or 
> configurable field to hold the type of the record.
>
> Kind regards,
> Oscar
>
>
> --
> Oscar Westra van Holthe - Kind 
>
> Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user 
> :
>>
>> Hi everyone,
>>
>>
>>
>> the current JSON Encoding approach severely limits interoperability with 
>> other JSON serialization frameworks. In my view, the JSON Encoding is only 
>> really useful if it acts as a bridge into and from JSON-centric applications 
>> and it currently gets in its own way.
>>
>>
>>
>> The current encoding being what it is, there should be an alternate mode 
>> that emphasizes interoperability with JSON “as-is” and allows Avro Schema to 
>> describe existing JSON document instances such that I can take someone’s 
>> existing JSON document in on one side of a piece of software and emit Avro 
>> binary on the other side while acting on the same schema.
>>
>>
>>
>> There are four specific issues:
>>
>>
>>
>> Binary Values
>> Unions with Primitive Type Values and Enum Values
>> Unions with Record Values
>> DateTime
>>
>>
>>
>> One by one:
>>
>>
>>
>> 1. Binary values:
>>
>> -
>>
>>
>>
>> Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
>> While I appreciate the creative trick, it costs 6 bytes for each encoded 
>> byte. I have a hard time finding any JSON libraries that provide a 
>> conversion of such strings from/to byte arrays, so this approach appears to 
>> be idiosyncratic for Avro’s JSON Encoding.
>>
>>
>>
>> The common way to encode binary in JSON is to use base64 encoding and that 
>> is widely and well supported in libraries. Base64 is 33% larger than plain 
>> bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>>
>>
>>
>> The Avro decoder is schema-informed and it knows that a field is expected to 
>> hold bytes, so it’s easy to mandate base64 for the field content in the 
>> alternate mode.
>>
>>
>>
>> 2. Unions with Primitive Type Values and Enum Values
>>
>> -
>>
>>
>>
>> It’s common to express optionality in Avro Schema by creating a union with 
>> the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to 
>> encode such unions, like any union, as { “{type}”: {value} } when the value 
>> is non-null.
>>
>>
>>
>> This choice ignores common practice and the fact that JSON’s values are 
>> dynamically typed (RFC8259 Section-3) and inherently accommodate unions. The 
>> conformant way to encode a value choice of null or “string” into a JSON 
>> value is plainly null and “string”.
>>
>>
>>
>> “foo” : null
>>
>> “foo”: “value”
>>
>>
>>
>> The “field default values” table in the Avro spec maps Avro types to the 
>> JSON types null, boolean, integer, number, string, object, and array, all of 
>> which can be encoded into and, more importantly, unambiguously decoded from 
>> a JSON value. The only semi-ambiguous case is integer vs. number, which is a 
>> convention in JSON rather than a distinct type, but any Avro serializer is 
>> guided by type information and can easily make that distinction.
>>
>>
>>
>> 3.

Re: Avro JSON Encoding

2024-04-18 Thread Clemens Vasters via user

The discriminator is "const".

I added "displayname" because I also have other scenarios for it and it appears 
like a good workaround for alias names that do not fit the "name", e.g. 
"$type". I am not passionate about "displayname", but "alias" is taken and it's 
going to be a user-supplied name in other scenarios. Two attributes with 
similar functions would be a it much.

To illustrate how I imagine "const" working:

[
   {
  "type":"record",
 "fields": [
   {
 "name": "typename",
  "type": "string",
   "const": "cat"
  },
  ... cat things ..
  ]
  },
  {
 "type":"record",
 "fields": [
 {
 "name": "typename",
  "type": "string",
   "const": "dog"
 },
 ... dog things ...
  ]
}
]

(Sorry about formatting being bad, did that on the phone)

To handle anyone's JSON the decoder will still have to support the duck typing 
that JSON Schema needs to do for oneOf, but the "const" declaration provides a 
cheap first option to test for before having to probe the whole structure. So 
even though the model is technically duck typing, the const declaration 
shortcuts it completely in an efficient implementation that looks there first.

You would use "const" on the fields that other frameworks designate as the 
discriminator and the value would be whatever is set by the publisher to 
identify the type they write.

With "displayname", assuming the publisher uses "$type" as the discriminator:

[
   {
  "type":"record",
 "fields": [
   {
 "name": "typename",
  "displayname": "$type",
  "type": "string",
   "const": "cat"
  },
  ... cat things ..
  ]
  },
  {
 "type":"record",
 "fields": [
 {
 "name": "typename",
  "displayname": "$type",
  "type": "string",
   "const": "dog"
 },
 ... dog things ...
  ]
}
]



Von: Oscar Westra van Holthe - Kind 
Gesendet: Donnerstag, April 18, 2024 8:00 PM
An: user@avro.apache.org ; Clemens Vasters 

Betreff: Re: Avro JSON Encoding

Sie erhalten nicht oft eine E-Mail von os...@westravanholthe.nl. Erfahren Sie, 
warum dies wichtig ist
Thank you Clemens,

This is a very detailed set of proposals, and it looks like it would work.

I do however, feel we'd need to define a way to unions with records. Your 
proposal lists various options, of which the discriminatory option seems most 
portable to me.

You mention the "displayName" proposal. I don't like that, as it mixes data 
with UI elements. The discriminator option can specify a fixed or configurable 
field to hold the type of the record.

Kind regards,
Oscar


--
Oscar Westra van Holthe - Kind 
mailto:os...@westravanholthe.nl>>

Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user 
mailto:user@avro.apache.org>>:
Hi everyone,

the current JSON Encoding approach severely limits interoperability with other 
JSON serialization frameworks. In my view, the JSON Encoding is only really 
useful if it acts as a bridge into and from JSON-centric applications and it 
currently gets in its own way.

The current encoding being what it is, there should be an alternate mode that 
emphasizes interoperability with JSON “as-is” and allows Avro Schema to 
describe existing JSON document instances such that I can take someone’s 
existing JSON document in on one side of a piece of software and emit Avro 
binary on the other side while acting on the same schema.

There are four specific issues:


  1.  Binary Values
  2.  Unions with Primitive Type Values and Enum Values
  3.  Unions with Record Values
  4.  DateTime

One by one:

1. Binary values:
-

Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
While I appreciate the creative trick, it costs 6 bytes for each encoded byte. 
I have a hard time finding any JSON libraries that provide a conversion of such 
strings from/to byte arrays, so this approach appears to be idiosyncratic for 
Avro’s JSON Encoding.

The common way to encode binary in JSON is to use base64 encoding and that is 
widely and well supported in libraries. Base64 is 33% larger than plain bytes, 
the encoding chosen here is 500% (!) larger than plain bytes.

The Avro decoder is schema-informed and it knows that a field is expected to 
hold bytes, so it’s easy to mandate base64 for the field content in the 
alternate mode.

2. Unions with Primitive Type Values and Enum Values
-

It’s common to express optionality in Avro Schema by creating a union with the 
“null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to encode 
such unions, like any union, as { “{type}”: {value} } when the value is 
non-null.

This choice ignores common practice and the fact that JSON’s

Re: Avro JSON Encoding

2024-04-18 Thread Oscar Westra van Holthe - Kind

Thank you Clemens,

This is a very detailed set of proposals, and it looks like it would work.

I do however, feel we'd need to define a way to unions with records. Your
proposal lists various options, of which the discriminatory option seems
most portable to me.

You mention the "displayName" proposal. I don't like that, as it mixes data
with UI elements. The discriminator option can specify a fixed or
configurable field to hold the type of the record.

Kind regards,
Oscar


-- 
Oscar Westra van Holthe - Kind 

Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user <
user@avro.apache.org>:

> Hi everyone,
>
>
>
> the current JSON Encoding approach severely limits interoperability with
> other JSON serialization frameworks. In my view, the JSON Encoding is only
> really useful if it acts as a bridge into and from JSON-centric
> applications and it currently gets in its own way.
>
>
>
> The current encoding being what it is, there should be an alternate mode
> that emphasizes interoperability with JSON “as-is” and allows Avro Schema
> to describe existing JSON document instances such that I can take someone’s
> existing JSON document in on one side of a piece of software and emit Avro
> binary on the other side while acting on the same schema.
>
>
>
> There are four specific issues:
>
>
>
>1. Binary Values
>2. Unions with Primitive Type Values and Enum Values
>3. Unions with Record Values
>4. DateTime
>
>
>
> One by one:
>
>
>
> 1. Binary values:
>
> -
>
>
>
> Binary values are (fixed and bytes) are encoded as escaped unicode
> literals. While I appreciate the creative trick, it costs 6 bytes for each
> encoded byte. I have a hard time finding any JSON libraries that provide a
> conversion of such strings from/to byte arrays, so this approach appears to
> be idiosyncratic for Avro’s JSON Encoding.
>
>
>
> The common way to encode binary in JSON is to use base64 encoding and that
> is widely and well supported in libraries. Base64 is 33% larger than plain
> bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>
>
>
> The Avro decoder is schema-informed and it knows that a field is expected
> to hold bytes, so it’s easy to mandate base64 for the field content in the
> alternate mode.
>
>
>
> 2. Unions with Primitive Type Values and Enum Values
>
> -
>
>
>
> It’s common to express optionality in Avro Schema by creating a union with
> the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to
> encode such unions, like any union, as { “{type}”: {value} } when the value
> is non-null.
>
>
>
> This choice ignores common practice and the fact that JSON’s values are
> dynamically typed (RFC8259 Section-3
> ) and inherently
> accommodate unions. The conformant way to encode a value choice of null or
> “string” into a JSON value is plainly null and “string”.
>
>
>
> “foo” : null
>
> “foo”: “value”
>
>
>
> The “field default values” table in the Avro spec maps Avro types to the
> JSON types null, boolean, integer, number, string, object, and array, all
> of which can be encoded into and, more importantly, unambiguously decoded
> from a JSON value. The only semi-ambiguous case is integer vs. number,
> which is a convention in JSON rather than a distinct type, but any Avro
> serializer is guided by type information and can easily make that
> distinction.
>
>
>
> 3. Unions with Record Values
>
> -
>
>
>
> The JSON Encoding pattern of unions also covers “record” typed values, of
> course, and this is indeed a tricky scenario during deserialization since
> JSON does not have any built-in notion of type hints for “object” typed
> values.
>
>
>
> The problem of having to disambiguate instances of different types in a
> field value is a common one also for users of JSON Schema when using the
> “oneOf” construct, which is equivalent to Avro unions. There are two common
> strategies:
>
>
>
> - “Duck Typing”:  Every conformant JSON Schema Validator determines the
> validity of a JSON node against a “oneOf" rule by testing the instance
> against all available alternative schema definitions. Validation fails if
> there is not exactly one valid match.
>
> - Discriminators: OpenAPI, for instance, mandates a “discriminator” field
> (see https://spec.openapis.org/oas/latest.html#discriminator-object) for
> disambiguating “oneOf” constructs, whereby the discriminator property is
> part of each instance. That approach informs numerous JSON serialization
> frameworks, which implement discriminators under that assumption.
>
>
>
> The Java Jackson library indeed supports the Avro JSON Encoding’s style of
> putting the discriminator into a wrapper field name (JsonTypeInfo
> annotation, JsonTypeInfo.As.WRAPPER_OBJECT). Many other frameworks only
> support the property approach, though, including the two dominant ones for
> .NET, Pydantic of Python, and others. There’s tooling like

RE: Avro JSON Encoding

2024-04-18 Thread Clemens Vasters via user

I literally do the “FWIW” here: 
https://github.com/clemensv/avrotize?tab=readme-ov-file#convert-json-schema-to-avro-schema

From: Andrew Otto 
Sent: Thursday, April 18, 2024 2:24 PM
To: user@avro.apache.org
Cc: Clemens Vasters 
Subject: Re: Avro JSON Encoding

Sie erhalten nicht oft eine E-Mail von 
o...@wikimedia.org. Erfahren Sie, warum dies wichtig 
ist
This is a great proposal.  At the Wikimedia Foundation, we've explicitly chosen 
to use JSON as our streaming serialization 
format.
  We considered using Avro JSON, but the need to use an Avro specific 
serialization for nullable types was the main reason we chose not to do so.  
We'd love to be able to more automatically convert between JSON and Avro 
Binary, and a proposal like this should allow us to do so!

> The conformant way to encode a value choice of null or “string” into a JSON 
> value is plainly null and “string”.
This is true, but we decided to do this in a different way.  In JSONSchema, 
'optional' fields are marked as such by not including them in the list of 
required fields.  So, instead of explicitly encoding an optional field value as 
'null', producers omit the field entirely.  When converting to different type 
systems (Flink, Spark, etc.) our converters explicitly always use the 
JSONSchema,
 so we know if a field should be present and nulled, even if it is omitted in 
the incoming record data.

FWIW, I believe this proposal could make JSONSchema and Avro Schemas equivalent 
(enough) to automatically generate one from the other, and use Avro libs to 
serialize/deserialize JSON directly.  Very cool!

-Andrew Otto
 Wikimedia Foundation

On Thu, Apr 18, 2024 at 4:17 AM Jean-Baptiste Onofré 
mailto:j...@nanthrax.net>> wrote:
Hi Clemens

Thanks for the detailed email.

Quick question : did you already create Jira about each improvements/issues ?

I will take the time to read asap.

Thanks
Regards
JB

Le jeu. 18 avr. 2024 à 09:12, Clemens Vasters via user 
mailto:user@avro.apache.org>> a écrit :
Hi everyone,

the current JSON Encoding approach severely limits interoperability with other 
JSON serialization frameworks. In my view, the JSON Encoding is only really 
useful if it acts as a bridge into and from JSON-centric applications and it 
currently gets in its own way.

The current encoding being what it is, there should be an alternate mode that 
emphasizes interoperability with JSON “as-is” and allows Avro Schema to 
describe existing JSON document instances such that I can take someone’s 
existing JSON document in on one side of a piece of software and emit Avro 
binary on the other side while acting on the same schema.

There are four specific issues:

  1.  Binary Values
  2.  Unions with Primitive Type Values and Enum Values
  3.  Unions with Record Values
  4.  DateTime

One by one:

1. Binary values:
-

Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
While I appreciate the creative trick, it costs 6 bytes for each encoded byte. 
I have a hard time finding any JSON libraries that provide a conversion of such 
strings from/to byte arrays, so this approach appears to be idiosyncratic for 
Avro’s JSON Encoding.

The common way to encode binary in JSON is to use base64 encoding and that is 
widely and well supported in libraries. Base64 is 33% larger than plain bytes, 
the encoding chosen here is 500% (!) larger than plain bytes.

The Avro decoder is schema-informed and it knows that a field is expected to 
hold bytes, so it’s easy to mandate base64 for the field content in the 
alternate mode.

2. Unions with Primitive Type Values and Enum Values
-

It’s common to express optionality in Avro Schema by creating a union with the 
“null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to encode 
such unions, like any union, as { “{type}”: {value} } when the value is 
non-null.

This choice ignores common practice and the fact that JSON’s values are 
dynamically typed (RFC8259 
Section-3) and inherently 
accommodate unions. The conformant way to encode a value choice of null or 
“string” into a JSON value is plainly null and “string”.

“foo” : null
“foo”: “value”

The “field default values” table in the Avro spec maps Avro types to the JSON 
types null, boolean, integer, number, string, object, and array, all of which 
can be encoded into and, more importantly, unambiguously decoded from a JSON 
value. The only semi-ambiguous case is integer vs. number, which is a 
convention in JSON rather than a distinct type, but any Avro serializer is 
guided by

Re: Avro JSON Encoding

2024-04-18 Thread Andrew Otto

This is a great proposal.  At the Wikimedia Foundation, we've explicitly
chosen to use JSON as our streaming serialization format
.
We considered using Avro JSON, but the need to use an Avro specific
serialization for nullable types was the main reason we chose not to do
so.  We'd love to be able to more automatically convert between JSON and
Avro Binary, and a proposal like this should allow us to do so!

> The conformant way to encode a value choice of null or “string” into a
JSON value is plainly null and “string”.
This is true, but we decided to do this in a different way.  In JSONSchema,
'optional' fields are marked as such by not including them in the list of
required fields.  So, instead of explicitly encoding an optional field
value as 'null', producers omit the field entirely.  When converting to
different type systems (Flink, Spark, etc.) our converters explicitly
always use the JSONSchema
,
so we know if a field should be present and nulled, even if it is omitted
in the incoming record data.

FWIW, I believe this proposal could make JSONSchema and Avro Schemas
equivalent (enough) to automatically generate one from the other, and use
Avro libs to serialize/deserialize JSON directly.  Very cool!

-Andrew Otto
 Wikimedia Foundation



On Thu, Apr 18, 2024 at 4:17 AM Jean-Baptiste Onofré 
wrote:

> Hi Clemens
>
> Thanks for the detailed email.
>
> Quick question : did you already create Jira about each
> improvements/issues ?
>
> I will take the time to read asap.
>
> Thanks
> Regards
> JB
>
> Le jeu. 18 avr. 2024 à 09:12, Clemens Vasters via user <
> user@avro.apache.org> a écrit :
>
>> Hi everyone,
>>
>>
>>
>> the current JSON Encoding approach severely limits interoperability with
>> other JSON serialization frameworks. In my view, the JSON Encoding is only
>> really useful if it acts as a bridge into and from JSON-centric
>> applications and it currently gets in its own way.
>>
>>
>>
>> The current encoding being what it is, there should be an alternate mode
>> that emphasizes interoperability with JSON “as-is” and allows Avro Schema
>> to describe existing JSON document instances such that I can take someone’s
>> existing JSON document in on one side of a piece of software and emit Avro
>> binary on the other side while acting on the same schema.
>>
>>
>>
>> There are four specific issues:
>>
>>
>>
>>1. Binary Values
>>2. Unions with Primitive Type Values and Enum Values
>>3. Unions with Record Values
>>4. DateTime
>>
>>
>>
>> One by one:
>>
>>
>>
>> 1. Binary values:
>>
>> -
>>
>>
>>
>> Binary values are (fixed and bytes) are encoded as escaped unicode
>> literals. While I appreciate the creative trick, it costs 6 bytes for each
>> encoded byte. I have a hard time finding any JSON libraries that provide a
>> conversion of such strings from/to byte arrays, so this approach appears to
>> be idiosyncratic for Avro’s JSON Encoding.
>>
>>
>>
>> The common way to encode binary in JSON is to use base64 encoding and
>> that is widely and well supported in libraries. Base64 is 33% larger than
>> plain bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>>
>>
>>
>> The Avro decoder is schema-informed and it knows that a field is expected
>> to hold bytes, so it’s easy to mandate base64 for the field content in the
>> alternate mode.
>>
>>
>>
>> 2. Unions with Primitive Type Values and Enum Values
>>
>> -
>>
>>
>>
>> It’s common to express optionality in Avro Schema by creating a union
>> with the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts
>> to encode such unions, like any union, as { “{type}”: {value} } when the
>> value is non-null.
>>
>>
>>
>> This choice ignores common practice and the fact that JSON’s values are
>> dynamically typed (RFC8259 Section-3
>> ) and inherently
>> accommodate unions. The conformant way to encode a value choice of null or
>> “string” into a JSON value is plainly null and “string”.
>>
>>
>>
>> “foo” : null
>>
>> “foo”: “value”
>>
>>
>>
>> The “field default values” table in the Avro spec maps Avro types to the
>> JSON types null, boolean, integer, number, string, object, and array, all
>> of which can be encoded into and, more importantly, unambiguously decoded
>> from a JSON value. The only semi-ambiguous case is integer vs. number,
>> which is a convention in JSON rather than a distinct type, but any Avro
>> serializer is guided by type information and can easily make that
>> distinction.
>>
>>
>>
>> 3. Unions with Record Values
>>
>> -
>>
>>
>>
>> The JSON Encoding pattern of unions also covers “record” typed values, of
>>

"Avrotize" tool

2024-04-18 Thread Clemens Vasters via user

Hi everyone,

I'm interested in feedback on the "Avrotize" tool:

Git: https://github.com/clemensv/avrotize  
PyPI: https://pypi.org/project/avrotize/

Avrotize is a command-line tool for converting data structure definitions 
between different schema formats, using Apache Avro Schema as the integration 
schema model.

You can use the tool to convert between Avro Schema and other schema formats 
like JSON Schema, XML Schema (XSD), Protocol Buffers (Protobuf), ASN.1, and 
database schema formats like Apache Parquet files, Kusto Data Table Definition 
(KQL) and T-SQL Table Definition (MSSQL Server). I'm aiming to support more 
schemas, especially for databases. 

With this, you can also convert from JSON Schema to Protobuf going via Avro 
Schema.

You can also generate C#, Java, TypeScript, JavaScript, and Python code from 
the Avro Schema documents. The difference to the native Avro tools is that 
Avrotize can emit data classes without Avro library dependencies and, 
optionally, with annotations for JSON serialization libraries like Jackson or 
System.Text.Json. The C# code generator is furthest along in terms of 
serialization helper capabilities, but I'll bring the Java version up to that 
level next week. 

The JSON Schema to Avro Schema conversion has its own page: 
https://github.com/clemensv/avrotize/blob/master/jsonschema.md

Best Regards
Clemens


Clemens Vasters
Messaging Platform Architect
Microsoft Azure
cleme...@microsoft.com   
European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 
Munich| Germany 
Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff 
Amtsgericht Aachen, HRB 12066

Re: Avro JSON Encoding

2024-04-18 Thread Jean-Baptiste Onofré

Hi Clemens,

I propose to wait a bit to give a chance to the community to review
your email and points.

Then, we will create the Jira accordingly.

Regards
JB

On Thu, Apr 18, 2024 at 9:20 AM Clemens Vasters  wrote:
>
> Hi JB,
>
>
>
> I have not done that yet. I’m happy to break that up into items once I get 
> the sense that this is a direction we can get to a consensus on.
>
>
>
> Shall I file the whole email as a “New Feature” issue first?
>
>
>
> Thanks
>
> Clemens
>
>
>
> From: Jean-Baptiste Onofré 
> Sent: Thursday, April 18, 2024 10:17 AM
> To: Clemens Vasters ; user@avro.apache.org
> Subject: Re: Avro JSON Encoding
>
>
>
> Hi Clemens
>
>
>
> Thanks for the detailed email.
>
>
>
> Quick question : did you already create Jira about each improvements/issues ?
>
>
>
> I will take the time to read asap.
>
>
>
> Thanks
>
> Regards
>
> JB
>
>
>
> Le jeu. 18 avr. 2024 à 09:12, Clemens Vasters via user  
> a écrit :
>
> Hi everyone,
>
>
>
> the current JSON Encoding approach severely limits interoperability with 
> other JSON serialization frameworks. In my view, the JSON Encoding is only 
> really useful if it acts as a bridge into and from JSON-centric applications 
> and it currently gets in its own way.
>
>
>
> The current encoding being what it is, there should be an alternate mode that 
> emphasizes interoperability with JSON “as-is” and allows Avro Schema to 
> describe existing JSON document instances such that I can take someone’s 
> existing JSON document in on one side of a piece of software and emit Avro 
> binary on the other side while acting on the same schema.
>
>
>
> There are four specific issues:
>
>
>
> Binary Values
> Unions with Primitive Type Values and Enum Values
> Unions with Record Values
> DateTime
>
>
>
> One by one:
>
>
>
> 1. Binary values:
>
> -
>
>
>
> Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
> While I appreciate the creative trick, it costs 6 bytes for each encoded 
> byte. I have a hard time finding any JSON libraries that provide a conversion 
> of such strings from/to byte arrays, so this approach appears to be 
> idiosyncratic for Avro’s JSON Encoding.
>
>
>
> The common way to encode binary in JSON is to use base64 encoding and that is 
> widely and well supported in libraries. Base64 is 33% larger than plain 
> bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>
>
>
> The Avro decoder is schema-informed and it knows that a field is expected to 
> hold bytes, so it’s easy to mandate base64 for the field content in the 
> alternate mode.
>
>
>
> 2. Unions with Primitive Type Values and Enum Values
>
> -
>
>
>
> It’s common to express optionality in Avro Schema by creating a union with 
> the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to 
> encode such unions, like any union, as { “{type}”: {value} } when the value 
> is non-null.
>
>
>
> This choice ignores common practice and the fact that JSON’s values are 
> dynamically typed (RFC8259 Section-3) and inherently accommodate unions. The 
> conformant way to encode a value choice of null or “string” into a JSON value 
> is plainly null and “string”.
>
>
>
> “foo” : null
>
> “foo”: “value”
>
>
>
> The “field default values” table in the Avro spec maps Avro types to the JSON 
> types null, boolean, integer, number, string, object, and array, all of which 
> can be encoded into and, more importantly, unambiguously decoded from a JSON 
> value. The only semi-ambiguous case is integer vs. number, which is a 
> convention in JSON rather than a distinct type, but any Avro serializer is 
> guided by type information and can easily make that distinction.
>
>
>
> 3. Unions with Record Values
>
> -
>
>
>
> The JSON Encoding pattern of unions also covers “record” typed values, of 
> course, and this is indeed a tricky scenario during deserialization since 
> JSON does not have any built-in notion of type hints for “object” typed 
> values.
>
>
>
> The problem of having to disambiguate instances of different types in a field 
> value is a common one also for users of JSON Schema when using the “oneOf” 
> construct, which is equivalent to Avro unions. There are two common 
> strategies:
>
>
>
> - “Duck Typing”:  Every conformant JSON Schema Validator determines the 
> validity of a JSON node against a “oneOf" rule by testing the instance 
> against all available alternative schema definitions. Validation fails if 
> there is not exactly one valid match.
>
> - Discriminators: OpenAPI, for instance, mandates a “discriminator” field 
> (see https://spec.openapis.org/oas/latest.html#discriminator-object) for 
> disambiguating “oneOf” constructs, whereby the discriminator property is part 
> of each instance. That approach informs numerous JSON serialization 
> frameworks, which implement discriminators under that assumption.
>
>
>
> The Java Jackson library indeed supports the Avro JSON Encoding’s

RE: Avro JSON Encoding

2024-04-18 Thread Clemens Vasters via user

Hi JB,

I have not done that yet. I'm happy to break that up into items once I get the 
sense that this is a direction we can get to a consensus on.

Shall I file the whole email as a "New Feature" issue first?

Thanks
Clemens

From: Jean-Baptiste Onofré 
Sent: Thursday, April 18, 2024 10:17 AM
To: Clemens Vasters ; user@avro.apache.org
Subject: Re: Avro JSON Encoding

Hi Clemens

Thanks for the detailed email.

Quick question : did you already create Jira about each improvements/issues ?

I will take the time to read asap.

Thanks
Regards
JB

Le jeu. 18 avr. 2024 à 09:12, Clemens Vasters via user 
mailto:user@avro.apache.org>> a écrit :
Hi everyone,

the current JSON Encoding approach severely limits interoperability with other 
JSON serialization frameworks. In my view, the JSON Encoding is only really 
useful if it acts as a bridge into and from JSON-centric applications and it 
currently gets in its own way.

The current encoding being what it is, there should be an alternate mode that 
emphasizes interoperability with JSON "as-is" and allows Avro Schema to 
describe existing JSON document instances such that I can take someone's 
existing JSON document in on one side of a piece of software and emit Avro 
binary on the other side while acting on the same schema.

There are four specific issues:


  1.  Binary Values
  2.  Unions with Primitive Type Values and Enum Values
  3.  Unions with Record Values
  4.  DateTime

One by one:

1. Binary values:
-

Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
While I appreciate the creative trick, it costs 6 bytes for each encoded byte. 
I have a hard time finding any JSON libraries that provide a conversion of such 
strings from/to byte arrays, so this approach appears to be idiosyncratic for 
Avro's JSON Encoding.

The common way to encode binary in JSON is to use base64 encoding and that is 
widely and well supported in libraries. Base64 is 33% larger than plain bytes, 
the encoding chosen here is 500% (!) larger than plain bytes.

The Avro decoder is schema-informed and it knows that a field is expected to 
hold bytes, so it's easy to mandate base64 for the field content in the 
alternate mode.

2. Unions with Primitive Type Values and Enum Values
-

It's common to express optionality in Avro Schema by creating a union with the 
"null" type, e.g. ["string", "null"]. The Avro JSON Encoding opts to encode 
such unions, like any union, as { "{type}": {value} } when the value is 
non-null.

This choice ignores common practice and the fact that JSON's values are 
dynamically typed (RFC8259 
Section-3) and inherently 
accommodate unions. The conformant way to encode a value choice of null or 
"string" into a JSON value is plainly null and "string".

"foo" : null
"foo": "value"

The "field default values" table in the Avro spec maps Avro types to the JSON 
types null, boolean, integer, number, string, object, and array, all of which 
can be encoded into and, more importantly, unambiguously decoded from a JSON 
value. The only semi-ambiguous case is integer vs. number, which is a 
convention in JSON rather than a distinct type, but any Avro serializer is 
guided by type information and can easily make that distinction.

3. Unions with Record Values
-

The JSON Encoding pattern of unions also covers "record" typed values, of 
course, and this is indeed a tricky scenario during deserialization since JSON 
does not have any built-in notion of type hints for "object" typed values.

The problem of having to disambiguate instances of different types in a field 
value is a common one also for users of JSON Schema when using the "oneOf" 
construct, which is equivalent to Avro unions. There are two common strategies:

- "Duck Typing":  Every conformant JSON Schema Validator determines the 
validity of a JSON node against a "oneOf" rule by testing the instance against 
all available alternative schema definitions. Validation fails if there is not 
exactly one valid match.
- Discriminators: OpenAPI, for instance, mandates a "discriminator" field (see 
https://spec.openapis.org/oas/latest.html#discriminator-object) for 
disambiguating "oneOf" constructs, whereby the discriminator property is part 
of each instance. That approach informs numerous JSON serialization frameworks, 
which implement discriminators under that assumption.

The Java Jackson library indeed supports the Avro JSON Encoding's style of 
putting the discriminator into a wrapper field name (JsonTypeInfo annotation, 
JsonTypeInfo.As.WRAPPER_OBJECT). Many other frameworks only support the 
property approach, though, including the two dominant ones for .NET, Pydantic 
of Python, and others. There's tooling like Redocly that flags that approach as 
a "mistake" (see 
https://redocly.com/docs/resources/discriminator/#property-outside-of-the-object).

What that means is that most

Re: Avro JSON Encoding

2024-04-18 Thread Jean-Baptiste Onofré

Hi Clemens

Thanks for the detailed email.

Quick question : did you already create Jira about each improvements/issues
?

I will take the time to read asap.

Thanks
Regards
JB

Le jeu. 18 avr. 2024 à 09:12, Clemens Vasters via user 
a écrit :

> Hi everyone,
>
>
>
> the current JSON Encoding approach severely limits interoperability with
> other JSON serialization frameworks. In my view, the JSON Encoding is only
> really useful if it acts as a bridge into and from JSON-centric
> applications and it currently gets in its own way.
>
>
>
> The current encoding being what it is, there should be an alternate mode
> that emphasizes interoperability with JSON “as-is” and allows Avro Schema
> to describe existing JSON document instances such that I can take someone’s
> existing JSON document in on one side of a piece of software and emit Avro
> binary on the other side while acting on the same schema.
>
>
>
> There are four specific issues:
>
>
>
>1. Binary Values
>2. Unions with Primitive Type Values and Enum Values
>3. Unions with Record Values
>4. DateTime
>
>
>
> One by one:
>
>
>
> 1. Binary values:
>
> -
>
>
>
> Binary values are (fixed and bytes) are encoded as escaped unicode
> literals. While I appreciate the creative trick, it costs 6 bytes for each
> encoded byte. I have a hard time finding any JSON libraries that provide a
> conversion of such strings from/to byte arrays, so this approach appears to
> be idiosyncratic for Avro’s JSON Encoding.
>
>
>
> The common way to encode binary in JSON is to use base64 encoding and that
> is widely and well supported in libraries. Base64 is 33% larger than plain
> bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>
>
>
> The Avro decoder is schema-informed and it knows that a field is expected
> to hold bytes, so it’s easy to mandate base64 for the field content in the
> alternate mode.
>
>
>
> 2. Unions with Primitive Type Values and Enum Values
>
> -
>
>
>
> It’s common to express optionality in Avro Schema by creating a union with
> the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to
> encode such unions, like any union, as { “{type}”: {value} } when the value
> is non-null.
>
>
>
> This choice ignores common practice and the fact that JSON’s values are
> dynamically typed (RFC8259 Section-3
> ) and inherently
> accommodate unions. The conformant way to encode a value choice of null or
> “string” into a JSON value is plainly null and “string”.
>
>
>
> “foo” : null
>
> “foo”: “value”
>
>
>
> The “field default values” table in the Avro spec maps Avro types to the
> JSON types null, boolean, integer, number, string, object, and array, all
> of which can be encoded into and, more importantly, unambiguously decoded
> from a JSON value. The only semi-ambiguous case is integer vs. number,
> which is a convention in JSON rather than a distinct type, but any Avro
> serializer is guided by type information and can easily make that
> distinction.
>
>
>
> 3. Unions with Record Values
>
> -
>
>
>
> The JSON Encoding pattern of unions also covers “record” typed values, of
> course, and this is indeed a tricky scenario during deserialization since
> JSON does not have any built-in notion of type hints for “object” typed
> values.
>
>
>
> The problem of having to disambiguate instances of different types in a
> field value is a common one also for users of JSON Schema when using the
> “oneOf” construct, which is equivalent to Avro unions. There are two common
> strategies:
>
>
>
> - “Duck Typing”:  Every conformant JSON Schema Validator determines the
> validity of a JSON node against a “oneOf" rule by testing the instance
> against all available alternative schema definitions. Validation fails if
> there is not exactly one valid match.
>
> - Discriminators: OpenAPI, for instance, mandates a “discriminator” field
> (see https://spec.openapis.org/oas/latest.html#discriminator-object) for
> disambiguating “oneOf” constructs, whereby the discriminator property is
> part of each instance. That approach informs numerous JSON serialization
> frameworks, which implement discriminators under that assumption.
>
>
>
> The Java Jackson library indeed supports the Avro JSON Encoding’s style of
> putting the discriminator into a wrapper field name (JsonTypeInfo
> annotation, JsonTypeInfo.As.WRAPPER_OBJECT). Many other frameworks only
> support the property approach, though, including the two dominant ones for
> .NET, Pydantic of Python, and others. There’s tooling like Redocly that
> flags that approach as a “mistake” (see
> https://redocly.com/docs/resources/discriminator/#property-outside-of-the-object
> ).
>
>
>
> What that means is that most existing JSON instances with ambiguous types
> will either use property discriminators or the implementation will rely on
> duck typing as JSON Schema does for validation. The Avro

Avro JSON Encoding

2024-04-18 Thread Clemens Vasters via user

Hi everyone,

the current JSON Encoding approach severely limits interoperability with other 
JSON serialization frameworks. In my view, the JSON Encoding is only really 
useful if it acts as a bridge into and from JSON-centric applications and it 
currently gets in its own way.

The current encoding being what it is, there should be an alternate mode that 
emphasizes interoperability with JSON "as-is" and allows Avro Schema to 
describe existing JSON document instances such that I can take someone's 
existing JSON document in on one side of a piece of software and emit Avro 
binary on the other side while acting on the same schema.

There are four specific issues:


  1.  Binary Values
  2.  Unions with Primitive Type Values and Enum Values
  3.  Unions with Record Values
  4.  DateTime

One by one:

1. Binary values:
-

Binary values are (fixed and bytes) are encoded as escaped unicode literals. 
While I appreciate the creative trick, it costs 6 bytes for each encoded byte. 
I have a hard time finding any JSON libraries that provide a conversion of such 
strings from/to byte arrays, so this approach appears to be idiosyncratic for 
Avro's JSON Encoding.

The common way to encode binary in JSON is to use base64 encoding and that is 
widely and well supported in libraries. Base64 is 33% larger than plain bytes, 
the encoding chosen here is 500% (!) larger than plain bytes.

The Avro decoder is schema-informed and it knows that a field is expected to 
hold bytes, so it's easy to mandate base64 for the field content in the 
alternate mode.

2. Unions with Primitive Type Values and Enum Values
-

It's common to express optionality in Avro Schema by creating a union with the 
"null" type, e.g. ["string", "null"]. The Avro JSON Encoding opts to encode 
such unions, like any union, as { "{type}": {value} } when the value is 
non-null.

This choice ignores common practice and the fact that JSON's values are 
dynamically typed (RFC8259 
Section-3) and inherently 
accommodate unions. The conformant way to encode a value choice of null or 
"string" into a JSON value is plainly null and "string".

"foo" : null
"foo": "value"

The "field default values" table in the Avro spec maps Avro types to the JSON 
types null, boolean, integer, number, string, object, and array, all of which 
can be encoded into and, more importantly, unambiguously decoded from a JSON 
value. The only semi-ambiguous case is integer vs. number, which is a 
convention in JSON rather than a distinct type, but any Avro serializer is 
guided by type information and can easily make that distinction.

3. Unions with Record Values
-

The JSON Encoding pattern of unions also covers "record" typed values, of 
course, and this is indeed a tricky scenario during deserialization since JSON 
does not have any built-in notion of type hints for "object" typed values.

The problem of having to disambiguate instances of different types in a field 
value is a common one also for users of JSON Schema when using the "oneOf" 
construct, which is equivalent to Avro unions. There are two common strategies:

- "Duck Typing":  Every conformant JSON Schema Validator determines the 
validity of a JSON node against a "oneOf" rule by testing the instance against 
all available alternative schema definitions. Validation fails if there is not 
exactly one valid match.
- Discriminators: OpenAPI, for instance, mandates a "discriminator" field (see 
https://spec.openapis.org/oas/latest.html#discriminator-object) for 
disambiguating "oneOf" constructs, whereby the discriminator property is part 
of each instance. That approach informs numerous JSON serialization frameworks, 
which implement discriminators under that assumption.

The Java Jackson library indeed supports the Avro JSON Encoding's style of 
putting the discriminator into a wrapper field name (JsonTypeInfo annotation, 
JsonTypeInfo.As.WRAPPER_OBJECT). Many other frameworks only support the 
property approach, though, including the two dominant ones for .NET, Pydantic 
of Python, and others. There's tooling like Redocly that flags that approach as 
a "mistake" (see 
https://redocly.com/docs/resources/discriminator/#property-outside-of-the-object).

What that means is that most existing JSON instances with ambiguous types will 
either use property discriminators or the implementation will rely on duck 
typing as JSON Schema does for validation. The Avro JSON Encoding approach is 
rare and is also counterintuitive for anyone comparing the declared object 
structure and the JSON structure who is not familiar with Avro's encoding 
rules. It has confused a lot of people in our house, for sure.

Proposed is the following approach:

a) add a new, optional "const" attribute that can be applied to any record 
field declaration that is of a primitive type. When present, the attribute 
causes the field to always have this

Issue while generating POJO using Avro Compiler

2024-04-04 Thread Kirti Dhar Upadhyay K via user

Hello,

I am using Avro Compiler (1.11.1) to generate Java Objects (PoJo) from Avro 
Schema (avsc) files.
I found a scenario, where if the field name starts with undersore (e.g. 
_employeeName), getter/setter names are generated using $ symbol 
(getEmployeeName$1()).
Debugging the scenario a bit, found that this is happening due to a check to 
detect conflicted fields as below:


SpecificCompile -> generateMethodName()



// Check for the special case in which the schema defines two fields whose
// names are identical except for the case of the first character:
char firstChar = field.name().charAt(0);
String conflictingFieldName = (Character.isLowerCase(firstChar) ? 
Character.toUpperCase(firstChar)
: Character.toLowerCase(firstChar)) + (field.name().length() > 1 ? 
field.name().substring(1) : "");
boolean fieldNameConflict = schema.getField(conflictingFieldName) != null;


This check doesn't behave properly in case, fieldname starts with underscore 
(as it is same either Upper case or Small case), resulting is conflicted with 
itself only.
Has someone encountered this issue and is there any workaround?

Regards,
Kirti Dhar

Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt

Hi everyone,

As part of The ASF’s 25th anniversary campaign[1], we will be celebrating
projects and communities in multiple ways.

We invite all projects and contributors to participate in the following
ways:

* Individuals - submit your first contribution:
https://news.apache.org/foundation/entry/the-asf-launches-firstasfcontribution-campaign
* Projects - share your public good story:
https://docs.google.com/forms/d/1vuN-tUnBwpTgOE5xj3Z5AG1hsOoDNLBmGIqQHwQT6k8/viewform?edit_requested=true
* Projects - submit a project spotlight for the blog:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278466116
* Projects - contact the Voice of Apache podcast (formerly Feathercast) to
be featured: https://feathercast.apache.org/help/
*  Projects - use the 25th anniversary template and the #ASF25Years hashtag
on social media:
https://docs.google.com/presentation/d/1oDbMol3F_XQuCmttPYxBIOIjRuRBksUjDApjd8Ve3L8/edit#slide=id.g26b0919956e_0_13

If you have questions, email the Marketing & Publicity team at
mark...@apache.org.

Peace,
BKP

[1] https://apache.org/asf25years/

[NOTE: You are receiving this message because you are a contributor to an
Apache Software Foundation project. The ASF will very occasionally send out
messages relating to the Foundation to contributors and members, such as
this one.]

Brian Proffitt
VP, Marketing & Publicity
VP, Conferences

Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald

Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

Getter/Setter name generation using Apache Compiler

2024-03-14 Thread Kirti Dhar Upadhyay K via user


Hello,

I am using Avro Compiler (1.11.1) to generate Java Objects (PoJo) from Avro 
Schema (avsc) files.
I found a scenario, where if the field name (e.g. _employeeName) starts with 
undersore (_), getter/setter names are generated using $ symbol 
(getEmployeeName$1()).
Is there any specific reason for this? Can we override this behaviour?

Regards,
Kirti Dhar

Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald

Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!

We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this year's applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, May 10th, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you to
apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Hangzhou, China in July, 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

Re: avro_schema_from_json return code 22

2024-02-09 Thread Alfredo Cardigliano

Hi

I have been digging a bit and I found that json-c (used by my application)
cannot cohexist with jansson (used by avro) as they define the same symbols.
No compilation or linking error of course, just a weird issue. 
Thank you for the hint.

Alfredo

> On 9 Feb 2024, at 16:28, Martin Grigorov  wrote:
> 
> Hi,
> 
> Error 22 in Linux is EINVAL - Invalid argument
> 
> Martin
> 
> On Fri, Feb 9, 2024 at 5:21 PM Alfredo Cardigliano  > wrote:
>> Hi
>> I am integrating Avro in an application, using avro_schema_from_json to 
>> parse the schema,
>> while this function is working fine when compiling a C sample application, 
>> this fails returning
>> error code 22 when compiling in a C++ application. I am using the libavro 
>> shipped with latest
>> Ubuntu 22. 
>> 
>> Any hint about this return value (22)? Thank you.
>> 
>> Alfredo

Re: avro_schema_from_json return code 22

2024-02-09 Thread Martin Grigorov

Hi,

Error 22 in Linux is EINVAL - Invalid argument

Martin

On Fri, Feb 9, 2024 at 5:21 PM Alfredo Cardigliano 
wrote:

> Hi
> I am integrating Avro in an application, using avro_schema_from_json to
> parse the schema,
> while this function is working fine when compiling a C sample application,
> this fails returning
> error code 22 when compiling in a C++ application. I am using the libavro
> shipped with latest
> Ubuntu 22.
>
> Any hint about this return value (22)? Thank you.
>
> Alfredo

avro_schema_from_json return code 22

2024-02-09 Thread Alfredo Cardigliano

Hi
I am integrating Avro in an application, using avro_schema_from_json to parse 
the schema,
while this function is working fine when compiling a C sample application, this 
fails returning
error code 22 when compiling in a C++ application. I am using the libavro 
shipped with latest
Ubuntu 22. 

Any hint about this return value (22)? Thank you.

Alfredo

Re: Avro query on Testing if Beta feature of Generating faster code is enabled

2024-02-06 Thread Martin Grigorov

On Tue, Feb 6, 2024 at 3:03 PM Chirag Nahata 
wrote:

> hey Martin,
>
> Thanks, This worked for me and I can confirm that I have enabled the flag
> * "-Dorg.apache.avro.specific.use_custom_coders=true"* .
>

> Can you suggest to me how I can observe the performance improvements
> before/after?
>

You can run your favorite profiler with your application.




>
> sincerely
> Chirag Nahata
>
> On Tue, Feb 6, 2024 at 5:03 PM Martin Grigorov 
> wrote:
>
>>
>>
>> On Tue, Feb 6, 2024 at 1:15 PM Siddharth Baranidharan
>>  wrote:
>>
>>> Hey Oscar,
>>> Thank you for your response.
>>> My application is an enterprise server(java based)-agent(python based)
>>> model which uses avro-protocol for communication between server & agents.
>>> To the best of my knowledge I have added the flag “-
>>> *Dorg.apache.avro.specific.use_custom_coders=true*” as a java option to
>>> the
>>> server side.
>>>
>>
>> You can put a breakpoint at
>> https://github.com/apache/avro/blob/d143d6262f7d6c688e695d1656f8609605835604/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L230
>> and see what is the value of "useCustomCoderFlag"
>>
>>
>>
>>>
>>> My question:
>>> How can I test whether this feature is successfully enabled & to answer
>>> your 2nd question, yes I want to test its effects & I am willing to trust
>>> the tests that you have for the flag. Can you detail the steps I need to
>>> follow for the same?
>>>
>>> Thank you,
>>> Siddharth B
>>>
>>> On Tue, Feb 6, 2024 at 1:18 PM Oscar Westra van Holthe - Kind <
>>> os...@westravanholthe.nl> wrote:
>>>
>>> > Hi,
>>> >
>>> > What do you mean by testing if the flag is successfully turned on? Do
>>> you
>>> > need to test its effects? Are you willing to trust the tests we have
>>> on the
>>> > flag?
>>> >
>>> > As for testing the performance difference, we do have a performance
>>> test
>>> > module. Perhaps you can use the same technique?
>>> >
>>> > Kind regards,
>>> > Oscar
>>> >
>>> > --
>>> > Oscar Westra van Holthe - Kind 
>>> >
>>> > Op di 6 feb. 2024 07:15 schreef chirag :
>>> >
>>> >> Hi Team,
>>> >>
>>> >> On the Avro Docs it is mentioned that:- to turn new approach to
>>> generating
>>> >> code that speeds up decoding and encoding set feature flag/system flag
>>> >> org.apache.avro.specific.use_custom_coders to true at runtime.(here
>>> >> <
>>> >>
>>> https://avro.apache.org/docs/1.11.1/getting-started-java/#beta-feature-generating-faster-code
>>> >> >
>>> >> ).
>>> >>
>>> >> Enquiring if:
>>> >>
>>> >>1. There is a way to see if this flag is successfully turned on
>>> during
>>> >>runtime?
>>> >>2. There is a way to measure the performance improvement in doing
>>> so?
>>> >>
>>> >> I have added this system flag to my distributed enterprise
>>> application but
>>> >> I am not sure if it is enabled and if there is a performance
>>> improvement
>>> >> on
>>> >> doing so.
>>> >>
>>> >> Sincerely
>>> >> Chirag Nahata
>>> >>
>>> >
>>>
>>

Re: Avro query on Testing if Beta feature of Generating faster code is enabled

2024-02-06 Thread Martin Grigorov

On Tue, Feb 6, 2024 at 1:15 PM Siddharth Baranidharan
 wrote:

> Hey Oscar,
> Thank you for your response.
> My application is an enterprise server(java based)-agent(python based)
> model which uses avro-protocol for communication between server & agents.
> To the best of my knowledge I have added the flag “-
> *Dorg.apache.avro.specific.use_custom_coders=true*” as a java option to the
> server side.
>

You can put a breakpoint at
https://github.com/apache/avro/blob/d143d6262f7d6c688e695d1656f8609605835604/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L230
and see what is the value of "useCustomCoderFlag"



>
> My question:
> How can I test whether this feature is successfully enabled & to answer
> your 2nd question, yes I want to test its effects & I am willing to trust
> the tests that you have for the flag. Can you detail the steps I need to
> follow for the same?
>
> Thank you,
> Siddharth B
>
> On Tue, Feb 6, 2024 at 1:18 PM Oscar Westra van Holthe - Kind <
> os...@westravanholthe.nl> wrote:
>
> > Hi,
> >
> > What do you mean by testing if the flag is successfully turned on? Do you
> > need to test its effects? Are you willing to trust the tests we have on
> the
> > flag?
> >
> > As for testing the performance difference, we do have a performance test
> > module. Perhaps you can use the same technique?
> >
> > Kind regards,
> > Oscar
> >
> > --
> > Oscar Westra van Holthe - Kind 
> >
> > Op di 6 feb. 2024 07:15 schreef chirag :
> >
> >> Hi Team,
> >>
> >> On the Avro Docs it is mentioned that:- to turn new approach to
> generating
> >> code that speeds up decoding and encoding set feature flag/system flag
> >> org.apache.avro.specific.use_custom_coders to true at runtime.(here
> >> <
> >>
> https://avro.apache.org/docs/1.11.1/getting-started-java/#beta-feature-generating-faster-code
> >> >
> >> ).
> >>
> >> Enquiring if:
> >>
> >>1. There is a way to see if this flag is successfully turned on
> during
> >>runtime?
> >>2. There is a way to measure the performance improvement in doing so?
> >>
> >> I have added this system flag to my distributed enterprise application
> but
> >> I am not sure if it is enabled and if there is a performance improvement
> >> on
> >> doing so.
> >>
> >> Sincerely
> >> Chirag Nahata
> >>
> >
>

Re: [Java] Avro does not respect logicalType with map type

2024-02-06 Thread Martin Grigorov

Hi,



On Fri, Feb 2, 2024 at 11:37 AM Dupa Trianko  wrote:

> I have lib class Dimension3D which contains 3 fields and I want to convert
> it as one avro filed.
>
> My definition of logical type connected classes:
> public class Dim3AvroLogicalType {
>
>   public static final String LOGICAL_TYPE_NAME = "dim3";
>
>   public static class Dim3LogicalType extends LogicalType {
>
> Dim3LogicalType() {
>   super(LOGICAL_TYPE_NAME);
> }
>
> @Override
> public void validate(Schema schema) {
>   super.validate(schema);
>   if (schema.getType() != Schema.Type.MAP) {
> throw new IllegalArgumentException(
> String.format("Logical type '%s' must be backed by Map",
> LOGICAL_TYPE_NAME));
>   }
> }
>   }
>
>   public static class Dim3Factory implements
> LogicalTypes.LogicalTypeFactory {
>
> @Override
> public LogicalType fromSchema(Schema schema) {
>   return new Dim3LogicalType();
> }
>
> @Override
> public String getTypeName() {
>   return LOGICAL_TYPE_NAME;
> }
>   }
>
>   public static class Dim3Conversion extends Conversion {
>
> @Override
> public Class getConvertedType() {
>   return Dimension3D.class;
> }
>
> @Override
> public String getLogicalTypeName() {
>   return LOGICAL_TYPE_NAME;
> }
>
> @Override
> public Dimension3D fromMap(Map value, Schema schema, LogicalType
> type) {
>   return Dimension3D.of(
>   Length.of(new BigDecimal(value.get("length").toString())),
>   Length.of(new BigDecimal(value.get("width").toString())),
>   Length.of(new BigDecimal(value.get("height").toString()))
>   );
> }
>
> @Override
> public Map toMap(Dimension3D dimension3D, Schema schema,
> LogicalType type) {
>   return Map.of(
>   "length", dimension3D.getLength().toDecimal().toString(),
>   "width", dimension3D.getWidth().toDecimal().toString(),
>   "height", dimension3D.getHeight().toDecimal().toString()
>   );
> }
>   }
> }
>
> interesting part of avro definition:
>   {
> "name": "dimension",
> "type": {
>   "type": "map",
>   "logicalType": "dim3",
>   "values": "string"
> }
>   }
>
> after executing maven task I got:
>
> public java.util.Map getDimension().
> Moreover there is no proper converter added to MODEL$ while other
> converters are added correctly.
>
> I'm using:
>   
> org.apache.avro
> avro-maven-plugin
> 1.11.3
>

Did you register your custom factory and conversion ?
See
https://github.com/apache/avro/blob/d143d6262f7d6c688e695d1656f8609605835604/lang/java/integration-test/codegen-test/pom.xml#L60-L69



>
> proper customLogicalTypeFactory & customConversion are added into
> configuration.
>
> How to properly use logical type with map backing type?
>

Re: Avro query on Testing if Beta feature of Generating faster code is enabled

2024-02-05 Thread Oscar Westra van Holthe - Kind

Hi,

What do you mean by testing if the flag is successfully turned on? Do you
need to test its effects? Are you willing to trust the tests we have on the
flag?

As for testing the performance difference, we do have a performance test
module. Perhaps you can use the same technique?

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

Op di 6 feb. 2024 07:15 schreef chirag :

> Hi Team,
>
> On the Avro Docs it is mentioned that:- to turn new approach to generating
> code that speeds up decoding and encoding set feature flag/system flag
> org.apache.avro.specific.use_custom_coders to true at runtime.(here
> <
> https://avro.apache.org/docs/1.11.1/getting-started-java/#beta-feature-generating-faster-code
> >
> ).
>
> Enquiring if:
>
>1. There is a way to see if this flag is successfully turned on during
>runtime?
>2. There is a way to measure the performance improvement in doing so?
>
> I have added this system flag to my distributed enterprise application but
> I am not sure if it is enabled and if there is a performance improvement on
> doing so.
>
> Sincerely
> Chirag Nahata
>

Community over Code EU 2024 Travel Assistance Applications now open!

2024-02-03 Thread Gavin McDonald

Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

[no subject]

2024-02-03 Thread Gavin McDonald

Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

[Java] Avro does not respect logicalType with map type

2024-02-02 Thread Dupa Trianko

 I have lib class Dimension3D which contains 3 fields and I want to convert
it as one avro filed.

My definition of logical type connected classes:
public class Dim3AvroLogicalType {

  public static final String LOGICAL_TYPE_NAME = "dim3";

  public static class Dim3LogicalType extends LogicalType {

Dim3LogicalType() {
  super(LOGICAL_TYPE_NAME);
}

@Override
public void validate(Schema schema) {
  super.validate(schema);
  if (schema.getType() != Schema.Type.MAP) {
throw new IllegalArgumentException(
String.format("Logical type '%s' must be backed by Map",
LOGICAL_TYPE_NAME));
  }
}
  }

  public static class Dim3Factory implements
LogicalTypes.LogicalTypeFactory {

@Override
public LogicalType fromSchema(Schema schema) {
  return new Dim3LogicalType();
}

@Override
public String getTypeName() {
  return LOGICAL_TYPE_NAME;
}
  }

  public static class Dim3Conversion extends Conversion {

@Override
public Class getConvertedType() {
  return Dimension3D.class;
}

@Override
public String getLogicalTypeName() {
  return LOGICAL_TYPE_NAME;
}

@Override
public Dimension3D fromMap(Map value, Schema schema, LogicalType
type) {
  return Dimension3D.of(
  Length.of(new BigDecimal(value.get("length").toString())),
  Length.of(new BigDecimal(value.get("width").toString())),
  Length.of(new BigDecimal(value.get("height").toString()))
  );
}

@Override
public Map toMap(Dimension3D dimension3D, Schema schema,
LogicalType type) {
  return Map.of(
  "length", dimension3D.getLength().toDecimal().toString(),
  "width", dimension3D.getWidth().toDecimal().toString(),
  "height", dimension3D.getHeight().toDecimal().toString()
  );
}
  }
}

interesting part of avro definition:
  {
"name": "dimension",
"type": {
  "type": "map",
  "logicalType": "dim3",
  "values": "string"
}
  }

after executing maven task I got:

public java.util.Map getDimension().
Moreover there is no proper converter added to MODEL$ while other
converters are added correctly.

I'm using:
  
org.apache.avro
avro-maven-plugin
1.11.3

proper customLogicalTypeFactory & customConversion are added into
configuration.

How to properly use logical type with map backing type?

Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread John McClean

fwiw, I'm using it and it works fine, at least for my use cases.

J

On Fri, Jan 12, 2024 at 1:55 AM Martin Grigorov 
wrote:

> Hi Vivek,
>
> I am not sure there is anyone to give you an exact answer. The C++ SDK has
> not been actively developed in the last few years.
> The best is to try it for your use cases and see if it works or not. The
> next step is to contribute Pull Requests for the missing functionalities!
>
> Martin
>
> On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
> vivek.ku...@eclipsetrading.com.invalid> wrote:
>
>> +dev
>>
>>
>> Regards,
>> Vivek Kumar
>>
>> [http://www.eclipsetrading.com/logo.png]
>>
>> Senior Software Developer
>> 23/F One Hennessy
>> 1 Hennessy Road
>> Wan Chai
>> Hong Kong
>> www.eclipsetrading.com
>> +852 2108 7352
>>
>> Follow us today on our online platforms
>> [Facebook][Linked-In]<
>> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
>> https://www.instagram.com/eclipsetrading>
>> 
>> From: Vivek Kumar 
>> Sent: Thursday, January 11, 2024 11:07 AM
>> To: user@avro.apache.org 
>> Subject: Avro schema evolution support in AVRO CPP
>>
>> Hi Avro team,
>>
>> I am writing this email to check the support of Avro schema evolution in
>> CPP - i.e. provide both the producer and consumer schema when decoding the
>> data.
>>
>> I can see that there's a resolvingDecoder function in AVRO CPP that takes
>> two schemas. See
>>
>> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
>>
>> But there's a FIXME comment in this function. See
>> https://issues.apache.org/jira/browse/AVRO-3720 and
>> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
>> Does this mean resolvingDecoder does not work properly? Could you please
>> explain what scenarios are not covered by resolvingDecoder and how can we
>> use it to support "Avro Schema Evolution" in c++?
>>
>> Thanks
>>
>>
>> Regards,
>> Vivek Kumar
>>
>> [http://www.eclipsetrading.com/logo.png]
>>
>> Senior Software Developer
>> 23/F One Hennessy
>> 1 Hennessy Road
>> Wan Chai
>> Hong Kong
>> www.eclipsetrading.com
>> +852 2108 7352
>>
>> Follow us today on our online platforms
>> [Facebook][Linked-In]<
>> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
>> https://www.instagram.com/eclipsetrading>
>>
>

Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread Andrew Marlow

"In practice, it is very rare for schema evolution to change the order of
the fields." - I'll say. Since we are talking about a protocol that is
deliberately not self-describing we cannot just pluck out what we want -
how would such code get to it? This is why the standard advice in these
situations is to never reorder, remove or rename fields and to always add
new stuff to the end.

On Fri, 12 Jan 2024 at 13:19, Thiruvalluvan MG 
wrote:

>  Out-of-order fields are not handled transparently in C++ if you are
> manually using the resolving decoder. (It's the same situation in Java as
> well).
> But, in C++ and in Java, if you generate code for the given Avro schema,
> the generated code takes care of the field ordering issue. Similarly, in
> both bindings, Generic data structures work properly with the field-order.
> If you are using the resolving decoder in your code directly, care must be
> exercises, If the reader schema and writer schema are both records and they
> have fields in different order (it is okay to insert or remove fields), the
> protocol is to first get the field-order array (which is essentially a
> permutation of the reader field ids 0 to n-1) from the resolving decoder
> and then read the fields of the reader schema in the order specified in the
> field-order array. This is done in order to avoid buffering by the decoder.
> Buffering can take a large number of allocations if the out-of-order fields
> is an array or map.
> In practice, it is very rare for schema evolution to change the order of
> the fields.
> Thanks
> ThiruOn Friday, 12 January, 2024 at 03:24:11 pm IST, Martin Grigorov <
> mgrigo...@apache.org> wrote:
>
>  Hi Vivek,
>
> I am not sure there is anyone to give you an exact answer. The C++ SDK has
> not been actively developed in the last few years.
> The best is to try it for your use cases and see if it works or not. The
> next step is to contribute Pull Requests for the missing functionalities!
>
> Martin
>
> On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
> vivek.ku...@eclipsetrading.com.invalid> wrote:
>
> > +dev
> >
> >
> > Regards,
> > Vivek Kumar
> >
> > [http://www.eclipsetrading.com/logo.png]
> >
> > Senior Software Developer
> > 23/F One Hennessy
> > 1 Hennessy Road
> > Wan Chai
> > Hong Kong
> > www.eclipsetrading.com
> > +852 2108 7352
> >
> > Follow us today on our online platforms
> > [Facebook][Linked-In]<
> > https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> > https://www.instagram.com/eclipsetrading>
> > 
> > From: Vivek Kumar 
> > Sent: Thursday, January 11, 2024 11:07 AM
> > To: user@avro.apache.org 
> > Subject: Avro schema evolution support in AVRO CPP
> >
> > Hi Avro team,
> >
> > I am writing this email to check the support of Avro schema evolution in
> > CPP - i.e. provide both the producer and consumer schema when decoding
> the
> > data.
> >
> > I can see that there's a resolvingDecoder function in AVRO CPP that takes
> > two schemas. See
> >
> >
> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
> >
> > But there's a FIXME comment in this function. See
> > https://issues.apache.org/jira/browse/AVRO-3720 and
> >
> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
> > Does this mean resolvingDecoder does not work properly? Could you please
> > explain what scenarios are not covered by resolvingDecoder and how can we
> > use it to support "Avro Schema Evolution" in c++?
> >
> > Thanks
> >
> >
> > Regards,
> > Vivek Kumar
> >
> > [http://www.eclipsetrading.com/logo.png]
> >
> > Senior Software Developer
> > 23/F One Hennessy
> > 1 Hennessy Road
> > Wan Chai
> > Hong Kong
> > www.eclipsetrading.com
> > +852 2108 7352
> >
> > Follow us today on our online platforms
> > [Facebook][Linked-In]<
> > https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> > https://www.instagram.com/eclipsetrading>
> >
>



-- 
Regards,

Andrew Marlow
http://www.andrewpetermarlow.co.uk

Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread Thiruvalluvan MG via user

 Out-of-order fields are not handled transparently in C++ if you are manually 
using the resolving decoder. (It's the same situation in Java as well).
But, in C++ and in Java, if you generate code for the given Avro schema, the 
generated code takes care of the field ordering issue. Similarly, in both 
bindings, Generic data structures work properly with the field-order.
If you are using the resolving decoder in your code directly, care must be 
exercises, If the reader schema and writer schema are both records and they 
have fields in different order (it is okay to insert or remove fields), the 
protocol is to first get the field-order array (which is essentially a 
permutation of the reader field ids 0 to n-1) from the resolving decoder and 
then read the fields of the reader schema in the order specified in the 
field-order array. This is done in order to avoid buffering by the decoder. 
Buffering can take a large number of allocations if the out-of-order fields is 
an array or map.
In practice, it is very rare for schema evolution to change the order of the 
fields.
Thanks
ThiruOn Friday, 12 January, 2024 at 03:24:11 pm IST, Martin Grigorov 
 wrote:  

 Hi Vivek,

I am not sure there is anyone to give you an exact answer. The C++ SDK has
not been actively developed in the last few years.
The best is to try it for your use cases and see if it works or not. The
next step is to contribute Pull Requests for the missing functionalities!

Martin

On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
vivek.ku...@eclipsetrading.com.invalid> wrote:

> +dev
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook][Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
> 
> From: Vivek Kumar 
> Sent: Thursday, January 11, 2024 11:07 AM
> To: user@avro.apache.org 
> Subject: Avro schema evolution support in AVRO CPP
>
> Hi Avro team,
>
> I am writing this email to check the support of Avro schema evolution in
> CPP - i.e. provide both the producer and consumer schema when decoding the
> data.
>
> I can see that there's a resolvingDecoder function in AVRO CPP that takes
> two schemas. See
>
> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
>
> But there's a FIXME comment in this function. See
> https://issues.apache.org/jira/browse/AVRO-3720 and
> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
> Does this mean resolvingDecoder does not work properly? Could you please
> explain what scenarios are not covered by resolvingDecoder and how can we
> use it to support "Avro Schema Evolution" in c++?
>
> Thanks
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook][Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
>

Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread Martin Grigorov

Hi Vivek,

I am not sure there is anyone to give you an exact answer. The C++ SDK has
not been actively developed in the last few years.
The best is to try it for your use cases and see if it works or not. The
next step is to contribute Pull Requests for the missing functionalities!

Martin

On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
vivek.ku...@eclipsetrading.com.invalid> wrote:

> +dev
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook][Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
> 
> From: Vivek Kumar 
> Sent: Thursday, January 11, 2024 11:07 AM
> To: user@avro.apache.org 
> Subject: Avro schema evolution support in AVRO CPP
>
> Hi Avro team,
>
> I am writing this email to check the support of Avro schema evolution in
> CPP - i.e. provide both the producer and consumer schema when decoding the
> data.
>
> I can see that there's a resolvingDecoder function in AVRO CPP that takes
> two schemas. See
>
> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
>
> But there's a FIXME comment in this function. See
> https://issues.apache.org/jira/browse/AVRO-3720 and
> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
> Does this mean resolvingDecoder does not work properly? Could you please
> explain what scenarios are not covered by resolvingDecoder and how can we
> use it to support "Avro Schema Evolution" in c++?
>
> Thanks
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook][Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
>

Re: Metadata / Annotations support

2023-10-11 Thread Oscar Westra van Holthe - Kind

On wed 11 okt. 2023 17:24, Gustavo Monarin  wrote:

> Sometimes is usefull to give a contextual information to a field.
>
> Other protocols like protobuf support annotations through what they call
> Option 
> which allows the following customization (uuid field annotation):
>
>
> ```
> message FarmerRegisteredEvent {
>
>   string uuid = 1[(pi2schema.subject_identifier) = true];
>
>   oneof personalData {
> ContactInfo contactInfo = 2;
> pi2schema.EncryptedPersonalData encryptedPersonalData = 6;
>   }
>
>   google.protobuf.Timestamp registeredAt = 4;
>   string referrer = 5;
>
> }
> ```
>
> Would it be possible to add such metadata information in an avro schema?
>
Yes. On schemata, fields and messages, you an add any property other than
the standard ones (like 'name', 'doc', etc.) with any json value as value.

As far as Avro is concerned these are ignored, but you can use them in your
code.

Sometimes you want more though, and for that there are logical types and
(for Java code generation) the 'javaAnnotation' property.

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind

Release of Avro Java with updates Jackson.databind library

2023-10-11 Thread Richard Ney via user

Was wondering if there's a timeline to release a version of the Java
library that is using the newer version of the
com.fasterxml.jackson.core:jackson-databind? Most of my other dependencies
have migrated to 2.15.x but since my project uses Avro I'm unable to
upgrade due to the incompatibility between 2.14.2 and 2.15.x.

I see that the POM for the Avro Java library on master was updated but the
most recent release of Avro is still using 2.14.2

Metadata / Annotations support

2023-10-11 Thread Gustavo Monarin

Sometimes is usefull to give a contextual information to a field.

Other protocols like protobuf support annotations through what they call
Option 
which allows the following customization (uuid field annotation):


```
message FarmerRegisteredEvent {

  string uuid = 1[(pi2schema.subject_identifier) = true];

  oneof personalData {
ContactInfo contactInfo = 2;
pi2schema.EncryptedPersonalData encryptedPersonalData = 6;
  }

  google.protobuf.Timestamp registeredAt = 4;
  string referrer = 5;

}
```


Would it be possible to add such metadata information in an avro schema?

Gustavo Monarin de Sousa

CVE-2023-39410: Apache Avro Java SDK: Memory when deserializing untrusted data in Avro Java SDK

2023-09-29 Thread Ryan Skraba

Severity: low

Affected versions:

- Apache Avro Java SDK before 1.11.3

Description:

When deserializing untrusted or corrupted data, it is possible for a reader to 
consume memory beyond the allowed constraints and thus lead to out of memory on 
the system.

This issue affects Java applications using Apache Avro Java SDK up to and 
including 1.11.2.  Users should update to apache-avro version 1.11.3 which 
addresses this issue.

This issue is being tracked as AVRO-3819 

Credit:

Adam Korczynski at ADA Logics Ltd (finder)

References:

https://avro.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-39410
https://issues.apache.org/jira/browse/AVRO-3819

[ANNOUNCE] Apache Avro 1.11.3 released

2023-09-26 Thread Ryan Skraba

The Apache Avro community is pleased to announce the release of Avro 1.11.3!

All signed release artifacts, signatures and verification instructions can
be found here: https://avro.apache.org/releases.html

This is a minor release, specifically addressing known issues with the
1.11.2 release, but also contains version bumps and doc fixes. The
link to all fixed JIRA issues and a brief summary can be found at:
https://github.com/apache/avro/releases/tag/release-1.11.3

In addition, language-specific release artifacts are available:

* C#: https://www.nuget.org/packages/Apache.Avro/1.11.3
* Java: https://repo1.maven.org/maven2/org/apache/avro/avro/1.11.3/
* Javascript: https://www.npmjs.com/package/avro-js/v/1.11.3
* Perl: https://metacpan.org/release/Avro
* Python 3: https://pypi.org/project/avro/1.11.3
* Ruby: https://rubygems.org/gems/avro/versions/1.11.3
* Rust: https://crates.io/crates/apache-avro/0.16.0

Thanks to everyone for contributing!

Ryan

Registration open for Community Over Code North America

2023-08-28 Thread Rich Bowen

Hello! Registration is still open for the upcoming Community Over Code
NA event in Halifax, NS! We invite you to  register for the event
https://communityovercode.org/registration/

Apache Committers, note that you have a special discounted rate for the
conference at US$250. To take advantage of this rate, use the special
code sent to the committers@ list by Brian Proffitt earlier this month.

If you are in need of an invitation letter, please consult the
information at https://communityovercode.org/visa-letter/

Please see https://communityovercode.org/ for more information about
the event, including how to make reservations for discounted hotel
rooms in Halifax. Discounted rates will only be available until Sept.
5, so reserve soon!

--Rich, for the event planning team

Re: EOS/EOL Date

2023-07-17 Thread Ryan Skraba

Hello!  While Avro doesn't have an official "end-of-life" statement or
policy, there is no active development on the 1.9 or 1.10 branch.

Our current policy is to add major features to the next major release
(1.12.0) while bug fixes, CVEs and minor features will be backported
to the next minor release (1.11.3).

I think we *should* have a policy in place, for projects that depend
on Avro to have a better visiblity.  I will bring this up on the
d...@avro.apache.org mailing list -- please feel free to join the
discussion!

All my best, Ryan

On Mon, Jul 17, 2023 at 11:19 AM Pranav Kumar (EXT) via user
 wrote:
>
> Hi,
>
>
>
> Could you please share End of life/End of support detail or any EoS criteria 
> that is followed for below:
>
>
>
> Apache Avro version-1.9.2
>
>
>
> Regards,
>
> Pranav

EOS/EOL Date

2023-07-17 Thread Pranav Kumar (EXT) via user

Hi,

Could you please share End of life/End of support detail or any EoS criteria 
that is followed for below:


  *   Apache Avro version-1.9.2

Regards,
Pranav

[ANNOUNCE] Apache Avro 1.11.2 released

2023-07-11 Thread Ryan Skraba

The Apache Avro community is pleased to announce the release of Avro 1.11.2!

All signed release artifacts, signatures and verification instructions can
be found here: https://avro.apache.org/releases.html

This release addresses ~89 Avro JIRA, including some interesting highlights:

C#
- AVRO-3434: Support logical schemas in reflect reader and writer
- AVRO-3670: Add NET 7.0 support
- AVRO-3724: Fix C# JsonEncoder for nested array of records
- AVRO-3756: Add a method to return types instead of writing them to disk

C++
- AVRO-3601: C++ API header contains breaking include
- AVRO-3705: C++17 support

Java
- AVRO-2943: Add new GenericData String/Utf8 ARRAY comparison test
- AVRO-2943: improve GenericRecord MAP type comparison
- AVRO-3473: Use ServiceLoader to discover Conversion
- AVRO-3536: Inherit conversions for Union type
- AVRO-3597: Allow custom readers to override string creation
- AVRO-3560: Throw SchemaParseException on dangling content beyond end of schema
- AVRO-3602: Support Map(with non-String keys) and Set in ReflectDatumReader
- AVRO-3676: Produce valid toString() for UUID JSON
- AVRO-3698: SpecificData.getClassName must replace reserved words
- AVRO-3700: Publish Java SBOM artifacts with CycloneDX
- AVRO-3783: Read LONG length for bytes, only allow INT sizes
- AVRO-3706: accept space in folder name

Python
- AVRO-3761: Fix broken validation of nullable UUID field
- AVRO-3229: Raise on invalid enum default only if validation enabled
- AVRO-3622: Fix compatibility check for schemas having or missing namespace
- AVRO-3669: Add py.typed marker file (PEP561 compliance)
- AVRO-3672: Add CI testing for Python 3.11
- AVRO-3680: allow to disable name validation

Ruby
- AVRO-3775: Fix decoded default value of logical type
- AVRO-3697: Test against Ruby 3.2
- AVRO-3722: Eagerly initialize instance variables for better inline cache hits

Rust
- Many, many bug fixes and implementation progress in this experimental SDK.
- Rust CI builds and lints are passing, and has been released to
crates.io as version 0.15.0

In addition:

- Upgrade dependencies to latest versions, including CVE fixes.
- Testing and build improvements.
- Performance fixes, other bug fixes, better documentation and more...

The link to all fixed JIRA issues and a brief summary can be found at:
https://github.com/apache/avro/releases/tag/release-1.11.2

In addition, language-specific release artifacts are available:

* C#: https://www.nuget.org/packages/Apache.Avro/1.11.2
* Java: https://repo1.maven.org/maven2/org/apache/avro/avro/1.11.2/
* Javascript: https://www.npmjs.com/package/avro-js/v/1.11.2
* Perl: https://metacpan.org/release/Avro
* Python 3: https://pypi.org/project/avro/1.11.2
* Ruby: https://rubygems.org/gems/avro/versions/1.11.2
* Rust: https://crates.io/crates/apache-avro/0.15.0

**Important**: a known issue has been discovered after the release that may
affect the Java SDK when using the MAP type.

- AVRO-3789 [Java]: Problem when comparing empty MAP types.

Thanks to everyone for contributing!

CFP for the 2nd Performance Engineering track at Community over Code NA 2023

2023-07-03 Thread Brebner, Paul

Hi Apache Avro people - There are only 10 days left to submit a talk proposal 
(title and abstract only) for Community over Code NA 2023! The 2nd Performance 
Engineering track is on this year so any Apache project-related performance and 
scalability talks are welcome, here's the track CFP for more ideas and links 
including the official Apache submission page: 
https://www.linkedin.com/pulse/call-papers-2nd-performance-engineering-track-over-code-brebner/
  - Paul Brebner and Roger Abelenda

TAC Applications for Community Over Code North America and Asia now open

2023-06-16 Thread Gavin McDonald

Hi All,

(This email goes out to all our user and dev project mailing lists, so you
may receive this
email more than once.)

The Travel Assistance Committee has opened up applications to help get
people to the following events:


*Community Over Code Asia 2023 - *
*August 18th to August 20th in Beijing , China*

Applications for this event closes on the 6th July so time is short, please
apply as soon as possible. TAC is prioritising applications from the Asia
and Oceania regions.

More details on this event can be found at:
https://apachecon.com/acasia2023/

More information on how to apply please read: https://tac.apache.org/


*Community Over Code North America - *
*October 7th to October 10th in Halifax, Canada*

Applications for this event closes on the 22nd July. We expect many
applications so please do apply as soon as you can. TAC is prioritising
applications from the North and South America regions.

More details on this event can be found at: https://communityovercode.org/

More information on how to apply please read: https://tac.apache.org/


*Have you applied to be a Speaker?*

If you have applied or intend to apply as a Speaker at either of these
events, and think you
may require assistance for Travel and/or Accommodation - TAC advises that
you do not
wait until you have been notified of your speaker status and to apply
early. Should you
not be accepted as a speaker and still wish to attend you can amend you
application to
include Conference fees, or, you may withdraw your application.

The call for presentations for Halifax is here:
https://communityovercode.org/call-for-presentations/
and you have until the 13th of July to apply.

The call for presentations for Beijing is here:
https://apachecon.com/acasia2023/cfp.html
and you have until the 18th June to apply.

*IMPORTANT Note on Visas:*

It is important that you apply for a Visa as soon as possible - do not wait
until you know if you have been accepted for Travel Assistance or not, as
due to current wait times for Interviews in some Countries, waiting that
long may be too late, so please do apply for a Visa right away. Contact
tac-ap...@tac.apache.org if you need any more information or assistance in
this area.

*Spread the Word!!*

TAC encourages you to spread the word about Travel Assistance to get to
these events, so feel free to repost as you see fit on Social Media, at
work, schools, universities etc etc...

Thank You and hope to see you all soon

Gavin McDonald on behalf of the ASF Travel Assistance Committee.

configure Avro to use a different hashing algorithm

2023-06-08 Thread Abhishek Dasgupta

Hi Avro experts,
We are using "avro-1.10.2"   on python3
So it seems that the MD5 hash is baked into the Avro protocol standard.
The problem is that the Avro protocol library we use to communicate
between  server and agent is using MD5 hash, which causes a security
violation exception when run on python3+Redhat+FIPS.

Please suggest is there a way to use a different algorithm by passing some
parameters or some hooks to avro ?

Currently, we are trying to modify avro's python code which is not good and
is causing more problems.

Thanks,

Re: enum schema evolution

2023-05-05 Thread Brennan Vincent

https://umanwizard.com/avro.rst

Read the section called "Is it always possible to tell, given two schemas,
whether they are compatible?". The example in that section uses unions, but your
example is similar. The schemas are "compatible" in the sense that it is
possible to resolve one against the other; however, _at runtime_ an error might
be signaled if the writer actually happens to write the enum value that is
missing in the reader.

This is not strictly a "bug", but I think Avro is missing some nuance in how
schema compatibility is described. Rather than just "compatible" vs. "not
compatible", it should actually distinguish "definitely not compatible",
"compatible depending on what data is observed at runtime", and "definitely
compatible".

On 2023-05-05 12:54, KV 59 wrote:
> It works both ways in compatibility Validation. I have used the Avro library 
> for
> compatibility check
> SchemaValidator validator = builder.mutualReadStrategy().validateAll();
> 
> 
> The SchemaValidator says it is valid with the first schema as reader and 
> second
> schema as writer and vice versa is valid as well. 
> 
> Obviously this is not the case. since symbol "S7" is not present in schema1 
> and
> also it has no default value defined,
> 
> 
> 
> On Fri, May 5, 2023 at 7:37 AM Brennan Vincent  > wrote:
> 
> Which one is the writer and which is the reader?
> 
> Sent from my iPhone
> 
>> On May 4, 2023, at 22:47, KV 59 > > wrote:
>>
>> 
>> Hi,
>>
>> I see that that java Avro compatibility check doesn't work as per
>> specification for enum schema evolution. I have a the following schema 
>>
>> {
>>   "type" : "record",
>>   "name" : "TestEnumRec",
>>   "namespace" : "com.five9.avro.enum.test",
>>   "fields" : [ {
>>     "name" : "enumType",
>>     "type" : {
>>       "type" : "enum",
>>       "name" : "EnumType",
>>       "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6" ]
>>     }
>>   } ]
>> }
>>
>>
>> And another version of the same schema 
>>
>> {
>>   "type" : "record",
>>   "name" : "TestEnumRec",
>>   "namespace" : "com.five9.avro.enum.test",
>>   "fields" : [ {
>>     "name" : "enumType",
>>     "type" : {
>>       "type" : "enum",
>>       "name" : "EnumType",
>>       "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6", "S7" ]
>>     }
>>   } ]
>> }
>>
>>
>>  These schemas show as compatible for mutual read
>>
>> This is not in line with what the specification says
>>
>>  *
>>
>> if both are enums: if the writer’s symbol is not present in the
>> reader’s enum and the reader has a default value, then that value
>> is used, otherwise an error is signalled.
>>
>>
>> I have tried this in Avro 1.9.1 and 1.11.1. Is this a bug?  If not, what
>> am I doing wrong?
>>
>> Appreciate reponses
>>
>> Regards,
>> Kishore
>

Re: enum schema evolution

2023-05-05 Thread KV 59

It works both ways in compatibility Validation. I have used the Avro
library for compatibility check
SchemaValidator validator = builder.mutualReadStrategy().validateAll();


The SchemaValidator says it is valid with the first schema as reader and
second schema as writer and vice versa is valid as well.

Obviously this is not the case. since symbol "S7" is not present in schema1
and also it has no default value defined,



On Fri, May 5, 2023 at 7:37 AM Brennan Vincent 
wrote:

> Which one is the writer and which is the reader?
>
> Sent from my iPhone
>
> On May 4, 2023, at 22:47, KV 59  wrote:
>
> 
> Hi,
>
> I see that that java Avro compatibility check doesn't work as per
> specification for enum schema evolution. I have a the following schema
>
> {
>>   "type" : "record",
>>   "name" : "TestEnumRec",
>>   "namespace" : "com.five9.avro.enum.test",
>>   "fields" : [ {
>> "name" : "enumType",
>> "type" : {
>>   "type" : "enum",
>>   "name" : "EnumType",
>>   "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6" ]
>> }
>>   } ]
>> }
>
>
> And another version of the same schema
>
> {
>>   "type" : "record",
>>   "name" : "TestEnumRec",
>>   "namespace" : "com.five9.avro.enum.test",
>>   "fields" : [ {
>> "name" : "enumType",
>> "type" : {
>>   "type" : "enum",
>>   "name" : "EnumType",
>>   "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6", "S7" ]
>> }
>>   } ]
>> }
>
>
>  These schemas show as compatible for mutual read
>
> This is not in line with what the specification says
>
>>
>>-
>>
>>if both are enums: if the writer’s symbol is not present in the
>>reader’s enum and the reader has a default value, then that value is used,
>>otherwise an error is signalled.
>>
>>
> I have tried this in Avro 1.9.1 and 1.11.1. Is this a bug?  If not, what
> am I doing wrong?
>
> Appreciate reponses
>
> Regards,
> Kishore
>
>

Re: enum schema evolution

2023-05-05 Thread Brennan Vincent

Which one is the writer and which is the reader?

Sent from my iPhone

> On May 4, 2023, at 22:47, KV 59  wrote:
> 
> 
> Hi,
> 
> I see that that java Avro compatibility check doesn't work as per 
> specification for enum schema evolution. I have a the following schema 
>> {
>>   "type" : "record",
>>   "name" : "TestEnumRec",
>>   "namespace" : "com.five9.avro.enum.test",
>>   "fields" : [ {
>> "name" : "enumType",
>> "type" : {
>>   "type" : "enum",
>>   "name" : "EnumType",
>>   "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6" ]
>> }
>>   } ]
>> }
> 
> And another version of the same schema 
> 
>> {
>>   "type" : "record",
>>   "name" : "TestEnumRec",
>>   "namespace" : "com.five9.avro.enum.test",
>>   "fields" : [ {
>> "name" : "enumType",
>> "type" : {
>>   "type" : "enum",
>>   "name" : "EnumType",
>>   "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6", "S7" ]
>> }
>>   } ]
>> }
> 
>  These schemas show as compatible for mutual read
> 
> This is not in line with what the specification says
>> if both are enums: if the writer’s symbol is not present in the reader’s 
>> enum and the reader has a default value, then that value is used, otherwise 
>> an error is signalled.
>> 
> 
> I have tried this in Avro 1.9.1 and 1.11.1. Is this a bug?  If not, what am I 
> doing wrong?
> 
> Appreciate reponses
> 
> Regards,
> Kishore

enum schema evolution

2023-05-04 Thread KV 59

Hi,

I see that that java Avro compatibility check doesn't work as per
specification for enum schema evolution. I have a the following schema

{
>   "type" : "record",
>   "name" : "TestEnumRec",
>   "namespace" : "com.five9.avro.enum.test",
>   "fields" : [ {
> "name" : "enumType",
> "type" : {
>   "type" : "enum",
>   "name" : "EnumType",
>   "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6" ]
> }
>   } ]
> }


And another version of the same schema

{
>   "type" : "record",
>   "name" : "TestEnumRec",
>   "namespace" : "com.five9.avro.enum.test",
>   "fields" : [ {
> "name" : "enumType",
> "type" : {
>   "type" : "enum",
>   "name" : "EnumType",
>   "symbols" : [ "S1", "S2", "S3", "S4", "S5", "S6", "S7" ]
> }
>   } ]
> }


 These schemas show as compatible for mutual read

This is not in line with what the specification says

>
>-
>
>if both are enums: if the writer’s symbol is not present in the
>reader’s enum and the reader has a default value, then that value is used,
>otherwise an error is signalled.
>
>
I have tried this in Avro 1.9.1 and 1.11.1. Is this a bug?  If not, what am
I doing wrong?

Appreciate reponses

Regards,
Kishore

Re: Subscribe to Avro users mailing list

2023-03-10 Thread Abhishek Dasgupta

Oops :) I thought I had unsubscribed to this mailing list. That's why I
wrote such a random email. I thought a bot would respond.
Seems like I was wrong.

On Fri, Mar 10, 2023 at 3:23 PM Martin Grigorov 
wrote:

> Hi,
>
> I didn't notice an email to be moderated, so I guess you are already
> subscribed.
> Feel free to start new topics/threads and ask any questions!
>
> On Fri, Mar 10, 2023 at 11:31 AM Abhishek Dasgupta <
> abhishekdasgupta...@gmail.com> wrote:
>
>> Hi Team,
>> I would like to ask questions.
>>
>

Re: Subscribe to Avro users mailing list

2023-03-10 Thread Martin Grigorov

Hi,

I didn't notice an email to be moderated, so I guess you are already
subscribed.
Feel free to start new topics/threads and ask any questions!

On Fri, Mar 10, 2023 at 11:31 AM Abhishek Dasgupta <
abhishekdasgupta...@gmail.com> wrote:

> Hi Team,
> I would like to ask questions.
>

Subscribe to Avro users mailing list

2023-03-10 Thread Abhishek Dasgupta

Hi Team,
I would like to ask questions.

Re: Ints & Bools Issue

2023-01-23 Thread Martin Grigorov

Hello David,

Please file a ticket at JIRA for this problem. And attach or link to a a
mini application demonstrating the problem.
I assume the bug is in the C++ code, since the Java SDK is more widely used.

On Mon, Jan 23, 2023 at 5:28 AM David Funnell 
wrote:

> Hello,
>
> My coworker and I are having an issue with what seems like the decoder not
> respecting byte lengths of ints and bools.
>
> I am working in c++ and we use kafka to consume and produce.  I am the
> producer and am encoding to binary and sending a data set to a server.
>
> My coworker is working in java, and is consuming and decoding, and we are
> using the same schema.  We have had an issue where any values sent as type
> int or boolean, appear to cause any following fields to start reading from
> the wrong section in memory when decoded.
>
> We have a simple set up and have followed documentation closely, the only
> error we get is an occasional exception that is very vague, but seems to be
> something to do with the int binary decoder.  We are unsure what the issue
> is, or if there's a configuration setting with avro we have missed.  We
> have tried statically casting values before the encoding process just in
> case, but this hasn't made a difference
>
> We have overcome this by using floats for everything int or bool however
> some help in not having to do this would be appreciated.
>
> Thanks
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.

Ints & Bools Issue

2023-01-22 Thread David Funnell

Hello,

My coworker and I are having an issue with what seems like the decoder not
respecting byte lengths of ints and bools.

I am working in c++ and we use kafka to consume and produce.  I am the
producer and am encoding to binary and sending a data set to a server.

My coworker is working in java, and is consuming and decoding, and we are
using the same schema.  We have had an issue where any values sent as type
int or boolean, appear to cause any following fields to start reading from
the wrong section in memory when decoded.

We have a simple set up and have followed documentation closely, the only
error we get is an occasional exception that is very vague, but seems to be
something to do with the int binary decoder.  We are unsure what the issue
is, or if there's a configuration setting with avro we have missed.  We
have tried statically casting values before the encoding process just in
case, but this hasn't made a difference

We have overcome this by using floats for everything int or bool however
some help in not having to do this would be appreciated.

Thanks

-- 
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify the system manager. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.

Re: Skip Namespace while Decoding Union Types from JSON

2022-12-05 Thread Oscar Westra van Holthe - Kind

Hi Chirag,

Please note that json encoded Avro is NOT the same as plain json. It's a
special json dialect that is generally not compatible with regular json
data.

Sad news: there exists no json parser that yields Avro records. Json simply
lacks the context information that Avro union parsing needs.

Kind regard,
Oscar

-- 
Oscar Westra van Holthe - Kind 

Op ma 5 dec. 2022 12:47 schreef Chirag Dewan :

> Hello,
>
> My system is receiving JSON data serialized from JSON schemas. I am trying
> to represent the *minProperties *and *maxProperties *types in JSON schema
> in AVRO using Union types. But the problem is, AVRO JSON decoder expect the
> Union branch to be present in data and it searches the union type by fully
> qualified name i.e. namespace and type name.
>
> Unfortunately, this is not how my data is encoded.
>
> My input JSON is encoded like this:
>
> {"Request": {
> 
>   }
> }
>
> And I represent it as:
>
> {
> "name": "Schemas",
> "namespace": "com.sample",
> "type": ["null", {
> "type": "record"
> "name": "Request"
> ...
>
> }
> }
>
> So AVRO JSON decoder expects the following:
>
> {"com.sample.Request": {
> 
>   }
> }
>
> One way I thought I could solve this is by using blank namespaces. But that 
> messes my Java class generation in default package.
>
>
> Any way around this? Any help is appreciated.
>
> Thank you.
>
>
>
>

Skip Namespace while Decoding Union Types from JSON

2022-12-05 Thread Chirag Dewan

Hello,

My system is receiving JSON data serialized from JSON schemas. I am trying
to represent the *minProperties *and *maxProperties *types in JSON schema
in AVRO using Union types. But the problem is, AVRO JSON decoder expect the
Union branch to be present in data and it searches the union type by fully
qualified name i.e. namespace and type name.

Unfortunately, this is not how my data is encoded.

My input JSON is encoded like this:

{"Request": {

  }
}

And I represent it as:

{
"name": "Schemas",
"namespace": "com.sample",
"type": ["null", {
"type": "record"
"name": "Request"
...

}
}

So AVRO JSON decoder expects the following:

{"com.sample.Request": {

  }
}

One way I thought I could solve this is by using blank namespaces. But
that messes my Java class generation in default package.


Any way around this? Any help is appreciated.

Thank you.

Re: Modifying a field's schema property in Java

2022-11-23 Thread Ryan Skraba

Thanks Oscar!

Julien (or anyone else) -- do you think it would be useful to have a
category of "Schema" objects that are mutable for the Java SDK?

Something like:

MutableSchema ms = originalSchema.unlock();
ms.getField("quantity").setProperty("precision", 5);
ms.getField("dept").setFieldName("department_id");
ms.getField("department_id").setType(Schema.Type.LONG);
Schema modifiedSchema = ms.lock();

This would be a major change to the Java SDK, but in the past, we've
used a lot of "ad hoc" or dynamic, transient schemas and making
changes has always been a pain point!

All my best, Ryan

On Sun, Nov 13, 2022 at 8:19 AM Oscar Westra van Holthe - Kind
 wrote:
>
> On sun 13 nov. 2022 05:34, Julien Phalip  wrote:
>>
>> I've got a schema with multiple levels of depths (sub-records) that I would 
>> need to change slightly. [...]
>>
>> Is there a way to make this type of modification on an existing schema, or 
>> do you have to recreate the entire schema from scratch?
>
>
> After creation, Avro schemata are immutable. To make such modifications you 
> can use a visitor. There already is some code available to help you along: 
> you can find an example in the module avro-compiler, that replaces references 
> to named schemata with the actual schema.
>
> IIRC, you're looking for the Schemas class. The interface you need to 
> implement has the word 'visitor' in the name.
>
> Kind regards,
> Oscar
>
> --
> Oscar Westra van Holthe - Kind

Re: Modifying a field's schema property in Java

2022-11-12 Thread Oscar Westra van Holthe - Kind

On sun 13 nov. 2022 05:34, Julien Phalip  wrote:

> I've got a schema with multiple levels of depths (sub-records) that I
> would need to change slightly. [...]
>
> Is there a way to make this type of modification on an existing schema, or
> do you have to recreate the entire schema from scratch?
>

After creation, Avro schemata are immutable. To make such modifications you
can use a visitor. There already is some code available to help you along:
you can find an example in the module avro-compiler, that replaces
references to named schemata with the actual schema.

IIRC, you're looking for the Schemas class. The interface you need to
implement has the word 'visitor' in the name.

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

>

Modifying a field's schema property in Java

2022-11-12 Thread Julien Phalip

Hi,

I've got a schema with multiple levels of depths (sub-records) that I would
need to change slightly. In particular, I need to change the "precision"
and "scale" properties on a decimal field.

My problem is that apparently in Java a field's properties are immutable.
If you call the "add prop()" method for a property that already exists, you
get an exception.

Is there a way to make this type of modification on an existing schema, or
do you have to recreate the entire schema from scratch?

Thank you,

Julien

Re: Rust Serde Deserialize

2022-08-30 Thread Martin Grigorov

https://github.com/lerouxrgd/rsgen-avro/pull/36

On Tue, Aug 30, 2022 at 9:07 AM Martin Grigorov 
wrote:

> Hi Rajiv,
>
> Now, when you said it I remembered about this prerequisite for serializing
> bytes.
> I will open a PR to rsgen-avro to add #[serde(with = "serde_bytes")] when
> "bytes" schema is used!
>
> Regards,
> Martin
>
> On Mon, Aug 29, 2022 at 5:10 PM Rajiv M Ranganath <
> rajiv.rangan...@gmail.com> wrote:
>
>> Hi Martin,
>>
>> It looks like there are no changes needed on the Avro side. I was not
>> using Serde correctly.
>>
>> On the Rust side, we need to use `serde_bytes` [1] crate, and define our
>> Rust struct as follows.
>>
>> ```
>> #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize,
>> serde::Serialize)]
>> #[serde(default)]
>> pub struct Abcd {
>> #[serde(with = "serde_bytes")]
>> pub b: Option>,
>> }
>> ```
>>
>> That seems to make things work.
>>
>> Thanks again for the reply.
>>
>> Best,
>> Rajiv
>>
>> [1] https://docs.rs/serde_bytes/latest/serde_bytes/
>>
>> On Mon, Aug 29, 2022 at 5:18 PM Rajiv M Ranganath
>>  wrote:
>> >
>> > On Mon, Aug 29, 2022 at 3:18 PM Martin Grigorov 
>> wrote:
>> >
>> > [...]
>> >
>> > > Do you want to contribute the code as a failing unit test in a Pull
>> > > Request ?
>> > > With a fix would be awesome!
>> >
>> > Okay. Please give me a couple of days. I'll investigate and open a PR.
>> >
>> > Best,
>> > Rajiv
>>
>

Re: Rust Serde Deserialize

2022-08-30 Thread Martin Grigorov

Hi Rajiv,

Now, when you said it I remembered about this prerequisite for serializing
bytes.
I will open a PR to rsgen-avro to add #[serde(with = "serde_bytes")] when
"bytes" schema is used!

Regards,
Martin

On Mon, Aug 29, 2022 at 5:10 PM Rajiv M Ranganath 
wrote:

> Hi Martin,
>
> It looks like there are no changes needed on the Avro side. I was not
> using Serde correctly.
>
> On the Rust side, we need to use `serde_bytes` [1] crate, and define our
> Rust struct as follows.
>
> ```
> #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize,
> serde::Serialize)]
> #[serde(default)]
> pub struct Abcd {
> #[serde(with = "serde_bytes")]
> pub b: Option>,
> }
> ```
>
> That seems to make things work.
>
> Thanks again for the reply.
>
> Best,
> Rajiv
>
> [1] https://docs.rs/serde_bytes/latest/serde_bytes/
>
> On Mon, Aug 29, 2022 at 5:18 PM Rajiv M Ranganath
>  wrote:
> >
> > On Mon, Aug 29, 2022 at 3:18 PM Martin Grigorov 
> wrote:
> >
> > [...]
> >
> > > Do you want to contribute the code as a failing unit test in a Pull
> > > Request ?
> > > With a fix would be awesome!
> >
> > Okay. Please give me a couple of days. I'll investigate and open a PR.
> >
> > Best,
> > Rajiv
>

Re: Rust Serde Deserialize

2022-08-29 Thread Rajiv M Ranganath

Hi Martin,

It looks like there are no changes needed on the Avro side. I was not
using Serde correctly.

On the Rust side, we need to use `serde_bytes` [1] crate, and define our
Rust struct as follows.

```
#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
#[serde(default)]
pub struct Abcd {
#[serde(with = "serde_bytes")]
pub b: Option>,
}
```

That seems to make things work.

Thanks again for the reply.

Best,
Rajiv

[1] https://docs.rs/serde_bytes/latest/serde_bytes/

On Mon, Aug 29, 2022 at 5:18 PM Rajiv M Ranganath
 wrote:
>
> On Mon, Aug 29, 2022 at 3:18 PM Martin Grigorov  wrote:
>
> [...]
>
> > Do you want to contribute the code as a failing unit test in a Pull
> > Request ?
> > With a fix would be awesome!
>
> Okay. Please give me a couple of days. I'll investigate and open a PR.
>
> Best,
> Rajiv

Re: Rust Serde Deserialize

2022-08-29 Thread Rajiv M Ranganath

On Mon, Aug 29, 2022 at 3:18 PM Martin Grigorov  wrote:

[...]

> Do you want to contribute the code as a failing unit test in a Pull
> Request ?
> With a fix would be awesome!

Okay. Please give me a couple of days. I'll investigate and open a PR.

Best,
Rajiv

Re: Rust Serde Deserialize

2022-08-29 Thread Martin Grigorov

Hi Rajiv,

There are 0 known issues for the Rust SDK! :-)

Do you want to contribute the code as a failing unit test in a Pull Request
?
With a fix would be awesome!

Thanks!

Martin

On Mon, Aug 29, 2022 at 12:11 PM Rajiv M Ranganath <
rajiv.rangan...@gmail.com> wrote:

> Hi Martin,
>
> On Mon, Aug 29, 2022 at 1:43 PM Martin Grigorov 
> wrote:
>
> [...]
>
> > I'd recommend you this nice tool for generating Rust structs from Avro
> > schema: https://github.com/lerouxrgd/rsgen-avro
>
> Thanks for the reply and the pointer to `rsgen-avro`. :-)
>
> It looks like there seems to be an issue in the way Serde is serializing
> the schema.
>
> Here is the example:
>
> ```main.rs
> use apache_avro::types::{Record, Value};
> use apache_avro::Schema;
>
> #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize,
> serde::Serialize)]
> #[serde(default)]
> pub struct Abcd {
> pub b: Option>,
> }
>
> #[inline(always)]
> fn default_abcd_b() -> Option> {
> None
> }
>
> impl Default for Abcd {
> fn default() -> Abcd {
> Abcd {
> b: default_abcd_b(),
> }
> }
> }
>
> fn main() {
> let writers_schema = Schema::parse_str(
> r#"
> {
>   "type": "record",
>   "name": "Abcd",
>   "fields": [
> {"name": "b", "type": ["null", "bytes"], "default": null}
>   ]
> }
> "#,
> )
> .unwrap();
>
> let mut abcd_manual = Record::new(_schema).unwrap();
> abcd_manual.put(
> "b",
> Value::Union(1,
> Box::new(Value::Bytes("hello_world".as_bytes().to_vec(,
> );
>
> println!("{:?}", abcd_manual);
>
> let abcd_manual_bytes =
> apache_avro::to_avro_datum(_schema, abcd_manual).unwrap();
> println!("{:?}", abcd_manual_bytes);
>
> let abcd_serde_value = apache_avro::to_value(Abcd {
> b: Some("hello_world".as_bytes().to_vec()),
> }).unwrap();
>
> println!("{:?}", abcd_serde_value);
>
> let abcd_serde_bytes = apache_avro::to_avro_datum(_schema,
> abcd_serde_value);
>
> println!("{:?}", abcd_serde_bytes);
> }
> ```
>
> Rather than creating an Avro Value of the form,
>
> ```
> Record { fields: [("b", Union(1, Bytes([104, 101, 108, 108, 111, 95,
> 119, 111, 114, 108, 100])))], schema_lookup: {"b": 0} }
> ```
>
> Serde seems to be generating an Avro Value,
>
> ```
> Record([("b", Union(1, Array([Int(104), Int(101), Int(108), Int(108),
> Int(111), Int(95), Int(119), Int(111), Int(114), Int(108),
> Int(100)])))])
> ```
>
> which is causing the subsequent conversion to bytes to fail.
>
> I was wondering if this is a known issue?
>
> Best,
> Rajiv
>

Re: Rust Serde Deserialize

2022-08-29 Thread Rajiv M Ranganath

Hi Martin,

On Mon, Aug 29, 2022 at 1:43 PM Martin Grigorov  wrote:

[...]

> I'd recommend you this nice tool for generating Rust structs from Avro
> schema: https://github.com/lerouxrgd/rsgen-avro

Thanks for the reply and the pointer to `rsgen-avro`. :-)

It looks like there seems to be an issue in the way Serde is serializing
the schema.

Here is the example:

```main.rs
use apache_avro::types::{Record, Value};
use apache_avro::Schema;

#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
#[serde(default)]
pub struct Abcd {
pub b: Option>,
}

#[inline(always)]
fn default_abcd_b() -> Option> {
None
}

impl Default for Abcd {
fn default() -> Abcd {
Abcd {
b: default_abcd_b(),
}
}
}

fn main() {
let writers_schema = Schema::parse_str(
r#"
{
  "type": "record",
  "name": "Abcd",
  "fields": [
{"name": "b", "type": ["null", "bytes"], "default": null}
  ]
}
"#,
)
.unwrap();

let mut abcd_manual = Record::new(_schema).unwrap();
abcd_manual.put(
"b",
Value::Union(1,
Box::new(Value::Bytes("hello_world".as_bytes().to_vec(,
);

println!("{:?}", abcd_manual);

let abcd_manual_bytes =
apache_avro::to_avro_datum(_schema, abcd_manual).unwrap();
println!("{:?}", abcd_manual_bytes);

let abcd_serde_value = apache_avro::to_value(Abcd {
b: Some("hello_world".as_bytes().to_vec()),
}).unwrap();

println!("{:?}", abcd_serde_value);

let abcd_serde_bytes = apache_avro::to_avro_datum(_schema,
abcd_serde_value);

println!("{:?}", abcd_serde_bytes);
}
```

Rather than creating an Avro Value of the form,

```
Record { fields: [("b", Union(1, Bytes([104, 101, 108, 108, 111, 95,
119, 111, 114, 108, 100])))], schema_lookup: {"b": 0} }
```

Serde seems to be generating an Avro Value,

```
Record([("b", Union(1, Array([Int(104), Int(101), Int(108), Int(108),
Int(111), Int(95), Int(119), Int(111), Int(114), Int(108),
Int(100)])))])
```

which is causing the subsequent conversion to bytes to fail.

I was wondering if this is a known issue?

Best,
Rajiv

Re: Rust Serde Deserialize

2022-08-29 Thread Martin Grigorov

Hi Rajiv,

I'd recommend you this nice tool for generating Rust structs from Avro
schema: https://github.com/lerouxrgd/rsgen-avro

$ cat q.avsc
   1   │ {
   2   │   "type": "record",
   3   │   "name": "Abcd",
   4   │   "fields": [
   5   │ {"name": "b", "type": ["null", "bytes"], "default": null}
   6   │   ]
   7   │ }

generates:

cat q.rs
   1   │
   2   │ #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize,
serde::Serialize)]
   3   │ #[serde(default)]
   4   │ pub struct Abcd {
   5   │ pub b: Option>,
   6   │ }
   7   │
   8   │ #[inline(always)]
   9   │ fn default_abcd_b() -> Option> { None }
  10   │
  11   │ impl Default for Abcd {
  12   │ fn default() -> Abcd {
  13   │ Abcd {
  14   │ b: default_abcd_b(),
  15   │ }
  16   │ }
  17   │ }


On Mon, Aug 29, 2022 at 10:05 AM Rajiv M Ranganath <
rajiv.rangan...@gmail.com> wrote:

> Hi,
>
> I am new to Avro. When I have an Arvo schema of the form,
>
> ```
> record Abcd {
> union { null, bytes } efgh = null;
> }
> ```
>
> What would be the corresponding Rust struct?
>
> I tried
>
> ```
> #[derive(Debug, Deserialize, PartialEq)]
> struct Abcd {
> efgh: Option>,
> }
> ```
>
> But for some reason, `apache_avro::from_value` is giving an
> `Err(DeserializeValue("not an array"))` error.
>
> Best,
> Rajiv
>

Rust Serde Deserialize

2022-08-29 Thread Rajiv M Ranganath

Hi,

I am new to Avro. When I have an Arvo schema of the form,

```
record Abcd {
union { null, bytes } efgh = null;
}
```

What would be the corresponding Rust struct?

I tried

```
#[derive(Debug, Deserialize, PartialEq)]
struct Abcd {
efgh: Option>,
}
```

But for some reason, `apache_avro::from_value` is giving an
`Err(DeserializeValue("not an array"))` error.

Best,
Rajiv

[ANNOUNCE] Apache Avro 1.11.1 released

2022-08-08 Thread Ryan Skraba

The Apache Avro community is pleased to announce the release of Avro 1.11.0!

All signed release artifacts, signatures and verification instructions can
be found here: https://avro.apache.org/releases.html

This release includes ~250 Jira issues, including some interesting features:

Some interesting highlights:

Avro specification
- [AVRO-3436] Clarify which names are allowed to be qualified with namespaces
- [AVRO-3370] Inconsistent behaviour on types as invalid names
- [AVRO-3275] Clarify how fullnames are created, with example
- [AVRO-3257] IDL: add syntax to create optional fields
- [AVRO-2019] Improve docs for logical type annotation

C++
- [AVRO-2722] Use of boost::mt19937 is not thread safe

C#
- [AVRO-3383] Many completed subtasks for modernizing C# coding style
- [AVRO-3481] Input and output variable type mismatch
- [AVRO-3475] Enforce time-millis and time-micros specification
- [AVRO-3469] Build and test using .NET SDK 7.0
- [AVRO-3468] Default values for logical types not supported
- [AVRO-3467] Use oracle-actions to test with Early Access JDKs
- [AVRO-3453] Avrogen Add Generated Code Attribute
- [AVRO-3432] Add command line option to skip creation of directories
based on namespace path
- [AVRO-3411] Add Visual Studio Code Devcontainer support
- [AVRO-3388] Implement extra codecs for C# as seperate nuget packages
- [AVRO-3265] avrogen generates uncompilable code when namespace ends
with ".Avro"
- [AVRO-3219] Support nullable enum type fields

Java
- [AVRO-3531] GenericDatumReader in multithread lead to infinite loop
- [AVRO-3482] Reuse MAGIC in DataFileReader
- [AVRO-3586] Make Avro Build Reproducible
- [AVRO-3441] Automatically register LogicalTypeFactory classes
- [AVRO-3375] Add union branch, array index and map key "path"
information to serialization errors
- [AVRO-3374] Fully qualified type reference "ns.int" loses namespace
- [AVRO-3294] IDL parsing allows doc comments in strange places
- [AVRO-3273] avro-maven-plugin breaks on old versions of Maven
- [AVRO-3266] Output stream incompatible with MagicS3GuardCommitter
- [AVRO-3243] Lock conflicts when using computeIfAbsent
- [AVRO-3120] Support Next Java LTS (Java 17)
- [AVRO-2498] UUID generation is not working

Javascript
- [AVRO-3489] Replace istanbul with nyc for code coverage
- [AVRO-3322] Buffer is not defined in browser environment
- [AVRO-3084] Fix JavaScript interop test to read files generated by
other languages on CI

Perl
- [AVRO-3263] Schema validation warning on invalid schema with a long field

Python
- [AVRO-3542] Scale assignment optimization
- [AVRO-3521] "Scale" property from decimal object
- [AVRO-3380] Byte reading in avro.io does not assert read bytes to
requested nbytes
- [AVRO-3229] validate the default value of an enum field
- [AVRO-3218] Pass LogicalType to BytesDecimalSchema

Ruby
- [AVRO-3277] Test against Ruby 3.1

Rust
- [AVRO-3558] Add a demo crate that shows usage as WebAssembly
- [AVRO-3526] Improve resolving Bytes and Fixed from string
- [AVRO-3506] Implement Single Object Writer
- [AVRO-3507] Implement Single Object Reader
- [AVRO-3405] Add API for user-provided metadata to file
- [AVRO-3339] Rename crate from avro-rs to apache-avro
- [AVRO-3479] Derive Avro Schema macro

Website
- [AVRO-2175] Website refactor
- [AVRO-3450] Document IDL support in IDEs

This is the first release that provides the Rust apache-avro crate at crates.io!

And of course upgraded dependencies to latest versions, CVE fixes and more
https://issues.apache.org/jira/issues/?jql=project%20%3D%20AVRO%20AND%20fixVersion%20%3D%201.11.1

The link to all fixed JIRA issues and a brief summary can be found at:
https://github.com/apache/avro/releases/tag/release-1.11.1

In addition, language-specific release artifacts are available:

* C#: https://www.nuget.org/packages/Apache.Avro/1.11.1
* Java: from Maven Central,
* Javascript: https://www.npmjs.com/package/avro-js/v/1.11.1
* Perl: https://metacpan.org/release/Avro
* Python 3: https://pypi.org/project/avro/1.11.1/
* Ruby: https://rubygems.org/gems/avro/versions/1.11.1
* Rust: https://crates.io/crates/apache-avro/0.14.0

Thanks to everyone for contributing!

Re: GenericDatumReader writer's schema question

2022-08-05 Thread Ivan Tsyba

Hello Oscar,
Yes, I've looked inside DataFileReader and now it's clear for me
Thank you

пт, 22 лип. 2022 р. о 12:46 Oscar Westra van Holthe - Kind <
os...@westravanholthe.nl> пише:

> Hi Ivan,
>
> You're correct about the GenericDatumReader javadoc, but the writer
> schema can be adjusted after creation. This is what the DataFileReader
> does.
>
> So after the DataFileReader is initialised, the underlying
> GenericDatumReader uses the the schema in the file as write schema (to
> understand the data), and the schema you provided as read schema (to give
> data to you via dataFileReader.next(user)).
>
> Does that clarify things for you?
>
>
> Kind regards,
> Oscar
>
>
> On Wed, 20 Jul 2022 at 10:37, Ivan Tsyba  wrote:
>
>> Hello
>>
>> As stated in Avro Getting Started
>>  
>> about
>> deserialization without code generation: "The data will be read using the
>> writer's schema included in the file, and the reader's schema provided to
>> the GenericDatumReader". Here is how GenericDatumReader is created in the
>> example
>>
>> DatumReader datumReader = new
>> GenericDatumReader(schema);
>>
>> But when you look at this GenericDatumReader constructor Javadoc it
>> states "Construct where the writer's and reader's schemas are the same."
>> (and actual code corresponds to this).
>>
>> So the writer's schema isn’t taken from a serialized file but from a
>> constructor parameter?
>>
>
>
> --
>
> ✉️ Oscar Westra van Holthe - Kind 
>
>

Re: GenericDatumReader writer's schema question

2022-07-22 Thread Oscar Westra van Holthe - Kind

Hi Ivan,

You're correct about the GenericDatumReader javadoc, but the writer schema
can be adjusted after creation. This is what the DataFileReader does.

So after the DataFileReader is initialised, the underlying
GenericDatumReader uses the the schema in the file as write schema (to
understand the data), and the schema you provided as read schema (to give
data to you via dataFileReader.next(user)).

Does that clarify things for you?

Kind regards,
Oscar

On Wed, 20 Jul 2022 at 10:37, Ivan Tsyba  wrote:

> Hello
>
> As stated in Avro Getting Started
>  
> about
> deserialization without code generation: "The data will be read using the
> writer's schema included in the file, and the reader's schema provided to
> the GenericDatumReader". Here is how GenericDatumReader is created in the
> example
>
> DatumReader datumReader = new
> GenericDatumReader(schema);
>
> But when you look at this GenericDatumReader constructor Javadoc it states
> "Construct where the writer's and reader's schemas are the same." (and
> actual code corresponds to this).
>
> So the writer's schema isn’t taken from a serialized file but from a
> constructor parameter?
>

-- 

✉️ Oscar Westra van Holthe - Kind

GenericDatumReader writer's schema question

2022-07-20 Thread Ivan Tsyba

Hello

As stated in Avro Getting Started

about
deserialization without code generation: "The data will be read using the
writer's schema included in the file, and the reader's schema provided to
the GenericDatumReader". Here is how GenericDatumReader is created in the
example

DatumReader datumReader = new
GenericDatumReader(schema);

But when you look at this GenericDatumReader constructor Javadoc it states
"Construct where the writer's and reader's schemas are the same." (and
actual code corresponds to this).

So the writer's schema isn’t taken from a serialized file but from a
constructor parameter?

Re: What exception to specify in Avro RPC message so that it generates to AvroRemoteException?

2022-07-01 Thread Oscar Westra van Holthe - Kind

Hi Abhishek,

Avro 1.9.0 introduced a change in the generated protocols that removes the
throws clause for AvroRemoteException.
Since then, any undeclared exception is wrapped in an AvroRuntimeException.

Kind regards,
Oscar

On Fri, 24 Jun 2022 at 15:01, Abhishek Dasgupta <
abhishekdasgupta...@gmail.com> wrote:

> Hi Oscar,
>   Thanks for the reply. Actually I'm trying to migrate from Avro
> 1.7.6-cdh5.12.0 to 1.11.0 in my codebase.  We had these Avro IDL files that
> generated the POJOs having methods with "throws  AvroRemoteException"
> clause even though there is no "throws" clause specified in their
> corresponding IDL file's RPC message.
>
>
> Currently in avro idl files with avro version 1.7.6-cdh5.12.0:
>
> void foo(int arg); // no throws clause present
>
> so the generated Avro POJO has:
>
> void foo(int arg) throws AvroRemoteException.// AvroRemoteException 
> generated even though no throws clause was present in IDL file
>
>
> When I upgraded the avro version to 1.11.0, the generated POJOs don't
> contain the throws clause in these methods.
>
>
> After avro version upgrade to 1.11.0,  :
>
> void foo(int arg); // no throws clause present
>
> so the generated Avro POJO has:
>
> void foo(int arg).  // no throws clause present after upgrade
>
>
>
>  I am unable to understand how these "throws" clauses were generated
> earlier ?
>
> What to specify in the avro IDL file's RPC messages so that they exactly
> throw AvroRemoteException in their generated POJOs ? Since a lot error
> handling is based on this exception in my codebase which results in
> compilation issues.
>
> On Thu, Jun 16, 2022 at 12:54 PM Oscar Westra van Holthe - Kind <
> os...@westravanholthe.nl> wrote:
>
>> Hi Abhishek,
>>
>> The throws something in your protocol will be compiled into a throws
>> something in Java.
>> The definition of something must be an error (not a record), which will
>> be compiled into a subclass of AvroRemoteException.
>>
>> So while your exact requirement cannot be satisfied, you will get
>> something similar.
>>
>>
>> Kind regards,
>> Oscar
>>
>>
>> On Wed, 15 Jun 2022 at 14:57, Abhishek Dasgupta <
>> abhishekdasgupta...@gmail.com> wrote:
>>
>>> I want the Avro generated Java class methods to have throws
>>> AvroRemoteException ? How to code this in my Avro IDL file ?
>>>
>>> Suppose I have this RPC message in my Avro protocol:
>>>
>>> void foo(int arg) throws something;
>>>
>>> so the generated Avro POJO has:
>>>
>>> void foo(int arg) throws AvroRemoteException
>>>
>>> What should put instead of something ?
>>>
>>> Using Avro 1.11.0
>>>
>>
>>
>> --
>>
>> ✉️ Oscar Westra van Holthe - Kind 
>>
>>

-- 

✉️ Oscar Westra van Holthe - Kind

Re: Manageable avro schema evolution in Java

2022-06-27 Thread Juan Cruz Viotti

Hi Niels,

Thanks a lot for sharing. Very interesting talk! Adding thumbs up :)

One comment around JSON Schema: in the talk you mention that JSON Schema
is still on beta given it is a draft. 

While JSON Schema is a "draft" from the point of view of IETF, it is
considered production-ready and the industry-standard for describing
JSON documents already. We hope to start publishing the JSON Schema
standard outside of IETF at some point to be able to workaround this
common perceptions problem!

We are starting to document use cases of JSON Schema in production on
YouTube:
https://www.youtube.com/playlist?list=PLHVhS4Tj1YZOrrvl7_a9LaBAtst7BWH8a.

-- 
Juan Cruz Viotti
Technical Lead @ Postman.com
https://www.jviotti.com

Manageable avro schema evolution in Java

2022-06-27 Thread Niels Basjes

Hi,

Back in 2019 I spoke at the Datawork Summit conference about using Avro for
schema evolution in streaming scenarios.  https://youtu.be/QOdhaEHbSZM

Recently a few people asked me how to actually do this in a practical way.

To facilitate this I have created an "as clean as possible" demonstrator
project that shows how I think this can be done. Note that this is only
intended to show a _possible_ way of doing this.

https://github.com/nielsbasjes/avro-schema-example

Note that the commit history is also part of the demonstration !

I would love to hear your feedback, comments, improvement suggestions, etc.

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

[FINAL CALL] - Travel Assistance to ApacheCon New Orleans 2022

2022-06-27 Thread Gavin McDonald

 To all committers and non-committers.

This is a final call to apply for travel/hotel assistance to get to and
stay in New Orleans
for ApacheCon 2022.

Applications have been extended by one week and so the application deadline
is now the 8th July 2022.

The rest of this email is a copy of what has been sent out previously.

We will be supporting ApacheCon North America in New Orleans, Louisiana,
on October 3rd through 6th, 2022.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. This year, We are supporting
both committers and non-committers involved with projects at the
Apache Software Foundation, or open source projects in general.

For more info on this year's applications and qualifying criteria, please
visit the TAC website at http://www.apache.org/travel/
Applications have been extended until the 8th of July 2022.

Important: Applicants have until the closing date above to submit their
applications (which should contain as much supporting material as required
to efficiently and accurately process their request), this will enable TAC
to announce successful awards shortly afterwards.

As usual, TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.

Why should you attend as a TAC recipient? We encourage you to read stories
from
past recipients at https://apache.org/travel/stories/ . Also note that
previous TAC recipients have gone on to become Committers, PMC Members, ASF
Members, Directors of the ASF Board and Infrastructure Staff members.
Others have gone from Committer to full time Open Source Developers!

How far can you go! - Let TAC help get you there.


===

Gavin McDonald on behalf of the Travel Assistance Committee.

Re: What exception to specify in Avro RPC message so that it generates to AvroRemoteException?

2022-06-24 Thread Abhishek Dasgupta

Hi Oscar,
  Thanks for the reply. Actually I'm trying to migrate from Avro
1.7.6-cdh5.12.0 to 1.11.0 in my codebase.  We had these Avro IDL files that
generated the POJOs having methods with "throws  AvroRemoteException"
clause even though there is no "throws" clause specified in their
corresponding IDL file's RPC message.

Currently in avro idl files with avro version 1.7.6-cdh5.12.0:

void foo(int arg); // no throws clause present

so the generated Avro POJO has:

void foo(int arg) throws AvroRemoteException.//
AvroRemoteException generated even though no throws clause was present
in IDL file

When I upgraded the avro version to 1.11.0, the generated POJOs don't
contain the throws clause in these methods.

After avro version upgrade to 1.11.0,  :

void foo(int arg); // no throws clause present

so the generated Avro POJO has:

void foo(int arg).  // no throws clause present after upgrade

 I am unable to understand how these "throws" clauses were generated
earlier ?

What to specify in the avro IDL file's RPC messages so that they exactly
throw AvroRemoteException in their generated POJOs ? Since a lot error
handling is based on this exception in my codebase which results in
compilation issues.

On Thu, Jun 16, 2022 at 12:54 PM Oscar Westra van Holthe - Kind <
os...@westravanholthe.nl> wrote:

> Hi Abhishek,
>
> The throws something in your protocol will be compiled into a throws
> something in Java.
> The definition of something must be an error (not a record), which will
> be compiled into a subclass of AvroRemoteException.
>
> So while your exact requirement cannot be satisfied, you will get
> something similar.
>
>
> Kind regards,
> Oscar
>
>
> On Wed, 15 Jun 2022 at 14:57, Abhishek Dasgupta <
> abhishekdasgupta...@gmail.com> wrote:
>
>> I want the Avro generated Java class methods to have throws
>> AvroRemoteException ? How to code this in my Avro IDL file ?
>>
>> Suppose I have this RPC message in my Avro protocol:
>>
>> void foo(int arg) throws something;
>>
>> so the generated Avro POJO has:
>>
>> void foo(int arg) throws AvroRemoteException
>>
>> What should put instead of something ?
>>
>> Using Avro 1.11.0
>>
>
>
> --
>
> ✉️ Oscar Westra van Holthe - Kind 
>
>

Re: What exception to specify in Avro RPC message so that it generates to AvroRemoteException?

2022-06-16 Thread Oscar Westra van Holthe - Kind

Hi Abhishek,

The throws something in your protocol will be compiled into a throws
something in Java.
The definition of something must be an error (not a record), which will be
compiled into a subclass of AvroRemoteException.

So while your exact requirement cannot be satisfied, you will get something
similar.

Kind regards,
Oscar

On Wed, 15 Jun 2022 at 14:57, Abhishek Dasgupta <
abhishekdasgupta...@gmail.com> wrote:

> I want the Avro generated Java class methods to have throws
> AvroRemoteException ? How to code this in my Avro IDL file ?
>
> Suppose I have this RPC message in my Avro protocol:
>
> void foo(int arg) throws something;
>
> so the generated Avro POJO has:
>
> void foo(int arg) throws AvroRemoteException
>
> What should put instead of something ?
>
> Using Avro 1.11.0
>

-- 

✉️ Oscar Westra van Holthe - Kind

What exception to specify in Avro RPC message so that it generates to AvroRemoteException?

2022-06-15 Thread Abhishek Dasgupta

I want the Avro generated Java class methods to have throws
AvroRemoteException ? How to code this in my Avro IDL file ?

Suppose I have this RPC message in my Avro protocol:

void foo(int arg) throws something;

so the generated Avro POJO has:

void foo(int arg) throws AvroRemoteException

What should put instead of something ?

Using Avro 1.11.0

Final reminder: ApacheCon North America call for presentations closing soon

2022-05-19 Thread Rich Bowen

[Note: You're receiving this because you are subscribed to one or more
Apache Software Foundation project mailing lists.]

This is your final reminder that the Call for Presetations for
ApacheCon North America 2022 will close at 00:01 GMT on Monday, May
23rd, 2022. Please don't wait! Get your talk proposals in now!

Details here: https://apachecon.com/acna2022/cfp.html

--Rich, for the ApacheCon Planners

Re: Unable to resolve NettyTransceiver/ NettyServer (Netty based classes) in avro-ipc 1.11.0

2022-05-19 Thread Martin Grigorov

On Wed, May 18, 2022 at 4:28 PM Abhishek Dasgupta <
abhishekdasgupta...@gmail.com> wrote:

> Hi Martin,
>  Thanks for the prompt reply but I am still unable to resolve the classes
> above. Any other suggestions?
>

https://github.com/apache/avro/blob/master/lang/java/ipc-netty/src/main/java/org/apache/avro/ipc/netty/NettyTransceiver.java
https://github.com/apache/avro/blob/master/lang/java/ipc-netty/src/main/java/org/apache/avro/ipc/netty/NettyServer.java

Both classes are in avro-ipc-netty. Maybe you didn't add the dependency
properly in your project ?!



>
> On Wed, May 18, 2022 at 6:11 PM Martin Grigorov 
> wrote:
>
>> Hi,
>>
>> You need to use
>> https://search.maven.org/artifact/org.apache.avro/avro-ipc-netty/1.11.0/bundle
>> instead.
>>
>>
>> On Wed, May 18, 2022 at 2:41 PM Abhishek Dasgupta <
>> abhishekdasgupta...@gmail.com> wrote:
>>
>>>
>>>
>>>
>>> Hi,
>>>   I am solving this CVE-2021-43045 in my project. I am upgrading Avro
>>> and Avro-ipc from 1.7.6-cdh5.12.0 to 1.11.0. There are some
>>> NettyTransceiver, NettyServer calls in the project. After upgrade, such
>>> imports become un-resolvable which results in compilation error. From Avro
>>> 1.9.0 onwards these classes moved to Avro.ipc.*netty* package. I add
>>> this is my code but ’netty’ package is not resolved. In External
>>> dependencies section avro-ipc (1.11.0) does not contain netty folder.
>>> Kindly suggest what to do here as it is critical fix in my project. Here
>>> are my dependencies:
>>>
>>>  org.apache.avro
>>> avro 1.11.0 
>>>
>>>  org.apache.avro
>>> avro-ipc 1.11.0 
>>>
>>>
>>> Thanks,
>>> Abhishek
>>>
>>>
>>>

Re: Unable to resolve NettyTransceiver/ NettyServer (Netty based classes) in avro-ipc 1.11.0

2022-05-18 Thread Abhishek Dasgupta

Hi Martin,
 Thanks for the prompt reply but I am still unable to resolve the classes
above. Any other suggestions?

On Wed, May 18, 2022 at 6:11 PM Martin Grigorov 
wrote:

> Hi,
>
> You need to use
> https://search.maven.org/artifact/org.apache.avro/avro-ipc-netty/1.11.0/bundle
> instead.
>
>
> On Wed, May 18, 2022 at 2:41 PM Abhishek Dasgupta <
> abhishekdasgupta...@gmail.com> wrote:
>
>>
>>
>>
>> Hi,
>>   I am solving this CVE-2021-43045 in my project. I am upgrading Avro
>> and Avro-ipc from 1.7.6-cdh5.12.0 to 1.11.0. There are some
>> NettyTransceiver, NettyServer calls in the project. After upgrade, such
>> imports become un-resolvable which results in compilation error. From Avro
>> 1.9.0 onwards these classes moved to Avro.ipc.*netty* package. I add
>> this is my code but ’netty’ package is not resolved. In External
>> dependencies section avro-ipc (1.11.0) does not contain netty folder.
>> Kindly suggest what to do here as it is critical fix in my project. Here
>> are my dependencies:
>>
>>  org.apache.avro
>> avro 1.11.0 
>>
>>  org.apache.avro
>> avro-ipc 1.11.0 
>>
>>
>> Thanks,
>> Abhishek
>>
>>
>>

Re: Unable to resolve NettyTransceiver/ NettyServer (Netty based classes) in avro-ipc 1.11.0

2022-05-18 Thread Martin Grigorov

Hi,

You need to use
https://search.maven.org/artifact/org.apache.avro/avro-ipc-netty/1.11.0/bundle
instead.


On Wed, May 18, 2022 at 2:41 PM Abhishek Dasgupta <
abhishekdasgupta...@gmail.com> wrote:

>
>
>
> Hi,
>   I am solving this CVE-2021-43045 in my project. I am upgrading Avro and
> Avro-ipc from 1.7.6-cdh5.12.0 to 1.11.0. There are some NettyTransceiver,
> NettyServer calls in the project. After upgrade, such imports become
> un-resolvable which results in compilation error. From Avro 1.9.0 onwards
> these classes moved to Avro.ipc.*netty* package. I add this is my code
> but ’netty’ package is not resolved. In External dependencies section
> avro-ipc (1.11.0) does not contain netty folder. Kindly suggest what to do
> here as it is critical fix in my project. Here are my dependencies:
>
>  org.apache.avro
> avro 1.11.0 
>
>  org.apache.avro
> avro-ipc 1.11.0 
>
>
> Thanks,
> Abhishek
>
>
>

Fwd: Unable to resolve NettyTransceiver/ NettyServer (Netty based classes) in avro-ipc 1.11.0

2022-05-18 Thread Abhishek Dasgupta



> 
> Hi,
>   I am solving this CVE-2021-43045 in my project. I am upgrading Avro and 
> Avro-ipc from 1.7.6-cdh5.12.0 to 1.11.0.  There are some NettyTransceiver, 
> NettyServer calls in the project. After upgrade, such imports become 
> un-resolvable which results in compilation error. From Avro 1.9.0 onwards 
> these classes moved to Avro.ipc.netty package. I add this is my code but 
> ’netty’ package is not resolved. In External dependencies section avro-ipc 
> (1.11.0) does not contain netty folder. Kindly suggest what to do here as it 
> is critical fix in my project. Here are my dependencies:
> 
> 
> org.apache.avro
> avro
> 1.11.0
> 
> 
> 
> org.apache.avro
> avro-ipc
> 1.11.0
> 
> 
> 
> Thanks,
> Abhishek

Conversions.DecimalConversion() doesn't work as expected

2022-05-18 Thread Anton

Hello,

 

I'm trying to serialize json to avro and back also I'm using fields having
decimal logical type, so I'd like to have possibility to provide json with
normal numbers for serializer and receive equal json from deserializer. And
I was hoping that Conversions.DecimalConversion() will help me with that,
but I still receive bytecodes from deserializer and serializer doesn't
accept numbers. What am I doing wrong?

 

The code is:

public static byte[] jsonToAvro(String json, String schemaStr) throws
IOException {

InputStream input = null;

GenericDatumWriter writer = null;

Encoder encoder = null;

ByteArrayOutputStream output = null;

try {

   final GenericData genericData = new
GenericData();

   genericData.addLogicalTypeConversion(new
Conversions.DecimalConversion());

 

   Schema schema = new
Schema.Parser().parse(schemaStr);

   DatumReader reader = new
GenericDatumReader(schema, schema, genericData);

   input = new
ByteArrayInputStream(json.getBytes());

   output = new ByteArrayOutputStream();

   DataInputStream din = new
DataInputStream(input);

   writer = new
GenericDatumWriter(schema, genericData);

   Decoder decoder =
DecoderFactory.get().jsonDecoder(schema, din);

   encoder =
EncoderFactory.get().binaryEncoder(output, null);

   GenericRecord datum;

   while (true) {

   try {

   datum =
reader.read(null, decoder);

   } catch (EOFException eofe) {

   break;

   }

   writer.write(datum, encoder);

   }

   encoder.flush();

   return output.toByteArray();

} finally {

   try {

   input.close();

   } catch (Exception e) {

   }

}

}

 

public static String avroToJson(byte[] avro, String schemaStr) throws
IOException {

boolean pretty = false;

GenericDatumReader reader = null;

JsonEncoder encoder = null;

ByteArrayOutputStream output = null;

try {

   final GenericData genericData = new
GenericData();

   genericData.addLogicalTypeConversion(new
Conversions.DecimalConversion());

 

   Schema schema = new
Schema.Parser().parse(schemaStr);

   reader = new
GenericDatumReader(schema, schema, genericData);

   InputStream input = new
ByteArrayInputStream(avro);

   output = new ByteArrayOutputStream();

   DatumWriter writer = new
GenericDatumWriter(schema, genericData);

   encoder =
EncoderFactory.get().jsonEncoder(schema, output, pretty);

   Decoder decoder =
DecoderFactory.get().binaryDecoder(input, null);

   GenericRecord datum;

   while (true) {

   try {

   datum =
reader.read(null, decoder);

   } catch (EOFException eofe) {

   break;

   }

   writer.write(datum, encoder);

   }

   encoder.flush();

   output.flush();

   return new String(output.toByteArray());

} finally {

   try {

   if (output != null)
output.close();

   } catch (Exception e) {

   }

}

}

 

Example data:

Schema

{

  "type" : "record",

  "name" : "example",

  "namespace" : "myavro",

  "fields" : [ {

"name" : "cost",

"type" : {

  "type" : "bytes",

  "logicalType" : "decimal",

  "precision" : 38,

  "scale" : 10

}

  }]

}

 

Json:

{ "cost" : 36.47 }

C++ decimal logical type

2022-05-12 Thread Anton

Hi,

I'm using C++ avro library to serialize json data and schema to avro
bytecode and vice versa. I utilize jsonDecoder+binaryEncoder and
binaryDecoder+jsonEncoder for this. Some of data includes logical decimal
types and now my serializer and deserializer can only receive and return
hexademical codes for such fields. Is there any way to automatically convert
numbers from incoming json to bytes representation upon serialization
process and backwards from bytes to human readable numbers while
deserializing from avro to json?

REMINDER - Travel Assistance available for ApacheCon NA New Orleans 2022

2022-05-03 Thread Gavin McDonald

Hi All Contributors and Committers,

This is a first reminder email that travel
assistance applications for ApacheCon NA 2022 are now open!

We will be supporting ApacheCon North America in New Orleans, Louisiana,
on October 3rd through 6th, 2022.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. This year, We are supporting
both committers and non-committers involved with projects at the
Apache Software Foundation, or open source projects in general.

For more info on this year's applications and qualifying criteria, please
visit the TAC website at http://www.apache.org/travel/
Applications are open and will close on the 1st of July 2022.

Important: Applicants have until the closing date above to submit their
applications (which should contain as much supporting material as required
to efficiently and accurately process their request), this will enable TAC
to announce successful awards shortly afterwards.

As usual, TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.

Why should you attend as a TAC recipient? We encourage you to read stories
from
past recipients at https://apache.org/travel/stories/ . Also note that
previous TAC recipients have gone on to become Committers, PMC Members, ASF
Members, Directors of the ASF Board and Infrastructure Staff members.
Others have gone from Committer to full time Open Source Developers!

How far can you go! - Let TAC help get you there.

Re: Converting an AVDL file into something that the avro python package can parse

2022-04-22 Thread Eric Gorr

Hello Oscar,

Worked perfectly. Thank you!

I had tried idl2schemata, but didn't realize I needed to pass an output
directory. Tried telling it to write the output to a file, but that
obviously didn't work.

Regards,
Eric


On Fri, Apr 22, 2022 at 10:00 AM Oscar Westra van Holthe - Kind <
os...@westravanholthe.nl> wrote:

> Hi Eric,
>
> You did everything right, except that you ended up with a protocol file.
>
> Please use the tool idl2schemata instead, to generate schema file(s):
>
> java -jar avro-tools.jar idl2schemata src/test/idl/input/namespaces.avdl
> /tmp/
>
> This will create a .avsc file per schema that you can use.
>
> Kind regards,
> Oscar
>
> --
> Oscar Westra van Holthe - Kind 
>
> Op vr 22 apr. 2022 14:27 schreef Eric Gorr :
>
>> What I would like to be able to do is take an .avdl file and parse it
>> into python. I would like to make use of the information from within
>> python.
>>
>> According to the documentation, Apache's python package does not handle
>> .avdl files. I need to use their `avro-tools` to convert the .avdl file
>> into something it does know how to parse.
>>
>> According to the documentation at
>> https://avro.apache.org/docs/current/idl.html, I can convert a .avdl
>> file into a .avpr file with the following command:
>>
>> > java -jar avro-tools.jar idl src/test/idl/input/namespaces.avdl
>> /tmp/namespaces.avpr
>>
>> I ran through my .avdl file through Avro-tools, and it produced an .avpr
>> file.
>>
>> What is unclear is how I can use the python package to interpret this
>> data. I tried something simple...
>>
>> > schema = avro.schema.parse(open("my.avpr", "rb").read())
>>
>> but that generates the error:
>>
>> > SchemaParseException: No "type" property:
>>
>> I believe that `avro.schema.parse` is designed to parse .avsc files (?).
>> However, it is unclear how I can use `avro-tools` to convert my .avdl into
>> .avsc. Is that possible?
>>
>> I am guessing there are many pieces I am missing and do not quite
>> understand (yet) what the purpose of all of these files are.
>>
>> It does appear that an .avpr is a JSON file (?) so I can just read and
>> interpret it myself, but I was hoping that there would be a python package
>> that would assist me in navigating the data.
>>
>> Can anyone provide some insights into this? Thank you.
>>
>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3203 matches

Mail list logo