Re: Formal spec for Avro Schema

2024-05-22 Thread Oscar Westra van Holthe - Kind
Hi everyone,

A bit late, but I though I’d add a few thoughts on this as well.

For one, I love the idea of improving our documentation. Separating the schema 
specification from the encoding makes perfect sense to me, and also allows us 
to clearly state which encoding to use. Currently, most problems I see in 
online questions arise from using raw / internal encodings, which I think is an 
easy problem to prevent.

As to the specification, I think it’s a good start. Some things I really like:

the introduction of an Avro Schema Document, limiting top types to a (union of) 
named type(s)
an explicit "no imports" rule to ensure a schema document is self-contained

There are some things I think we can improve, such as explicitly mentioning all 
attributes (the ’type’ attribute is not introduced), fixing a few errors in the 
document, etc. I’ve taken the liberty of doing so.

One notable addition is a de facto requirement: that names and aliases must be 
unique within their context.

I’ve put my changes in a fork of Clemens gist: 
https://gist.github.com/opwvhk/38481bf19a175a86c703d8ba0ab08866


As a followup to this schema specification, we can make specifications for the 
binary encoding (warning to never use it directly), Avro files, the 
Single-Object encoding, protocols, the protocol wire format, and the IDL schema 
and protocol definitions.


Kind regards.
Oscar


-- 
Oscar Westra van Holthe - Kind 

> On 15 May 2024, at 11:17, Clemens Vasters via user  
> wrote:
> 
> Hi Martin,
>  
> I am saying that the specification of the schema is currently entangled with 
> the specification of the serialization framework. Avro Schema is useful and 
> usable even if you never touch the Avro binaries (the framework, an 
> implementation using the spec). 
>  
> I am indeed proposing to separate the schema spec from the specs of the Avro 
> binary encoding and the Avro JSON encoding, which also avoids strange 
> entanglements like the JSON encoding pointing to the schema description’s 
> default values section, which is in itself rather lacking in precision, i.e. 
> the encoding rule for binary or fixed is “defined” with a rather terse 
> example: "\u00ff"
>  
> Microsoft would like to propose Avro and Avro Schema in several 
> standardization efforts, but we need a spec that works in those contexts and 
> that can stand on its own. I would also like to see “application/avro” as a 
> formal media type, but the route towards that only goes via formal 
> standardization of both schema and encodings.
>  
> I believe the Avro project’s reach and importance is such that schema and 
> encodings should have formal specs that can stand on their own as JSON and 
> CBOR and AMQP and XML and OPC/Binary and other serialization schemas/formats 
> do. I don’t think existence of a formal spec gets in the way of progress and 
> Avro is so mature that the spec captures a fairly stable state.
>  
> Best Regards
> Clemens
>  
> From: Martin Grigorov 
> Sent: Wednesday, May 15, 2024 10:54 AM
> To: d...@avro.apache.org
> Cc: user@avro.apache.org
> Subject: Re: Formal spec for Avro Schema
>  
> Hi Clemens,
>  
> On Wed, May 15, 2024 at 11:18 AM Clemens Vasters 
> mailto:cleme...@microsoft.com.invalid>> 
> wrote:
> Hi Martin,
> 
> we find Avro Schema to be a great fit for describing application data 
> structures in general and even independent of wire-serialization scenarios.
> 
> Therefore, I would like to have a spec that focuses specifically on the 
> schema format, is grounded in the IETF RFC specs, and which follows the 
> conventions set by IETF, so that folks who need a sane schema format to 
> describe data structures independent of implementation can use that.
>  
> Do you say that the specification document is implementation dependent ?
> If this is the case then maybe we should work on improving it instead of 
> duplicating it.
>  
> 
> The benefit for the Avro serialization framework of having such a formal spec 
> that is untangled from the wire-serialization specs is that all schemas 
> defined by that schema model are compatible with the framework.
>  
> What do you mean by "framework" here ?
>  
> 
> The differences are organization, scope, and language style (including 
> keywords etc.). The expressed ruleset is the same.
>  
> I don't think it is a good idea to add a second document that is very similar 
> to the specification but uses a different language style.
> To me this looks like a duplication.
> IMO it would be better to suggest (many) (smaller) improvements for the 
> existing document. 
>  
>  
> 
> Best Regards
> Clemens
> 
> -Original Message-
> From: Martin Grigorov mailto:mgrigo...@apache.org>>
> Sent: Wednesd

RE: Formal spec for Avro Schema

2024-05-15 Thread Clemens Vasters via user
Hi Martin,

I am saying that the specification of the schema is currently entangled with 
the specification of the serialization framework. Avro Schema is useful and 
usable even if you never touch the Avro binaries (the framework, an 
implementation using the spec).

I am indeed proposing to separate the schema spec from the specs of the Avro 
binary encoding and the Avro JSON encoding, which also avoids strange 
entanglements like the JSON encoding pointing to the schema description’s 
default values section, which is in itself rather lacking in precision, i.e. 
the encoding rule for binary or fixed is “defined” with a rather terse example: 
"\u00ff"

Microsoft would like to propose Avro and Avro Schema in several standardization 
efforts, but we need a spec that works in those contexts and that can stand on 
its own. I would also like to see “application/avro” as a formal media type, 
but the route towards that only goes via formal standardization of both schema 
and encodings.

I believe the Avro project’s reach and importance is such that schema and 
encodings should have formal specs that can stand on their own as JSON and CBOR 
and AMQP and XML and OPC/Binary and other serialization schemas/formats do. I 
don’t think existence of a formal spec gets in the way of progress and Avro is 
so mature that the spec captures a fairly stable state.

Best Regards
Clemens

From: Martin Grigorov 
Sent: Wednesday, May 15, 2024 10:54 AM
To: d...@avro.apache.org
Cc: user@avro.apache.org
Subject: Re: Formal spec for Avro Schema

Hi Clemens,

On Wed, May 15, 2024 at 11:18 AM Clemens Vasters 
mailto:cleme...@microsoft.com.invalid>> wrote:
Hi Martin,

we find Avro Schema to be a great fit for describing application data 
structures in general and even independent of wire-serialization scenarios.

Therefore, I would like to have a spec that focuses specifically on the schema 
format, is grounded in the IETF RFC specs, and which follows the conventions 
set by IETF, so that folks who need a sane schema format to describe data 
structures independent of implementation can use that.

Do you say that the specification document is implementation dependent ?
If this is the case then maybe we should work on improving it instead of 
duplicating it.


The benefit for the Avro serialization framework of having such a formal spec 
that is untangled from the wire-serialization specs is that all schemas defined 
by that schema model are compatible with the framework.

What do you mean by "framework" here ?


The differences are organization, scope, and language style (including keywords 
etc.). The expressed ruleset is the same.

I don't think it is a good idea to add a second document that is very similar 
to the specification but uses a different language style.
To me this looks like a duplication.
IMO it would be better to suggest (many) (smaller) improvements for the 
existing document.



Best Regards
Clemens

-Original Message-
From: Martin Grigorov mailto:mgrigo...@apache.org>>
Sent: Wednesday, May 15, 2024 9:26 AM
To: d...@avro.apache.org<mailto:d...@avro.apache.org>
Cc: user@avro.apache.org<mailto:user@avro.apache.org>
Subject: Re: Formal spec for Avro Schema

[Sie erhalten nicht häufig E-Mails von 
mgrigo...@apache.org<mailto:mgrigo...@apache.org>. Weitere Informationen, warum 
dies wichtig ist, finden Sie unter 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Clemens,

What is the difference between your document and the specification [1] ?
I haven't read it completely but it looks very similar to the specification to 
me.

1. https://avro.apache.org/docs/1.11.1/specification/
2.
https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
- sources of the specification

On Wed, May 15, 2024 at 9:28 AM Clemens Vasters 
mailto:cleme...@microsoft.com>.invalid> wrote:

> I wrote a formal spec for the Avro Schema format.
>
>
>
> https://gist/
> .github.com<http://github.com/>%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02
> %7Cclemensv%40microsoft.com<http://40microsoft.com/>%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98
> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB
> ec%3D=0
>
>
>
> Where would that go in the repo?
>
>
>
>
>
>
> <http://www/.
> microsoft.com<http://microsoft.com/>%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A
> 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com<http://40microsoft.com/>%7C5c
> d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ
> YEAjqkSV

Re: Formal spec for Avro Schema

2024-05-15 Thread Martin Grigorov
Hi Elliot,

I am not sure which document you are referring to - the new proposal by
Clemens or the official specification.
Please start a new email thread or file a Jira ticket if you think
something needs to be improved in the specification!

On Wed, May 15, 2024 at 10:56 AM Elliot West  wrote:

> I note that the enum type appears to be missing the specification of the
> default attribute.
>
> On Wed, 15 May 2024 at 08:26, Martin Grigorov 
> wrote:
>
>> Hi Clemens,
>>
>> What is the difference between your document and the specification [1] ?
>> I haven't read it completely but it looks very similar to the
>> specification to me.
>>
>> 1. https://avro.apache.org/docs/1.11.1/specification/
>> 2.
>> https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
>> - sources of the specification
>>
>> On Wed, May 15, 2024 at 9:28 AM Clemens Vasters
>>  wrote:
>>
>>> I wrote a formal spec for the Avro Schema format.
>>>
>>>
>>>
>>> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333
>>>
>>>
>>>
>>> Where would that go in the repo?
>>>
>>>
>>>
>>>
>>>
>>>
>>> <http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA>
>>>
>>> *Clemens Vasters*
>>>
>>> Messaging Platform Architect
>>>
>>> Microsoft Azure
>>>
>>> È+49 151 44063557
>>>
>>> *  cleme...@microsoft.com
>>> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 |
>>> 80539 Munich| Germany
>>> <https://www.google.com/maps/search/Gew%C3%BCrzm%C3%BChlstrasse+11%C2%A0%7C+80539+Munich%7C+Germany?entry=gmail=g>
>>> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
>>> Amtsgericht Aachen, HRB 12066
>>>
>>>
>>>
>>>
>>>
>>


Re: Formal spec for Avro Schema

2024-05-15 Thread Martin Grigorov
Hi Clemens,

On Wed, May 15, 2024 at 11:18 AM Clemens Vasters
 wrote:

> Hi Martin,
>
> we find Avro Schema to be a great fit for describing application data
> structures in general and even independent of wire-serialization scenarios.


> Therefore, I would like to have a spec that focuses specifically on the
> schema format, is grounded in the IETF RFC specs, and which follows the
> conventions set by IETF, so that folks who need a sane schema format to
> describe data structures independent of implementation can use that.
>

Do you say that the specification document is implementation dependent ?
If this is the case then maybe we should work on improving it instead of
duplicating it.


>
> The benefit for the Avro serialization framework of having such a formal
> spec that is untangled from the wire-serialization specs is that all
> schemas defined by that schema model are compatible with the framework.
>

What do you mean by "framework" here ?


>
> The differences are organization, scope, and language style (including
> keywords etc.). The expressed ruleset is the same.
>

I don't think it is a good idea to add a second document that is very
similar to the specification but uses a different language style.
To me this looks like a duplication.
IMO it would be better to suggest (many) (smaller) improvements for the
existing document.



>
> Best Regards
> Clemens
>
> -Original Message-
> From: Martin Grigorov 
> Sent: Wednesday, May 15, 2024 9:26 AM
> To: d...@avro.apache.org
> Cc: user@avro.apache.org
> Subject: Re: Formal spec for Avro Schema
>
> [Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org. Weitere
> Informationen, warum dies wichtig ist, finden Sie unter
> https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Clemens,
>
> What is the difference between your document and the specification [1] ?
> I haven't read it completely but it looks very similar to the
> specification to me.
>
> 1. https://avro.apache.org/docs/1.11.1/specification/
> 2.
>
> https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
> - sources of the specification
>
> On Wed, May 15, 2024 at 9:28 AM Clemens Vasters 
> 
> wrote:
>
> > I wrote a formal spec for the Avro Schema format.
> >
> >
> >
> > https://gist/
> > .github.com%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02
> > %7Cclemensv%40microsoft.com%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98
> > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT
> > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB
> > ec%3D=0
> >
> >
> >
> > Where would that go in the repo?
> >
> >
> >
> >
> >
> >
> > <http://www/.
> > microsoft.com%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A
> > 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com%7C5c
> > d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> > 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ
> > YEAjqkSVznt3N%2FKGjZzE%2BJietvHZuaiqVQYuDY%3D=0>
> >
> > *Clemens Vasters*
> >
> > Messaging Platform Architect
> >
> > Microsoft Azure
> >
> > È+49 151 44063557
> >
> > *  cleme...@microsoft.com
> > European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 |
> > 80539
> > Munich| Germany
> > Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
> > Amtsgericht Aachen, HRB 12066
> >
> >
> >
> >
> >
>


RE: Formal spec for Avro Schema

2024-05-15 Thread Clemens Vasters via user
Hi Martin,

we find Avro Schema to be a great fit for describing application data 
structures in general and even independent of wire-serialization scenarios.

Therefore, I would like to have a spec that focuses specifically on the schema 
format, is grounded in the IETF RFC specs, and which follows the conventions 
set by IETF, so that folks who need a sane schema format to describe data 
structures independent of implementation can use that.

The benefit for the Avro serialization framework of having such a formal spec 
that is untangled from the wire-serialization specs is that all schemas defined 
by that schema model are compatible with the framework.

The differences are organization, scope, and language style (including keywords 
etc.). The expressed ruleset is the same.

Best Regards
Clemens

-Original Message-
From: Martin Grigorov 
Sent: Wednesday, May 15, 2024 9:26 AM
To: d...@avro.apache.org
Cc: user@avro.apache.org
Subject: Re: Formal spec for Avro Schema

[Sie erhalten nicht häufig E-Mails von mgrigo...@apache.org. Weitere 
Informationen, warum dies wichtig ist, finden Sie unter 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Clemens,

What is the difference between your document and the specification [1] ?
I haven't read it completely but it looks very similar to the specification to 
me.

1. https://avro.apache.org/docs/1.11.1/specification/
2.
https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
- sources of the specification

On Wed, May 15, 2024 at 9:28 AM Clemens Vasters 
 wrote:

> I wrote a formal spec for the Avro Schema format.
>
>
>
> https://gist/
> .github.com%2Fclemensv%2F498c481965c425b218ee156b38b49333=05%7C02
> %7Cclemensv%40microsoft.com%7C5cd57d6ebe504e02e6dd08dc74b06a33%7C72f98
> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C638513548275308005%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C0%7C%7C%7C=n24LJspeNxYRKjlD0tgJzxQh3CzuILK%2FRe30gbarB
> ec%3D=0
>
>
>
> Where would that go in the repo?
>
>
>
>
>
>
> <http://www/.
> microsoft.com%2Fen-us%2Fnews%2FImageDetail.aspx%3Fid%3D4DABA54CBB4D25A
> 9E9905BC59E4A6D44E33694EA=05%7C02%7Cclemensv%40microsoft.com%7C5c
> d57d6ebe504e02e6dd08dc74b06a33%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> 7C0%7C638513548275312403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=x6ZAZ
> YEAjqkSVznt3N%2FKGjZzE%2BJietvHZuaiqVQYuDY%3D=0>
>
> *Clemens Vasters*
>
> Messaging Platform Architect
>
> Microsoft Azure
>
> È+49 151 44063557
>
> *  cleme...@microsoft.com
> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 |
> 80539
> Munich| Germany
> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
> Amtsgericht Aachen, HRB 12066
>
>
>
>
>


Re: Formal spec for Avro Schema

2024-05-15 Thread Elliot West
I note that the enum type appears to be missing the specification of the
default attribute.

On Wed, 15 May 2024 at 08:26, Martin Grigorov  wrote:

> Hi Clemens,
>
> What is the difference between your document and the specification [1] ?
> I haven't read it completely but it looks very similar to the
> specification to me.
>
> 1. https://avro.apache.org/docs/1.11.1/specification/
> 2.
> https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
> - sources of the specification
>
> On Wed, May 15, 2024 at 9:28 AM Clemens Vasters
>  wrote:
>
>> I wrote a formal spec for the Avro Schema format.
>>
>>
>>
>> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333
>>
>>
>>
>> Where would that go in the repo?
>>
>>
>>
>>
>>
>>
>> <http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA>
>>
>> *Clemens Vasters*
>>
>> Messaging Platform Architect
>>
>> Microsoft Azure
>>
>> È+49 151 44063557
>>
>> *  cleme...@microsoft.com
>> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539
>> Munich| Germany
>> <https://www.google.com/maps/search/Gew%C3%BCrzm%C3%BChlstrasse+11%C2%A0%7C+80539+Munich%7C+Germany?entry=gmail=g>
>> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
>> Amtsgericht Aachen, HRB 12066
>>
>>
>>
>>
>>
>


Re: Formal spec for Avro Schema

2024-05-15 Thread Martin Grigorov
Hi Clemens,

What is the difference between your document and the specification [1] ?
I haven't read it completely but it looks very similar to the specification
to me.

1. https://avro.apache.org/docs/1.11.1/specification/
2.
https://github.com/apache/avro/tree/main/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification
- sources of the specification

On Wed, May 15, 2024 at 9:28 AM Clemens Vasters
 wrote:

> I wrote a formal spec for the Avro Schema format.
>
>
>
> https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333
>
>
>
> Where would that go in the repo?
>
>
>
>
>
>
> <http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA>
>
> *Clemens Vasters*
>
> Messaging Platform Architect
>
> Microsoft Azure
>
> È+49 151 44063557
>
> *  cleme...@microsoft.com
> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539
> Munich| Germany
> Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
> Amtsgericht Aachen, HRB 12066
>
>
>
>
>


Formal spec for Avro Schema

2024-05-15 Thread Clemens Vasters via user
I wrote a formal spec for the Avro Schema format.

https://gist.github.com/clemensv/498c481965c425b218ee156b38b49333

Where would that go in the repo?


[cid:image001.jpg@01DAA6A1.96E35FC0]<http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA>
Clemens Vasters
Messaging Platform Architect
Microsoft Azure
È+49 151 44063557
*  cleme...@microsoft.com<mailto:cleme...@microsoft.com>
European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 
Munich| Germany
Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
Amtsgericht Aachen, HRB 12066




Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread John McClean
fwiw, I'm using it and it works fine, at least for my use cases.

J

On Fri, Jan 12, 2024 at 1:55 AM Martin Grigorov 
wrote:

> Hi Vivek,
>
> I am not sure there is anyone to give you an exact answer. The C++ SDK has
> not been actively developed in the last few years.
> The best is to try it for your use cases and see if it works or not. The
> next step is to contribute Pull Requests for the missing functionalities!
>
> Martin
>
> On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
> vivek.ku...@eclipsetrading.com.invalid> wrote:
>
>> +dev
>>
>>
>> Regards,
>> Vivek Kumar
>>
>> [http://www.eclipsetrading.com/logo.png]
>>
>> Senior Software Developer
>> 23/F One Hennessy
>> 1 Hennessy Road
>> Wan Chai
>> Hong Kong
>> www.eclipsetrading.com<http://www.eclipsetrading.com/>
>> +852 2108 7352
>>
>> Follow us today on our online platforms
>> [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
>> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
>> https://www.instagram.com/eclipsetrading>
>> 
>> From: Vivek Kumar 
>> Sent: Thursday, January 11, 2024 11:07 AM
>> To: user@avro.apache.org 
>> Subject: Avro schema evolution support in AVRO CPP
>>
>> Hi Avro team,
>>
>> I am writing this email to check the support of Avro schema evolution in
>> CPP - i.e. provide both the producer and consumer schema when decoding the
>> data.
>>
>> I can see that there's a resolvingDecoder function in AVRO CPP that takes
>> two schemas. See
>>
>> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
>>
>> But there's a FIXME comment in this function. See
>> https://issues.apache.org/jira/browse/AVRO-3720 and
>> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
>> Does this mean resolvingDecoder does not work properly? Could you please
>> explain what scenarios are not covered by resolvingDecoder and how can we
>> use it to support "Avro Schema Evolution" in c++?
>>
>> Thanks
>>
>>
>> Regards,
>> Vivek Kumar
>>
>> [http://www.eclipsetrading.com/logo.png]
>>
>> Senior Software Developer
>> 23/F One Hennessy
>> 1 Hennessy Road
>> Wan Chai
>> Hong Kong
>> www.eclipsetrading.com<http://www.eclipsetrading.com/>
>> +852 2108 7352
>>
>> Follow us today on our online platforms
>> [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
>> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
>> https://www.instagram.com/eclipsetrading>
>>
>


Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread Andrew Marlow
"In practice, it is very rare for schema evolution to change the order of
the fields." - I'll say. Since we are talking about a protocol that is
deliberately not self-describing we cannot just pluck out what we want -
how would such code get to it? This is why the standard advice in these
situations is to never reorder, remove or rename fields and to always add
new stuff to the end.

On Fri, 12 Jan 2024 at 13:19, Thiruvalluvan MG 
wrote:

>  Out-of-order fields are not handled transparently in C++ if you are
> manually using the resolving decoder. (It's the same situation in Java as
> well).
> But, in C++ and in Java, if you generate code for the given Avro schema,
> the generated code takes care of the field ordering issue. Similarly, in
> both bindings, Generic data structures work properly with the field-order.
> If you are using the resolving decoder in your code directly, care must be
> exercises, If the reader schema and writer schema are both records and they
> have fields in different order (it is okay to insert or remove fields), the
> protocol is to first get the field-order array (which is essentially a
> permutation of the reader field ids 0 to n-1) from the resolving decoder
> and then read the fields of the reader schema in the order specified in the
> field-order array. This is done in order to avoid buffering by the decoder.
> Buffering can take a large number of allocations if the out-of-order fields
> is an array or map.
> In practice, it is very rare for schema evolution to change the order of
> the fields.
> Thanks
> ThiruOn Friday, 12 January, 2024 at 03:24:11 pm IST, Martin Grigorov <
> mgrigo...@apache.org> wrote:
>
>  Hi Vivek,
>
> I am not sure there is anyone to give you an exact answer. The C++ SDK has
> not been actively developed in the last few years.
> The best is to try it for your use cases and see if it works or not. The
> next step is to contribute Pull Requests for the missing functionalities!
>
> Martin
>
> On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
> vivek.ku...@eclipsetrading.com.invalid> wrote:
>
> > +dev
> >
> >
> > Regards,
> > Vivek Kumar
> >
> > [http://www.eclipsetrading.com/logo.png]
> >
> > Senior Software Developer
> > 23/F One Hennessy
> > 1 Hennessy Road
> > Wan Chai
> > Hong Kong
> > www.eclipsetrading.com<http://www.eclipsetrading.com/>
> > +852 2108 7352
> >
> > Follow us today on our online platforms
> > [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
> > https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> > https://www.instagram.com/eclipsetrading>
> > ____
> > From: Vivek Kumar 
> > Sent: Thursday, January 11, 2024 11:07 AM
> > To: user@avro.apache.org 
> > Subject: Avro schema evolution support in AVRO CPP
> >
> > Hi Avro team,
> >
> > I am writing this email to check the support of Avro schema evolution in
> > CPP - i.e. provide both the producer and consumer schema when decoding
> the
> > data.
> >
> > I can see that there's a resolvingDecoder function in AVRO CPP that takes
> > two schemas. See
> >
> >
> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
> >
> > But there's a FIXME comment in this function. See
> > https://issues.apache.org/jira/browse/AVRO-3720 and
> >
> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
> > Does this mean resolvingDecoder does not work properly? Could you please
> > explain what scenarios are not covered by resolvingDecoder and how can we
> > use it to support "Avro Schema Evolution" in c++?
> >
> > Thanks
> >
> >
> > Regards,
> > Vivek Kumar
> >
> > [http://www.eclipsetrading.com/logo.png]
> >
> > Senior Software Developer
> > 23/F One Hennessy
> > 1 Hennessy Road
> > Wan Chai
> > Hong Kong
> > www.eclipsetrading.com<http://www.eclipsetrading.com/>
> > +852 2108 7352
> >
> > Follow us today on our online platforms
> > [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
> > https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> > https://www.instagram.com/eclipsetrading>
> >
>



-- 
Regards,

Andrew Marlow
http://www.andrewpetermarlow.co.uk


Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread Thiruvalluvan MG via user
 Out-of-order fields are not handled transparently in C++ if you are manually 
using the resolving decoder. (It's the same situation in Java as well).
But, in C++ and in Java, if you generate code for the given Avro schema, the 
generated code takes care of the field ordering issue. Similarly, in both 
bindings, Generic data structures work properly with the field-order.
If you are using the resolving decoder in your code directly, care must be 
exercises, If the reader schema and writer schema are both records and they 
have fields in different order (it is okay to insert or remove fields), the 
protocol is to first get the field-order array (which is essentially a 
permutation of the reader field ids 0 to n-1) from the resolving decoder and 
then read the fields of the reader schema in the order specified in the 
field-order array. This is done in order to avoid buffering by the decoder. 
Buffering can take a large number of allocations if the out-of-order fields is 
an array or map.
In practice, it is very rare for schema evolution to change the order of the 
fields.
Thanks
ThiruOn Friday, 12 January, 2024 at 03:24:11 pm IST, Martin Grigorov 
 wrote:  
 
 Hi Vivek,

I am not sure there is anyone to give you an exact answer. The C++ SDK has
not been actively developed in the last few years.
The best is to try it for your use cases and see if it works or not. The
next step is to contribute Pull Requests for the missing functionalities!

Martin

On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
vivek.ku...@eclipsetrading.com.invalid> wrote:

> +dev
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com<http://www.eclipsetrading.com/>
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
> 
> From: Vivek Kumar 
> Sent: Thursday, January 11, 2024 11:07 AM
> To: user@avro.apache.org 
> Subject: Avro schema evolution support in AVRO CPP
>
> Hi Avro team,
>
> I am writing this email to check the support of Avro schema evolution in
> CPP - i.e. provide both the producer and consumer schema when decoding the
> data.
>
> I can see that there's a resolvingDecoder function in AVRO CPP that takes
> two schemas. See
>
> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
>
> But there's a FIXME comment in this function. See
> https://issues.apache.org/jira/browse/AVRO-3720 and
> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
> Does this mean resolvingDecoder does not work properly? Could you please
> explain what scenarios are not covered by resolvingDecoder and how can we
> use it to support "Avro Schema Evolution" in c++?
>
> Thanks
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com<http://www.eclipsetrading.com/>
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
>
  

Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread Martin Grigorov
Hi Vivek,

I am not sure there is anyone to give you an exact answer. The C++ SDK has
not been actively developed in the last few years.
The best is to try it for your use cases and see if it works or not. The
next step is to contribute Pull Requests for the missing functionalities!

Martin

On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
vivek.ku...@eclipsetrading.com.invalid> wrote:

> +dev
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com<http://www.eclipsetrading.com/>
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
> 
> From: Vivek Kumar 
> Sent: Thursday, January 11, 2024 11:07 AM
> To: user@avro.apache.org 
> Subject: Avro schema evolution support in AVRO CPP
>
> Hi Avro team,
>
> I am writing this email to check the support of Avro schema evolution in
> CPP - i.e. provide both the producer and consumer schema when decoding the
> data.
>
> I can see that there's a resolvingDecoder function in AVRO CPP that takes
> two schemas. See
>
> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
>
> But there's a FIXME comment in this function. See
> https://issues.apache.org/jira/browse/AVRO-3720 and
> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
> Does this mean resolvingDecoder does not work properly? Could you please
> explain what scenarios are not covered by resolvingDecoder and how can we
> use it to support "Avro Schema Evolution" in c++?
>
> Thanks
>
>
> Regards,
> Vivek Kumar
>
> [http://www.eclipsetrading.com/logo.png]
>
> Senior Software Developer
> 23/F One Hennessy
> 1 Hennessy Road
> Wan Chai
> Hong Kong
> www.eclipsetrading.com<http://www.eclipsetrading.com/>
> +852 2108 7352
>
> Follow us today on our online platforms
> [Facebook]<https://www.facebook.com/eclipsetrading/>[Linked-In]<
> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
> https://www.instagram.com/eclipsetrading>
>


Re: Manageable avro schema evolution in Java

2022-06-27 Thread Juan Cruz Viotti
Hi Niels,

Thanks a lot for sharing. Very interesting talk! Adding thumbs up :)

One comment around JSON Schema: in the talk you mention that JSON Schema
is still on beta given it is a draft. 

While JSON Schema is a "draft" from the point of view of IETF, it is
considered production-ready and the industry-standard for describing
JSON documents already. We hope to start publishing the JSON Schema
standard outside of IETF at some point to be able to workaround this
common perceptions problem!

We are starting to document use cases of JSON Schema in production on
YouTube:
https://www.youtube.com/playlist?list=PLHVhS4Tj1YZOrrvl7_a9LaBAtst7BWH8a.

-- 
Juan Cruz Viotti
Technical Lead @ Postman.com
https://www.jviotti.com


Manageable avro schema evolution in Java

2022-06-27 Thread Niels Basjes
Hi,

Back in 2019 I spoke at the Datawork Summit conference about using Avro for
schema evolution in streaming scenarios.  https://youtu.be/QOdhaEHbSZM

Recently a few people asked me how to actually do this in a practical way.

To facilitate this I have created an "as clean as possible" demonstrator
project that shows how I think this can be done. Note that this is only
intended to show a _possible_ way of doing this.

https://github.com/nielsbasjes/avro-schema-example

Note that the commit history is also part of the demonstration !

I would love to hear your feedback, comments, improvement suggestions, etc.

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: Unable to get Avro schema alias working for an array item

2022-01-15 Thread Spencer Lu
We're using Avro 1.7.4 with Apache Flume 1.9.0, and have written a
Flume interceptor in Java to handle the deserializing with the old
schema and the reserializing with the new schema.

In the interceptor, we have the following code to deserialize:

GenericDatumReader reader = new
GenericDatumReader<>(oldSchema, newSchema);
BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(event.getBody(), null);
GenericRecord record = reader.read(null, decoder);

Then the following code to reserialize:

ByteArrayOutputStream outStream = new ByteArrayOutputStream();
GenericDatumWriter writer = new GenericDatumWriter<>(newSchema);
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outStream, null);
writer.write(record, encoder);
encoder.flush();
event.setBody(outStream.toByteArray());


On Sat, Jan 15, 2022 at 2:33 PM Spencer Nelson  wrote:
>
> This should work according to the spec. What language and Avro library are 
> you using, and with what version?
>
> Aliases are a bit tricky to use correctly. When deserializing, you may need 
> to indicate the writer’s schema as using oldFieldName1 and oldFieldName2, 
> while the reader schema uses newFieldName1 and newFieldName2. In other words, 
> you may need to provide both the old and new schemas to the deserializer. 
> This is just built in to how aliases work (
> https://avro.apache.org/docs/current/spec.html#Aliases). This may be a little 
> abstract and unclear; it’s easier to describe in the context of a particular 
> language.
>
>
> On Sat, Jan 15, 2022 at 8:37 AM Spencer Lu  wrote:
>>
>> Hi everyone,
>>
>> We have an application that receives Avro data, and it needs to rename
>> certain fields in the data before sending it downstream. The
>> application is using the following Avro schema to send the data
>> downstream (note that 2 of the fields have aliases defined):
>>
>> {
>> "name":"MyCompanyRecordAvroEncoder",
>> "aliases":["com.mycompany.avro.MyStats"],
>> "type":"record",
>> "fields":[
>> {"name":"newFieldName1","type":["null",
>> "int"],"default":null,"aliases":["oldFieldName1"]}
>> 
>> {"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
>> : [
>> {"name":"recordId","type":"long"},
>> {"name":"recordName","type":["null", "string"],"default":null},
>> {"name":"newFieldName2","type":["null",
>> "string"],"default":null,"aliases":["oldFieldName2"]}
>> ]}}, "default": []}
>> ]
>> }
>>
>> We see that our application receives the following Avro data:
>>
>> {
>> "oldFieldName1": 300,
>> "statusRecords": [
>> {
>> "recordId": 100,
>> "recordName": "Record1",
>> "oldFieldName2":
>> "{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
>> }
>> ]
>> }
>>
>> Then the application sends the following Avro data downstream:
>>
>> {
>>  "newFieldName1": 300,
>>  "statusRecords": [
>>  {
>>  "recordId": 100,
>>  "recordName": "Record1",
>>  "newFieldName2": null
>>  }
>>  ]
>> }
>>
>> As you can see, newFieldName1 is aliased to oldFieldName1 and has the
>> value from oldFieldName1, so its alias is working.
>>
>> However, newFieldName2 is aliased to oldFieldName2, but it is null
>> instead of having the value from oldFieldName2, so its alias is not
>> working.
>>
>> The only difference I see between newFieldName1 and newFieldName2 is
>> that newFieldName2 is a field within an array item. Do aliases not
>> work for fields in array items? Or is there some other issue?
>>
>> Any idea how can I get the alias for newFieldName2 to work?
>>
>> Thanks,
>> Spencer


Re: Unable to get Avro schema alias working for an array item

2022-01-15 Thread Spencer Nelson
This should work according to the spec. What language and Avro library are
you using, and with what version?

Aliases are a bit tricky to use correctly. When deserializing, you may need
to indicate the writer’s schema as using oldFieldName1 and oldFieldName2,
while the reader schema uses newFieldName1 and newFieldName2. In other
words, you may need to provide both the old and new schemas to the
deserializer. This is just built in to how aliases work (
https://avro.apache.org/docs/current/spec.html#Aliases). This may be a
little abstract and unclear; it’s easier to describe in the context of a
particular language.


On Sat, Jan 15, 2022 at 8:37 AM Spencer Lu  wrote:

> Hi everyone,
>
> We have an application that receives Avro data, and it needs to rename
> certain fields in the data before sending it downstream. The
> application is using the following Avro schema to send the data
> downstream (note that 2 of the fields have aliases defined):
>
> {
> "name":"MyCompanyRecordAvroEncoder",
> "aliases":["com.mycompany.avro.MyStats"],
> "type":"record",
> "fields":[
> {"name":"newFieldName1","type":["null",
> "int"],"default":null,"aliases":["oldFieldName1"]}
>
> {"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
> : [
> {"name":"recordId","type":"long"},
> {"name":"recordName","type":["null", "string"],"default":null},
> {"name":"newFieldName2","type":["null",
> "string"],"default":null,"aliases":["oldFieldName2"]}
> ]}}, "default": []}
> ]
> }
>
> We see that our application receives the following Avro data:
>
> {
> "oldFieldName1": 300,
> "statusRecords": [
> {
> "recordId": 100,
> "recordName": "Record1",
> "oldFieldName2":
>
> "{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
> }
> ]
> }
>
> Then the application sends the following Avro data downstream:
>
> {
>  "newFieldName1": 300,
>  "statusRecords": [
>  {
>  "recordId": 100,
>  "recordName": "Record1",
>  "newFieldName2": null
>  }
>  ]
> }
>
> As you can see, newFieldName1 is aliased to oldFieldName1 and has the
> value from oldFieldName1, so its alias is working.
>
> However, newFieldName2 is aliased to oldFieldName2, but it is null
> instead of having the value from oldFieldName2, so its alias is not
> working.
>
> The only difference I see between newFieldName1 and newFieldName2 is
> that newFieldName2 is a field within an array item. Do aliases not
> work for fields in array items? Or is there some other issue?
>
> Any idea how can I get the alias for newFieldName2 to work?
>
> Thanks,
> Spencer
>


Unable to get Avro schema alias working for an array item

2022-01-15 Thread Spencer Lu
Hi everyone,

We have an application that receives Avro data, and it needs to rename
certain fields in the data before sending it downstream. The
application is using the following Avro schema to send the data
downstream (note that 2 of the fields have aliases defined):

{
"name":"MyCompanyRecordAvroEncoder",
"aliases":["com.mycompany.avro.MyStats"],
"type":"record",
"fields":[
{"name":"newFieldName1","type":["null",
"int"],"default":null,"aliases":["oldFieldName1"]}

{"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
: [
{"name":"recordId","type":"long"},
{"name":"recordName","type":["null", "string"],"default":null},
{"name":"newFieldName2","type":["null",
"string"],"default":null,"aliases":["oldFieldName2"]}
]}}, "default": []}
]
}

We see that our application receives the following Avro data:

{
"oldFieldName1": 300,
"statusRecords": [
{
"recordId": 100,
"recordName": "Record1",
"oldFieldName2":
"{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
}
]
}

Then the application sends the following Avro data downstream:

{
 "newFieldName1": 300,
 "statusRecords": [
 {
 "recordId": 100,
 "recordName": "Record1",
 "newFieldName2": null
 }
 ]
}

As you can see, newFieldName1 is aliased to oldFieldName1 and has the
value from oldFieldName1, so its alias is working.

However, newFieldName2 is aliased to oldFieldName2, but it is null
instead of having the value from oldFieldName2, so its alias is not
working.

The only difference I see between newFieldName1 and newFieldName2 is
that newFieldName2 is a field within an array item. Do aliases not
work for fields in array items? Or is there some other issue?

Any idea how can I get the alias for newFieldName2 to work?

Thanks,
Spencer


Re: Getting ArrayIndexOutOfBoundException when decoding byte-array with new avro schema

2021-03-07 Thread Prateek Rajput
Please ignore, I have found the solution.

*Regards,*
*Prateek Rajput* 


On Sun, Mar 7, 2021 at 10:03 PM Prateek Rajput 
wrote:

> Hi Everyone,
> In our use-case we encode avro pojo to byteArray and then push it to
> Kafka, but when I am trying to decode this with slightly updated avro pojo
> (only one new Double field is added with default value 0), I am
> getting ArrayIndexOutOfBoundException.
> Now I know this might be expected as we are not transferring avro schema
> in any header, so the exception might be legit.
> But I do not want any down time in my system and want to deploy my changes
> with new pojo. Is there any way I can do that.
> For reference... Our Utility class for encoding/decoding
>
> import org.apache.avro.file.SeekableByteArrayInput;
> import org.apache.avro.io.BinaryDecoder;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.DecoderFactory;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.specific.SpecificDatumReader;
> import org.apache.avro.specific.SpecificDatumWriter;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
>
> public class AvroUtils {
>
> public static  byte[] encode(C value, Class c) throws IOException {
> ByteArrayOutputStream byteArrayOutputStream = new 
> ByteArrayOutputStream();
> BinaryEncoder binaryEncoder = 
> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
> SpecificDatumWriter datumWriter = new SpecificDatumWriter<>(c);
> datumWriter.write(value, binaryEncoder);
> binaryEncoder.flush();
> return byteArrayOutputStream.toByteArray();
> }
>
> public static  C decode(byte[] byteArray, Class c) throws 
> IOException {
> SeekableByteArrayInput byteArrayInputStream = new 
> SeekableByteArrayInput(byteArray);
> BinaryDecoder binaryDecoder = 
> DecoderFactory.get().binaryDecoder(byteArrayInputStream, null);
> SpecificDatumReader datumReader = new SpecificDatumReader<>(c);
> return datumReader.read(null, binaryDecoder);
> }
> }
>
> Avro version - 1.7.7
>
>
> *Regards,*
> *Prateek Rajput* 
>

-- 


*-*

*This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the 
system manager. This message contains confidential information and is 
intended only for the individual named. If you are not the named addressee, 
you should not disseminate, distribute or copy this email. Please notify 
the sender immediately by email if you have received this email by mistake 
and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.*

 

*Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those 
of the organization. Any information on shares, debentures or similar 
instruments, recommended product pricing, valuations and the like are for 
information purposes only. It is not meant to be an instruction or 
recommendation, as the case may be, to buy or to sell securities, products, 
services nor an offer to buy or sell securities, products or services 
unless specifically stated to be so on behalf of the Flipkart group. 
Employees of the Flipkart group of companies are expressly required not to 
make defamatory statements and not to infringe or authorise any 
infringement of copyright or any other legal right by email communications. 
Any such communication is contrary to organizational policy and outside the 
scope of the employment of the individual concerned. The organization will 
not accept any liability in respect of such communication, and the employee 
responsible will be personally liable for any damages or other liability 
arising.*

 

*Our organization accepts no liability for the 
content of this email, or for the consequences of any actions taken on the 
basis of the information *provided,* unless that information is 
subsequently confirmed in writing. If you are not the intended recipient, 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.*

_-_


Getting ArrayIndexOutOfBoundException when decoding byte-array with new avro schema

2021-03-07 Thread Prateek Rajput
Hi Everyone,
In our use-case we encode avro pojo to byteArray and then push it to Kafka,
but when I am trying to decode this with slightly updated avro pojo (only
one new Double field is added with default value 0), I am
getting ArrayIndexOutOfBoundException.
Now I know this might be expected as we are not transferring avro schema in
any header, so the exception might be legit.
But I do not want any down time in my system and want to deploy my changes
with new pojo. Is there any way I can do that.
For reference... Our Utility class for encoding/decoding

import org.apache.avro.file.SeekableByteArrayInput;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.BinaryEncoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

public class AvroUtils {

public static  byte[] encode(C value, Class c) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
BinaryEncoder binaryEncoder =
EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
SpecificDatumWriter datumWriter = new SpecificDatumWriter<>(c);
datumWriter.write(value, binaryEncoder);
binaryEncoder.flush();
return byteArrayOutputStream.toByteArray();
}

public static  C decode(byte[] byteArray, Class c) throws
IOException {
SeekableByteArrayInput byteArrayInputStream = new
SeekableByteArrayInput(byteArray);
BinaryDecoder binaryDecoder =
DecoderFactory.get().binaryDecoder(byteArrayInputStream, null);
SpecificDatumReader datumReader = new SpecificDatumReader<>(c);
return datumReader.read(null, binaryDecoder);
}
}

Avro version - 1.7.7


*Regards,*
*Prateek Rajput* 

-- 


*-*

*This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the 
system manager. This message contains confidential information and is 
intended only for the individual named. If you are not the named addressee, 
you should not disseminate, distribute or copy this email. Please notify 
the sender immediately by email if you have received this email by mistake 
and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.*

 

*Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those 
of the organization. Any information on shares, debentures or similar 
instruments, recommended product pricing, valuations and the like are for 
information purposes only. It is not meant to be an instruction or 
recommendation, as the case may be, to buy or to sell securities, products, 
services nor an offer to buy or sell securities, products or services 
unless specifically stated to be so on behalf of the Flipkart group. 
Employees of the Flipkart group of companies are expressly required not to 
make defamatory statements and not to infringe or authorise any 
infringement of copyright or any other legal right by email communications. 
Any such communication is contrary to organizational policy and outside the 
scope of the employment of the individual concerned. The organization will 
not accept any liability in respect of such communication, and the employee 
responsible will be personally liable for any damages or other liability 
arising.*

 

*Our organization accepts no liability for the 
content of this email, or for the consequences of any actions taken on the 
basis of the information *provided,* unless that information is 
subsequently confirmed in writing. If you are not the intended recipient, 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.*

_-_


Re: Announcement: Avro Schema Viewer

2019-09-22 Thread Driesprong, Fokko
Awesome work Robin, thanks for sharing!

Cheers, Fokko

Op za 21 sep. 2019 om 19:57 schreef Robin Trietsch :

> Hey Brian,
>
> Thanks! At the moment it only supports one top level schema with different
> versions. But feel free to implement it :)
>
> Regards,
> Robin
>
> On 21 Sep 2019, at 18:39, Brian Lachniet  wrote:
>
> This is really cool, thank you for sharing! I can't wait to try this out
> on our schemas at work.
>
> Does this support multiple, top-level schemas? Or does it support only
> multiple versions of one top-level schema?
>
> On Fri, Sep 20, 2019 at 4:42 AM Robin Trietsch 
> wrote:
>
>> Hi Avro users!
>>
>> We'd like to introduce you to a tool that we built at bol.com (largest
>> online retailer of the Netherlands), that can be used to visualize Avro
>> schemas (in *.avsc* format). Below is a screenshot of an example
>> deployment.
>>
>> 
>> Try it out yourself at: https://bolcom.github.io/avro-schema-viewer
>> Learn more and view the source at:
>> https://github.com/bolcom/avro-schema-viewer
>>
>> Regards,
>> Mike Junger
>> Robin Trietsch
>>
>
>
> --
> [image: 51b630b05e01a6d5134ccfd520f547c4.png]
> Brian Lachniet
> Software Engineer
> E: blachn...@gmail.com | blachniet.com <http://www.blachniet.com/>
> <https://twitter.com/blachniet> <http://www.linkedin.com/in/blachniet>
>
>
>


Re: Announcement: Avro Schema Viewer

2019-09-21 Thread Robin Trietsch
Hey Brian,

Thanks! At the moment it only supports one top level schema with different 
versions. But feel free to implement it :)

Regards,
Robin

> On 21 Sep 2019, at 18:39, Brian Lachniet  wrote:
> 
> This is really cool, thank you for sharing! I can't wait to try this out on 
> our schemas at work.
> 
> Does this support multiple, top-level schemas? Or does it support only 
> multiple versions of one top-level schema?
> 
> On Fri, Sep 20, 2019 at 4:42 AM Robin Trietsch  <mailto:robin.triet...@gmail.com>> wrote:
> Hi Avro users!
> 
> We'd like to introduce you to a tool that we built at bol.com 
> <http://bol.com/> (largest online retailer of the Netherlands), that can be 
> used to visualize Avro schemas (in .avsc format). Below is a screenshot of an 
> example deployment.
> 
> 
> Try it out yourself at: https://bolcom.github.io/avro-schema-viewer 
> <https://bolcom.github.io/avro-schema-viewer>
> Learn more and view the source at: 
> https://github.com/bolcom/avro-schema-viewer 
> <https://github.com/bolcom/avro-schema-viewer>
> 
> Regards,
> Mike Junger
> Robin Trietsch
> 
> 
> -- 
> 
> Brian Lachniet
> Software Engineer
> E: blachn...@gmail.com <mailto:blachn...@gmail.com> | blachniet.com 
> <http://www.blachniet.com/>
>  <https://twitter.com/blachniet>  <http://www.linkedin.com/in/blachniet>


Re: Avro schema having Map of Records

2019-08-07 Thread Edgar H
Seems like the right time to share some Parquet vs Avro knowledge haha :)

My god, exactly what you said! Untyped List within a POJO, problem solved.
BTW, it was using ReflectData.getSchema().

Thanks a lot Ryan! Really appreciated!

El mar., 6 ago. 2019 a las 17:35, Ryan Skraba () escribió:

> Funny, I'm familiar with Avro, but I'm currently looking closely at
> Parquet!
>
> Interestingly enough, I just ran across the conversion utilities in
> Spark that could have answered your original question[1].
>
> It looks like you're using ReflectData to get the schema.  Is the
> exception occurring during the ReflectData.getSchema() or .induce() ?
> Can you share the full stack trace or better yet, the POJO that
> reproduces the error?
>
> I _think_ I may have ran across something similar when getting a
> schema via reflection, but the class had a raw collection field (List
> instead of List).  I can't clearly recall, but that might be
> a useful hint.
>
> [1]:
> https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L136
>
> On Tue, Aug 6, 2019 at 2:39 PM Edgar H  wrote:
> >
> > Thanks a lot for the quick reply Ryan! That was exactly what I was
> looking for :)
> >
> > Been trying including the changes within my code and currently it's
> throwing the following exception... Caused by:
> org.apache.avro.AvroRuntimeException: Can't find element type of Collection
> >
> > I'm thinking that it could be the POJO not containing the classes for
> the inner record fields (I just have a getter and setter for the one_level
> field but the rest are types of that one)? Or how should it be represented
> within the parent POJO?
> >
> > Sorry if the questions sound too simple, but I'm too used to work with
> Parquet that Avro seems like a shift from time to time :)
> >
> > El mar., 6 ago. 2019 a las 12:01, Ryan Skraba ()
> escribió:
> >>
> >> Hello -- Avro supports a map type:
> >> https://avro.apache.org/docs/1.9.0/spec.html#Maps
> >>
> >> Generating an Avro schema from a JSON example can be ambiguous since a
> >> JSON object can either be converted to a record or a map.  You're
> >> probably looking for something like this:
> >>
> >> {
> >>   "type" : "record",
> >>   "name" : "MyClass",
> >>   "namespace" : "com.acme.avro",
> >>   "fields" : [ {
> >> "name" : "one_level",
> >> "type" : {
> >>   "type" : "record",
> >>   "name" : "one_level",
> >>   "fields" : [ {
> >> "name" : "inner_level",
> >> "type" : {
> >>   "type" : "map",
> >>   "values" : {
> >> "type" : "record",
> >> "name" : "sample",
> >> "fields" : [ {
> >>   "name" : "sample1",
> >>   "type" : "string"
> >> }, {
> >>   "name" : "sample2",
> >>   "type" : "string"
> >> } ]
> >>   }
> >> }
> >>   } ]
> >> }
> >>   } ]
> >> }
> >>
> >> On Tue, Aug 6, 2019 at 10:47 AM Edgar H  wrote:
> >> >
> >> > I'm trying to translate a schema that I have in Spark which is
> defined for Parquet, and I would like to use it within Avro too.
> >> >
> >> >   StructField("one_level", StructType(List(StructField(
> >> > "inner_level",
> >> > MapType(
> >> >   StringType,
> >> >   StructType(
> >> > List(
> >> >   StructField("field1", StringType),
> >> >   StructField("field2", ArrayType(StringType))
> >> > )
> >> >   )
> >> > )
> >> >   )
> >> > )), nullable = false)
> >> >
> >> > However, in Avro I haven't seen any examples of Maps containing
> Record type objects...
> >> >
> >> > Tried a sample input with an online Avro schema generator, taking
> this input.
> >> >
> >> > {
> >> > "one_level": {
>

Re: Avro schema having Map of Records

2019-08-06 Thread Ryan Skraba
Funny, I'm familiar with Avro, but I'm currently looking closely at Parquet!

Interestingly enough, I just ran across the conversion utilities in
Spark that could have answered your original question[1].

It looks like you're using ReflectData to get the schema.  Is the
exception occurring during the ReflectData.getSchema() or .induce() ?
Can you share the full stack trace or better yet, the POJO that
reproduces the error?

I _think_ I may have ran across something similar when getting a
schema via reflection, but the class had a raw collection field (List
instead of List).  I can't clearly recall, but that might be
a useful hint.

[1]: 
https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L136

On Tue, Aug 6, 2019 at 2:39 PM Edgar H  wrote:
>
> Thanks a lot for the quick reply Ryan! That was exactly what I was looking 
> for :)
>
> Been trying including the changes within my code and currently it's throwing 
> the following exception... Caused by: org.apache.avro.AvroRuntimeException: 
> Can't find element type of Collection
>
> I'm thinking that it could be the POJO not containing the classes for the 
> inner record fields (I just have a getter and setter for the one_level field 
> but the rest are types of that one)? Or how should it be represented within 
> the parent POJO?
>
> Sorry if the questions sound too simple, but I'm too used to work with 
> Parquet that Avro seems like a shift from time to time :)
>
> El mar., 6 ago. 2019 a las 12:01, Ryan Skraba () escribió:
>>
>> Hello -- Avro supports a map type:
>> https://avro.apache.org/docs/1.9.0/spec.html#Maps
>>
>> Generating an Avro schema from a JSON example can be ambiguous since a
>> JSON object can either be converted to a record or a map.  You're
>> probably looking for something like this:
>>
>> {
>>   "type" : "record",
>>   "name" : "MyClass",
>>   "namespace" : "com.acme.avro",
>>   "fields" : [ {
>> "name" : "one_level",
>> "type" : {
>>   "type" : "record",
>>   "name" : "one_level",
>>   "fields" : [ {
>> "name" : "inner_level",
>> "type" : {
>>   "type" : "map",
>>   "values" : {
>> "type" : "record",
>> "name" : "sample",
>> "fields" : [ {
>>   "name" : "sample1",
>>   "type" : "string"
>> }, {
>>   "name" : "sample2",
>>   "type" : "string"
>> } ]
>>   }
>> }
>>   } ]
>> }
>>   } ]
>> }
>>
>> On Tue, Aug 6, 2019 at 10:47 AM Edgar H  wrote:
>> >
>> > I'm trying to translate a schema that I have in Spark which is defined for 
>> > Parquet, and I would like to use it within Avro too.
>> >
>> >   StructField("one_level", StructType(List(StructField(
>> > "inner_level",
>> > MapType(
>> >   StringType,
>> >   StructType(
>> > List(
>> >   StructField("field1", StringType),
>> >   StructField("field2", ArrayType(StringType))
>> > )
>> >   )
>> > )
>> >   )
>> > )), nullable = false)
>> >
>> > However, in Avro I haven't seen any examples of Maps containing Record 
>> > type objects...
>> >
>> > Tried a sample input with an online Avro schema generator, taking this 
>> > input.
>> >
>> > {
>> > "one_level": {
>> > "inner_level": {
>> > "sample1": {
>> > "field1": "sample",
>> > "field2": ["a", "b"],
>> > },
>> > "sample2": {
>> > "field1": "sample2",
>> > "field2": ["a", "b"]
>> > }
>> > }
>> > }
>> >
>> > }
>> >
>> > It prompts this output.
>> >
>> > {
>> >   "name": "MyClass",
>> >   "type": "record",
>> >  

Re: Avro schema having Map of Records

2019-08-06 Thread Edgar H
Thanks a lot for the quick reply Ryan! That was exactly what I was looking
for :)

Been trying including the changes within my code and currently it's
throwing the following exception... Caused by:
org.apache.avro.AvroRuntimeException: Can't find element type of Collection

I'm thinking that it could be the POJO not containing the classes for the
inner record fields (I just have a getter and setter for the one_level
field but the rest are types of that one)? Or how should it be represented
within the parent POJO?

Sorry if the questions sound too simple, but I'm too used to work with
Parquet that Avro seems like a shift from time to time :)

El mar., 6 ago. 2019 a las 12:01, Ryan Skraba () escribió:

> Hello -- Avro supports a map type:
> https://avro.apache.org/docs/1.9.0/spec.html#Maps
>
> Generating an Avro schema from a JSON example can be ambiguous since a
> JSON object can either be converted to a record or a map.  You're
> probably looking for something like this:
>
> {
>   "type" : "record",
>   "name" : "MyClass",
>   "namespace" : "com.acme.avro",
>   "fields" : [ {
> "name" : "one_level",
> "type" : {
>   "type" : "record",
>   "name" : "one_level",
>   "fields" : [ {
> "name" : "inner_level",
> "type" : {
>   "type" : "map",
>   "values" : {
> "type" : "record",
> "name" : "sample",
> "fields" : [ {
>   "name" : "sample1",
>   "type" : "string"
> }, {
>   "name" : "sample2",
>   "type" : "string"
> } ]
>   }
> }
>   } ]
> }
>   } ]
> }
>
> On Tue, Aug 6, 2019 at 10:47 AM Edgar H  wrote:
> >
> > I'm trying to translate a schema that I have in Spark which is defined
> for Parquet, and I would like to use it within Avro too.
> >
> >   StructField("one_level", StructType(List(StructField(
> > "inner_level",
> > MapType(
> >   StringType,
> >   StructType(
> > List(
> >   StructField("field1", StringType),
> >   StructField("field2", ArrayType(StringType))
> > )
> >   )
> > )
> >   )
> > )), nullable = false)
> >
> > However, in Avro I haven't seen any examples of Maps containing Record
> type objects...
> >
> > Tried a sample input with an online Avro schema generator, taking this
> input.
> >
> > {
> > "one_level": {
> > "inner_level": {
> > "sample1": {
> > "field1": "sample",
> > "field2": ["a", "b"],
> > },
> > "sample2": {
> > "field1": "sample2",
> > "field2": ["a", "b"]
> > }
> > }
> > }
> >
> > }
> >
> > It prompts this output.
> >
> > {
> >   "name": "MyClass",
> >   "type": "record",
> >   "namespace": "com.acme.avro",
> >   "fields": [
> > {
> >   "name": "one_level",
> >   "type": {
> > "name": "one_level",
> > "type": "record",
> > "fields": [
> >   {
> > "name": "inner_level",
> > "type": {
> >   "name": "inner_level",
> >   "type": "record",
> >   "fields": [
> > {
> >   "name": "sample1",
> >   "type": {
> > "name": "sample1",
> > "type": "record",
> > "fields": [
> >   {
> > "name": "field1",
> > "type": "string"
> >   },
> >   {
> > "name": "field2",
> > "type": {
> >   "type": "array",
> >   "items": "string"
> > }
> >   }
> > ]
> >   }
> > },
> > {
> >   "name": "sample2",
> >   "type": {
> > "name": "sample2",
> > "type": "record",
> > "fields": [
> >   {
> > "name": "field1",
> > "type": "string"
> >   },
> >   {
> > "name": "field2",
> > "type": {
> >   "type": "array",
> >   "items": "string"
> > }
> >   }
> > ]
> >   }
> > }
> >   ]
> > }
> >   }
> > ]
> >   }
> > }
> >   ]
> > }
> >
> > Which isn't absolutely what I'm looking for. Is it possible to define
> such schema in Avro?
>


Re: Avro schema having Map of Records

2019-08-06 Thread Ryan Skraba
Hello -- Avro supports a map type:
https://avro.apache.org/docs/1.9.0/spec.html#Maps

Generating an Avro schema from a JSON example can be ambiguous since a
JSON object can either be converted to a record or a map.  You're
probably looking for something like this:

{
  "type" : "record",
  "name" : "MyClass",
  "namespace" : "com.acme.avro",
  "fields" : [ {
"name" : "one_level",
"type" : {
  "type" : "record",
  "name" : "one_level",
  "fields" : [ {
"name" : "inner_level",
"type" : {
  "type" : "map",
  "values" : {
"type" : "record",
"name" : "sample",
"fields" : [ {
  "name" : "sample1",
  "type" : "string"
}, {
  "name" : "sample2",
  "type" : "string"
} ]
  }
}
  } ]
}
  } ]
}

On Tue, Aug 6, 2019 at 10:47 AM Edgar H  wrote:
>
> I'm trying to translate a schema that I have in Spark which is defined for 
> Parquet, and I would like to use it within Avro too.
>
>   StructField("one_level", StructType(List(StructField(
> "inner_level",
> MapType(
>   StringType,
>   StructType(
> List(
>   StructField("field1", StringType),
>   StructField("field2", ArrayType(StringType))
> )
>   )
> )
>   )
> )), nullable = false)
>
> However, in Avro I haven't seen any examples of Maps containing Record type 
> objects...
>
> Tried a sample input with an online Avro schema generator, taking this input.
>
> {
> "one_level": {
> "inner_level": {
> "sample1": {
> "field1": "sample",
> "field2": ["a", "b"],
> },
> "sample2": {
> "field1": "sample2",
> "field2": ["a", "b"]
> }
> }
> }
>
> }
>
> It prompts this output.
>
> {
>   "name": "MyClass",
>   "type": "record",
>   "namespace": "com.acme.avro",
>   "fields": [
> {
>   "name": "one_level",
>   "type": {
> "name": "one_level",
> "type": "record",
> "fields": [
>   {
> "name": "inner_level",
> "type": {
>   "name": "inner_level",
>   "type": "record",
>   "fields": [
> {
>   "name": "sample1",
>   "type": {
> "name": "sample1",
> "type": "record",
> "fields": [
>   {
> "name": "field1",
> "type": "string"
>   },
>   {
> "name": "field2",
> "type": {
>   "type": "array",
>   "items": "string"
> }
>   }
> ]
>   }
> },
> {
>   "name": "sample2",
>   "type": {
> "name": "sample2",
> "type": "record",
> "fields": [
>   {
> "name": "field1",
> "type": "string"
>   },
>   {
> "name": "field2",
> "type": {
>   "type": "array",
>   "items": "string"
> }
>   }
> ]
>   }
> }
>   ]
> }
>   }
> ]
>   }
> }
>   ]
> }
>
> Which isn't absolutely what I'm looking for. Is it possible to define such 
> schema in Avro?


Avro schema having Map of Records

2019-08-06 Thread Edgar H
I'm trying to translate a schema that I have in Spark which is defined for
Parquet, and I would like to use it within Avro too.

  StructField("one_level", StructType(List(StructField(
"inner_level",
MapType(
  StringType,
  StructType(
List(
  StructField("field1", StringType),
  StructField("field2", ArrayType(StringType))
)
  )
)
  )
)), nullable = false)

However, in Avro I haven't seen any examples of Maps containing Record type
objects...

Tried a sample input with an online Avro schema generator, taking this
input.

{
"one_level": {
"inner_level": {
"sample1": {
"field1": "sample",
"field2": ["a", "b"],
},
"sample2": {
"field1": "sample2",
"field2": ["a", "b"]
}
}
}

}

It prompts this output.

{
  "name": "MyClass",
  "type": "record",
  "namespace": "com.acme.avro",
  "fields": [
{
  "name": "one_level",
  "type": {
"name": "one_level",
"type": "record",
"fields": [
  {
"name": "inner_level",
"type": {
  "name": "inner_level",
  "type": "record",
  "fields": [
{
  "name": "sample1",
  "type": {
"name": "sample1",
"type": "record",
"fields": [
  {
"name": "field1",
"type": "string"
  },
  {
"name": "field2",
"type": {
  "type": "array",
  "items": "string"
}
  }
]
  }
},
{
  "name": "sample2",
  "type": {
"name": "sample2",
"type": "record",
"fields": [
  {
"name": "field1",
"type": "string"
  },
  {
"name": "field2",
"type": {
  "type": "array",
  "items": "string"
}
  }
]
  }
}
  ]
}
  }
]
  }
}
  ]
}

Which isn't absolutely what I'm looking for. Is it possible to define such
schema in Avro?


Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-02 Thread Ryan Skraba
>>>>>>   byte[] v1AsBytes = serialize(new Simple(1, "name1"), true, false);
>>>>>>>>
>>>>>>>>   // Read as Simple v2, same as your method but with the writer and
>>>>>>>> reader schema.
>>>>>>>>   DatumReader datumReader =
>>>>>>>>   new SpecificDatumReader<>(Simple.getClassSchema(),
>>>>>>>> SimpleV2.getClassSchema());
>>>>>>>>   Decoder decoder = DecoderFactory.get().binaryDecoder(v1AsBytes, 
>>>>>>>> null);
>>>>>>>>   SimpleV2 v2 = datumReader.read(null, decoder);
>>>>>>>>
>>>>>>>>   assertThat(v2.getId(), is(1));
>>>>>>>>   assertThat(v2.getName(), is(new Utf8("name1")));
>>>>>>>>   assertThat(v2.getDescription(), nullValue());
>>>>>>>> }
>>>>>>>>
>>>>>>>> This demonstrates with two different schemas and SpecificRecords in
>>>>>>>> the same test, but the same principle applies if it's the same record
>>>>>>>> that has evolved -- you need to know the original schema that wrote
>>>>>>>> the data in order to apply the schema that you're now using for
>>>>>>>> reading.
>>>>>>>>
>>>>>>>> I hope this clarifies what you are looking for!
>>>>>>>>
>>>>>>>> All my best, Ryan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 30, 2019 at 3:30 PM Martin Mucha  
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Thanks for answer.
>>>>>>>> >
>>>>>>>> > Actually I have exactly the same behavior with avro 1.9.0 and 
>>>>>>>> > following deserializer in our other app, which uses strictly avro 
>>>>>>>> > codebase, and failing with same exceptions. So lets leave "allegro" 
>>>>>>>> > library and lots of other tools out of it in our discussion.
>>>>>>>> > I can use whichever aproach. All I need is single way, where I can 
>>>>>>>> > deserialize byte[] into class generated by avro-maven-plugin, and 
>>>>>>>> > which will respect documentation regarding schema evolution. 
>>>>>>>> > Currently we're using following deserializer and serializer, and 
>>>>>>>> > these does not work when it comes to schema evolution. What is the 
>>>>>>>> > correct way to serialize and deserializer avro data?
>>>>>>>> >
>>>>>>>> > I probably don't understand your mention about GenericRecord or 
>>>>>>>> > GenericDatumReader. I tried to use GenericDatumReader in 
>>>>>>>> > deserializer below, but then it seems I got back just 
>>>>>>>> > GenericData$Record instance, which I can use then to access array of 
>>>>>>>> > instances, which is not what I'm looking for(IIUC), since in that 
>>>>>>>> > case I could have just use plain old JSON and deserialize it using 
>>>>>>>> > jackson having no schema evolution problems at all. If that's 
>>>>>>>> > correct, I'd rather stick to SpecificDatumReader, and somehow fix it 
>>>>>>>> > if possible.
>>>>>>>> >
>>>>>>>> > What can be done? Or how schema evolution is intended to be used? I 
>>>>>>>> > found a lots of question searching for this answer.
>>>>>>>> >
>>>>>>>> > thanks!
>>>>>>>> > Martin.
>>>>>>>> >
>>>>>>>> > deserializer:
>>>>>>>> >
>>>>>>>> > public static  T deserialize(Class 
>>>>>>>> > targetType,
>>>>>>>> >
>>>>>>>> > byte[] data,
>>>>>>>> >
>>>>>>>> > boolean useBinaryDecoder) {
>>>>>>>> > try {
>>

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Svante Karlsson
alizer
>>>>>>> below, but then it seems I got back just GenericData$Record instance, 
>>>>>>> which
>>>>>>> I can use then to access array of instances, which is not what I'm 
>>>>>>> looking
>>>>>>> for(IIUC), since in that case I could have just use plain old JSON and
>>>>>>> deserialize it using jackson having no schema evolution problems at 
>>>>>>> all. If
>>>>>>> that's correct, I'd rather stick to SpecificDatumReader, and somehow 
>>>>>>> fix it
>>>>>>> if possible.
>>>>>>> >
>>>>>>> > What can be done? Or how schema evolution is intended to be used?
>>>>>>> I found a lots of question searching for this answer.
>>>>>>> >
>>>>>>> > thanks!
>>>>>>> > Martin.
>>>>>>> >
>>>>>>> > deserializer:
>>>>>>> >
>>>>>>> > public static  T
>>>>>>> deserialize(Class targetType,
>>>>>>> >
>>>>>>> byte[] data,
>>>>>>> >
>>>>>>> boolean useBinaryDecoder) {
>>>>>>> > try {
>>>>>>> > if (data == null) {
>>>>>>> > return null;
>>>>>>> > }
>>>>>>> >
>>>>>>> > log.trace("data='{}'",
>>>>>>> DatatypeConverter.printHexBinary(data));
>>>>>>> >
>>>>>>> > Schema schema = targetType.newInstance().getSchema();
>>>>>>> > DatumReader datumReader = new
>>>>>>> SpecificDatumReader<>(schema);
>>>>>>> > Decoder decoder = useBinaryDecoder
>>>>>>> > ? DecoderFactory.get().binaryDecoder(data,
>>>>>>> null)
>>>>>>> > : DecoderFactory.get().jsonDecoder(schema, new
>>>>>>> String(data));
>>>>>>> >
>>>>>>> > T result = targetType.cast(datumReader.read(null,
>>>>>>> decoder));
>>>>>>> > log.trace("deserialized data='{}'", result);
>>>>>>> > return result;
>>>>>>> > } catch (Exception ex) {
>>>>>>> > throw new SerializationException("Error deserializing
>>>>>>> data", ex);
>>>>>>> > }
>>>>>>> > }
>>>>>>> >
>>>>>>> > serializer:
>>>>>>> > public static  byte[] serialize(T
>>>>>>> data, boolean useBinaryDecoder, boolean pretty) {
>>>>>>> > try {
>>>>>>> > if (data == null) {
>>>>>>> > return new byte[0];
>>>>>>> > }
>>>>>>> >
>>>>>>> > log.debug("data='{}'", data);
>>>>>>> > Schema schema = data.getSchema();
>>>>>>> > ByteArrayOutputStream byteArrayOutputStream = new
>>>>>>> ByteArrayOutputStream();
>>>>>>> > Encoder binaryEncoder = useBinaryDecoder
>>>>>>> > ?
>>>>>>> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
>>>>>>> > : EncoderFactory.get().jsonEncoder(schema,
>>>>>>> byteArrayOutputStream, pretty);
>>>>>>> >
>>>>>>> > DatumWriter datumWriter = new
>>>>>>> GenericDatumWriter<>(schema);
>>>>>>> > datumWriter.write(data, binaryEncoder);
>>>>>>> >
>>>>>>> > binaryEncoder.flush();
>>>>>>> > byteArrayOutputStream.close();
>>>>>>> >
>>>>>>> > byte[] result = byteArrayOutputStream.toByteArray();
>>>>>>> > log.debug("serialized data='{}'",
>>>>

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Martin Mucha
t;>>>>>   // Read as Simple v2, same as your method but with the writer and
>>>>>> reader schema.
>>>>>>   DatumReader datumReader =
>>>>>>   new SpecificDatumReader<>(Simple.getClassSchema(),
>>>>>> SimpleV2.getClassSchema());
>>>>>>   Decoder decoder = DecoderFactory.get().binaryDecoder(v1AsBytes,
>>>>>> null);
>>>>>>   SimpleV2 v2 = datumReader.read(null, decoder);
>>>>>>
>>>>>>   assertThat(v2.getId(), is(1));
>>>>>>   assertThat(v2.getName(), is(new Utf8("name1")));
>>>>>>   assertThat(v2.getDescription(), nullValue());
>>>>>> }
>>>>>>
>>>>>> This demonstrates with two different schemas and SpecificRecords in
>>>>>> the same test, but the same principle applies if it's the same record
>>>>>> that has evolved -- you need to know the original schema that wrote
>>>>>> the data in order to apply the schema that you're now using for
>>>>>> reading.
>>>>>>
>>>>>> I hope this clarifies what you are looking for!
>>>>>>
>>>>>> All my best, Ryan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 30, 2019 at 3:30 PM Martin Mucha 
>>>>>> wrote:
>>>>>> >
>>>>>> > Thanks for answer.
>>>>>> >
>>>>>> > Actually I have exactly the same behavior with avro 1.9.0 and
>>>>>> following deserializer in our other app, which uses strictly avro 
>>>>>> codebase,
>>>>>> and failing with same exceptions. So lets leave "allegro" library and 
>>>>>> lots
>>>>>> of other tools out of it in our discussion.
>>>>>> > I can use whichever aproach. All I need is single way, where I can
>>>>>> deserialize byte[] into class generated by avro-maven-plugin, and which
>>>>>> will respect documentation regarding schema evolution. Currently we're
>>>>>> using following deserializer and serializer, and these does not work when
>>>>>> it comes to schema evolution. What is the correct way to serialize and
>>>>>> deserializer avro data?
>>>>>> >
>>>>>> > I probably don't understand your mention about GenericRecord or
>>>>>> GenericDatumReader. I tried to use GenericDatumReader in deserializer
>>>>>> below, but then it seems I got back just GenericData$Record instance, 
>>>>>> which
>>>>>> I can use then to access array of instances, which is not what I'm 
>>>>>> looking
>>>>>> for(IIUC), since in that case I could have just use plain old JSON and
>>>>>> deserialize it using jackson having no schema evolution problems at all. 
>>>>>> If
>>>>>> that's correct, I'd rather stick to SpecificDatumReader, and somehow fix 
>>>>>> it
>>>>>> if possible.
>>>>>> >
>>>>>> > What can be done? Or how schema evolution is intended to be used? I
>>>>>> found a lots of question searching for this answer.
>>>>>> >
>>>>>> > thanks!
>>>>>> > Martin.
>>>>>> >
>>>>>> > deserializer:
>>>>>> >
>>>>>> > public static  T deserialize(Class
>>>>>> targetType,
>>>>>> >
>>>>>> byte[] data,
>>>>>> >
>>>>>> boolean useBinaryDecoder) {
>>>>>> > try {
>>>>>> > if (data == null) {
>>>>>> > return null;
>>>>>> > }
>>>>>> >
>>>>>> > log.trace("data='{}'",
>>>>>> DatatypeConverter.printHexBinary(data));
>>>>>> >
>>>>>> > Schema schema = targetType.newInstance().getSchema();
>>>>>> > DatumReader datumReader = new
>>>>>> SpecificDatumReader<>(schema);
>>>>>> > Decoder decoder = useBinaryDecoder
>>>>>> > ? DecoderFactory.get().binaryDecoder(data, null)
>>>>>> > 

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Svante Karlsson
t;>> if possible.
>>>>> >
>>>>> > What can be done? Or how schema evolution is intended to be used? I
>>>>> found a lots of question searching for this answer.
>>>>> >
>>>>> > thanks!
>>>>> > Martin.
>>>>> >
>>>>> > deserializer:
>>>>> >
>>>>> > public static  T deserialize(Class
>>>>> targetType,
>>>>> >
>>>>> byte[] data,
>>>>> >
>>>>> boolean useBinaryDecoder) {
>>>>> > try {
>>>>> > if (data == null) {
>>>>> > return null;
>>>>> > }
>>>>> >
>>>>> > log.trace("data='{}'",
>>>>> DatatypeConverter.printHexBinary(data));
>>>>> >
>>>>> > Schema schema = targetType.newInstance().getSchema();
>>>>> > DatumReader datumReader = new
>>>>> SpecificDatumReader<>(schema);
>>>>> > Decoder decoder = useBinaryDecoder
>>>>> > ? DecoderFactory.get().binaryDecoder(data, null)
>>>>> > : DecoderFactory.get().jsonDecoder(schema, new
>>>>> String(data));
>>>>> >
>>>>> > T result = targetType.cast(datumReader.read(null,
>>>>> decoder));
>>>>> >     log.trace("deserialized data='{}'", result);
>>>>> > return result;
>>>>> > } catch (Exception ex) {
>>>>> > throw new SerializationException("Error deserializing
>>>>> data", ex);
>>>>> > }
>>>>> > }
>>>>> >
>>>>> > serializer:
>>>>> > public static  byte[] serialize(T
>>>>> data, boolean useBinaryDecoder, boolean pretty) {
>>>>> > try {
>>>>> > if (data == null) {
>>>>> > return new byte[0];
>>>>> > }
>>>>> >
>>>>> > log.debug("data='{}'", data);
>>>>> > Schema schema = data.getSchema();
>>>>> > ByteArrayOutputStream byteArrayOutputStream = new
>>>>> ByteArrayOutputStream();
>>>>> > Encoder binaryEncoder = useBinaryDecoder
>>>>> > ?
>>>>> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
>>>>> > : EncoderFactory.get().jsonEncoder(schema,
>>>>> byteArrayOutputStream, pretty);
>>>>> >
>>>>> > DatumWriter datumWriter = new
>>>>> GenericDatumWriter<>(schema);
>>>>> > datumWriter.write(data, binaryEncoder);
>>>>> >
>>>>> > binaryEncoder.flush();
>>>>> > byteArrayOutputStream.close();
>>>>> >
>>>>> > byte[] result = byteArrayOutputStream.toByteArray();
>>>>> > log.debug("serialized data='{}'",
>>>>> DatatypeConverter.printHexBinary(result));
>>>>> > return result;
>>>>> > } catch (IOException ex) {
>>>>> > throw new SerializationException(
>>>>> > "Can't serialize data='" + data, ex);
>>>>> > }
>>>>> > }
>>>>> >
>>>>> > út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba 
>>>>> napsal:
>>>>> >>
>>>>> >> Hello!  Schema evolution relies on both the writer and reader
>>>>> schemas
>>>>> >> being available.
>>>>> >>
>>>>> >> It looks like the allegro tool you are using is using the
>>>>> >> GenericDatumReader that assumes the reader and writer schema are the
>>>>> >> same:
>>>>> >>
>>>>> >>
>>>>> https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83
>>>>> >>
>>>>> >> I do 

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Martin Mucha
>>>> > } catch (Exception ex) {
>>>> > throw new SerializationException("Error deserializing
>>>> data", ex);
>>>> > }
>>>> > }
>>>> >
>>>> > serializer:
>>>> > public static  byte[] serialize(T data,
>>>> boolean useBinaryDecoder, boolean pretty) {
>>>> > try {
>>>> > if (data == null) {
>>>> > return new byte[0];
>>>> > }
>>>> >
>>>> > log.debug("data='{}'", data);
>>>> > Schema schema = data.getSchema();
>>>> > ByteArrayOutputStream byteArrayOutputStream = new
>>>> ByteArrayOutputStream();
>>>> > Encoder binaryEncoder = useBinaryDecoder
>>>> > ?
>>>> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
>>>> > : EncoderFactory.get().jsonEncoder(schema,
>>>> byteArrayOutputStream, pretty);
>>>> >
>>>> > DatumWriter datumWriter = new
>>>> GenericDatumWriter<>(schema);
>>>> > datumWriter.write(data, binaryEncoder);
>>>> >
>>>> > binaryEncoder.flush();
>>>> > byteArrayOutputStream.close();
>>>> >
>>>> > byte[] result = byteArrayOutputStream.toByteArray();
>>>> > log.debug("serialized data='{}'",
>>>> DatatypeConverter.printHexBinary(result));
>>>> > return result;
>>>> > } catch (IOException ex) {
>>>> > throw new SerializationException(
>>>> > "Can't serialize data='" + data, ex);
>>>> > }
>>>> > }
>>>> >
>>>> > út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba 
>>>> napsal:
>>>> >>
>>>> >> Hello!  Schema evolution relies on both the writer and reader schemas
>>>> >> being available.
>>>> >>
>>>> >> It looks like the allegro tool you are using is using the
>>>> >> GenericDatumReader that assumes the reader and writer schema are the
>>>> >> same:
>>>> >>
>>>> >>
>>>> https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83
>>>> >>
>>>> >> I do not believe that the "default" value is taken into account for
>>>> >> data that is strictly missing from the binary input, just when a
>>>> field
>>>> >> is known to be in the reader schema but missing from the original
>>>> >> writer.
>>>> >>
>>>> >> You may have more luck reading the GenericRecord with a
>>>> >> GenericDatumReader with both schemas, and using the
>>>> >> `convertToJson(record)`.
>>>> >>
>>>> >> I hope this is useful -- Ryan
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha 
>>>> wrote:
>>>> >> >
>>>> >> > Hi,
>>>> >> >
>>>> >> > I've got some issues/misunderstanding of AVRO schema evolution.
>>>> >> >
>>>> >> > When reading through avro documentation, for example [1], I
>>>> understood, that schema evolution is supported, and if I added column with
>>>> specified default, it should be backwards compatible (and even forward when
>>>> I remove it again). Sounds great, so I added column defined as:
>>>> >> >
>>>> >> > {
>>>> >> >   "name": "newColumn",
>>>> >> >   "type": ["null","string"],
>>>> >> >   "default": null,
>>>> >> >   "doc": "something wrong"
>>>> >> > }
>>>> >> >
>>>> >> > and try to consumer some topic having this schema from beginning,
>>>> it fails with message:
>>>>

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Svante Karlsson
2019 at 3:30 PM Martin Mucha  wrote:
>>> >
>>> > Thanks for answer.
>>> >
>>> > Actually I have exactly the same behavior with avro 1.9.0 and
>>> following deserializer in our other app, which uses strictly avro codebase,
>>> and failing with same exceptions. So lets leave "allegro" library and lots
>>> of other tools out of it in our discussion.
>>> > I can use whichever aproach. All I need is single way, where I can
>>> deserialize byte[] into class generated by avro-maven-plugin, and which
>>> will respect documentation regarding schema evolution. Currently we're
>>> using following deserializer and serializer, and these does not work when
>>> it comes to schema evolution. What is the correct way to serialize and
>>> deserializer avro data?
>>> >
>>> > I probably don't understand your mention about GenericRecord or
>>> GenericDatumReader. I tried to use GenericDatumReader in deserializer
>>> below, but then it seems I got back just GenericData$Record instance, which
>>> I can use then to access array of instances, which is not what I'm looking
>>> for(IIUC), since in that case I could have just use plain old JSON and
>>> deserialize it using jackson having no schema evolution problems at all. If
>>> that's correct, I'd rather stick to SpecificDatumReader, and somehow fix it
>>> if possible.
>>> >
>>> > What can be done? Or how schema evolution is intended to be used? I
>>> found a lots of question searching for this answer.
>>> >
>>> > thanks!
>>> > Martin.
>>> >
>>> > deserializer:
>>> >
>>> > public static  T deserialize(Class
>>> targetType,
>>> >byte[]
>>> data,
>>> >boolean
>>> useBinaryDecoder) {
>>> > try {
>>> > if (data == null) {
>>> > return null;
>>> > }
>>> >
>>> > log.trace("data='{}'",
>>> DatatypeConverter.printHexBinary(data));
>>> >
>>> > Schema schema = targetType.newInstance().getSchema();
>>> > DatumReader datumReader = new
>>> SpecificDatumReader<>(schema);
>>> > Decoder decoder = useBinaryDecoder
>>> > ? DecoderFactory.get().binaryDecoder(data, null)
>>> > : DecoderFactory.get().jsonDecoder(schema, new
>>> String(data));
>>> >
>>> > T result = targetType.cast(datumReader.read(null,
>>> decoder));
>>> > log.trace("deserialized data='{}'", result);
>>> > return result;
>>> > } catch (Exception ex) {
>>> > throw new SerializationException("Error deserializing
>>> data", ex);
>>> > }
>>> > }
>>> >
>>> > serializer:
>>> > public static  byte[] serialize(T data,
>>> boolean useBinaryDecoder, boolean pretty) {
>>> > try {
>>> > if (data == null) {
>>> > return new byte[0];
>>> > }
>>> >
>>> > log.debug("data='{}'", data);
>>> > Schema schema = data.getSchema();
>>> > ByteArrayOutputStream byteArrayOutputStream = new
>>> ByteArrayOutputStream();
>>> > Encoder binaryEncoder = useBinaryDecoder
>>> > ?
>>> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
>>> > : EncoderFactory.get().jsonEncoder(schema,
>>> byteArrayOutputStream, pretty);
>>> >
>>> > DatumWriter datumWriter = new
>>> GenericDatumWriter<>(schema);
>>> > datumWriter.write(data, binaryEncoder);
>>> >
>>> > binaryEncoder.flush();
>>> > byteArrayOutputStream.close();
>>> >
>>> > byte[] result = byteArrayOutputStream.toByteArray();
>>> > log.debug("serialized data='{}'",
>>> DatatypeConverter.printHexBinary(result));
>>> > return result;
>>> > } catch (IOException ex) {
>>> >  

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Martin Mucha
>> >
>> > I probably don't understand your mention about GenericRecord or
>> GenericDatumReader. I tried to use GenericDatumReader in deserializer
>> below, but then it seems I got back just GenericData$Record instance, which
>> I can use then to access array of instances, which is not what I'm looking
>> for(IIUC), since in that case I could have just use plain old JSON and
>> deserialize it using jackson having no schema evolution problems at all. If
>> that's correct, I'd rather stick to SpecificDatumReader, and somehow fix it
>> if possible.
>> >
>> > What can be done? Or how schema evolution is intended to be used? I
>> found a lots of question searching for this answer.
>> >
>> > thanks!
>> > Martin.
>> >
>> > deserializer:
>> >
>> > public static  T deserialize(Class
>> targetType,
>> >byte[]
>> data,
>> >boolean
>> useBinaryDecoder) {
>> > try {
>> > if (data == null) {
>> > return null;
>> > }
>> >
>> > log.trace("data='{}'",
>> DatatypeConverter.printHexBinary(data));
>> >
>> > Schema schema = targetType.newInstance().getSchema();
>> > DatumReader datumReader = new
>> SpecificDatumReader<>(schema);
>> > Decoder decoder = useBinaryDecoder
>> > ? DecoderFactory.get().binaryDecoder(data, null)
>> > : DecoderFactory.get().jsonDecoder(schema, new
>> String(data));
>> >
>> > T result = targetType.cast(datumReader.read(null, decoder));
>> > log.trace("deserialized data='{}'", result);
>> > return result;
>> > } catch (Exception ex) {
>> > throw new SerializationException("Error deserializing
>> data", ex);
>> > }
>> > }
>> >
>> > serializer:
>> > public static  byte[] serialize(T data,
>> boolean useBinaryDecoder, boolean pretty) {
>> > try {
>> > if (data == null) {
>> > return new byte[0];
>> > }
>> >
>> > log.debug("data='{}'", data);
>> > Schema schema = data.getSchema();
>> > ByteArrayOutputStream byteArrayOutputStream = new
>> ByteArrayOutputStream();
>> > Encoder binaryEncoder = useBinaryDecoder
>> > ?
>> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
>> > : EncoderFactory.get().jsonEncoder(schema,
>> byteArrayOutputStream, pretty);
>> >
>> > DatumWriter datumWriter = new
>> GenericDatumWriter<>(schema);
>> > datumWriter.write(data, binaryEncoder);
>> >
>> > binaryEncoder.flush();
>> >     byteArrayOutputStream.close();
>> >
>> > byte[] result = byteArrayOutputStream.toByteArray();
>> > log.debug("serialized data='{}'",
>> DatatypeConverter.printHexBinary(result));
>> > return result;
>> > } catch (IOException ex) {
>> > throw new SerializationException(
>> > "Can't serialize data='" + data, ex);
>> > }
>> > }
>> >
>> > út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba  napsal:
>> >>
>> >> Hello!  Schema evolution relies on both the writer and reader schemas
>> >> being available.
>> >>
>> >> It looks like the allegro tool you are using is using the
>> >> GenericDatumReader that assumes the reader and writer schema are the
>> >> same:
>> >>
>> >>
>> https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83
>> >>
>> >> I do not believe that the "default" value is taken into account for
>> >> data that is strictly missing from the binary input, just when a field
>> >> is known to be in the reader schema but missing from the original
>> >> writer.
>> >>
>> >> You may have more luck reading the GenericRecord with

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-07-30 Thread Martin Mucha
ntHexBinary(data));
> >
> > Schema schema = targetType.newInstance().getSchema();
> > DatumReader datumReader = new
> SpecificDatumReader<>(schema);
> > Decoder decoder = useBinaryDecoder
> > ? DecoderFactory.get().binaryDecoder(data, null)
> > : DecoderFactory.get().jsonDecoder(schema, new
> String(data));
> >
> > T result = targetType.cast(datumReader.read(null, decoder));
> > log.trace("deserialized data='{}'", result);
> > return result;
> > } catch (Exception ex) {
> > throw new SerializationException("Error deserializing data",
> ex);
> > }
> > }
> >
> > serializer:
> > public static  byte[] serialize(T data,
> boolean useBinaryDecoder, boolean pretty) {
> > try {
> > if (data == null) {
> > return new byte[0];
> > }
> >
> > log.debug("data='{}'", data);
> > Schema schema = data.getSchema();
> > ByteArrayOutputStream byteArrayOutputStream = new
> ByteArrayOutputStream();
> > Encoder binaryEncoder = useBinaryDecoder
> > ?
> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
> > : EncoderFactory.get().jsonEncoder(schema,
> byteArrayOutputStream, pretty);
> >
> > DatumWriter datumWriter = new
> GenericDatumWriter<>(schema);
> > datumWriter.write(data, binaryEncoder);
> >
> > binaryEncoder.flush();
> > byteArrayOutputStream.close();
> >
> > byte[] result = byteArrayOutputStream.toByteArray();
> > log.debug("serialized data='{}'",
> DatatypeConverter.printHexBinary(result));
> > return result;
> > } catch (IOException ex) {
> > throw new SerializationException(
> > "Can't serialize data='" + data, ex);
> > }
> > }
> >
> > út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba  napsal:
> >>
> >> Hello!  Schema evolution relies on both the writer and reader schemas
> >> being available.
> >>
> >> It looks like the allegro tool you are using is using the
> >> GenericDatumReader that assumes the reader and writer schema are the
> >> same:
> >>
> >>
> https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83
> >>
> >> I do not believe that the "default" value is taken into account for
> >> data that is strictly missing from the binary input, just when a field
> >> is known to be in the reader schema but missing from the original
> >> writer.
> >>
> >> You may have more luck reading the GenericRecord with a
> >> GenericDatumReader with both schemas, and using the
> >> `convertToJson(record)`.
> >>
> >> I hope this is useful -- Ryan
> >>
> >>
> >>
> >> On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha 
> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I've got some issues/misunderstanding of AVRO schema evolution.
> >> >
> >> > When reading through avro documentation, for example [1], I
> understood, that schema evolution is supported, and if I added column with
> specified default, it should be backwards compatible (and even forward when
> I remove it again). Sounds great, so I added column defined as:
> >> >
> >> > {
> >> >   "name": "newColumn",
> >> >   "type": ["null","string"],
> >> >   "default": null,
> >> >   "doc": "something wrong"
> >> > }
> >> >
> >> > and try to consumer some topic having this schema from beginning, it
> fails with message:
> >> >
> >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
> >> > at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424)
> >> > at org.apache.avro.io
> .ResolvingDecoder.doAction(ResolvingDecoder.java:290)
> >> > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >> > at org.apache.avro.io
> .ResolvingDecoder.readIndex(Resolving

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-07-30 Thread Ryan Skraba
> throw new SerializationException("Error deserializing data", ex);
> }
> }
>
> serializer:
> public static  byte[] serialize(T data, boolean 
> useBinaryDecoder, boolean pretty) {
> try {
> if (data == null) {
> return new byte[0];
> }
>
> log.debug("data='{}'", data);
> Schema schema = data.getSchema();
> ByteArrayOutputStream byteArrayOutputStream = new 
> ByteArrayOutputStream();
> Encoder binaryEncoder = useBinaryDecoder
> ? 
> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
> : EncoderFactory.get().jsonEncoder(schema, 
> byteArrayOutputStream, pretty);
>
> DatumWriter datumWriter = new 
> GenericDatumWriter<>(schema);
> datumWriter.write(data, binaryEncoder);
>
> binaryEncoder.flush();
> byteArrayOutputStream.close();
>
> byte[] result = byteArrayOutputStream.toByteArray();
> log.debug("serialized data='{}'", 
> DatatypeConverter.printHexBinary(result));
> return result;
> } catch (IOException ex) {
> throw new SerializationException(
> "Can't serialize data='" + data, ex);
> }
> }
>
> út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba  napsal:
>>
>> Hello!  Schema evolution relies on both the writer and reader schemas
>> being available.
>>
>> It looks like the allegro tool you are using is using the
>> GenericDatumReader that assumes the reader and writer schema are the
>> same:
>>
>> https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83
>>
>> I do not believe that the "default" value is taken into account for
>> data that is strictly missing from the binary input, just when a field
>> is known to be in the reader schema but missing from the original
>> writer.
>>
>> You may have more luck reading the GenericRecord with a
>> GenericDatumReader with both schemas, and using the
>> `convertToJson(record)`.
>>
>> I hope this is useful -- Ryan
>>
>>
>>
>> On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha  wrote:
>> >
>> > Hi,
>> >
>> > I've got some issues/misunderstanding of AVRO schema evolution.
>> >
>> > When reading through avro documentation, for example [1], I understood, 
>> > that schema evolution is supported, and if I added column with specified 
>> > default, it should be backwards compatible (and even forward when I remove 
>> > it again). Sounds great, so I added column defined as:
>> >
>> > {
>> >   "name": "newColumn",
>> >   "type": ["null","string"],
>> >   "default": null,
>> >   "doc": "something wrong"
>> > }
>> >
>> > and try to consumer some topic having this schema from beginning, it fails 
>> > with message:
>> >
>> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
>> > at 
>> > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424)
>> > at 
>> > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
>> > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>> > at 
>> > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>> > at 
>> > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>> > at 
>> 

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-07-30 Thread Martin Mucha
Thanks for answer.

Actually I have exactly the same behavior with avro 1.9.0 and following
deserializer in our other app, which uses strictly avro codebase, and
failing with same exceptions. So lets leave "allegro" library and lots of
other tools out of it in our discussion.
I can use whichever aproach. All I need is single way, where I can
deserialize byte[] into class generated by avro-maven-plugin, and which
will respect documentation regarding schema evolution. Currently we're
using following deserializer and serializer, and these does not work when
it comes to schema evolution. What is the correct way to serialize and
deserializer avro data?

I probably don't understand your mention about GenericRecord or
GenericDatumReader. I tried to use GenericDatumReader in deserializer
below, but then it seems I got back just GenericData$Record instance, which
I can use then to access array of instances, which is not what I'm looking
for(IIUC), since in that case I could have just use plain old JSON and
deserialize it using jackson having no schema evolution problems at all. If
that's correct, I'd rather stick to SpecificDatumReader, and somehow fix it
if possible.

What can be done? Or how schema evolution is intended to be used? I found a
lots of question searching for this answer.

thanks!
Martin.

deserializer:

public static  T deserialize(Class
targetType,
   byte[] data,
   boolean
useBinaryDecoder) {
try {
if (data == null) {
return null;
}

log.trace("data='{}'", DatatypeConverter.printHexBinary(data));

Schema schema = targetType.newInstance().getSchema();
DatumReader datumReader = new
SpecificDatumReader<>(schema);
Decoder decoder = useBinaryDecoder
? DecoderFactory.get().binaryDecoder(data, null)
: DecoderFactory.get().jsonDecoder(schema, new
String(data));

T result = targetType.cast(datumReader.read(null, decoder));
log.trace("deserialized data='{}'", result);
return result;
} catch (Exception ex) {
throw new SerializationException("Error deserializing data",
ex);
}
}

serializer:
public static  byte[] serialize(T data,
boolean useBinaryDecoder, boolean pretty) {
try {
if (data == null) {
return new byte[0];
}

log.debug("data='{}'", data);
Schema schema = data.getSchema();
ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
Encoder binaryEncoder = useBinaryDecoder
?
EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
: EncoderFactory.get().jsonEncoder(schema,
byteArrayOutputStream, pretty);

DatumWriter datumWriter = new
GenericDatumWriter<>(schema);
datumWriter.write(data, binaryEncoder);

binaryEncoder.flush();
byteArrayOutputStream.close();

byte[] result = byteArrayOutputStream.toByteArray();
log.debug("serialized data='{}'",
DatatypeConverter.printHexBinary(result));
return result;
} catch (IOException ex) {
throw new SerializationException(
"Can't serialize data='" + data, ex);
}
}

út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba  napsal:

> Hello!  Schema evolution relies on both the writer and reader schemas
> being available.
>
> It looks like the allegro tool you are using is using the
> GenericDatumReader that assumes the reader and writer schema are the
> same:
>
>
> https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83
>
> I do not believe that the "default" value is taken into account for
> data that is strictly missing from the binary input, just when a field
> is known to be in the reader schema but missing from the original
> writer.
>
> You may have more luck reading the GenericRecord with a
> GenericDatumReader with both schemas, and using the
> `convertToJson(record)`.
>
> I hope this is useful -- Ryan
>
>
>
> On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha  wrote:
> >
> > Hi,
> >
> > I've got some issues/misunderstanding of AVRO schema evolution.
> >
> > When reading through avro documentation, for example [1], I understood,
> that schema evolution is supported, and if I added column with specified
> default, it should be backwards compatible (and even forward when I remove
> it again). Sounds great, so I added column defined as:
> &g

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-07-30 Thread Ryan Skraba
Hello!  Schema evolution relies on both the writer and reader schemas
being available.

It looks like the allegro tool you are using is using the
GenericDatumReader that assumes the reader and writer schema are the
same:

https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83

I do not believe that the "default" value is taken into account for
data that is strictly missing from the binary input, just when a field
is known to be in the reader schema but missing from the original
writer.

You may have more luck reading the GenericRecord with a
GenericDatumReader with both schemas, and using the
`convertToJson(record)`.

I hope this is useful -- Ryan



On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha  wrote:
>
> Hi,
>
> I've got some issues/misunderstanding of AVRO schema evolution.
>
> When reading through avro documentation, for example [1], I understood, that 
> schema evolution is supported, and if I added column with specified default, 
> it should be backwards compatible (and even forward when I remove it again). 
> Sounds great, so I added column defined as:
>
> {
>   "name": "newColumn",
>   "type": ["null","string"],
>   "default": null,
>   "doc": "something wrong"
> }
>
> and try to consumer some topic having this schema from beginning, it fails 
> with message:
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
> at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424)
> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
> at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
> at 
> tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83)
> to give a little bit more information. Avro schema defines one top level 
> type, having 2 fields. String describing type of message, and union of N 
> types. All N-1, non-modified types can be read, but one updated with 
> optional, default-having column cannot be read. I'm not sure if this design 
> is strictly speaking correct, but that's not the point (feel free to 
> criticise and recommend better approach!). I'm after schema evolution, which 
> seems not to be working.
>
>
> And if we alter type definition to:
>
> "type": "string",
> "default": ""
> it still does not work and generated error is:
>
> Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length is 
> negative: -1
> at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
> at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
> at 
> org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
> at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
> at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
> at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(Generic

Validate JSON data against an Avro schema

2018-10-03 Thread Lukas Michelbacher
I want to use Avro to validate data in JSON objects against a schema.

My expectation was that the schema validation process covers the
following scenarios with appropriate error messages:

1. Required field X is missing in the data. Error message something
like "field X not found"
2. Field X has the wrong type. Error message something like "field X
expected String, found Integer"
3. Field Y is in the data but it's not mentioned in the schema. Error
message something like "Unexpected field found: Y"

With the code below, I found that only scenario 1 works as I expected.
Scenario 2 gets a somewhat helpful error message and scenario 3 is not
a failure at all.

Is there anything wrong with my approach?

Lukas

// validation method
void validate(ObjectNode node) {
Schema schema = SchemaBuilder
.record("test")
.fields()
.requiredString("testField")
.endRecord();

String nodeAsString = node.toString();
DatumReader datumReader = new GenericDatumReader<>(schema);
datumReader.read(null, getDecoder(schema, nodeAsString));
}

// scenarios
JsonNodeFactory factory = JsonNodeFactory.instance;

// 1. Required field missing
ObjectNode node = factory.objectNode()
node.put("xyz", "foo");

validate(node) // Result: "Expected field name not found: testField"

// 2. Required field has wrong type
ObjectNode node = factory.objectNode()
node.put("testField", 1);
validate(node) // Result: "Expected string. Got VALUE_NUMBER_INT" (The
name of the field that has the wrong type is not part of the message
which is less helpful if there are multiple fields)

// 3. Extraneous field
ObjectNode node = factory.objectNode()
node.put("testField", "foo");
node.put("xyz", "foo");
validate(node) // There is no error even though the specified JSON
object contains data that the schema does not define


Re: Avro Schema with Large Number of fields AVRO-1642

2018-09-06 Thread chinchu chinchu
Hi All,
We have an avro schema that has around 2k fields .When we do not use a
nested structure(nested record types)  we run into  jvm specification
issues related to method size and number of parameters .My preference is to
keep the avro schema completely flattened without any nested types .How
ever the avro maven plugin generates classes that violates the jvm spec.Is
there a way to over come this scenario so truely flattened avro schemas can
be used.

Thanks,
Chinchu

On Tue, Sep 4, 2018 at 9:54 AM chinchu chinchu 
wrote:

> Hi All,
> We have an avro schema that has around 2k fields .When we do not use a
> nested structure(nested record types)  we run into  jvm specification
> issues related to method size and number of parameters .My preference is to
> keep the avro schema completely flattened without any nested types .How
> ever the avro maven plugin generates classes that violates the jvm spec.Is
> there a way to over come this scenario so truely flattened avro schemas can
> be used.
>
> Thanks,
> Chinchu
>


Avro Schema with Large Number of fields AVRO-1642

2018-09-04 Thread chinchu chinchu
Hi All,
We have an avro schema that has around 2k fields .When we do not use a
nested structure(nested record types)  we run into  jvm specification
issues related to method size and number of parameters .My preference is to
keep the avro schema completely flattened without any nested types .How
ever the avro maven plugin generates classes that violates the jvm spec.Is
there a way to over come this scenario so truely flattened avro schemas can
be used.

Thanks,
Chinchu


Re: Avro Schema Question

2018-05-29 Thread Motoko Kusanagi
Hi Elliot,

Thanks for that bit of info. It is helpful. Where do you draw the line between 
complex unions versus simple unions? In other words, what criteria do you use 
to say this union is too complex?

Thanks,

Scott

From: Elliot West 
Sent: Saturday, May 26, 2018 1:58 AM
To: user@avro.apache.org
Subject: Re: Avro Schema Question

A word of caution on the union type. You may find support for unions very 
patchy if you are hoping to process records using well known data processing 
engines. We’ve been unable to usefully read union types in both Apache Spark 
and Hive for example. The simple null union construct is the exception: [null, 
typeA], as it is usually represented by a nullable columns of typeA. We’ve 
resorted to prohibiting schemas with complex unions so that our producers can’t 
create data that is not fully readable by our consumers.

Elliot.

On Fri, 25 May 2018 at 22:30, Motoko Kusanagi 
mailto:major-motoko-kusan...@outlook.com>> 
wrote:
Hi Michael,

Thanks!! Yes, it does.

Scott

From: Michael Smith mailto:micha...@syapse.com>>
Sent: Friday, May 25, 2018 2:21 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Subject: Re: Avro Schema Question

{"type": "int"}, {"type": "string"} is not valid json, so you definitely can't 
do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a 
single value that is either an int or a string. At the highest level, your 
schema can only be one type, but that type may be (and in fact probably will 
be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi 
mailto:major-motoko-kusan...@outlook.com>> 
wrote:

Hi,


I read the specification multiple times. In the specification, it says "A 
Schema is represented in JSON<http://www.json.org/> by one of:" in the Schema 
Declaration section. The "one" confuses me as I am interpreting it as exactly 
one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record },

{type : record }, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want 
complex schema? Or does "one of" mean I have to choose one and exactly one for 
the high top-most level of the schema?


Thanks!!



--


Michael A. Smith — Senior Systems Engineer



micha...@syapse.com<mailto:micha...@syapse.com>
syapse.com
<http://www.syapse.com/>100 Matsonford 
Road<https://maps.google.com/?q=100+Matsonford+Rd=gmail=g>
Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


[https://lh3.googleusercontent.com/8OwE1TeaqeIeUgpNi5sD9LKfc0Zl8IoENh1w5JbTbmluiHFjMqEPDL_Fl-0ulgaUPxTKEXoYlY2GIdVBSHaqLihzqQCLtJR-gwZWJt9ri0rHgb7rn0hKtqYv5m9iVMdjIUv4xlOx]



Re: Avro Schema Question

2018-05-26 Thread Elliot West
A word of caution on the union type. You may find support for unions very
patchy if you are hoping to process records using well known data
processing engines. We’ve been unable to usefully read union types in both
Apache Spark and Hive for example. The simple null union construct is the
exception: [null, typeA], as it is usually represented by a nullable
columns of typeA. We’ve resorted to prohibiting schemas with complex unions
so that our producers can’t create data that is not fully readable by our
consumers.

Elliot.

On Fri, 25 May 2018 at 22:30, Motoko Kusanagi <
major-motoko-kusan...@outlook.com> wrote:

> Hi Michael,
>
> Thanks!! Yes, it does.
>
> Scott
> --
> *From:* Michael Smith <micha...@syapse.com>
> *Sent:* Friday, May 25, 2018 2:21 PM
> *To:* user@avro.apache.org
> *Subject:* Re: Avro Schema Question
>
> {"type": "int"}, {"type": "string"} is not valid json, so you definitely
> can't do that. But
>
> [{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a
> single value that is either an int or a string. At the highest level, your
> schema can only be one type, but that type may be (and in fact probably
> will be) a complex type -- a union of records or a single record.
>
> Does that answer your question?
>
> On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <
> major-motoko-kusan...@outlook.com> wrote:
>
> Hi,
>
>
> I read the specification multiple times. In the specification, it says "A
> Schema is represented in JSON <http://www.json.org/> by one of:" in the
> Schema Declaration section. The "one" confuses me as I am interpreting it
> as exactly one of the 3 that it listed.
>
>
> In short, can I do this as a single schema?
>
> {type : int},
>
> {type : string},
>
> {type : int},
>
>
> Or do the following as a single schema?
>
> {type : int},
>
> {type : record },
>
> {type : record }, // Not the same as the previous.
>
> {type : string},
>
>
> Or do I have to "embed" the above under a complex type like a record if I
> want complex schema? Or does "one of" mean I have to choose one and exactly
> one for the high top-most level of the schema?
>
>
> Thanks!!
>
>
>
> --
>
> Michael A. Smith — Senior Systems Engineer
> --
>
> micha...@syapse.com
> syapse.com
> <http://www.syapse.com/>100 Matsonford Road
> <https://maps.google.com/?q=100+Matsonford+Rd=gmail=g>
> Five Radnor Corporate Center
> Suite 444
> Radnor, PA 19087
> https://www.linkedin.com/in/michaelalexandersmith
>
>


Re: Avro Schema Question

2018-05-25 Thread Motoko Kusanagi
Hi Michael,

Thanks!! Yes, it does.

Scott

From: Michael Smith <micha...@syapse.com>
Sent: Friday, May 25, 2018 2:21 PM
To: user@avro.apache.org
Subject: Re: Avro Schema Question

{"type": "int"}, {"type": "string"} is not valid json, so you definitely can't 
do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a 
single value that is either an int or a string. At the highest level, your 
schema can only be one type, but that type may be (and in fact probably will 
be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi 
<major-motoko-kusan...@outlook.com<mailto:major-motoko-kusan...@outlook.com>> 
wrote:

Hi,


I read the specification multiple times. In the specification, it says "A 
Schema is represented in JSON<http://www.json.org/> by one of:" in the Schema 
Declaration section. The "one" confuses me as I am interpreting it as exactly 
one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record },

{type : record }, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want 
complex schema? Or does "one of" mean I have to choose one and exactly one for 
the high top-most level of the schema?


Thanks!!



--


Michael A. Smith — Senior Systems Engineer



micha...@syapse.com<mailto:micha...@syapse.com>
syapse.com
<http://www.syapse.com/>100 Matsonford 
Road<https://maps.google.com/?q=100+Matsonford+Rd=gmail=g>
Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


[https://lh3.googleusercontent.com/8OwE1TeaqeIeUgpNi5sD9LKfc0Zl8IoENh1w5JbTbmluiHFjMqEPDL_Fl-0ulgaUPxTKEXoYlY2GIdVBSHaqLihzqQCLtJR-gwZWJt9ri0rHgb7rn0hKtqYv5m9iVMdjIUv4xlOx]



Re: Avro Schema Question

2018-05-25 Thread Michael Smith
{"type": "int"}, {"type": "string"} is not valid json, so you definitely
can't do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a
single value that is either an int or a string. At the highest level, your
schema can only be one type, but that type may be (and in fact probably
will be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <
major-motoko-kusan...@outlook.com> wrote:

> Hi,
>
>
> I read the specification multiple times. In the specification, it says "A
> Schema is represented in JSON  by one of:" in the
> Schema Declaration section. The "one" confuses me as I am interpreting it
> as exactly one of the 3 that it listed.
>
>
> In short, can I do this as a single schema?
>
> {type : int},
>
> {type : string},
>
> {type : int},
>
>
> Or do the following as a single schema?
>
> {type : int},
>
> {type : record },
>
> {type : record }, // Not the same as the previous.
>
> {type : string},
>
>
> Or do I have to "embed" the above under a complex type like a record if I
> want complex schema? Or does "one of" mean I have to choose one and exactly
> one for the high top-most level of the schema?
>
>
> Thanks!!
>
>
>
> --

Michael A. Smith — Senior Systems Engineer
--

micha...@syapse.com
syapse.com
100 Matsonford Road

Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


Avro Schema Question

2018-05-25 Thread Motoko Kusanagi
Hi,


I read the specification multiple times. In the specification, it says "A 
Schema is represented in JSON by one of:" in the Schema 
Declaration section. The "one" confuses me as I am interpreting it as exactly 
one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record },

{type : record }, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want 
complex schema? Or does "one of" mean I have to choose one and exactly one for 
the high top-most level of the schema?


Thanks!!




Modify an existing Avro schema

2017-09-28 Thread FIXED-TERM Sonnentag Paul (CR/PJ-AI-S1)
I'm implementing an application which reads in structured data with an Avro 
schema applies some dynamically configurable transformations and outputs the 
data with Avro again. The problem I have is that for some transformations I 
need to modify the Avro schema. One transform could be for example that I read 
a value from a field apply some function to the value and write it back to a 
new field. In this scenario I need to add the new field to the output schema. I 
haven't found a really good way to do this with Avro. What I'm doing right now 
is reading all the fields from the old schema, create a new schema and copy all 
the fields over to this new schema:

Nicely formatted version: 
https://gist.github.com/paulsonnentag/201b8a13cba8ba91e384240cf26c63f1

// ...

// creating a new schema with the fields of the old schema added plus the new 
fields

val schema = // ... the schema of the input data

var newSchema = SchemaBuilder
  .builder(schema.getNamespace)
  .record(schema.getName)
  .fields()

// create new schema with existing fields from schemas and new fields which 
are created through transforms
val fields = schema.getFields ++ getNewFields(schema, transforms)

fields
  .foldLeft(newSchema)((newSchema, field: Schema.Field) => {
newSchema
  .name(field.name)
  .`type`(field.schema())
  .noDefault()
  // TODO: find way to differentiate between explicitly set null 
defaults and fields which have no default
  //.withDefault(field.defaultValue())
  })

 newSchema.endRecord()
   }



// ...



// create new fields like this

new Schema.Field(

  "addedField",

  Schema.createUnion(List(

Schema.create(Schema.Type.STRING),

Schema.create(Schema.Type.NULL)

  )),

  null,

  null

)

Any ideas how this could be done in a way that doesn't feel so hacky?

Thanks,
Paul


Re: Avro schema properties contention on multithread read

2017-07-08 Thread Zoltan Farkas
The order of attributes in Json might matter as far as I can remember, so 
LinkedHashMap might not be replaceable with a concurrenthashmap.
Plus concurrenthashmap is not exactly without concurrency overhead…
I wrote a util that creates a immutable schema 
https://github.com/zolyfarkas/spf4j/blob/master/spf4j-avro/src/main/java/org/spf4j/avro/schema/Schemas.java#L26
 
<https://github.com/zolyfarkas/spf4j/blob/master/spf4j-avro/src/main/java/org/spf4j/avro/schema/Schemas.java#L26>
 
But you would have to use it it conjunction with a unsynchronized avro 
implementation. (which I do in my fork, and you can do as well).

I wonder if there is interest in merging this into the avro lib someday.

—Z


> On Jul 6, 2017, at 12:20 PM, f...@legsem.com wrote:
> 
> On 05.07.2017 21:53, Zoltan Farkas wrote:
> 
>> The synchronization in JsonProperties is curently inconsistent (see 
>> getObjectProps()) which makes current implementation @NotThreadSafe
>>  
>> I think it would be probably best to remove synchronization from those 
>> methods... and add @NotThreadSafe to the class...
>> Utilities like Schemas.synchronizedSchema(...) and 
>> Schemas.unmodifiableSchema(...) could be added to help with various use 
>> cases...
>>  
>>  
>> —Z
>>  
> Thank you for your reply. I like your Schemas.unmodifiableSchema(...) a lot.
> 
> While what you are describing would be ideal, a simpler solution might be to 
> change the LinkedHashMap that backs jsonProperties into something like a 
> ConcurrentHashMap, avoiding the need for synchronization.
> 
> This being said ConcurrentHashMap itself does not preserve insertion order, 
> so its not a mere replacement to LinkedHashMap.
> 



Re: Avro schema properties contention on multithread read

2017-07-06 Thread fady
On 05.07.2017 21:53, Zoltan Farkas wrote:

> The synchronization in JsonProperties is curently inconsistent (see 
> getObjectProps()) which makes current implementation @NotThreadSafe 
> 
> I think it would be probably best to remove synchronization from those 
> methods... and add @NotThreadSafe to the class... 
> Utilities like Schemas.synchronizedSchema(...) and 
> Schemas.unmodifiableSchema(...) could be added to help with various use 
> cases... 
> 
> --Z

Thank you for your reply. I like your Schemas.unmodifiableSchema(...) a
lot. 

While what you are describing would be ideal, a simpler solution might
be to change the LinkedHashMap that backs jsonProperties into something
like a ConcurrentHashMap, avoiding the need for synchronization. 

This being said ConcurrentHashMap itself does not preserve insertion
order, so its not a mere replacement to LinkedHashMap.

Re: Avro schema properties contention on multithread read

2017-07-05 Thread Zoltan Farkas
The synchronization in JsonProperties is curently inconsistent (see 
getObjectProps()) which makes current implementation @NotThreadSafe

I think it would be probably best to remove synchronization from those methods… 
and add @NotThreadSafe to the class…
Utilities like Schemas.synchronizedSchema(…) and Schemas.unmodifiableSchema(…) 
could be added to help with various use cases…


—Z


> On Jun 29, 2017, at 2:21 AM, f...@legsem.com wrote:
> 
> Hello,
> 
> We are using Avro Schema properties and while running concurrent tests, we 
> noticed a lot of contentions on org.apache.avro.JsonProperties#getJsonProp.
> 
> In the attached screen shot, we have 4 concurrent threads all sharing the 
> same avro schema and reading from it simultaneously.
> 
> On this screen shot each red period is a contention between threads. Most of 
> these contentions are on getJsonProp.
> 
> This is due to getJsonProp being a synchronized method.
> 
> We have tried avro 1.7.7, 1.8.1 and 1.8.2. All have this problem (getJsonProp 
> is deprecated in 1.8 but the replacement method is also synchronized).
> 
> We can work around this by not sharing the avro schemas between threads 
> (using ThreadLocal for instance) but this is ugly.
> 
> It seems that avro schemas are mostly immutable, which is great for 
> multithread read access, but it turns out Properties within these schemas are 
> mutable and, since they are stored in a LinkedHashMap, synchronization is 
> necessary.
> 
> Anyone having a similar issue?
> 
> Thank you



Re: Is this a valid Avro schema?

2016-09-02 Thread Sean Busbey
The schemas are fine, but the JSON snippet isn't a valid instance of
the second schema.

In the default JSON encoding for Avro, you have to include the name of
the record as an object field[1].

For example, given test_schema_0.avsc with your first schema and
test_schema_1.avsc as your second, here are random example instances:

$ java -jar avro-tools-1.9.0-SNAPSHOT.jar random --count 1
--schema-file test_schema_0.avsc schema_0_random.avro
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
test.seed=1472871710806
$ java -jar avro-tools-1.9.0-SNAPSHOT.jar tojson --pretty schema_0_random.avro
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
{
  "name" : "msbvsjefb",
  "id" : 5742171927645279316
}
$ java -jar avro-tools-1.9.0-SNAPSHOT.jar random --count 1
--schema-file test_schema_1.avsc schema_1_random.avro
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
test.seed=1472871721099
$ java -jar avro-tools-1.9.0-SNAPSHOT.jar tojson --pretty schema_1_random.avro
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
{
  "com.user.user_record" : {
"name" : "ljfijs",
"id" : -7695450471550075616
  }
}



[1]: http://avro.apache.org/docs/current/spec.html#json_encoding

On Fri, Sep 2, 2016 at 5:01 PM, Kamesh Kompella  wrote:
> Hi there,
>  First, please look at the following schema
>
> {"name": "user_record",
>   "namespace": "com.user",
>   "type": "record",
>   "fields" : [
> {"name": "name", "type": "string"},
> {"name": "id", "type": "long"}
>   ]}
>
> and the following JSON:
>
> {"name": “Foo", “id": 42}
>
>
> When I run avro-tools with the option fromjson, I get a .avro file. Stuff
> works.
>
> If I enclose the schema above into array as shown below (I bolded the array
> begin and end in red for clarity), avro-tools (version 1.8.1) throws the
> following exception and dies.
>
>
> [{"name": "user_record",
>   "namespace": "com.user",
>   "type": "record",
>   "fields" : [
> {"name": "name", "type": "string"},
> {"name": "id", "type": "long"}
>   ]}]
>
> I get the following exception:
>
> Exception in thread "main" org.apache.avro.AvroTypeException: Unknown union
> branch name at
> org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
>
> Does it make sense to enclose a schema into array? Is this a bug in
> avro-tools or is this an invalid schema? The exception above seems to
> indicate that a schema file may not begin with a JSON array of schemas.
>
> The documentation seems to indicate schema may be defined as union of other
> other schemas.
>
> I cloned the code base and I could not locate a single instance of avsc file
> in it that defined its schema as a JSON array. Hence, the question.
>
> I appreciate your response.
>
> Regards
> Kamesh



-- 
busbey


Is this a valid Avro schema?

2016-09-02 Thread Kamesh Kompella
Hi there,
 First, please look at the following schema 

{"name": "user_record",
  "namespace": "com.user",
  "type": "record",
  "fields" : [ 
{"name": "name", "type": "string"},
{"name": "id", "type": "long"}
  ]}

and the following JSON:  

{"name": “Foo", “id": 42}

When I run avro-tools with the option fromjson, I get a .avro file. Stuff works.

If I enclose the schema above into array as shown below (I bolded the array 
begin and end in red for clarity), avro-tools (version 1.8.1) throws the 
following exception and dies. 


[{"name": "user_record",
  "namespace": "com.user",
  "type": "record",
  "fields" : [ 
{"name": "name", "type": "string"},
{"name": "id", "type": "long"}
  ]}]

I get the following exception: 

Exception in thread "main" org.apache.avro.AvroTypeException: Unknown union 
branch name at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)

Does it make sense to enclose a schema into array? Is this a bug in avro-tools 
or is this an invalid schema? The exception above seems to indicate that a 
schema file may not begin with a JSON array of schemas. 

The documentation seems to indicate schema may be defined as union of other 
other schemas.

I cloned the code base and I could not locate a single instance of avsc file in 
it that defined its schema as a JSON array. Hence, the question.

I appreciate your response.

Regards
Kamesh

Trouble Serializing Record from a multi-Record Avro Schema using C++

2016-06-17 Thread Balajee R.C
I am getting a runtime exception when serializing an Avro record generated
from JSON schema containing multiple records in a C++ program.

I was able to reproduce this problem with both avrocpp-1.7.7 aas well as
avrocpp-1.8.1 (latest stable release).

Here is my schema:

[
> {
>   "type" : "record",
>   "name" : "Point",
>   "doc:" : "Struct holding x, y, z values",
>   "namespace": "Geometry",
>   "fields" : [
>   {
> "name": "x",
> "type": "double"
>   },
>   {
> "name": "y",
> "type": "double"
>   },
>   {
> "name": "z",
> "type": "double"
>   }
>   ]
> },
> {
>   "type" : "record",
>   "name" : "Polygon",
>   "doc:" : "A collection of points and edges",
>   "namespace": "Geometry",
>   "fields" : [
>   {
> "name": "points",
> "type": ["Point"],
> "doc": "Points within this polygon"
>   }
>   ]
> }
> ]


 Here is my cpp code:

#include 
> #include 
> #include 
> #include "schema.h"
> int main()
> {
> avro::ValidSchema schema;
> std::string error;
>
> std::ifstream inputJson("schema.json");
> try
> {
> avro::compileJsonSchema(inputJson, schema);
> }
> catch(std::exception& err)
> {
> std::cout << "Error while parsing schema: " << err.what() <<
> std::endl;
> exit(1);
> }
> std::cout << "Found valid JSON. Proceeding with encoding/decoding
> test" << std::endl;
> std::stringstream output;
> std::auto_ptr out =
> avro::ostreamOutputStream(output);
> avro::EncoderPtr e = avro::jsonEncoder(schema);
> e->init(*out);
> Geometry::Point pt;
> pt.x = 1.0;
> pt.y = 2.13;
> pt.z = 3.2;
> avro::encode(*e, pt);
> e->flush();
> std::cout << "Encoded: " << std::endl  << output.str() << std::endl;
> std::auto_ptr in = avro::istreamInputStream(output);
> avro::DecoderPtr d = avro::jsonDecoder(schema);
> d->init(*in);
>
> Geometry::Point c2;
> avro::decode(*d, c2);
> std::cout << '(' << c2.x << ", " << c2.y << ')' << std::endl;
> std::cout << "Done." << std::endl;
> return 0;
> }


 This is how I am generating C++ header from schema:

avrogencpp -i schema.json -o schema.h -n Geometry


Here is the output I get in my console on running the program:

Found valid JSON. Proceeding with encoding/decoding test
> terminate called after throwing an instance of 'avro::Exception'
>   what():  Invalid operation. Expected: Double got Union
> Aborted (core dumped)


I get the same exception if I use a "nested" schema:

[
> {
>   "type" : "record",
>   "name" : "Polygon",
>   "doc:" : "A collection of points and edges",
>   "namespace": "Geometry",
>   "fields" : [
> {
>   "name": "points",
>   "type": ["null", {
> "type" : "record",
> "name" : "Point",
> "doc:" : "Struct holding x, y, z values",
> "namespace": "Geometry",
> "fields" : [
> {
>   "name": "x",
>   "type": "double"
> },
> {
>   "name": "y",
>   "type": "double"
> },
> {
>   "name": "z",
>   "type": "double"
> }
> ]
>   }],
>   "doc": "Points within this polygon"
> }
>   ]
> }
> ]


And yet, the program runs fine if I use a schema with ONLY the Point record
defined, and leaving out the Polygon record:

{
>   "type" : "record",
>   "name" : "Point",
>   "doc:" : "Struct holding x, y, z values",
>   "namespace": "Geometry",
>   "fields" : [
>   {
> "name": "x",
> "type": "double"
>   },
>   {
> "name": "y",
> "type": "double"
>   },
>   {
> "name": "z",
> "type": "double"
>   }
>   ]
> }


With the above schema containing only Point record, I get the correct
output:

Found valid JSON. Proceeding with encoding/decoding test
> Encoded:
> {"x":1,"y":2.13,"z":3.2}
> (1, 2.13)
> Done.


Can someone point out whether the above issue I am facing is because of
something I am doing wrong? If not, I can file a bug report for the same.

Other details:
OS: Linux (Ubuntu 14.04, x86_64)
Boost version that avro was built with: 1.56
g++ version: 4.8.4

Regards,
Balajee.R.C


How to define byte[] and LocalDateTime in avro schema?

2016-04-13 Thread Ratha v
Hi all;
Im new to Avro schema. I try to publish/consumer my java objects using
kafka.

I have java bean classes, which contains fields with LocalDateTime and
byte[] . How can I define both in avro schema primitive types?What is the
best primitive type i can use for LocalDateTime?

private LocalDateTime timestamp;
private byte[] content;

I defined something like this; but getting



* {"name": "content", "type": "bytes"},*

Class cast exception[1]

[1]
Caused by: java.lang.ClassCastException: [B cannot be cast to
java.nio.ByteBuffer at
org.apache.avro.generic.GenericDatumWriter.writeBytes(GenericDatumWriter.java:219)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:77)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)


--
-Ratha
http://vvratha.blogspot.com/


RE: Want to Add New Column in Avro Schema

2016-03-23 Thread Lunagariya, Dhaval
Thanks guys.

Just updated the .avsc and it’s done. No need to recreated the table again.

Regards,
Dhaval

From: Maulik Gandhi [mailto:mmg...@gmail.com]
Sent: Wednesday, March 23, 2016 7:05 PM
To: user
Cc: er.dcpa...@gmail.com
Subject: Re: Want to Add New Column in Avro Schema

Create table DDL looks right to me.

How are you updating avro.schema.url ?

Thanks.
- Maulik

On Wed, Mar 23, 2016 at 8:29 AM, Lunagariya, Dhaval 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>> wrote:
Here is the DDL.

DROP TABLE IF EXISTS TEST;

CREATE EXTERNAL TABLE TEST
PARTITIONED BY (
COL1 STRING,
COL2 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///data/hive/TEST'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc');

Thanks,
Dhaval

From: Aaron.Dossett 
[mailto:aaron.doss...@target.com<mailto:aaron.doss...@target.com>]
Sent: Wednesday, March 23, 2016 6:50 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Cc: 'er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>'
Subject: Re: Want to Add New Column in Avro Schema

You shouldn’t have to drop the table, just update the .avsc.  Can you share the 
DDL you use to create the table?

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 8:17 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: RE: Want to Add New Column in Avro Schema

Yes. I made require changes in .avsc file and I drop the table and re-created 
using updated .avsc. But I am not getting existing data in that case.

Where am I wrong? Can you through some light

Thanks,
Dhaval

From: Aaron.Dossett [mailto:aaron.doss...@target.com]
Sent: Wednesday, March 23, 2016 6:36 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Cc: 'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'
Subject: Re: Want to Add New Column in Avro Schema

If you create the external table by reference to the .avsc file (TBLPROPERTIES 
( 'avro.schema.url’=‘hdfs://foo.avsc')) the all you have to do is update that 
avsc file in a compatible way and Hive should reflect the new schema.  I’ve 
implemented this pattern in my production system for several months now.

-Aaron

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 6:32 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: Want to Add New Column in Avro Schema

Hey folks,

I want to add new column in existing Hive Table. We created external hive table 
with the help of .avsc. Now I want to add new column in that table.

How can I do that without disturbing any data present in table?

Please Help.

Regards,
Dhaval




Re: Want to Add New Column in Avro Schema

2016-03-23 Thread Maulik Gandhi
Create table DDL looks right to me.

How are you updating *avro.schema.url* ?

Thanks.
- Maulik

On Wed, Mar 23, 2016 at 8:29 AM, Lunagariya, Dhaval <
dhaval.lunagar...@citi.com> wrote:

> Here is the DDL.
>
>
>
> DROP TABLE IF EXISTS TEST;
>
>
>
> CREATE EXTERNAL TABLE TEST
>
> PARTITIONED BY (
>
> COL1 STRING,
>
> COL2 STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>
> STORED AS
>
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>
> LOCATION 'hdfs:///data/hive/TEST'
>
> TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc');
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com]
> *Sent:* Wednesday, March 23, 2016 6:50 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> You shouldn’t have to drop the table, just update the .avsc.  Can you
> share the DDL you use to create the table?
>
>
>
> *From: *"Lunagariya, Dhaval" <dhaval.lunagar...@citi.com>
> *Reply-To: *"user@avro.apache.org" <user@avro.apache.org>
> *Date: *Wednesday, March 23, 2016 at 8:17 AM
> *To: *"user@avro.apache.org" <user@avro.apache.org>
> *Cc: *"'er.dcpa...@gmail.com'" <er.dcpa...@gmail.com>
> *Subject: *RE: Want to Add New Column in Avro Schema
>
>
>
> Yes. I made require changes in .avsc file and I drop the table and
> re-created using updated .avsc. But I am not getting existing data in that
> case.
>
>
>
> Where am I wrong? Can you through some light
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com
> <aaron.doss...@target.com>]
> *Sent:* Wednesday, March 23, 2016 6:36 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> If you create the external table by reference to the .avsc file
> (TBLPROPERTIES ( 'avro.schema.url’=‘hdfs://foo.avsc')) the all you have to
> do is update that avsc file in a compatible way and Hive should reflect the
> new schema.  I’ve implemented this pattern in my production system for
> several months now.
>
>
>
> -Aaron
>
>
>
> *From: *"Lunagariya, Dhaval" <dhaval.lunagar...@citi.com>
> *Reply-To: *"user@avro.apache.org" <user@avro.apache.org>
> *Date: *Wednesday, March 23, 2016 at 6:32 AM
> *To: *"user@avro.apache.org" <user@avro.apache.org>
> *Cc: *"'er.dcpa...@gmail.com'" <er.dcpa...@gmail.com>
> *Subject: *Want to Add New Column in Avro Schema
>
>
>
> Hey folks,
>
>
>
> I want to add new column in existing Hive Table. We created external hive
> table with the help of .avsc. Now I want to add new column in that table.
>
>
>
> How can I do that without disturbing any data present in table?
>
>
>
> Please Help.
>
>
>
> Regards,
>
> Dhaval
>
>
>


Re: Want to Add New Column in Avro Schema

2016-03-23 Thread Maulik Gandhi
You can try describe tableName; and see if the new added column appears in
Hive table.

Thanks.
- Maulik


On Wed, Mar 23, 2016 at 8:38 AM, Aaron.Dossett <aaron.doss...@target.com>
wrote:

> And what happens if you simply update the .avsc file on HDFS?  Does
> ‘describe test’ show the new columns?
>
> From: "Lunagariya, Dhaval" <dhaval.lunagar...@citi.com>
> Reply-To: "user@avro.apache.org" <user@avro.apache.org>
> Date: Wednesday, March 23, 2016 at 8:29 AM
>
> To: "user@avro.apache.org" <user@avro.apache.org>
> Cc: "'er.dcpa...@gmail.com'" <er.dcpa...@gmail.com>
> Subject: RE: Want to Add New Column in Avro Schema
>
> Here is the DDL.
>
>
>
> DROP TABLE IF EXISTS TEST;
>
>
>
> CREATE EXTERNAL TABLE TEST
>
> PARTITIONED BY (
>
> COL1 STRING,
>
> COL2 STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>
> STORED AS
>
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>
> LOCATION 'hdfs:///data/hive/TEST'
>
> TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc');
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com
> <aaron.doss...@target.com>]
> *Sent:* Wednesday, March 23, 2016 6:50 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> You shouldn’t have to drop the table, just update the .avsc.  Can you
> share the DDL you use to create the table?
>
>
>
> *From: *"Lunagariya, Dhaval" <dhaval.lunagar...@citi.com>
> *Reply-To: *"user@avro.apache.org" <user@avro.apache.org>
> *Date: *Wednesday, March 23, 2016 at 8:17 AM
> *To: *"user@avro.apache.org" <user@avro.apache.org>
> *Cc: *"'er.dcpa...@gmail.com'" <er.dcpa...@gmail.com>
> *Subject: *RE: Want to Add New Column in Avro Schema
>
>
>
> Yes. I made require changes in .avsc file and I drop the table and
> re-created using updated .avsc. But I am not getting existing data in that
> case.
>
>
>
> Where am I wrong? Can you through some light
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com
> <aaron.doss...@target.com>]
> *Sent:* Wednesday, March 23, 2016 6:36 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> If you create the external table by reference to the .avsc file
> (TBLPROPERTIES ( 'avro.schema.url’=‘hdfs://foo.avsc')) the all you have to
> do is update that avsc file in a compatible way and Hive should reflect the
> new schema.  I’ve implemented this pattern in my production system for
> several months now.
>
>
>
> -Aaron
>
>
>
> *From: *"Lunagariya, Dhaval" <dhaval.lunagar...@citi.com>
> *Reply-To: *"user@avro.apache.org" <user@avro.apache.org>
> *Date: *Wednesday, March 23, 2016 at 6:32 AM
> *To: *"user@avro.apache.org" <user@avro.apache.org>
> *Cc: *"'er.dcpa...@gmail.com'" <er.dcpa...@gmail.com>
> *Subject: *Want to Add New Column in Avro Schema
>
>
>
> Hey folks,
>
>
>
> I want to add new column in existing Hive Table. We created external hive
> table with the help of .avsc. Now I want to add new column in that table.
>
>
>
> How can I do that without disturbing any data present in table?
>
>
>
> Please Help.
>
>
>
> Regards,
>
> Dhaval
>
>
>


Re: Want to Add New Column in Avro Schema

2016-03-23 Thread Aaron . Dossett
And what happens if you simply update the .avsc file on HDFS?  Does 'describe 
test' show the new columns?

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 8:29 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: RE: Want to Add New Column in Avro Schema

Here is the DDL.

DROP TABLE IF EXISTS TEST;

CREATE EXTERNAL TABLE TEST
PARTITIONED BY (
COL1 STRING,
COL2 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///data/hive/TEST'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc');

Thanks,
Dhaval

From: Aaron.Dossett [mailto:aaron.doss...@target.com]
Sent: Wednesday, March 23, 2016 6:50 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Cc: 'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'
Subject: Re: Want to Add New Column in Avro Schema

You shouldn't have to drop the table, just update the .avsc.  Can you share the 
DDL you use to create the table?

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 8:17 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: RE: Want to Add New Column in Avro Schema

Yes. I made require changes in .avsc file and I drop the table and re-created 
using updated .avsc. But I am not getting existing data in that case.

Where am I wrong? Can you through some light

Thanks,
Dhaval

From: Aaron.Dossett [mailto:aaron.doss...@target.com]
Sent: Wednesday, March 23, 2016 6:36 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Cc: 'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'
Subject: Re: Want to Add New Column in Avro Schema

If you create the external table by reference to the .avsc file (TBLPROPERTIES 
( 'avro.schema.url'='hdfs://foo.avsc')) the all you have to do is update that 
avsc file in a compatible way and Hive should reflect the new schema.  I've 
implemented this pattern in my production system for several months now.

-Aaron

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 6:32 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: Want to Add New Column in Avro Schema

Hey folks,

I want to add new column in existing Hive Table. We created external hive table 
with the help of .avsc. Now I want to add new column in that table.

How can I do that without disturbing any data present in table?

Please Help.

Regards,
Dhaval



RE: Want to Add New Column in Avro Schema

2016-03-23 Thread Lunagariya, Dhaval
Here is the DDL.

DROP TABLE IF EXISTS TEST;

CREATE EXTERNAL TABLE TEST
PARTITIONED BY (
COL1 STRING,
COL2 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///data/hive/TEST'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc');

Thanks,
Dhaval

From: Aaron.Dossett [mailto:aaron.doss...@target.com]
Sent: Wednesday, March 23, 2016 6:50 PM
To: user@avro.apache.org
Cc: 'er.dcpa...@gmail.com'
Subject: Re: Want to Add New Column in Avro Schema

You shouldn't have to drop the table, just update the .avsc.  Can you share the 
DDL you use to create the table?

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 8:17 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: RE: Want to Add New Column in Avro Schema

Yes. I made require changes in .avsc file and I drop the table and re-created 
using updated .avsc. But I am not getting existing data in that case.

Where am I wrong? Can you through some light

Thanks,
Dhaval

From: Aaron.Dossett [mailto:aaron.doss...@target.com]
Sent: Wednesday, March 23, 2016 6:36 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Cc: 'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'
Subject: Re: Want to Add New Column in Avro Schema

If you create the external table by reference to the .avsc file (TBLPROPERTIES 
( 'avro.schema.url'='hdfs://foo.avsc')) the all you have to do is update that 
avsc file in a compatible way and Hive should reflect the new schema.  I've 
implemented this pattern in my production system for several months now.

-Aaron

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 6:32 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: Want to Add New Column in Avro Schema

Hey folks,

I want to add new column in existing Hive Table. We created external hive table 
with the help of .avsc. Now I want to add new column in that table.

How can I do that without disturbing any data present in table?

Please Help.

Regards,
Dhaval



RE: Want to Add New Column in Avro Schema

2016-03-23 Thread Lunagariya, Dhaval
Yes. I made require changes in .avsc file and I drop the table and re-created 
using updated .avsc. But I am not getting existing data in that case.

Where am I wrong? Can you through some light

Thanks,
Dhaval

From: Aaron.Dossett [mailto:aaron.doss...@target.com]
Sent: Wednesday, March 23, 2016 6:36 PM
To: user@avro.apache.org
Cc: 'er.dcpa...@gmail.com'
Subject: Re: Want to Add New Column in Avro Schema

If you create the external table by reference to the .avsc file (TBLPROPERTIES 
( 'avro.schema.url'='hdfs://foo.avsc')) the all you have to do is update that 
avsc file in a compatible way and Hive should reflect the new schema.  I've 
implemented this pattern in my production system for several months now.

-Aaron

From: "Lunagariya, Dhaval" 
<dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>>
Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Date: Wednesday, March 23, 2016 at 6:32 AM
To: "user@avro.apache.org<mailto:user@avro.apache.org>" 
<user@avro.apache.org<mailto:user@avro.apache.org>>
Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" 
<er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>>
Subject: Want to Add New Column in Avro Schema

Hey folks,

I want to add new column in existing Hive Table. We created external hive table 
with the help of .avsc. Now I want to add new column in that table.

How can I do that without disturbing any data present in table?

Please Help.

Regards,
Dhaval



Want to Add New Column in Avro Schema

2016-03-23 Thread Lunagariya, Dhaval
Hey folks,

I want to add new column in existing Hive Table. We created external hive table 
with the help of .avsc. Now I want to add new column in that table.

How can I do that without disturbing any data present in table?

Please Help.

Regards,
Dhaval



Re: Avro schema doesn't honor backward compatibilty

2016-02-02 Thread Raghvendra Singh
Hi Ryan

Thanks for your answer. Here is what i am doing in my environment

1. Write the data using the old schema

*SpecificDatumWriter datumWriter = new
SpecificDatumWriter<>(SCHEMA_V1)*

2. Now trying to read the data written by the old schema using the new
schema

*DatumReader payloadReader = new SpecificDatumReader<>(*
*SCHEMA_V2**)*

In this case *SCHEMA_V1 *is the old schema which doesn't have the field
while SCHEMA_V2 is the new one which has the extra field.

Your suggestion *"You should run setSchema on your SpecificDatumReader to
set the schema the data was written with"*  is kind of work around where i
have to read the data with the schema it was written with and hence this is
not exactly backward compatible. Note that if i do this then i have to
maintain all the schemas while reading and somehow know which version the
data was written with and hence this will make schema evolution pretty
painful.

Please let me know if i didn't understand your email correctly or their is
something i missed.

-raghu

On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <b...@cloudera.com> wrote:

> Hi Raghvendra,
>
> It looks like the problem is that you're using the new schema in place of
> the schema that the data was written with.  You should run setSchema on
> your SpecificDatumReader to set the schema the data was written with.
>
> What's happening is that the schema you're using, the new one, has the new
> field so Avro assumes it is present and tries to read it. By setting the
> schema that the data was actually written with, the datum reader will know
> that it isn't present and will use your default instead. When you read data
> encoded with the new schema, you need to use it as the written schema
> instead so the datum reader knows that the field should be read.
>
> Does that make sense?
>
> rb
>
> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>
>> down votefavorite
>> <
>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>> >
>>
>>
>> I have this avro schema
>>
>> {
>>   "namespace": "xx..x.x",
>>   "type": "record",
>>   "name": "MyPayLoad",
>>   "fields": [
>>   {"name": "filed1",  "type": "string"},
>>   {"name": "filed2", "type": "long"},
>>   {"name": "filed3",  "type": "boolean"},
>>   {
>>"name" : "metrics",
>>"type":
>>{
>>   "type" : "array",
>>   "items":
>>   {
>>   "name": "MyRecord",
>>   "type": "record",
>>   "fields" :
>>   [
>> {"name": "min", "type": "long"},
>> {"name": "max", "type": "long"},
>> {"name": "sum", "type": "long"},
>> {"name": "count", "type": "long"}
>>   ]
>>   }
>>}
>>   }
>>]}
>>
>> Here is the code which we use to parse the data
>>
>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>  DatumReader payloadReader = new
>> SpecificDatumReader<>(MyPayLoad.class);
>>  Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>> null);
>>  MyPayLoad myPayLoad = null;
>>  try {
>>  myPayLoad = payloadReader.read(null, decoder);
>>  } catch (IOException e) {
>>  logger.log(Level.SEVERE, e.getMessage(), e);
>>  }
>>
>>  return myPayLoad;
>>  }
>>
>> Now i want to add one more field int the schema so the schema looks like
>> below
>>
>>   {
>>   "namespace": "xx..x.x",
>>   "type": "record",
>>   "name": "MyPayLoad",
>>   "fields": [
>>   {"name": "filed1",  "type": "string"},
>>   {"name": "filed2", "type": "long"},
>>   {"name": "filed3",  "type": "bool

Re: Avro schema doesn't honor backward compatibilty

2016-02-02 Thread Raghvendra Singh
Great, Thank you very much guys, this works. Very much appreciated.

On Tue, Feb 2, 2016 at 12:46 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Raghvendra,
>
> You need to use
>
> *DatumReader payloadReader = new
> SpecificDatumReader<>(SCHEMA_V1, **SCHEMA_V2**)*
>
> So you provide both writer(SCHEMA_V1) and reader(SCHMEA_V2) to avro. In
> your current case avro is assuming both to be the same which is certainly
> not the case and hence it is failing. I think this is what Ryan was
> referring to as well.
>
> Hope that helps.
>
>
>
> On Tue, Feb 2, 2016 at 1:44 PM, Raghvendra Singh <rsi...@appdynamics.com>
> wrote:
>
>> Hi Ryan
>>
>> Thanks for your answer. Here is what i am doing in my environment
>>
>> 1. Write the data using the old schema
>>
>> *SpecificDatumWriter datumWriter = new
>> SpecificDatumWriter<>(SCHEMA_V1)*
>>
>> 2. Now trying to read the data written by the old schema using the new
>> schema
>>
>> *DatumReader payloadReader = new
>> SpecificDatumReader<>(**SCHEMA_V2**)*
>>
>> In this case *SCHEMA_V1 *is the old schema which doesn't have the field
>> while SCHEMA_V2 is the new one which has the extra field.
>>
>> Your suggestion *"You should run setSchema on your SpecificDatumReader
>> to set the schema the data was written with"*  is kind of work around
>> where i have to read the data with the schema it was written with and hence
>> this is not exactly backward compatible. Note that if i do this then i have
>> to maintain all the schemas while reading and somehow know which version
>> the data was written with and hence this will make schema evolution pretty
>> painful.
>>
>> Please let me know if i didn't understand your email correctly or their
>> is something i missed.
>>
>> -raghu
>>
>> On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <b...@cloudera.com> wrote:
>>
>>> Hi Raghvendra,
>>>
>>> It looks like the problem is that you're using the new schema in place
>>> of the schema that the data was written with.  You should run setSchema on
>>> your SpecificDatumReader to set the schema the data was written with.
>>>
>>> What's happening is that the schema you're using, the new one, has the
>>> new field so Avro assumes it is present and tries to read it. By setting
>>> the schema that the data was actually written with, the datum reader will
>>> know that it isn't present and will use your default instead. When you read
>>> data encoded with the new schema, you need to use it as the written schema
>>> instead so the datum reader knows that the field should be read.
>>>
>>> Does that make sense?
>>>
>>> rb
>>>
>>> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>>>
>>>> down votefavorite
>>>> <
>>>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>>>> >
>>>>
>>>>
>>>> I have this avro schema
>>>>
>>>> {
>>>>   "namespace": "xx..x.x",
>>>>   "type": "record",
>>>>   "name": "MyPayLoad",
>>>>   "fields": [
>>>>   {"name": "filed1",  "type": "string"},
>>>>   {"name": "filed2", "type": "long"},
>>>>   {"name": "filed3",  "type": "boolean"},
>>>>   {
>>>>"name" : "metrics",
>>>>"type":
>>>>{
>>>>   "type" : "array",
>>>>   "items":
>>>>   {
>>>>   "name": "MyRecord",
>>>>   "type": "record",
>>>>   "fields" :
>>>>   [
>>>> {"name": "min", "type": "long"},
>>>> {"name": "max", "type": "long"},
>>>> {"name": "sum", "type": "long"},
>>>> {"name": "count", "type": "long"}
>>>&g

Re: Avro schema doesn't honor backward compatibilty

2016-02-02 Thread Ryan Blue

Hi Raghvendra,

It looks like the problem is that you're using the new schema in place 
of the schema that the data was written with.  You should run setSchema 
on your SpecificDatumReader to set the schema the data was written with.


What's happening is that the schema you're using, the new one, has the 
new field so Avro assumes it is present and tries to read it. By setting 
the schema that the data was actually written with, the datum reader 
will know that it isn't present and will use your default instead. When 
you read data encoded with the new schema, you need to use it as the 
written schema instead so the datum reader knows that the field should 
be read.


Does that make sense?

rb

On 02/01/2016 12:31 PM, Raghvendra Singh wrote:

down votefavorite
<http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>

I have this avro schema

{
  "namespace": "xx..x.x",
  "type": "record",
  "name": "MyPayLoad",
  "fields": [
  {"name": "filed1",  "type": "string"},
  {"name": "filed2", "type": "long"},
  {"name": "filed3",  "type": "boolean"},
  {
   "name" : "metrics",
   "type":
   {
  "type" : "array",
  "items":
  {
  "name": "MyRecord",
  "type": "record",
  "fields" :
  [
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
  ]
  }
   }
  }
   ]}

Here is the code which we use to parse the data

public static final MyPayLoad parseBinaryPayload(byte[] payload) {
 DatumReader payloadReader = new
SpecificDatumReader<>(MyPayLoad.class);
 Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
 MyPayLoad myPayLoad = null;
 try {
 myPayLoad = payloadReader.read(null, decoder);
 } catch (IOException e) {
 logger.log(Level.SEVERE, e.getMessage(), e);
 }

 return myPayLoad;
 }

Now i want to add one more field int the schema so the schema looks like
below

  {
  "namespace": "xx..x.x",
  "type": "record",
  "name": "MyPayLoad",
  "fields": [
  {"name": "filed1",  "type": "string"},
  {"name": "filed2", "type": "long"},
  {"name": "filed3",  "type": "boolean"},
  {
   "name" : "metrics",
   "type":
   {
  "type" : "array",
  "items":
  {
  "name": "MyRecord",
  "type": "record",
  "fields" :
  [
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
  ]
  }
   }
  }
  {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
   ]}

Note the filed added and also the default is defined. The problem is that
if we receive the data which was written using the older schema i get this
error

java.io.EOFException: null
 at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
~[avro-1.7.4.jar:1.7.4]
 at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
~[avro-1.7.4.jar:1.7.4]
 at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
~[avro-1.7.4.jar:1.7.4]
 at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
~[avro-1.7.4.jar:1.7.4]
 at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
~[avro-1.7.4.jar:1.7.4]
 at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
~[avro-1.7.4.jar:1.7.4]
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
~[avro-1.7.4.jar:1.7.4]
 at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
~[avro-1.7.4.jar:1.7.4]
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
~[avro-1.7.4.jar:1.7.4]
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
~[avro-1.7.4.jar:1.7.4]
 at 
com.appdynamics.blitz.shared.util.X.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
~[blitz-shared.jar:na]

What i understood from this
<https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html>
document
that this should have been backward compatible but somehow that doesn't
seem to be the case. Any idea what i am doing wrong?




--
Ryan Blue
Software Engineer
Cloudera, Inc.


Avro schema doesn't honor backward compatibilty

2016-02-01 Thread Raghvendra Singh
down votefavorite
<http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>

I have this avro schema

{
 "namespace": "xx..x.x",
 "type": "record",
 "name": "MyPayLoad",
 "fields": [
 {"name": "filed1",  "type": "string"},
 {"name": "filed2", "type": "long"},
 {"name": "filed3",  "type": "boolean"},
 {
  "name" : "metrics",
  "type":
  {
 "type" : "array",
 "items":
 {
 "name": "MyRecord",
 "type": "record",
 "fields" :
 [
   {"name": "min", "type": "long"},
   {"name": "max", "type": "long"},
   {"name": "sum", "type": "long"},
   {"name": "count", "type": "long"}
 ]
 }
  }
 }
  ]}

Here is the code which we use to parse the data

public static final MyPayLoad parseBinaryPayload(byte[] payload) {
DatumReader payloadReader = new
SpecificDatumReader<>(MyPayLoad.class);
Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
MyPayLoad myPayLoad = null;
try {
myPayLoad = payloadReader.read(null, decoder);
} catch (IOException e) {
logger.log(Level.SEVERE, e.getMessage(), e);
}

return myPayLoad;
}

Now i want to add one more field int the schema so the schema looks like
below

 {
 "namespace": "xx..x.x",
 "type": "record",
 "name": "MyPayLoad",
 "fields": [
 {"name": "filed1",  "type": "string"},
 {"name": "filed2", "type": "long"},
 {"name": "filed3",  "type": "boolean"},
 {
  "name" : "metrics",
  "type":
  {
 "type" : "array",
 "items":
 {
 "name": "MyRecord",
 "type": "record",
 "fields" :
 [
   {"name": "min", "type": "long"},
   {"name": "max", "type": "long"},
   {"name": "sum", "type": "long"},
   {"name": "count", "type": "long"}
 ]
 }
  }
 }
 {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
  ]}

Note the filed added and also the default is defined. The problem is that
if we receive the data which was written using the older schema i get this
error

java.io.EOFException: null
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
~[avro-1.7.4.jar:1.7.4]
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
~[avro-1.7.4.jar:1.7.4]
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
~[avro-1.7.4.jar:1.7.4]
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
~[avro-1.7.4.jar:1.7.4]
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
~[avro-1.7.4.jar:1.7.4]
at 
com.appdynamics.blitz.shared.util.X.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
~[blitz-shared.jar:na]

What i understood from this
<https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html>
document
that this should have been backward compatible but somehow that doesn't
seem to be the case. Any idea what i am doing wrong?


Re: Avro schema doesn't honor backward compatibilty

2016-02-01 Thread Raghvendra Singh
Thanks Prajwal

I tried what you suggested but i still get the same error.



On Mon, Feb 1, 2016 at 2:05 PM, Prajwal Tuladhar <p...@infynyxx.com> wrote:

> Hi,
>
> I think your usage of default for field "agentType" is invalid here.
>
> When generating code from invalid schema, it tends to fail:
>
> [INFO]
>> [INFO] --- avro-maven-plugin:1.7.6-cdh5.4.4:schema (default) @ test-app
>> ---
>> [WARNING] Avro: Invalid default for field agentType: "APP_AGENT" not a
>> ["null","string"]
>
>
> Try:
>
> {
>>  "namespace": "xx..x.x",
>>  "type": "record",
>>  "name": "MyPayLoad",
>>  "fields": [
>>  {"name": "filed1",  "type": "string"},
>>  {"name": "filed2", "type": "long"},
>>  {"name": "filed3",  "type": "boolean"},
>>  {
>>   "name" : "metrics",
>>   "type":
>>   {
>>  "type" : "array",
>>  "items":
>>  {
>>  "name": "MyRecord",
>>  "type": "record",
>>  "fields" :
>>  [
>>{"name": "min", "type": "long"},
>>        {"name": "max", "type": "long"},
>>{"name": "sum", "type": "long"},
>>{"name": "count", "type": "long"}
>>  ]
>>  }
>>   }
>>  },
>>  {"name": "agentType",  "type": ["null", "string"], "default": null}
>>   ]
>> }
>
>
>
>
>
> On Mon, Feb 1, 2016 at 8:31 PM, Raghvendra Singh <rsi...@appdynamics.com>
> wrote:
>
>>
>>
>> down votefavorite
>> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>>
>> I have this avro schema
>>
>> {
>>  "namespace": "xx..x.x",
>>  "type": "record",
>>  "name": "MyPayLoad",
>>  "fields": [
>>  {"name": "filed1",  "type": "string"},
>>  {"name": "filed2", "type": "long"},
>>  {"name": "filed3",  "type": "boolean"},
>>  {
>>   "name" : "metrics",
>>   "type":
>>   {
>>  "type" : "array",
>>  "items":
>>  {
>>  "name": "MyRecord",
>>  "type": "record",
>>  "fields" :
>>  [
>>{"name": "min", "type": "long"},
>>{"name": "max", "type": "long"},
>>{"name": "sum", "type": "long"},
>>{"name": "count", "type": "long"}
>>  ]
>>  }
>>   }
>>  }
>>   ]}
>>
>> Here is the code which we use to parse the data
>>
>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>> DatumReader payloadReader = new 
>> SpecificDatumReader<>(MyPayLoad.class);
>> Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
>> MyPayLoad myPayLoad = null;
>> try {
>> myPayLoad = payloadReader.read(null, decoder);
>> } catch (IOException e) {
>> logger.log(Level.SEVERE, e.getMessage(), e);
>> }
>>
>> return myPayLoad;
>> }
>>
>> Now i want to add one more field int the schema so the schema looks like
>> below
>>
>>  {
>>  "namespace": "xx..x.x",
>>  "type": "record",
>>  "name"

Re: Avro schema doesn't honor backward compatibilty

2016-02-01 Thread Prajwal Tuladhar
Hi,

I think your usage of default for field "agentType" is invalid here.

When generating code from invalid schema, it tends to fail:

[INFO]
> [INFO] --- avro-maven-plugin:1.7.6-cdh5.4.4:schema (default) @ test-app ---
> [WARNING] Avro: Invalid default for field agentType: "APP_AGENT" not a
> ["null","string"]


Try:

{
>  "namespace": "xx..x.x",
>  "type": "record",
>  "name": "MyPayLoad",
>  "fields": [
>  {"name": "filed1",  "type": "string"},
>  {"name": "filed2", "type": "long"},
>  {"name": "filed3",  "type": "boolean"},
>  {
>   "name" : "metrics",
>   "type":
>   {
>  "type" : "array",
>  "items":
>  {
>  "name": "MyRecord",
>  "type": "record",
>  "fields" :
>  [
>{"name": "min", "type": "long"},
>{"name": "max", "type": "long"},
>        {"name": "sum", "type": "long"},
>{"name": "count", "type": "long"}
>  ]
>  }
>   }
>  },
>  {"name": "agentType",  "type": ["null", "string"], "default": null}
>   ]
> }





On Mon, Feb 1, 2016 at 8:31 PM, Raghvendra Singh <rsi...@appdynamics.com>
wrote:

>
>
> down votefavorite
> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>
> I have this avro schema
>
> {
>  "namespace": "xx..x.x",
>  "type": "record",
>  "name": "MyPayLoad",
>  "fields": [
>  {"name": "filed1",  "type": "string"},
>  {"name": "filed2", "type": "long"},
>  {"name": "filed3",  "type": "boolean"},
>  {
>   "name" : "metrics",
>   "type":
>   {
>  "type" : "array",
>  "items":
>  {
>  "name": "MyRecord",
>  "type": "record",
>  "fields" :
>  [
>{"name": "min", "type": "long"},
>{"name": "max", "type": "long"},
>{"name": "sum", "type": "long"},
>{"name": "count", "type": "long"}
>  ]
>  }
>   }
>  }
>   ]}
>
> Here is the code which we use to parse the data
>
> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
> DatumReader payloadReader = new 
> SpecificDatumReader<>(MyPayLoad.class);
> Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
> MyPayLoad myPayLoad = null;
> try {
> myPayLoad = payloadReader.read(null, decoder);
> } catch (IOException e) {
> logger.log(Level.SEVERE, e.getMessage(), e);
> }
>
> return myPayLoad;
> }
>
> Now i want to add one more field int the schema so the schema looks like
> below
>
>  {
>  "namespace": "xx..x.x",
>  "type": "record",
>  "name": "MyPayLoad",
>  "fields": [
>  {"name": "filed1",  "type": "string"},
>  {"name": "filed2", "type": "long"},
>  {"name": "filed3",  "type": "boolean"},
>  {
>   "name" : "metrics",
>   "type":
>   {
>  "type" : "array",
>  "items":
>  {
>  "name": "MyRecord",
>  

RE: add new attributes into avro schema

2015-12-17 Thread David Newberger
Hi,

I’m fairly new to working with avro so I could be wrong however this:

https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaBuilder.html

“Primitive Types
All Avro primitive types are trivial to configure. A primitive type in Avro 
JSON can be declared two ways, one that supports custom properties and one that 
does not:
{"type":"int"}
{"type":{"name":"int"}}
{"type":{"name":"int", "customProp":"val"}}
The analogous code form for the above three JSON lines are the below three 
lines:
  .intType()
  .intBuilder().endInt()
  .intBuilder().prop("customProp", "val").endInt()
Every primitive type has a shortcut to create the trivial type, and a builder 
when custom properties are required. The first line above is a shortcut for the 
second, analogous to the JSON case.”

makes it look like you can.

David Newberger

From: John Smith [mailto:lenov...@gmail.com]
Sent: Thursday, December 17, 2015 5:13 AM
To: user@avro.apache.org
Subject: add new attributes into avro schema

Hi,
is it possible to extend avro schema with custom attributes, for example

{
 "type":"record",
  "name":"X",
   "fields":[
  {"name":"b3","type":"int","doc":"blabla","newField1":"test", 
"newField2":"test2"}
]}');


Thank you!



add new attributes into avro schema

2015-12-17 Thread John Smith
Hi,

is it possible to extend avro schema with custom attributes, for example

{
 "type":"record",
  "name":"X",
   "fields":[
  {"name":"b3","type":"int","doc":"blabla",*"newField1"*:"test",
*"newField2"*:"test2"}
]}');



Thank you!


Re: Json data to avro schema validation

2015-09-01 Thread Jeetendra G
any idea here?

On Tue, Sep 1, 2015 at 1:36 AM, Jeetendra G <jeetendr...@housing.com> wrote:

> Hi All
>
> I have  Json data in object and I want to validate that object against
> Avro schema .is there an API to validate this?
>
>
> Regards
> Jeetendra
>


Json data to avro schema validation

2015-08-31 Thread Jeetendra G
Hi All

I have  Json data in object and I want to validate that object against Avro
schema .is there an API to validate this?


Regards
Jeetendra


RE: Not able to load avro schema fully with all its contents

2015-05-20 Thread Pierre de Frém
Hello,
Sam is right in his previous answer.More precisely, the field doc is read by 
the Compiler, but not stored at the moment in the Node object. The reason might 
be that the field doc is optional is the avro specification (see: 
https://avro.apache.org/docs/1.7.7/spec.html, Complex types).
If you want to store the field doc, you'll have to modify the source code 
yourself to:- create a new member doc in the Node API (Node.hh),- store the 
doc field in Node as it is read by the Compiler (Compiler.cc),- serialize the 
field doc in NodeImpl.cc
I did a patch for my own use were I store and read fields doc for a 
NodeRecord, and I serialize fields doc for the root Node of a NodeRecord.
You can find it at:the corresponding branch (created for the 
patch):https://github.com/pidefrem/avro/tree/branch-1.7-specificrecord
the corresponding commit for the field 
doc:https://github.com/pidefrem/avro/commit/795a0805b8ea8d3228bd92a483c9cbb405e11a62
Rem: if you want to serialize all fields doc of a NodeRecord, just change line 
195 of NodeImpl.cc fromif (depth == 1  getDoc().size()) {
to
if (getDoc().size()) {
(Maybe my patch could be added in the trunk of the source code if it is useful?)
Hope this helps.
Pierre

Date: Tue, 19 May 2015 18:37:56 +
From: sgr...@yahoo-inc.com
To: user@avro.apache.org
Subject: Re: Not able to load avro schema fully with all its contents

Just a guess, but I would assume that the schema object only stores fields that 
it cares about. This would exclude your docs. If you want to know for sure, the 
source code is here: https://github.com/apache/avro/tree/trunk/lang/c%2B%2B  

Sam


 On Tuesday, May 19, 2015 1:13 PM, Check Peck comptechge...@gmail.com 
wrote:


 Can anyone help me with this?On Mon, May 18, 2015 at 2:04 PM, Check Peck 
comptechge...@gmail.com wrote:Does anyone have any idea on this why it is 
behaving like this?On Mon, May 18, 2015 at 1:03 PM, Check Peck 
comptechge...@gmail.com wrote:And this is my to_string method I forgot to 
provide.std::string DataSchema::to_string() const{ostringstream os;if 
(valid()){os  JSON data: ;m_schema.toJson(os);  }   
 return os.str();}On Mon, May 18, 2015 at 12:54 PM, Check Peck 
comptechge...@gmail.com wrote:I am working with Apache Avro in C++ and I am 
trying to load avro schema by using Avro C++ library. Everything works fine 
without any issues, only problem is - I have few doc in my Avro schema which 
is not getting shown at all in my AvroSchema when I try to load it and also 
print it out.DataSchema_ptr schema_data(new DataSchema());
schema_data-m_schema = load(avro_schema_file_name.c_str());const 
avro::NodePtr node_data_ptr = schema_data-m_schema.root();if 
(node_data_ptr  node_data_ptr-hasName()){// is there any problem 
with this node_data_ptr usage here?schema_data-m_name = 
node_data_ptr-name().fullname().c_str();   // this line prints out 
whole AVRO but it doesn't have doc which is there in my AVROcoutFile 
String :   schema_data-to_string()  endl;}   Here m_schema is 
avro::ValidSchema m_schema;   Can anyone help me with this. In general I 
don't see my doc which I have in Avro Schema getting shown when I print it out.




  

RE: Not able to load avro schema fully with all its contents

2015-05-20 Thread Pierre de Frém
Hello,
I posted the patch for the trunk branch of the git there (for it to be 
reviewed):https://issues.apache.org/jira/browse/AVRO-1256
Pierre

From: theped...@hotmail.com
To: user@avro.apache.org
Subject: RE: Not able to load avro schema fully with all its contents
Date: Wed, 20 May 2015 10:08:22 +




Hello,
Sam is right in his previous answer.More precisely, the field doc is read by 
the Compiler, but not stored at the moment in the Node object. The reason might 
be that the field doc is optional is the avro specification (see: 
https://avro.apache.org/docs/1.7.7/spec.html, Complex types).
If you want to store the field doc, you'll have to modify the source code 
yourself to:- create a new member doc in the Node API (Node.hh),- store the 
doc field in Node as it is read by the Compiler (Compiler.cc),- serialize the 
field doc in NodeImpl.cc
I did a patch for my own use were I store and read fields doc for a 
NodeRecord, and I serialize fields doc for the root Node of a NodeRecord.
You can find it at:the corresponding branch (created for the 
patch):https://github.com/pidefrem/avro/tree/branch-1.7-specificrecord
the corresponding commit for the field 
doc:https://github.com/pidefrem/avro/commit/795a0805b8ea8d3228bd92a483c9cbb405e11a62
Rem: if you want to serialize all fields doc of a NodeRecord, just change line 
195 of NodeImpl.cc fromif (depth == 1  getDoc().size()) {
to
if (getDoc().size()) {
(Maybe my patch could be added in the trunk of the source code if it is useful?)
Hope this helps.
Pierre

Date: Tue, 19 May 2015 18:37:56 +
From: sgr...@yahoo-inc.com
To: user@avro.apache.org
Subject: Re: Not able to load avro schema fully with all its contents

Just a guess, but I would assume that the schema object only stores fields that 
it cares about. This would exclude your docs. If you want to know for sure, the 
source code is here: https://github.com/apache/avro/tree/trunk/lang/c%2B%2B  

Sam


 On Tuesday, May 19, 2015 1:13 PM, Check Peck comptechge...@gmail.com 
wrote:


 Can anyone help me with this?On Mon, May 18, 2015 at 2:04 PM, Check Peck 
comptechge...@gmail.com wrote:Does anyone have any idea on this why it is 
behaving like this?On Mon, May 18, 2015 at 1:03 PM, Check Peck 
comptechge...@gmail.com wrote:And this is my to_string method I forgot to 
provide.std::string DataSchema::to_string() const{ostringstream os;if 
(valid()){os  JSON data: ;m_schema.toJson(os);  }   
 return os.str();}On Mon, May 18, 2015 at 12:54 PM, Check Peck 
comptechge...@gmail.com wrote:I am working with Apache Avro in C++ and I am 
trying to load avro schema by using Avro C++ library. Everything works fine 
without any issues, only problem is - I have few doc in my Avro schema which 
is not getting shown at all in my AvroSchema when I try to load it and also 
print it out.DataSchema_ptr schema_data(new DataSchema());
schema_data-m_schema = load(avro_schema_file_name.c_str());const 
avro::NodePtr node_data_ptr = schema_data-m_schema.root();if 
(node_data_ptr  node_data_ptr-hasName()){// is there any problem 
with this node_data_ptr usage here?schema_data-m_name = 
node_data_ptr-name().fullname().c_str();   // this line prints out 
whole AVRO but it doesn't have doc which is there in my AVROcoutFile 
String :   schema_data-to_string()  endl;}   Here m_schema is 
avro::ValidSchema m_schema;   Can anyone help me with this. In general I 
don't see my doc which I have in Avro Schema getting shown when I print it out.





  

Re: Not able to load avro schema fully with all its contents

2015-05-19 Thread Check Peck
Can anyone help me with this?

On Mon, May 18, 2015 at 2:04 PM, Check Peck comptechge...@gmail.com wrote:

 Does anyone have any idea on this why it is behaving like this?

 On Mon, May 18, 2015 at 1:03 PM, Check Peck comptechge...@gmail.com
 wrote:

 And this is my to_string method I forgot to provide.

 std::string DataSchema::to_string() const
 {
 ostringstream os;
 if (valid())
 {
 os  JSON data: ;
 m_schema.toJson(os);
 }
 return os.str();

 }


 On Mon, May 18, 2015 at 12:54 PM, Check Peck comptechge...@gmail.com
 wrote:

 I am working with Apache Avro in C++ and I am trying to load avro schema
 by using Avro C++ library. Everything works fine without any issues, only
 problem is - I have few doc in my Avro schema which is not getting shown
 at all in my AvroSchema when I try to load it and also print it out.

 DataSchema_ptr schema_data(new DataSchema());
 schema_data-m_schema = load(avro_schema_file_name.c_str());
 const avro::NodePtr node_data_ptr = schema_data-m_schema.root();
 if (node_data_ptr  node_data_ptr-hasName())
 {
 // is there any problem with this node_data_ptr usage here?
 schema_data-m_name = node_data_ptr-name().fullname().c_str();

 // this line prints out whole AVRO but it doesn't have doc which
 is there in my AVRO
 coutFile String :   schema_data-to_string()  endl;
 }

 Here m_schema is avro::ValidSchema m_schema;

 Can anyone help me with this. In general I don't see my doc which I have
 in Avro Schema getting shown when I print it out.






Not able to load avro schema fully with all its contents

2015-05-18 Thread Check Peck
I am working with Apache Avro in C++ and I am trying to load avro schema by
using Avro C++ library. Everything works fine without any issues, only
problem is - I have few doc in my Avro schema which is not getting shown
at all in my AvroSchema when I try to load it and also print it out.

DataSchema_ptr schema_data(new DataSchema());
schema_data-m_schema = load(avro_schema_file_name.c_str());
const avro::NodePtr node_data_ptr = schema_data-m_schema.root();
if (node_data_ptr  node_data_ptr-hasName())
{
// is there any problem with this node_data_ptr usage here?
schema_data-m_name = node_data_ptr-name().fullname().c_str();

// this line prints out whole AVRO but it doesn't have doc which is
there in my AVRO
coutFile String :   schema_data-to_string()  endl;
}

Here m_schema is avro::ValidSchema m_schema;

Can anyone help me with this. In general I don't see my doc which I have in
Avro Schema getting shown when I print it out.


Re: Not able to load avro schema fully with all its contents

2015-05-18 Thread Check Peck
And this is my to_string method I forgot to provide.

std::string DataSchema::to_string() const
{
ostringstream os;
if (valid())
{
os  JSON data: ;
m_schema.toJson(os);
}
return os.str();
}


On Mon, May 18, 2015 at 12:54 PM, Check Peck comptechge...@gmail.com
wrote:

 I am working with Apache Avro in C++ and I am trying to load avro schema
 by using Avro C++ library. Everything works fine without any issues, only
 problem is - I have few doc in my Avro schema which is not getting shown
 at all in my AvroSchema when I try to load it and also print it out.

 DataSchema_ptr schema_data(new DataSchema());
 schema_data-m_schema = load(avro_schema_file_name.c_str());
 const avro::NodePtr node_data_ptr = schema_data-m_schema.root();
 if (node_data_ptr  node_data_ptr-hasName())
 {
 // is there any problem with this node_data_ptr usage here?
 schema_data-m_name = node_data_ptr-name().fullname().c_str();

 // this line prints out whole AVRO but it doesn't have doc which
 is there in my AVRO
 coutFile String :   schema_data-to_string()  endl;
 }

 Here m_schema is avro::ValidSchema m_schema;

 Can anyone help me with this. In general I don't see my doc which I have
 in Avro Schema getting shown when I print it out.



Re: Not able to load avro schema fully with all its contents

2015-05-18 Thread Check Peck
Does anyone have any idea on this why it is behaving like this?

On Mon, May 18, 2015 at 1:03 PM, Check Peck comptechge...@gmail.com wrote:

 And this is my to_string method I forgot to provide.

 std::string DataSchema::to_string() const
 {
 ostringstream os;
 if (valid())
 {
 os  JSON data: ;
 m_schema.toJson(os);
 }
 return os.str();

 }


 On Mon, May 18, 2015 at 12:54 PM, Check Peck comptechge...@gmail.com
 wrote:

 I am working with Apache Avro in C++ and I am trying to load avro schema
 by using Avro C++ library. Everything works fine without any issues, only
 problem is - I have few doc in my Avro schema which is not getting shown
 at all in my AvroSchema when I try to load it and also print it out.

 DataSchema_ptr schema_data(new DataSchema());
 schema_data-m_schema = load(avro_schema_file_name.c_str());
 const avro::NodePtr node_data_ptr = schema_data-m_schema.root();
 if (node_data_ptr  node_data_ptr-hasName())
 {
 // is there any problem with this node_data_ptr usage here?
 schema_data-m_name = node_data_ptr-name().fullname().c_str();

 // this line prints out whole AVRO but it doesn't have doc which
 is there in my AVRO
 coutFile String :   schema_data-to_string()  endl;
 }

 Here m_schema is avro::ValidSchema m_schema;

 Can anyone help me with this. In general I don't see my doc which I have
 in Avro Schema getting shown when I print it out.





Issue with reading old data with a new Avro Schema

2015-04-08 Thread Nicolas Phung
Hello,

I'm trying to read old avro binary data with a new schema (I add a new
field).

This is the Avro Schema (OLD) I was using to write Avro binary data before:
{
namespace: com.hello.world,
type: record,
name: Toto,
fields:
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
}
]
}

This is the Avro Schema (NEW) I'm using to read the Avro binary data :

{
namespace: com.hello.world,
type: record,
name: Toto,
fields:
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
},
{
name: c,
type: string,
default: na
}
]
}

However, I can't read the old data with the new Schema. I've got the
following errors :

15/04/08 17:32:22 ERROR executor.Executor: Exception in task 0.0 in stage
3.0 (TID 3)
java.io.EOFException
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:272)
at
org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:113)
at
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:353)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
at
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at com.miguno.kafka.avro.AvroDecoder.fromBytes(AvroDecoder.scala:31)

From my understanding, I should be able to read the old data with the new
schema that contains a new field with a default value. But it doesn't seem
to work. Am I doing something wrong ?

I have posted a report https://issues.apache.org/jira/browse/AVRO-1661

Regards,
Nicolas PHUNG


Re: Issue with reading old data with a new Avro Schema

2015-04-08 Thread Lukas Steiblys
The schema is not valid JSON. Maybe you forgot the “[“ after “fields:”?

Lukas
From: Nicolas Phung 
Sent: Wednesday, April 8, 2015 9:45 AM
To: user@avro.apache.org 
Subject: Issue with reading old data with a new Avro Schema

Hello, 

I'm trying to read old avro binary data with a new schema (I add a new field).


This is the Avro Schema (OLD) I was using to write Avro binary data before:
{
namespace: com.hello.world,
type: record,
name: Toto,
fields: 
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
}
]
}

This is the Avro Schema (NEW) I'm using to read the Avro binary data :

{
namespace: com.hello.world,
type: record,
name: Toto,
fields: 
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
},
{
name: c,
type: string,
default: na
}
]
}

However, I can't read the old data with the new Schema. I've got the following 
errors :

15/04/08 17:32:22 ERROR executor.Executor: Exception in task 0.0 in stage 3.0 
(TID 3)
java.io.EOFException
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:272)
at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:113)
at 
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:353)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
at 
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at com.miguno.kafka.avro.AvroDecoder.fromBytes(AvroDecoder.scala:31)

From my understanding, I should be able to read the old data with the new 
schema that contains a new field with a default value. But it doesn't seem to 
work. Am I doing something wrong ?

I have posted a report https://issues.apache.org/jira/browse/AVRO-1661

Regards,
Nicolas PHUNG

Re: Issue with reading old data with a new Avro Schema

2015-04-08 Thread Nicolas Phung
OLD:
{
namespace: com.hello.world,
type: record,
name: Toto,
fields: [
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
}
]
}

NEW:
{
namespace: com.hello.world,
type: record,
name: Toto,
fields: [
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
},
{
name: c,
type: string,
default: na
}
]
}

Sorry bad copy paste. The Avro Schema should be fine because I'm using
sbt-avro to generate the class from it.

On Wed, Apr 8, 2015 at 6:57 PM, Lukas Steiblys lu...@doubledutch.me wrote:

   The schema is not valid JSON. Maybe you forgot the “[“ after “fields:”?

 Lukas
   *From:* Nicolas Phung nicolas.ph...@gmail.com
 *Sent:* Wednesday, April 8, 2015 9:45 AM
 *To:* user@avro.apache.org
 *Subject:* Issue with reading old data with a new Avro Schema

  Hello,

 I'm trying to read old avro binary data with a new schema (I add a new
 field).

  This is the Avro Schema (OLD) I was using to write Avro binary data
 before:
 {
 namespace: com.hello.world,
 type: record,
 name: Toto,
 fields:
 {
 name: a,
 type: [
 string,
 null
 ]
 },
 {
 name: b,
 type: string
 }
 ]
 }

 This is the Avro Schema (NEW) I'm using to read the Avro binary data :

 {
 namespace: com.hello.world,
 type: record,
 name: Toto,
 fields:
 {
 name: a,
 type: [
 string,
 null
 ]
 },
 {
 name: b,
 type: string
 },
 {
 name: c,
 type: string,
 default: na
 }
 ]
 }

 However, I can't read the old data with the new Schema. I've got the
 following errors :

 15/04/08 17:32:22 ERROR executor.Executor: Exception in task 0.0 in stage
 3.0 (TID 3)
 java.io.EOFException
 at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
 at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
 at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
 at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:272)
 at
 org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:113)
 at
 org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:353)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
 at
 org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
 at
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
 at
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at com.miguno.kafka.avro.AvroDecoder.fromBytes(AvroDecoder.scala:31)

 From my understanding, I should be able to read the old data with the new
 schema that contains a new field with a default value. But it doesn't seem
 to work. Am I doing something wrong ?

 I have posted a report https://issues.apache.org/jira/browse/AVRO-1661

 Regards,
 Nicolas PHUNG



Re: Adding new field with default value to an Avro schema

2015-02-03 Thread Sean Busbey
Schema evolution in Avro requires access to both the schema used when
writing the data and the desired Schema for reading the data.

Normally, Avro data is stored in some container format (i.e. the one in the
spec[1]) and the parsing library takes care of pulling the schema used when
writing out of said container.

If you are using Avro data in some other location, you must have the writer
schema as well. One common use case is a shared messaging system focused on
small messages (but that doesn't use Avro RPC). In such cases, Doug Cutting
has some guidance he's previously given (quoted with permission, albeit
very late):

 A best practice for things like this is to prefix each Avro record
 with a (small) numeric schema ID.  This is used as the key for a
 shared database of schemas.  The schema corresponding to a key never
 changes, so the database can be cached heavily.  It never gets very
 big either.  It could be as simple as a .java file, with the
 constraint that you'd need to upgrade things downstream before
 upstream, or as complicated as an enterprise-wide REST schema service
 (AVRO-1124).  A variation is to use schema fingerprints as keys.

 Potentially relevant stuff:

 https://issues.apache.org/jira/browse/AVRO-1124
 http://avro.apache.org/docs/current/spec.html#Schema+Fingerprints

If you take the integer schema ID approach, you can use Avro's built in
utilities for zig-zap encoding, which will ensure that most of the time
your identifier only takes a small amount of space.

[1]: http://avro.apache.org/docs/current/spec.html#Object+Container+Files


On Tue, Feb 3, 2015 at 5:57 AM, Burak Emre emrekaba...@gmail.com wrote:

 I added a field with a default value to an Avro schema which is previously
 used for writing data. Is it possible to read the previous data using *only
 new schema* which has that new field at the end?

 I tried this scenario but unfortunately it throws EOFException while
 reading third field. Even though it has a default value and the previous
 fields is read successfully, I'm not able to de-serialize the record back
 without providing the writer schema I used previously.

 Schema schema = Schema.createRecord(test, null, avro.test, false);
 schema.setFields(Lists.newArrayList(
 new Field(project, Schema.create(Type.STRING), null, null),
 new Field(city, 
 Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
 Schema.create(Type.STRING))), null, NullNode.getInstance(;
 GenericData.Record record = new GenericRecordBuilder(schema)
 .set(project, ff).build();
 GenericDatumWriter w = new GenericDatumWriter(schema);ByteArrayOutputStream 
 outputStream = new ByteArrayOutputStream();BinaryEncoder encoder = 
 EncoderFactory.get().binaryEncoder(outputStream, null);

 w.write(record, encoder);
 encoder.flush();

 schema = Schema.createRecord(test, null, avro.test, false);
 schema.setFields(Lists.newArrayList(
 new Field(project, Schema.create(Type.STRING), null, null),
 new Field(city, 
 Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
 Schema.create(Type.STRING))), null, NullNode.getInstance()),
 new Field(newField, 
 Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
 Schema.create(Type.STRING))), null, NullNode.getInstance(;
 DatumReaderGenericRecord reader = new GenericDatumReader(schema);Decoder 
 decoder = DecoderFactory.get().binaryDecoder(outputStream.toByteArray(), 
 null);GenericRecord result = reader.read(null, decoder);





-- 
Sean


Re: Adding new field with default value to an Avro schema

2015-02-03 Thread Lukas Steiblys
On a related note, is there a tool that can check the backwards compatibility 
of schemas? I found some old messages talking about it, but no actual tool. I 
guess I could hack it together using some functions in the Avro library.

Lukas

From: Burak Emre 
Sent: Tuesday, February 3, 2015 9:01 AM
To: user@avro.apache.org 
Subject: Re: Adding new field with default value to an Avro schema

@Sean thanks for the explanation. 

I have multiple writers but only one reader and the only schema migration 
operation is adding a new field so I thought that I may use the same schema for 
all dataset since the ordering will be same in all of them even though some may 
contain extra fields which is also defined in schema definition.

Actually I wanted to avoid using an external database for sequential schema ids 
since it would make the system more complex than it should be in my case but it 
seems this is the only option for now.

-- 
Burak Emre
Koc University

On Tuesday 3 February 2015 at 18:22, Sean Busbey wrote:

  Schema evolution in Avro requires access to both the schema used when writing 
the data and the desired Schema for reading the data. 

  Normally, Avro data is stored in some container format (i.e. the one in the 
spec[1]) and the parsing library takes care of pulling the schema used when 
writing out of said container.

  If you are using Avro data in some other location, you must have the writer 
schema as well. One common use case is a shared messaging system focused on 
small messages (but that doesn't use Avro RPC). In such cases, Doug Cutting has 
some guidance he's previously given (quoted with permission, albeit very late):

   A best practice for things like this is to prefix each Avro record
   with a (small) numeric schema ID.  This is used as the key for a
   shared database of schemas.  The schema corresponding to a key never
   changes, so the database can be cached heavily.  It never gets very
   big either.  It could be as simple as a .java file, with the
   constraint that you'd need to upgrade things downstream before
   upstream, or as complicated as an enterprise-wide REST schema service
   (AVRO-1124).  A variation is to use schema fingerprints as keys.
   
   Potentially relevant stuff:
   
   https://issues.apache.org/jira/browse/AVRO-1124
   http://avro.apache.org/docs/current/spec.html#Schema+Fingerprints


  If you take the integer schema ID approach, you can use Avro's built in 
utilities for zig-zap encoding, which will ensure that most of the time your 
identifier only takes a small amount of space.

  [1]: http://avro.apache.org/docs/current/spec.html#Object+Container+Files


  On Tue, Feb 3, 2015 at 5:57 AM, Burak Emre emrekaba...@gmail.com wrote:

  I added a field with a default value to an Avro schema which is 
previously used for writing data. Is it possible to read the previous data 
using only new schema which has that new field at the end?
  I tried this scenario but unfortunately it throws EOFException while 
reading third field. Even though it has a default value and the previous fields 
is read successfully, I'm not able to de-serialize the record back without 
providing the writer schema I used previously.

Schema schema = Schema.createRecord(test, null, avro.test, false);
schema.setFields(Lists.newArrayList(
new Field(project, Schema.create(Type.STRING), null, null),
new Field(city, 
Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
Schema.create(Type.STRING))), null, NullNode.getInstance())
));

GenericData.Record record = new GenericRecordBuilder(schema)
.set(project, ff).build();

GenericDatumWriter w = new GenericDatumWriter(schema);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);

w.write(record, encoder);
encoder.flush();

schema = Schema.createRecord(test, null, avro.test, false);
schema.setFields(Lists.newArrayList(
new Field(project, Schema.create(Type.STRING), null, null),
new Field(city, 
Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
Schema.create(Type.STRING))), null, NullNode.getInstance()),
new Field(newField, 
Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
Schema.create(Type.STRING))), null, NullNode.getInstance())
));

DatumReaderGenericRecord reader = new GenericDatumReader(schema);
Decoder decoder = 
DecoderFactory.get().binaryDecoder(outputStream.toByteArray(), null);
GenericRecord result = reader.read(null, decoder);




  -- 

  Sean


Re: Adding new field with default value to an Avro schema

2015-02-03 Thread Sean Busbey
On Tue, Feb 3, 2015 at 11:34 AM, Lukas Steiblys lu...@doubledutch.me
wrote:

   On a related note, is there a tool that can check the backwards
 compatibility of schemas? I found some old messages talking about it, but
 no actual tool. I guess I could hack it together using some functions in
 the Avro library.

 Lukas


I don't think so, but this would be a great addition to the avro-tools
utility. Would you mind filing a JIRA for it?

-- 
Sean


Re: Adding new field with default value to an Avro schema

2015-02-03 Thread Doug Cutting
On Tue, Feb 3, 2015 at 9:34 AM, Lukas Steiblys lu...@doubledutch.me wrote:
 On a related note, is there a tool that can check the backwards
 compatibility of schemas?

https://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaCompatibility.html

Doug


Avro schema and data read with it.

2014-12-17 Thread ๏̯͡๏
I have a data that is persisted in Avro format. Each record has a certain
schema and it contains 10 fields while it is persisted.

When I read the same record(s) from other process, i also specify a schema
with a subset of fields (5).

Will only 5 columns be read from disk?
or
Will all the columns be read but 5 are later discarded?
or
Are all the columns read but only five are accessible since the schema used
to read contain only five columns?

Please suggest.

Regards,
Deepak


Re: Avro schema and data read with it.

2014-12-17 Thread Doug Cutting
Avro skips over fields that were present in the writer's schema but
are no longer present in the reader's schema.  Skipping is
substantially faster than reading for most types.  For known-size
types like string, bytes, fixed, double and float the file pointer can
be incremented past skipped values.  For skipped structures like
records, maps and arrays, no memory is allocated and no stores are
made.  Avro data files are not in a columnar format however, so the
i/o and decompression of skipped fields is not generally avoided.

Doug

On Wed, Dec 17, 2014 at 7:53 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
 I have a data that is persisted in Avro format. Each record has a certain
 schema and it contains 10 fields while it is persisted.

 When I read the same record(s) from other process, i also specify a schema
 with a subset of fields (5).

 Will only 5 columns be read from disk?
 or
 Will all the columns be read but 5 are later discarded?
 or
 Are all the columns read but only five are accessible since the schema used
 to read contain only five columns?

 Please suggest.

 Regards,
 Deepak



Is it legal avro schema to have a name tie to different type in different record

2014-09-14 Thread Patrick Nip
Hi All,

Is the following legal schema:

{

{

  metadata : {

schema : {

  family : search,

  version : v1,

  attrs : [ srch ]

}

  }

}{

  metadata : {

schema : {

  family : UDB,

  version : v1,

  attrs : [ login, reg ]

}

  }

}

}

Note the metadata is appear twice in 2 different record with different 
definition.

Thanks
Patrick


Map avro schema example in php

2014-09-10 Thread Vadim Keylis
Good evening. This is snippet of  avro schema field that I need to
build data for and populate in php *{name: meta__kvpairs, type:
[ null, { type: map, values: [null, string] } ] }*.

 Would appreciate if someone gives me an example of how data would
look in php to satisfy meta__kvpairs avro rule so that it can be
serialized without exception.


Thanks so much.


Re: Map avro schema example in php

2014-09-10 Thread Vadim Keylis
Please disregard.


On Tue, Sep 9, 2014 at 11:26 PM, Vadim Keylis vkeylis2...@gmail.com wrote:

 Good evening. This is snippet of  avro schema field that I need to build data 
 for and populate in php *{name: meta__kvpairs, type: [ null, { 
 type: map, values: [null, string] } ] }*.

  Would appreciate if someone gives me an example of how data would look in 
 php to satisfy meta__kvpairs avro rule so that it can be serialized without 
 exception.


 Thanks so much.




Encoding Avro schema as binary

2014-09-08 Thread Roger Hoover
Hi,

I wanted to see how much overhead would be involved if we ship an avro
schema along with message in a messaging context.  Seems like it might
simplify things to not always need a schema registry available with all
schema versions.

I found an old thread (
http://search-hadoop.com/m/zmrzAWDkbt1/noble+paul/v=threaded) referencing
this issue. (https://issues.apache.org/jira/browse/AVRO-251)

A couple questions?
1) Any particular reason the patch was never merged?  Can anyone foresee
major issues with this approach?
2) I applied this patch to the 1.5 branch thinking that I'd have the best
luck there (both are from 2011).  I'm having trouble building anything that
depends on the avro-maven-plugin (see errors below).  Any help would be
appreciated testing this out.

Thanks!

I'm getting this error trying to use the avro-maven-plugin to compile the
ipc module. Any ideas?

[ERROR] Failed to execute goal
org.apache.avro:avro-maven-plugin:1.5.5-SNAPSHOT:schema (schemas) on
project avro-ipc: Execution schemas of goal
org.apache.avro:avro-maven-plugin:1.5.5-SNAPSHOT:schema failed: An API
incompatibility was encountered while executing
org.apache.avro:avro-maven-plugin:1.5.5-SNAPSHOT:schema:
java.lang.ExceptionInInitializerError: null
[ERROR] -
[ERROR] realm = pluginorg.apache.avro:avro-maven-plugin:1.5.5-SNAPSHOT
[ERROR] strategy =
org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] =
file:/Users/rhoover/.m2/repository/org/apache/avro/avro-maven-plugin/1.5.5-SNAPSHOT/avro-maven-plugin-1.5.5-SNAPSHOT.jar
file:///Users/rhoover/.m2/repository/org/apache/avro/avro-maven-plugin/1.5.5-SNAPSHOT/avro-maven-plugin-1.5.5-SNAPSHOT.jar
[ERROR] urls[1] =
file:/Users/rhoover/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.1/plexus-interpolation-1.1.jar
file:///Users/rhoover/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.1/plexus-interpolation-1.1.jar
[ERROR] urls[2] =
file:/Users/rhoover/.m2/repository/org/codehaus/plexus/plexus-utils/1.5.5/plexus-utils-1.5.5.jar
file:///Users/rhoover/.m2/repository/org/codehaus/plexus/plexus-utils/1.5.5/plexus-utils-1.5.5.jar
[ERROR] urls[3] =
file:/Users/rhoover/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar
file:///Users/rhoover/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar
[ERROR] urls[4] =
file:/Users/rhoover/.m2/repository/org/apache/maven/shared/file-management/1.2.1/file-management-1.2.1.jar
file:///Users/rhoover/.m2/repository/org/apache/maven/shared/file-management/1.2.1/file-management-1.2.1.jar
[ERROR] urls[5] =
file:/Users/rhoover/.m2/repository/org/apache/maven/shared/maven-shared-io/1.1/maven-shared-io-1.1.jar
file:///Users/rhoover/.m2/repository/org/apache/maven/shared/maven-shared-io/1.1/maven-shared-io-1.1.jar
[ERROR] urls[6] =
file:/Users/rhoover/.m2/repository/org/apache/avro/avro-compiler/1.5.5-SNAPSHOT/avro-compiler-1.5.5-SNAPSHOT.jar
file:///Users/rhoover/.m2/repository/org/apache/avro/avro-compiler/1.5.5-SNAPSHOT/avro-compiler-1.5.5-SNAPSHOT.jar
[ERROR] urls[7] =
file:/Users/rhoover/.m2/repository/org/apache/avro/avro/1.5.5-SNAPSHOT/avro-1.5.5-SNAPSHOT.jar
file:///Users/rhoover/.m2/repository/org/apache/avro/avro/1.5.5-SNAPSHOT/avro-1.5.5-SNAPSHOT.jar
[ERROR] urls[8] =
file:/Users/rhoover/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.7.3/jackson-mapper-asl-1.7.3.jar
file:///Users/rhoover/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.7.3/jackson-mapper-asl-1.7.3.jar
[ERROR] urls[9] =
file:/Users/rhoover/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.7.3/jackson-core-asl-1.7.3.jar
file:///Users/rhoover/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.7.3/jackson-core-asl-1.7.3.jar
[ERROR] urls[10] =
file:/Users/rhoover/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar
file:///Users/rhoover/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar
[ERROR] urls[11] =
file:/Users/rhoover/.m2/repository/org/xerial/snappy/snappy-java/1.0.3.2/snappy-java-1.0.3.2.jar
file:///Users/rhoover/.m2/repository/org/xerial/snappy/snappy-java/1.0.3.2/snappy-java-1.0.3.2.jar
[ERROR] urls[12] =
file:/Users/rhoover/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar
file:///Users/rhoover/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar
[ERROR] urls[13] =
file:/Users/rhoover/.m2/repository/org/apache/velocity/velocity/1.7/velocity-1.7.jar
file:///Users/rhoover/.m2/repository/org/apache/velocity/velocity/1.7/velocity-1.7.jar
[ERROR] urls[14] =
file:/Users/rhoover/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
file:///Users/rhoover/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
[ERROR] urls[15] =
file:/Users/rhoover/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
file:///Users/rhoover/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
[ERROR] urls[16] =
file:/Users/rhoover/.m2

Re: Avro schema in Ruby API

2014-02-18 Thread Tomas Svarovsky
Hey Harsh,

thanks. I can confirm that the first one works. Let me try the second one.

Tomas


On Sun, Feb 16, 2014 at 8:07 AM, Harsh J ha...@cloudera.com wrote:

 Hi,

 For (1) I believe you could do a Schema.parse meta['avro.schema'] to
 obtain the schema as an object from the meta entry of the file.

 For (2), as defined in the spec at
 http://avro.apache.org/docs/current/spec.html#Object+Container+Files,
 since the schema is stored only in the header of the file, using a
 simple initialised reader will be efficient in reading just that. The
 file's data blocks are read only upon enumerating over the reader.

 On Sun, Feb 16, 2014 at 4:52 AM, Tomas Svarovsky
 svarovsky.to...@gmail.com wrote:
  Hey,
 
  I wanted to ask couple of questions.
 
  1) Let's assume I have 2 avro files. I would like to grab schemas of
 both.
  Compare them and decide what to do. The only way I found to get to the
  schema in a reader is through
 
  dr = Avro::DataFile::Reader.new(file, Avro::IO::DatumReader.new)
  dr.meta
 
  and that is still stringified JSON. Is this the only way or even is this
 use
  case something supported or should I do it differently?
 
  2) Also is ti possible to read just the schema? Sometimes it is useful to
  look at a file without actually reading the whole file let's say from s3.
 
  Regards Tomas



 --
 Harsh J



Avro schema in Ruby API

2014-02-15 Thread Tomas Svarovsky
Hey,

I wanted to ask couple of questions.

1) Let's assume I have 2 avro files. I would like to grab schemas of both.
Compare them and decide what to do. The only way I found to get to the
schema in a reader is through

dr = Avro::DataFile::Reader.new(file, Avro::IO::DatumReader.new)
dr.meta

and that is still stringified JSON. Is this the only way or even is this
use case something supported or should I do it differently?

2) Also is ti possible to read just the schema? Sometimes it is useful to
look at a file without actually reading the whole file let's say from s3.

Regards Tomas


Avro schema

2013-08-01 Thread Lior Schachter
Hi all,

When writing Avro schema to the a data file, what will be the expected
behavior if the file is used as M/R input. How does the second/third/...
splits get the schema (the schema is always written to the first split) ?

Thanks,
Lior


Re: Avro schema

2013-08-01 Thread Harsh J
We read it from the top of the file at start (just the schema bytes)
and then initialize the reader.

On Thu, Aug 1, 2013 at 8:32 PM, Lior Schachter lior...@gmail.com wrote:
 Hi all,

 When writing Avro schema to the a data file, what will be the expected
 behavior if the file is used as M/R input. How does the second/third/...
 splits get the schema (the schema is always written to the first split) ?

 Thanks,
 Lior





-- 
Harsh J


Re: Avro schema

2013-08-01 Thread Harsh J
Yes, we seek to 0 and we read the header then seek back to the split offset.
On Aug 1, 2013 11:16 PM, Lior Schachter lior...@gmail.com wrote:

 Hi Harsh,
 So for each split you first read the header of the file directly from HDFS
 ?

 Thanks,
 Lior




 On Thu, Aug 1, 2013 at 7:36 PM, Harsh J ha...@cloudera.com wrote:

 We read it from the top of the file at start (just the schema bytes)
 and then initialize the reader.

 On Thu, Aug 1, 2013 at 8:32 PM, Lior Schachter lior...@gmail.com wrote:
  Hi all,
 
  When writing Avro schema to the a data file, what will be the expected
  behavior if the file is used as M/R input. How does the second/third/...
  splits get the schema (the schema is always written to the first split)
 ?
 
  Thanks,
  Lior
 
 



 --
 Harsh J





Re: Avro Schema to SQL

2013-06-28 Thread Scott Carey
Not all Avro schemas can be converted to SQL.  Primarily, Unions can pose
challenges, as well as recursive references.

Nested types are a mixed bag ‹ some SQL-related systems have rich support
for nested types and/or JSON (e.g. PosgtgreSQL) which can make this easier,
while others are more crude (MySQL, HIVE).

With Unions, in some cases a union field can be expanded/flattened into
multiple fields, of which only one is not null.  Recursive types can be
transformed into key references.

In general, all of these transformation strategies require decisions by the
user and potentially custom work depending on what database is involved.

Traversing an Avro Schema in Java is done via the Schema API, the Javadoc
explains it and there are many examples in the avro source code.  The type
of schema must be checked, and for each nested type a different decent into
its contained types can occur.

From:  Avinash Dongre dongre.avin...@gmail.com
Reply-To:  user@avro.apache.org user@avro.apache.org
Date:  Wednesday, June 19, 2013 2:31 AM
To:  user@avro.apache.org user@avro.apache.org
Subject:  Avro Schema to SQL

Is there know tool/framework available to convert Avro Schema into SQL.
If now , How Do i iterate over the schema to find out what records, enums
are there. I can think of how to achieve this with simple Schema, but I am
not able to figure out a way for nested schemas.



Thanks
Avinash





Re: Avro Schema to SQL

2013-06-25 Thread Mason

Might be worth looking at Sqoop's source.

On 6/19/13 02:31 AM, Avinash Dongre wrote:

Is there know tool/framework available to convert Avro Schema into SQL.
If now , How Do i iterate over the schema to find out what records, 
enums are there. I can think of how to achieve this with simple 
Schema, but I am not able to figure out a way for nested schemas.




Thanks
Avinash





How to process different types of avro schema

2013-03-18 Thread sourabh chaki
Hi All,

In my application I am getting a stream of avro events. This stream
contains different types of avro events belonging to different schemas. I
was wondering what is the right way to process this data and do analytics
on top of this. Can I use hive? I did study the avro serde that could be
used to decode avro data and I’m thinking I need to transform the input
stream into (multiple) entries belonging to different tables. For this I’m
considering using a mapper job that would extract these events type by type
and then we could use hive on top of these separate schemas. I’m wondering
if anyone has dealt with such scenario before and if this approach would
work with decent performance?

Alternative way is to use all the logic in M-R code for the analytics that
we want to do on top of this data. Please advise.

Thanks in advance.

Sourabh


Re: AvroStorage/Avro Schema Question

2012-04-17 Thread Russell Jurney
:
},
{
name:address,
type:[null,string],
doc:
}
]
}

]
}
],
doc:
}
]
}

On Tue, Apr 10, 2012 at 2:36 AM, Russell Jurney russell.jur...@gmail.com 
wrote:
H unable to get this to work:

{
namespace: agile.data.avro,
name: Email,
type: record,
fields: [
{name:message_id, type: [string, null]},
{name:froms,type: [{type:record, name:from, fields: 
[{type:array, items:string}, null]}, null]},
{name:tos,type: [{type:record, name:to, fields: 
[{type:array, items:string}, null]}, null]},
{name:ccs,type: [{type:record, name:cc, fields: 
[{type:array, items:string}, null]}, null]},
{name:bccs,type: [{type:record, name:bcc, fields: 
[{type:array, items:string}, null]}, null]},
{name:reply_tos,type: [{type:record, name:reply_to, 
fields: [{type:array, items:string}, null]}, null]},
{name:in_reply_to, type: [{type:array, items:string}, 
null]},
{name:subject, type: [string, null]},
{name:body, type: [string, null]},
{name:date, type: [string, null]}
]
}

On Tue, Apr 10, 2012 at 2:26 AM, Russell Jurney russell.jur...@gmail.com 
wrote:
In thinking about it more... it seems that unfortunately, the only thing I can 
really do is to change the schema for all email address fields:

{name:from,type: [{type:array, items:string}, null]},
to:
{name:froms,type: [{type:record, name:from, fields: 
[{type:array, items:string}, null]}, null]},

That is, to pluralize everything and then individually name array elements. I 
will try running this through my stack.


On Mon, Apr 2, 2012 at 9:13 AM, Scott Carey scottca...@apache.org wrote:
It appears as though the Avro to PigStorage schema translation names (in pig) 
all arrays ARRAY_ELEM.  The nullable wrapper is 'visible' and the field name is 
not moved onto the bag name.   

About a year and a half ago I started
https://issues.apache.org/jira/browse/AVRO-592

but before finishing it AvroStorage was written elsewhere.  I don't recall 
exactly what I did with the schema translation there, but I recall the mapping 
from an Avro schema to pig tried to hide the nullable wrappers more.


In Avro, arrays are unnamed types, so I see two things you could probably do 
without any code changes:

* Add a line in the pig script to project / rename the fields to what you want 
(unfortunate and clumbsy, but I think it will work — I think you want 
from::PIG_WRAPPER::ARRAY_ELEM as from  or 
FLATTEN(from::PIG_WRAPPER)::ARRAY_ELEM as from something like that.
* Add a record wrapper to your schema (which may inject more messiness in the 
pig schema view):
{
namespace: agile.data.avro,
name: Email,
type: record,
fields: [
{name:message_id, type: [string, null]},
{name:from,type: [{type:record, name:From, fields: 
[[{type:array, items:string},null]], null]},
   …
]
}

But that is very awkward — requiring a named record for each field that is an 
unnamed type.


Ideally PigStorage would treat any union of null and one other thing as a 
simple pig type with no wrapper, and project the name of a field or record into 
the name of the thing inside a bag.


-Scott

On 3/29/12 6:05 PM, Russell Jurney russell.jur...@gmail.com wrote:

Is it possible to name string elements in the schema of an array?  
Specifically, below I want to name the email addresses in the 
from/to/cc/bcc/reply_to fields, so they don't get auto-named ARRAY_ELEM by 
Pig's AvroStorage.  I know I can probably fix this in Java in the Pig 
AvroStorage UDF, but I'm hoping I can also fix it more easily in the schema.  
Last time I read Avro's array docs in this context, my hit-points dropped by a 
third, so pardom me if I've not rtfm this time :)

Complete description of what I'm doing follows:

Avro schema for my emails:

{
namespace: agile.data.avro,
name: Email,
type: record,
fields: [
{name:message_id, type: [string, null]},
{name:from,type: [{type:array, items:string}, null]},
{name:to,type: [{type:array, items:string}, null]},
{name:cc,type: [{type:array, items:string}, null]},
{name:bcc,type: [{type:array, items:string}, null]},
{name:reply_to, type: [{type:array, items:string}, 
null]},
{name:in_reply_to, type: [{type:array, items:string}, 
null]},
{name:subject, type: [string, null]},
{name:body, type: [string, null]},
{name:date, type: [string, null]}
]
}

Pig to publish my Avros:

grunt emails = load '/me/tmp/emails' using AvroStorage();
grunt describe emails

emails: {message_id: chararray,from: {PIG_WRAPPER: (ARRAY_ELEM: chararray)},to: 
{PIG_WRAPPER

AVRO Schema Validator

2012-01-13 Thread Jason Rutherglen
Is there a command line way to validate an AVRO schema?


Re: Exposing a constant in an Avro Schema

2011-11-15 Thread Andrew Kenworthy
Hi Scott,

it's the latter I need; simply the ability to pass meta-data with my schema, so 
the user property is just what I need.

Thanks for your help!

Andrew




From: Scott Carey scottca...@apache.org
To: user@avro.apache.org user@avro.apache.org; Andrew Kenworthy 
adwkenwor...@yahoo.com
Sent: Monday, November 14, 2011 9:09 PM
Subject: Re: Exposing a constant in an Avro Schema


Named types (records, fields, fixed, enum) can store arbitrary user properties 
attached to the schema ( similar to doc but no special meaning).


Do you want this constant to be in every instance of your data object?  If so, 
the enum is one way to do it.  
If you simply want to push metadata along with the schema, use the schema 
properties, they are name-value pairs.  For example you can have myVersion 
attached to your schema for a record:


{type:record, name:bar.baz.FooRecord, myVersion:1.1, fields: {
    {name:field1, type:int},
    …
  } 
}

On 11/14/11 8:03 AM, Andrew Kenworthy adwkenwor...@yahoo.com wrote:


Hi,


I would like to embed a schema version number in the schema that I use for 
writing data: it would be read-only so that I can determine later on which 
version of my avro schema was used. The best I could come up with is to 
(ab)use an enum with a single value like this, as I couldn't find any way to 
define a constant:


{type:enum,name:version_1_1,doc:enum indicating avro write schema 
version 1.1,symbols:[VERSION_1_1]}


Is there a better way to register a constant value that has no meaning within 
the avro data file, other than to expose some kind of meta information?


Thanks,


Andrew Kenworthy





  1   2   >