Re: [ANNOUNCE] Apache Avro 1.10.1 released

2020-12-04 Thread Driesprong, Fokko
Awesome! Thanks for running the release. Great to see some continuity in
the Avro releases.

Cheers, Fokko

Op vr 4 dec. 2020 om 21:19 schreef Ryan Skraba 

> Please note: I mistakenly sent this same message earlier today with the
> wrong subject!  It is, in fact, 1.10.1 that was released.  My apologies!
>
> The Apache Avro community is pleased to announce the release of Avro
> 1.10.1!
>
> All signed release artifacts, signatures and verification instructions can
> be
> found here: https://avro.apache.org/releases.html
>
> This release includes 33 Jira issues, including some interesting features:
>
> C#: AVRO-2750 Support for enum defaults
> C++: AVRO-2891 Expose last sync offset written on DataFileWriter
> Java: AVRO-2924 SpecificCompiler add 'LocalDateTime' logical type
> Java: AVRO-2937 Expose some missing flags in SpecificCompilerTool
> PHP: AVRO-2096 Fixes to missing functions
> Ruby: AVRO-2907 Ruby schema.single_object_schema_fingerprint is reversed
>
> Migration notes:
> Java: AVRO-2817 Turn off validateDefaults when reading legacy Avro files
> Python: AVRO-2656 avro-python package is now the preferred python3 library
> and
>   avro-python3 is prepared to be deprecated
>
> And of course upgraded dependencies to latest versions, CVE fixes and more:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20AVRO%20AND%20fixVersion%20%3D%201.10.1
>
> The link to all fixed JIRA issues and a brief summary can be found at:
> https://github.com/apache/avro/releases/tag/release-1.10.1
>
> In addition, language-specific release artifacts are available:
>
> * C#: https://www.nuget.org/packages/Apache.Avro/1.10.1
> * Java: from Maven Central,
> * Javascript: https://www.npmjs.com/package/avro-js/v/1.10.1
> * Python 2: https://pypi.org/project/avro/1.10.1/
> * Python 3: https://pypi.org/project/avro-python3/1.10.1/
> * Ruby: https://rubygems.org/gems/avro/versions/1.10.1
>
> Thanks to everyone for contributing!
>


Re: [Announce] Please welcome Ryan Skraba to the Apache Avro PMC

2020-09-17 Thread Driesprong, Fokko
Awesome, well deserved Ryan!

Cheers, Fokko

Op do 17 sep. 2020 om 18:06 schreef Micah Kornfield :

> Congratulations!
>
> On Mon, Sep 14, 2020 at 10:07 AM Sean Busbey  wrote:
>
> > Hi folks!
> >
> > On behalf of the Apache Avro PMC I am pleased to announce that Ryan
> > Skraba has accepted our invitation to become a PMC member. We
> > appreciate Ryan stepping up to take more responsibility in the
> > project.
> >
> > Please join me in welcoming Ryan to the Avro PMC!
> >
> > As a reminder, if anyone would like to nominate another person as a
> > committer or PMC member, even if you are not currently a committer or
> > PMC member, you can always drop a note to priv...@avro.apache.org to
> > let us know.
> >
> > --
> > busbey
> >
>


Re: Schema Evolution Removing fields

2020-08-19 Thread Driesprong, Fokko
Hi Kishore,

This isn't possible. The fields should be optional in the first place. I
always use the table from Confluence:
https://docs.confluent.io/current/schema-registry/avro.html

The way to do this is to only add, or delete optional fields.

Hope this helps.

Kind regards, Fokko

Op di 7 jul. 2020 om 20:57 schreef KV 59 :

> Hi,
>
> I'm trying to remove a field which is required in the old schema (which is
> by default)
>
> long field1;
>
>
> and want to remove the field in the updated schema I made it optional and
> nullable
>
> union {null, long} field1 = null;
>
>
> Now the new schema can read the objects from the old schema (Backward
> compatible) but not the other way round (Forward compatible).
>
> What is the best strategy to have both?
>
> Thanks
> Kishore
>
>


Re: does anyone use IDL ?

2020-07-03 Thread Driesprong, Fokko
Hi Rog,

I know a few companies that use IDL, but it isn't as popular as just AVSC.
It depends on your use case, if you have a very hierarchical schema, then
it might be nice to use. If you have a simple message that you want to pass
around, then only a avsc might do the trick.

Cheers, Fokko

Op vr 19 jun. 2020 om 13:00 schreef roger peppe :

> Hi,
>
> Because the syntax is so much more convenient, we thought it would be a
> good idea to switch from using the raw JSON AVSC format to using IDL (avdl
> files).
>
> Almost immediately, I discovered three quite significant bugs:
>
> 1. you can't use default values for record-typed fields:
>https://issues.apache.org/jira/browse/AVRO-2867
> 2. you can't specify default values on enum-typed fields:
>https://issues.apache.org/jira/browse/AVRO-2866
> 3. you can't specify arrays, maps or unions that contain some of the
> logical types mentioned in the specification
>   https://issues.apache.org/jira/projects/AVRO/issues/AVRO-2864
>
> I couldn't find any existing issues, and the first two in particular feel
> basic enough that surely if there were a significant number of people using
> IDL, they'd have at least reported them...
>
> The second issue in particular is a real problem for us: it *can* be
> worked around by using a union with null, but that has significant
> usability and performance implications.
>
> So my underlying question is: is it a bad idea for us to be using IDL?
> Perhaps there are other, more significant issues that we haven't
> encountered yet that mean that people haven't decided to use it (and hence
> haven't encountered bugs like these). If so, it would be good to know!
>
>   cheers,
> rog.
>
>


Re: 1.10.0 Release?

2020-04-22 Thread Driesprong, Fokko
Thanks Covey,

Please give the 1.10-SNAPSHOT a try, and if you see any issues, let me
know. This will make sure that we capture bugs on forehand.

Cheers, Fokko

Op di 14 apr. 2020 om 23:59 schreef Corey Fritz :

> Thanks! Will keep an eye out...
>
> *Corey Fritz | Architect*
> corey.fr...@snagajob.com
> *office* | 866.227.0466
>
>
> On Tue, Apr 14, 2020 at 5:32 PM Ismaël Mejía  wrote:
>
>> We have discussed so far about cutting the branch for 1.10.0 and starting
>> the release next month (May 2020).
>> I will send a reminder soon to Avro's dev@ mailing list so we start
>> triaging and preparing the release.
>>
>>
>> https://lists.apache.org/thread.html/rb9693e90a8141b2c9f0f9c901c488a079fa6245b2e4d475e022ab1e8%40%3Cdev.avro.apache.org%3E
>>
>>
>>
>> On Tue, Apr 14, 2020 at 10:44 PM Corey Fritz 
>> wrote:
>>
>>> Any estimate available on when 1.10.0 will be released? We have a strong
>>> desire to use the C#  POCO serializers added in this ticket:
>>>
>>> https://issues.apache.org/jira/browse/AVRO-2389
>>>
>>> *Corey Fritz | Architect*
>>> corey.fr...@snagajob.com
>>> *office* | 866.227.0466
>>>
>>>
>>>
>>> The largest platform for hourly work.
>>>
>>
>
>
> The largest platform for hourly work.
>


Re: Avro Spec: encoding unions

2020-03-29 Thread Driesprong, Fokko
Hi Anh,

It looks like that you've found an inconsistency in the docs there. I think
we need to update the docs, and state that an int is being written.

Stay strong!

Cheers, Fokko

Op vr 20 mrt. 2020 om 07:58 schreef Anh Le :

> Hi guys,
>
> I'm reading the current Avro Spec. It states that:
>
> > A union is encoded by first writing a long value indicating the
> zero-based position within the union of the schema of its value. The value
> is then encoded per the indicated schema within the union.
>
> But as I dive through the code base, for example:
> https://github.com/rdblue/avro-java/blob/master/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java#L123-L125,
> I see there's no long value here. We've got an Int instead.
>
> Would you please tell me if there's any misunderstanding here.
>
> Thank you (and be strong)!
>


Re: Best-practice for integrating Jackson DTO's and Avro DTO's?

2020-02-17 Thread Driesprong, Fokko
Hi Marko,

You can easily convert Avro to JSON:
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/JsonDecoder.java
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java

However, if you use union values, the structure is a bit awkward. For
example, it's very common for a JSON-encoded value to allow a value that's
either null or string. In Avro, that's trivially expressed as the union type
 ["null", "string"]. With conventional JSON, a string value "foo" would be
encoded just as "foo", which is easily distinguished from null when
decoding. However, when using the Avro JSON format it must be encoded
as {"string":
"foo"}.

You could easily define the whole protocol in an Avro AVDL:
https://avro.apache.org/docs/current/idl.html And generate POJO's from
there.

Hope this helps,
Cheers, Fokko


Op di 18 feb. 2020 om 00:21 schreef marko :

> Our group has an number of services that interact with REST API’s.   Those
> REST API documents are built by converting Domain Objects to Document
> DTO’s,
> then the DTO’s are serialized to JSON via Jackson.
>
> A (made-up) example might be a Car Domain Object that exposes itself and
> it’s Parts with REST endpoints.   So there is a set of DTO’s that we
> maintain to assemble various REST Doc types (Engine, Wheel, Assembly,
> DamagedParts, etc…).
>
> Now we want to introduce Kafka messaging with structured Avro Schemas and
> generated Java Messages.   The generated Java Avro Message Classes is what
> i'm referring to as "Avro DTO’s" (similar analogy of REST
> data-representation of the Domain)
>
> Ideally we could reuse our existing Json-DTO's and Domain<-> Json-DTO
> Converters, but I’m not sure if/how that is possible?
>
> Is there any way to stay DRY given we have Domain<->Json-DTO Converters,
> and
> now it seems like we have to create duplicate Domain<->Avro-DTO Converters?
>
> Is there any well-know way to streamline this situation that you know of?
>
> Thank you for your help.
>
>
>
>
> --
> Sent from: http://apache-avro.679487.n3.nabble.com/Avro-Users-f679479.html
>


Re: [ANNOUNCE] Apache Avro 1.9.2 released

2020-02-17 Thread Driesprong, Fokko
Awesome, thanks Ryan for running the release!

Cheers, Fokko

Op do 13 feb. 2020 om 16:27 schreef Ryan Skraba :

> The Apache Avro community is pleased to announce the release of Avro 1.9.2!
>
> The link to all fixed JIRA issues and a brief summary can be found at:
> https://github.com/apache/avro/releases/tag/release-1.9.2
>
> This release includes 73 Jira issues:
>
> https://jira.apache.org/jira/issues/?jql=project%20%3D%20AVRO%20AND%20fixVersion%20%3D%201.9.2
>
> Some bug fixes:
> * C#: AVRO-2606 handle multidimensional arrays of custom types
> * Java: AVRO-2592 Avro decimal fails on some conditions
> * Java: AVRO-2641 Generated code results in java.lang.ClassCastException
> * Java: AVRO-2663 Projection on nested records does not work
> * Python: AVRO-2429 unknown logical types should fall back
> Improvements:
> * Java: AVRO-2247 Improve Java reading performance with a new reader
> * Python: AVRO-2104 Schema normalisation and fingerprint support for
> Python 3
> Work to unify Python2 and Python3 APIs in preparation for sunset.
> Improved tests
> Improved, more reliable builds.
> Improved readability
> Upgraded dependencies to latest versions, including CVE fixes.
> And more...
>
> This release can be downloaded from:
> https://www.apache.org/dyn/closer.cgi/avro/
>
> The released artifacts are available:
> * C#: https://www.nuget.org/packages/Apache.Avro/1.9.2
> * Java: from Maven Central,
> * Javascript: https://www.npmjs.com/package/avro-js/v/1.9.2
> * Python 2: https://pypi.org/project/avro/1.9.2/
> * Python 3: https://pypi.org/project/avro-python3/1.9.2.1/
>   - See https://issues.apache.org/jira/browse/AVRO-2737
> * Ruby: https://rubygems.org/gems/avro/versions/1.9.2
>
> Thanks to everyone for contributing!
>
> Ryan Skraba
>


Re: avro-tools illegal reflective access warnings

2020-01-20 Thread Driesprong, Fokko
>> "fix"
>> >> >>> these module errors, and deactivate the NativeCodeLoader logs with
>> an
>> >> >>> explicit log4j.properties:
>> >> >>>
>> >> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
>> >> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
>> >> >>>
>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
>> >> >>> tojson x.out
>> >> >>
>> >> >>
>> >> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j
>> properties files though. What do I need to put in /tmp/log4j.properties to
>> make this work?
>> >> >>
>> >> >>> None of that is particularly satisfactory, but it could be a
>> >> >>> workaround for your immediate use.
>> >> >>
>> >> >>
>> >> >> Yeah, not ideal, because if something goes wrong, stdout will be
>> corrupted, but at least some noise should go away :)
>> >> >>
>> >> >>> I'd also like to see a more unified experience with the CLI tool
>> for
>> >> >>> documentation and usage.  The current state requires a bit of Avro
>> >> >>> expertise to use, but it has some functions that would be pretty
>> >> >>> useful for a user working with Avro data.  I raised
>> >> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
>> >> >>>
>> >> >>> In my opinion, a schema compatibility tool would be a useful and
>> >> >>> welcome feature!
>> >> >>
>> >> >>
>> >> >> That would indeed be nice, but in the meantime, is there really
>> nothing in the avro-tools commands that uses a chosen schema to read a data
>> file written with some other schema? That would give me what I'm after
>> currently.
>> >> >>
>> >> >> Thanks again for the helpful response.
>> >> >>
>> >> >>cheers,
>> >> >>  rog.
>> >> >>
>> >> >>>
>> >> >>> Best regards, Ryan
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe 
>> wrote:
>> >> >>> >
>> >> >>> > Hi Fokko,
>> >> >>> >
>> >> >>> > Thanks for your swift response!
>> >> >>> >
>> >> >>> > Stdout and stderr definitely seem to be merged on this platform
>> at least. Here's a sample:
>> >> >>> >
>> >> >>> > % avrotool random --count 1 --schema '"int"'  x.out
>> >> >>> > % avrotool tojson x.out > x.json
>> >> >>> > % cat x.json
>> >> >>> > 125140891
>> >> >>> > WARNING: An illegal reflective access operation has occurred
>> >> >>> > WARNING: Illegal reflective access by
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>> sun.security.krb5.Config.getInstance()
>> >> >>> > WARNING: Please consider reporting this to the maintainers of
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> >> >>> > WARNING: Use --illegal-access=warn to enable warnings of further
>> illegal reflective access operations
>> >> >>> > WARNING: All illegal access operations will be denied in a
>> future release
>> >> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >> >>> > %
>> >> >>> >
>> >> >>> > I've just verified that it's not a problem with the java
>> executable itself (I ran a program that printed to System.err and the text
>> correctly goes to the standard error).
>> >> >>> >
>> >> >>> > > Regarding the documentation, the CLI itself contains info on
>> all the available commands. Also, there are excellent online resources:
>> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>> Is there anythi

Re: avro-tools illegal reflective access warnings

2020-01-17 Thread Driesprong, Fokko
Hi Roger,

We also have Java11 in our CI, but it might be that there are still some
issues with it. I haven't battletested Avro with Java 11 at least. For
skipping the tests, you can provide a flag to Maven:

# Make sure that you're in the Java project
cd lang/java/
mvn clean install -DskipTests

Let me know if this works for you.

Cheers, Fokko


Op do 16 jan. 2020 om 18:48 schreef roger peppe :

> On Thu, 16 Jan 2020 at 17:21, Ryan Skraba  wrote:
>
>> Hello!  For a simple, silent log4j, I use:
>>
>> $ cat /tmp/log4j.properties
>> log4j.rootLogger=off
>>
>
> Apparently passing those flags has sorted my stdin/stderr issue as well as
> suppressing the warnings. I wonder what was going on there. Thanks very
> much!
>
>
>> I didn't find anything currently in the avro-tools that uses both
>> reader and writer schemas while deserializing data...  It should be a
>> pretty easy feature to add as an option to the DataFileReadTool
>> (a.k.a. tojson)!
>>
>> You are correct about running ./build.sh dist in the java directory --
>> it fails with JDK 11 (likely fixable:
>> https://issues.apache.org/jira/browse/MJAVADOC-562).
>>
>> You should probably do a simple mvn clean install instead and find the
>> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>> should work with JDK11 without any problem (well-tested in the build).
>>
>
> I tried that (I ran it in the lang/java directory) and I still get a
> failure: https://gist.github.com/rogpeppe/e7f199c6fefb9c05eedad9e9841de14f
>
> Maybe there's a way to build without running the tests, perhaps? Please
> pardon my ignorance here.
>
>   cheers,
>


Re: avro-tools illegal reflective access warnings

2020-01-16 Thread Driesprong, Fokko
Hi Rog,

This is actually a warning produced by the Hadoop library, that we're
using. Please note that htis isn't part of the stdout:

$ find /tmp/tmp
/tmp/tmp
/tmp/tmp/._SUCCESS.crc
/tmp/tmp/part-0-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
/tmp/tmp/.part-0-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
/tmp/tmp/_SUCCESS

$ avro-tools tojson
/tmp/tmp/part-0-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
{"line_of_text":{"string":"Hello"}}
{"line_of_text":{"string":"World"}}

$ avro-tools tojson
/tmp/tmp/part-0-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
/tmp/tmp/data.json
20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

$ cat /tmp/tmp/data.json
{"line_of_text":{"string":"Hello"}}
{"line_of_text":{"string":"World"}}

So when you pipe the data, it doesn't include the warnings.

Regarding the documentation, the CLI itself contains info on all the
available commands. Also, there are excellent online resources:
https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
Is
there anything specific that you're missing?

Hope this helps.

Cheers, Fokko

Op do 16 jan. 2020 om 09:30 schreef roger peppe :

> Hi,
>
> I've been trying to use avro-tools to verify Avro implementations, and
> I've come across an issue. Perhaps someone here might be able to help?
>
> When I run avro-tools with some subcommands, it prints a bunch of warnings
> (see below) to the standard output. Does anyone know a way to disable this?
> I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools 1.9.1.
>
> The warnings are somewhat annoying because they can corrupt output of
> tools that print to the standard output, such as recodec.
>
> Aside: is there any documentation for the commands in avro-tools? Some
> seem to have some command-line help (though unfortunately there doesn't
> seem to be a standard way of showing it), but often that help often doesn't
> describe what the command actually does.
>
> Here's the output that I see:
>
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
>   cheers,
> rog.
>
>


Re: More idiomatic JSON encoding for unions

2020-01-09 Thread Driesprong, Fokko
Thanks for chipping in Zoltan and Sean. I did not plan to change the
current JSON encoder. My initial suggestion would make this an option that
the user can set. The default will be the current situation, so nothing
should change when upgrading to a newer version of Avro.

Cheers, Fokko

Op wo 8 jan. 2020 om 21:39 schreef Sean Busbey :

> I agree with Zoltan here. We have a really long history of maintaining
> compatibility for encoders.
>
> On Tue, Jan 7, 2020 at 10:06 AM Zoltan Farkas 
> wrote:
>
>> Fokko,
>>
>> I am not sure we should be changing the existing json encoder,
>> I think we should just add another encoder, and devs can use either one
>> of them based on their use case… and stay backward compatible.
>>
>> we should maybe standardize the content types for them… I have seen
>> application/avro being used for binary, we could have for json:
>> application/avro+json for the current format, application/avro.2+json for
>> the new format….
>>
>> At some point in the future we could deprecate the old one…
>>
>> —Z
>>
>>
>> On Jan 7, 2020, at 2:41 AM, Driesprong, Fokko 
>> wrote:
>>
>> I would be a great fan of this as well. This also bothered me. The tricky
>> part here is to see when to release this because it will break the existing
>> JSON structure. We could make this configurable as well.
>>
>> Cheers, Fokko
>>
>> Op ma 6 jan. 2020 om 22:36 schreef roger peppe :
>>
>>> That's great, thanks! I thought this would probably have come up before.
>>>
>>> Have you written down your changes in a somewhat more formal
>>> specification document, by any chance?
>>>
>>>   cheers,
>>> rog.
>>>
>>>
>>> On Mon, 6 Jan 2020, 18:50 zoly farkas,  wrote:
>>>
>>>> I think there is consensus that this should be implemented, see [AVRO-1582]
>>>> Json serialization of nullable fileds and fields with default values
>>>> improvement. - ASF JIRA
>>>> <https://issues.apache.org/jira/browse/AVRO-1582>
>>>>
>>>> [AVRO-1582] Json serialization of nullable fileds and fields with
>>>> defaul...
>>>>
>>>> <https://issues.apache.org/jira/browse/AVRO-1582>
>>>>
>>>>
>>>> Here is a live example to get some sample data in avro json:
>>>> https://demo.spf4j.org/example/records/1?_Accept=application/avro%2Bjson
>>>> and the "Natural"
>>>> https://demo.spf4j.org/example/records/1?_Accept=application/json using
>>>> the encoder suggested as implementation in the jira.
>>>>
>>>> Somebody needs to find the time do the work to integrate this...
>>>>
>>>> --Z
>>>>
>>>>
>>>>
>>>>
>>>> On Monday, January 6, 2020, 12:36:44 PM EST, roger peppe <
>>>> rogpe...@gmail.com> wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> The JSON encoding in the specification
>>>> <https://avro.apache.org/docs/current/spec.html#json_encoding> includes
>>>> an explicit type name for all kinds of object other than null. This means
>>>> that a JSON-encoded Avro value with a union is very rarely directly
>>>> compatible with normal JSON formats.
>>>>
>>>> For example, it's very common for a JSON-encoded value to allow a value
>>>> that's either null or string. In Avro, that's trivially expressed as the
>>>> union type ["null", "string"]. With conventional JSON, a string value
>>>> "foo" would be encoded just as "foo", which is easily distinguished
>>>> from null when decoding. However when using the Avro JSON format it
>>>> must be encoded as {"string": "foo"}.
>>>>
>>>> This means that Avro JSON-encoded values don't interchange easily with
>>>> other JSON-encoded values.
>>>>
>>>> AFAICS the main reason that the type name is always required in
>>>> JSON-encoded unions is to avoid ambiguity. This particularly applies to
>>>> record and map types, where it's not possible in general to tell which
>>>> member of the union has been specified by looking at the data itself.
>>>>
>>>> However, that reasoning doesn't apply if all the members of the union
>>>> can be distinguished from their JSON token type.
>>>>
>>>> I am considering using a JSON encoding that omits the type name when
>>>> all the members of the union encode to distinct JSON token types (the JSON
>>>> token types being: null, boolean, string, number, object and array).
>>>>
>>>> For example, JSON-encoded values using the Avro schema ["null",
>>>> "string", "int"] would encode as the literal values themselves (e.g.
>>>> null, "foo", 999), but JSON-encoded values using the Avro schema ["int",
>>>> "double"] would require the type name because the JSON lexeme doesn't
>>>> distinguish between different kinds of number.
>>>>
>>>> This would mean that it would be possible to represent a significant
>>>> subset of "normal" JSON schemas with Avro. It seems to me that would
>>>> potentially be very useful.
>>>>
>>>> Thoughts? Is this a really bad idea to be contemplating? :)
>>>>
>>>>   cheers,
>>>> rog.
>>>>
>>>>
>>>>
>>


Re: More idiomatic JSON encoding for unions

2020-01-06 Thread Driesprong, Fokko
I would be a great fan of this as well. This also bothered me. The tricky
part here is to see when to release this because it will break the existing
JSON structure. We could make this configurable as well.

Cheers, Fokko

Op ma 6 jan. 2020 om 22:36 schreef roger peppe :

> That's great, thanks! I thought this would probably have come up before.
>
> Have you written down your changes in a somewhat more formal specification
> document, by any chance?
>
>   cheers,
> rog.
>
>
> On Mon, 6 Jan 2020, 18:50 zoly farkas,  wrote:
>
>> I think there is consensus that this should be implemented, see [AVRO-1582]
>> Json serialization of nullable fileds and fields with default values
>> improvement. - ASF JIRA 
>>
>> [AVRO-1582] Json serialization of nullable fileds and fields with
>> defaul...
>>
>> 
>>
>>
>> Here is a live example to get some sample data in avro json:
>> https://demo.spf4j.org/example/records/1?_Accept=application/avro%2Bjson
>> and the "Natural"
>> https://demo.spf4j.org/example/records/1?_Accept=application/json using
>> the encoder suggested as implementation in the jira.
>>
>> Somebody needs to find the time do the work to integrate this...
>>
>> --Z
>>
>>
>>
>>
>> On Monday, January 6, 2020, 12:36:44 PM EST, roger peppe <
>> rogpe...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> The JSON encoding in the specification
>>  includes
>> an explicit type name for all kinds of object other than null. This means
>> that a JSON-encoded Avro value with a union is very rarely directly
>> compatible with normal JSON formats.
>>
>> For example, it's very common for a JSON-encoded value to allow a value
>> that's either null or string. In Avro, that's trivially expressed as the
>> union type ["null", "string"]. With conventional JSON, a string value
>> "foo" would be encoded just as "foo", which is easily distinguished from
>> null when decoding. However when using the Avro JSON format it must be
>> encoded as {"string": "foo"}.
>>
>> This means that Avro JSON-encoded values don't interchange easily with
>> other JSON-encoded values.
>>
>> AFAICS the main reason that the type name is always required in
>> JSON-encoded unions is to avoid ambiguity. This particularly applies to
>> record and map types, where it's not possible in general to tell which
>> member of the union has been specified by looking at the data itself.
>>
>> However, that reasoning doesn't apply if all the members of the union can
>> be distinguished from their JSON token type.
>>
>> I am considering using a JSON encoding that omits the type name when all
>> the members of the union encode to distinct JSON token types (the JSON
>> token types being: null, boolean, string, number, object and array).
>>
>> For example, JSON-encoded values using the Avro schema ["null",
>> "string", "int"] would encode as the literal values themselves (e.g. null,
>> "foo", 999), but JSON-encoded values using the Avro schema ["int",
>> "double"] would require the type name because the JSON lexeme doesn't
>> distinguish between different kinds of number.
>>
>> This would mean that it would be possible to represent a significant
>> subset of "normal" JSON schemas with Avro. It seems to me that would
>> potentially be very useful.
>>
>> Thoughts? Is this a really bad idea to be contemplating? :)
>>
>>   cheers,
>> rog.
>>
>>
>>


New Committer: Ryan Skraba

2019-12-17 Thread Driesprong, Fokko
Folks,

The Project Management Committee (PMC) for Apache Avro has invited Ryan
Skraba to become a committer and we are pleased to announce that he has
accepted. Ryan is actively fixing bugs by providing patches and reviewing
pull requests by others. We're very happy to have him on board.

Being a committer enables easier contribution to the project since there is
no need to go via the patch submission process. This should enable better
productivity.

Please join me in congratulating Ryan on his recognition of great work thus
far in our community.

Cheers, Fokko


Re: Announcement: Avro Schema Viewer

2019-09-22 Thread Driesprong, Fokko
Awesome work Robin, thanks for sharing!

Cheers, Fokko

Op za 21 sep. 2019 om 19:57 schreef Robin Trietsch :

> Hey Brian,
>
> Thanks! At the moment it only supports one top level schema with different
> versions. But feel free to implement it :)
>
> Regards,
> Robin
>
> On 21 Sep 2019, at 18:39, Brian Lachniet  wrote:
>
> This is really cool, thank you for sharing! I can't wait to try this out
> on our schemas at work.
>
> Does this support multiple, top-level schemas? Or does it support only
> multiple versions of one top-level schema?
>
> On Fri, Sep 20, 2019 at 4:42 AM Robin Trietsch 
> wrote:
>
>> Hi Avro users!
>>
>> We'd like to introduce you to a tool that we built at bol.com (largest
>> online retailer of the Netherlands), that can be used to visualize Avro
>> schemas (in *.avsc* format). Below is a screenshot of an example
>> deployment.
>>
>> 
>> Try it out yourself at: https://bolcom.github.io/avro-schema-viewer
>> Learn more and view the source at:
>> https://github.com/bolcom/avro-schema-viewer
>>
>> Regards,
>> Mike Junger
>> Robin Trietsch
>>
>
>
> --
> [image: 51b630b05e01a6d5134ccfd520f547c4.png]
> Brian Lachniet
> Software Engineer
> E: blachn...@gmail.com | blachniet.com 
>  
>
>
>


Re: Release notes for Avro 1.9.1

2019-09-13 Thread Driesprong, Fokko
Thank you for the question Javier,

The 1.9.1 release patches some regression issues that we're discovered in
1.9.0. The release notes are here:
https://github.com/apache/avro/releases/tag/release-1.9.1

Cheers, Fokko Driesprong


Op wo 11 sep. 2019 om 17:46 schreef Javier Holguera <
javier.holgu...@gmail.com>:

> Hi,
>
> Seems like there is a new version of Avro. Are there releases notes
> anywhere? Or at least a way to see what has changed?
>
> Avro 1.9.0 was a big release (in the cooks for many years) with lots of
> new features / bug fixes. That probably introduced a few regressions in the
> process. Does 1.9.1 address all of those regressions or is there any big,
> blocking bugs that are still pending? In the context of the Java
> implementation.
>
> Thanks!
>


Re: [Announce] Please welcome Nándor Kollár to the Apache Avro PMC

2019-08-31 Thread Driesprong, Fokko
Welcome Nándor, great to have you on the PMC! 

Op za 31 aug. 2019 om 09:20 schreef Niels Basjes :

> Welcome!
>
> On Fri, Aug 30, 2019, 23:39 Brian Lachniet  wrote:
>
> > Congratulations, Nándor!
> >
> > On Fri, Aug 30, 2019, 5:37 PM Sean Busbey  wrote:
> >
> >> Hi folks!
> >>
> >> On behalf of the Apache Avro PMC I am pleased to announce that Nándor
> >> Kollár has accepted our invitation to become a PMC member. We
> >> appreciate Nándor stepping up to take more responsibility in the
> >> project.
> >>
> >> Please join me in welcoming Nándor to the Avro PMC!
> >>
> >> As a reminder, if anyone would like to nominate another person as a
> >> committer or PMC member, even if you are not currently a committer or
> >> PMC member, you can always drop a note to priv...@avro.apache.org to
> >> let us know.
> >>
> >
>


Re: Should a Schema be serializable in Java?

2019-07-18 Thread Driesprong, Fokko
Thank you Ryan, I have a few comments on Github. Looks good to me.

Cheers, Fokko

Op do 18 jul. 2019 om 11:58 schreef Ryan Skraba :

> Hello!  I'm motivated to see this happen :D
>
> +Zoltan, the original author.  I created a PR against apache/avro master
> here: https://github.com/apache/avro/pull/589
>
> I cherry-picked the commit from your fork, and reapplied
> spotless/checkstyle.  I hope this is the correct way to preserve authorship
> and that I'm not stepping on any toes!
>
> Can someone take a look at the above PR?
>
> Best regards,
>
> Ryan
>
> On Tue, Jul 16, 2019 at 11:58 AM Ismaël Mejía  wrote:
>
>> Yes probably it is overkill to warn given the examples you mention.
>> Also your argument towards reusing the mature (and battle tested)
>> combination of Schema.Parser + String serialization makes sense.
>>
>> Adding this to 1.9.1 will be an extra selling point for projects
>> wanting to migrate to the latest version of Avro so it sounds good to
>> me but you should add it to master and then we can cherry pick it from
>> there.
>>
>>
>> On Tue, Jul 16, 2019 at 11:16 AM Ryan Skraba  wrote:
>> >
>> > Hello!  Thanks to the reference to AVRO-1852. It's exactly what I was
>> looking for.
>> >
>> > I agree that Java serialization shouldn't be used for anything
>> cross-platform, or (in my opinion) used for any data persistence at all.
>> Especially not for an Avro container file or sending binary data through a
>> messaging system...
>> >
>> > But Java serialization is definitely useful and used for sending
>> instances of "distributed work" implemented in Java from node to node in a
>> cluster.  I'm not too worried about existing connectors -- we can see that
>> each framework has "solved" the problem one at a time.  In addition to
>> Flink, there's
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroUtils.java#L29
>> and
>> https://github.com/apache/spark/blob/3663dbe541826949cecf5e1ea205fe35c163d147/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOutputWriterFactory.scala#L35
>> .
>> >
>> > Specifically, I see the advantage for user-defined distributed
>> functions that happen to carry along an Avro Schema -- and I can personally
>> say that I've encountered this a lot in our code!
>> >
>> > That being said, I think it's probably overkill to warn the user about
>> the perils of Java serialization (not being cross-language and requiring
>> consistent JDKs and libraries across JVMs).  If an error occurs for one of
>> those reasons, there's a larger problem for the dev to address, and it's
>> just as likely to occur for any Java library in the job if the environment
>> is bad.  Related, we've encountered similar issues with logical types
>> existing in Avro 1.8 in the driver but not in Avro 1.7 on the cluster...
>> the solution is "make sure you don't do that".  (Looking at you, guava and
>> jackson!)
>> >
>> > The patch in question delegates serialization to the string form of the
>> schema, so it's basically doing what all of the above Avro "holders" are
>> doing -- I wouldn't object to having a sample schema available that fully
>> exercises what a schema can hold, but I also think that Schema.Parser (used
>> underneath) is currently pretty well tested and mature!
>> >
>> > Do you think this could be a candidate for 1.9.1 as a minor
>> improvement?  I can't think of any reason that this wouldn't be backwards
>> compatible.
>> >
>> > Ryan
>> >
>> > side note: I wrote java.lang.Serializable earlier, which probably
>> didn't help my search for prior discussion... :/
>> >
>> > On Tue, Jul 16, 2019 at 9:59 AM Ismaël Mejía  wrote:
>> >>
>> >> This is a good idea even if it may have some issues that we should
>> >> probably document and warn users about:
>> >>
>> >> 1. Java based serialization is really practical for JVM based systems,
>> >> but we should probably add a warning or documentation because Java
>> >> serialization is not deterministic between JVMs so this could be a
>> >> source for issues (usually companies use the same version of the JVM
>> >> so this is less critical, but this still can happen specially now with
>> >> all the different versions of Java and OpenJDK based flavors).
>> >>
>> >> 2. This is not cross language compatible, the String based
>> >> representation (or ev

Re: [ANNOUNCE] Please welcome Ismaël Mejía to the Apache Avro PMC

2019-06-10 Thread Driesprong, Fokko
Congrats, welcome Ismaël!

Op di 11 jun. 2019 om 07:02 schreef Niels Basjes 

> Welcome!
>
> On Tue, 11 Jun 2019, 00:01 Brian Lachniet,  wrote:
>
>> Congratulations, Ismaël!
>>
>> On Mon, Jun 10, 2019 at 5:48 PM Jesse Anderson 
>> wrote:
>>
>>> Congrats!
>>>
>>> On Mon, Jun 10, 2019, 4:41 PM Sean Busbey  wrote:
>>>
 Hi folks!

 On behalf of the Apache Avro PMC I am pleased to announce that Ismaël
 Mejía has accepted our invitation to become a PMC member. We
 appreciate Ismaël stepping up to take more responsibility in the
 project.

 Please join me in welcoming Ismaël to the Avro PMC!

 As a reminder, if anyone would like to nominate another person as a
 committer or PMC member, even if you are not currently a committer or
 PMC member, you can always drop a note to priv...@avro.apache.org to
 let us know.

 -busbey

>>>
>>
>> --
>>
>> [image: 51b630b05e01a6d5134ccfd520f547c4.png]
>>
>> Brian Lachniet
>>
>> Software Engineer
>>
>> E: blachn...@gmail.com | blachniet.com 
>>
>>  
>>
>


[Announce] Release of Apache Avro 1.9.0

2019-05-24 Thread Driesprong, Fokko
Since the last release of Apache Avro 1.8.2 on May 31, 2017. Two years
later, I'm thrilled to announce the release of Avro 1.9.0!

Changes are listed at https://s.apache.org/avro190 A list of highlights of
the new version: https://blog.godatadriven.com/apache-avro-1-9-release

This release can be downloaded from:
https://www.apache.org/dyn/closer.cgi/avro/

Java artifacts are available from Maven Central, Ruby artifacts are
at RubyGems, Python is at PyPI, JS at NPM.

Thanks to everyone for contributing and helping out.

Fokko Driesprong