Manageable avro schema evolution in Java

2022-06-27 Thread Niels Basjes
Hi,

Back in 2019 I spoke at the Datawork Summit conference about using Avro for
schema evolution in streaming scenarios.  https://youtu.be/QOdhaEHbSZM

Recently a few people asked me how to actually do this in a practical way.

To facilitate this I have created an "as clean as possible" demonstrator
project that shows how I think this can be done. Note that this is only
intended to show a _possible_ way of doing this.

https://github.com/nielsbasjes/avro-schema-example

Note that the commit history is also part of the demonstration !

I would love to hear your feedback, comments, improvement suggestions, etc.

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: New website

2021-10-28 Thread Niels Basjes
Hi,

To me this already looks a lot better than the default website, especially
because now it also supports mobile devices.
The exact look and feel for sites like this is always a discussion thing
as a step 1: I don't have any input on this right now.

What I am thinking about are things like:
Where do we want to host this one?
- On the existing Apache infrastructure?
- Using Github pages? This would make it possible to automatically
regenerate the site on a push to the master/main branch.
- Somewhere else like the netlify this demo is hosted on?

Also:
Do we want this to be a separate repository?
Or do we want this to be part of the main code repository?

Niels



On Thu, Oct 28, 2021 at 10:44 AM Martin Grigorov 
wrote:

> Hi all,
>
> Please check the new candidate for Apache Avro website:
> https://avro-website.netlify.app/
>
> It is based on Hugo and uses Docsy theme.
> Its source code and instructions how to build could be found at
> https://github.com/martin-g/avro-website.
> The JIRA ticket is: https://issues.apache.org/jira/browse/AVRO-2175
>
> I am not web designer, so some things may look not finished.
> I've just copied the HTML content from the old site (
> https://avro.apache.org/) and converted it to Markdown for Hugo.
>
> Any feedback is welcome! With Pull Requests would be awesome!
>
> Regards,
> Martin
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: Companies using Apache Avro

2021-01-27 Thread Niels Basjes
Hi,

I work at bol.com (biggest online retailer in the Netherlands/Belgium area)
and we heavily use Avro.

Here is a talk about why we use Avro:
https://www.youtube.com/watch?v=QOdhaEHbSZM

Niels Basjes

On Mon, Jan 25, 2021 at 9:28 PM Juan Cruz Viotti  wrote:

> Hey there!
>
> Do you know where can I find a list of relatively well-known companies
> that make use of Apache Avro? I'm trying to collect a small list for
> research purposes and my search is not yielding many results apart from
> Facebook.
>
> Thanks in advance,
>
> --
> Juan Cruz Viotti
> Software Engineer
> https://www.jviotti.com
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: New plugin for Jetbrains (IntelliJ / PyCharm / ...)

2021-01-13 Thread Niels Basjes
Nice, I'm going to try it out soon!

Niels

On Mon, Jan 11, 2021 at 9:38 AM Oscar Westra van Holthe - Kind <
os...@westravanholthe.nl> wrote:

> Hello everyone,
>
> Does anyone use IntelliJ, PyCharm or another Jetbrains IDE, and would like
> to edit Avro schemas with it?
> I've built a plugin that recognizes Avro schema and protocol files, and I
> would appreciate feedback and constructive criticism.
>
> It's an early version that supports:
> .avsc schema files: recognized as JSON 'dialect', uses a JSON schema to
> supply code completion and semantic checks
> .avpr protocol files: recognized as JSON 'dialect', uses a JSON schema to
> supply code completion and semantic checks
> .avdl protocol files: provides syntax highlighting, correct formatting,
> code completion, semantic checks, and named schema navigation
>
> You can find it in the plugin marketplace in your IDE (search for "avrio
> idl"), or here:
> https://plugins.jetbrains.com/plugin/15728-apache-avro-idl-schema-support
>
> Feedback, bugs, ideas (pull requests), etc. are most welcome via github:
> https://github.com/opwvhk/avro-schema-support
>
>
> Kind regards,
> Oscar
>
> --
>
> ✉️ Oscar Westra van Holthe - Kind 
>  https://plugins.jetbrains.com/plugin/15728-apache-avro-idl-schema-support
>
>
>

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: New Committer: Ryan Skraba

2019-12-17 Thread Niels Basjes
Welcome!

On Tue, Dec 17, 2019 at 5:20 PM Ryan Skraba  wrote:

> Thanks so much!  I'm super impressed with the quality of the work and
> advancement I've seen here, and I'm pretty excited and grateful to be
> able to contribute!
>
> Ryan
>
>
> On Tue, Dec 17, 2019 at 1:21 PM Austin Cawley-Edwards
>  wrote:
> >
> > Congrats Ryan, thanks for the help so far!
> >
> >
> > Austin
> >
> > On Tue, Dec 17, 2019 at 7:13 AM Michael Burr  wrote:
> >>
> >> unsubscribe
> >>
> >> On Tue, Dec 17, 2019 at 4:43 AM Driesprong, Fokko 
> wrote:
> >>>
> >>> Folks,
> >>>
> >>> The Project Management Committee (PMC) for Apache Avro has invited
> Ryan Skraba to become a committer and we are pleased to announce that he
> has accepted. Ryan is actively fixing bugs by providing patches and
> reviewing pull requests by others. We're very happy to have him on board.
> >>>
> >>> Being a committer enables easier contribution to the project since
> there is no need to go via the patch submission process. This should enable
> better productivity.
> >>>
> >>> Please join me in congratulating Ryan on his recognition of great work
> thus far in our community.
> >>>
> >>> Cheers, Fokko
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: [Announce] Please welcome Nándor Kollár to the Apache Avro PMC

2019-08-31 Thread Niels Basjes
Welcome!

On Fri, Aug 30, 2019, 23:39 Brian Lachniet  wrote:

> Congratulations, Nándor!
>
> On Fri, Aug 30, 2019, 5:37 PM Sean Busbey  wrote:
>
>> Hi folks!
>>
>> On behalf of the Apache Avro PMC I am pleased to announce that Nándor
>> Kollár has accepted our invitation to become a PMC member. We
>> appreciate Nándor stepping up to take more responsibility in the
>> project.
>>
>> Please join me in welcoming Nándor to the Avro PMC!
>>
>> As a reminder, if anyone would like to nominate another person as a
>> committer or PMC member, even if you are not currently a committer or
>> PMC member, you can always drop a note to priv...@avro.apache.org to
>> let us know.
>>
>


Re: [ANNOUNCE] Please welcome Ismaël Mejía to the Apache Avro PMC

2019-06-10 Thread Niels Basjes
Welcome!

On Tue, 11 Jun 2019, 00:01 Brian Lachniet,  wrote:

> Congratulations, Ismaël!
>
> On Mon, Jun 10, 2019 at 5:48 PM Jesse Anderson 
> wrote:
>
>> Congrats!
>>
>> On Mon, Jun 10, 2019, 4:41 PM Sean Busbey  wrote:
>>
>>> Hi folks!
>>>
>>> On behalf of the Apache Avro PMC I am pleased to announce that Ismaël
>>> Mejía has accepted our invitation to become a PMC member. We
>>> appreciate Ismaël stepping up to take more responsibility in the
>>> project.
>>>
>>> Please join me in welcoming Ismaël to the Avro PMC!
>>>
>>> As a reminder, if anyone would like to nominate another person as a
>>> committer or PMC member, even if you are not currently a committer or
>>> PMC member, you can always drop a note to priv...@avro.apache.org to
>>> let us know.
>>>
>>> -busbey
>>>
>>
>
> --
>
> [image: 51b630b05e01a6d5134ccfd520f547c4.png]
>
> Brian Lachniet
>
> Software Engineer
>
> E: blachn...@gmail.com | blachniet.com 
>
>  
>


Re: [ANNOUNCE] Please welcome Fokko Driesprong to the Apache Avro PMC

2019-05-14 Thread Niels Basjes
Welcome aboard!

On Tue, May 14, 2019, 09:28 Sean Busbey  wrote:

> Hi folks!
>
> On behalf of the Apache Avro PMC I am pleased to announce that Fokko
> Driesprong has accepted our invitation to become a PMC member on the
> Avro project. We appreciate Fokko stepping up to take more
> responsibility in the project.
>
> Please join me in welcoming Fokko to the Avro PMC!
>
>
>
> As a reminder, if anyone would like to nominate another person as a
> committer or PMC member, even if you are not currently a committer or
> PMC member, you can always drop a note to priv...@avro.apache.org to
> let us know.
>
> -busbey
>


Re: Difference between Avro message vs Avro Object Container files?

2017-05-16 Thread Niels Basjes
Hi,

A key thing with Avro is that in order to deserialize a record from the
byte array back into a usable form you need the schema that was used to
create the bytes in the first place.

An Avro file is essentially a (large) set of records that all adhere to the
same schema.
In such a file you will find the complete schema and for each of the
records the binary representation of that record.
This is possible way storing records that can then be used for batch
processing and because the schema is part of the file you can always read
all records in that file.

The Avro message format was created for the streaming usecase.
If you want to stream records into Kafka (where they will persist until the
TTL expires) then you need a way to know the schema ... for each record.
A schema may change over time we need to record the schema with EACH record.
Because the schema can be quite big (several KiB is common) you do not want
to store the same schema with every message.
So for the Message format you will find the ID of the schema in conjunction
with the actual record.
Looking at the API there is a system included behind which you can create a
database for all versions of all your schemas.

Does this clarify it for you?

Niels Basjes


On Tue, May 16, 2017 at 8:30 AM, kant kodali <kanth...@gmail.com> wrote:

> Hi All,
>
> I am new to Avro so I was wondering what is the difference between Avro
> message vs Avro Object Container files? Are they related at all? What are
> the use cases for each?
>
> Thanks!
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: Issues with Oracle JDK 1.8

2016-05-24 Thread Niels Basjes
Hi,

I have not been able to spot any Avro code in this stacktrace.
Why do you think this is a problem in Avro?

Niels Basjes

On Tue, May 24, 2016 at 3:59 PM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi user@,
> Are there any known issues using Avro with Oracle JDK 1.8? Specifically, I
> am using the following
>
> lmcgibbn@LMC-032857 /usr/local(NUTCH-2089) $ java -version
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>
> When I use the above JDK to compile Gora and run our test suite I get a
> failure which I haven't really investigated as of yet. I have printed the
> trace below. The test in question can be located at
> Thanks in advance for any replies.
> Lewis
>
>
>
> ---
> Test set: org.apache.gora.query.impl.TestQueryBase
>
> ---
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.711 sec
> <<< FAILURE!
> testReadWrite(org.apache.gora.query.impl.TestQueryBase)  Time elapsed:
> 0.309 sec  <<< ERROR!
> java.io.EOFException
> at
> org.apache.avro.util.ByteBufferInputStream.getBuffer(ByteBufferInputStream.java:86)
> at
> org.apache.avro.util.ByteBufferInputStream.read(ByteBufferInputStream.java:48)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at org.apache.hadoop.io.Text.readString(Text.java:466)
> at org.apache.hadoop.io.Text.readString(Text.java:457)
> at
> org.apache.gora.mapreduce.StringSerialization$1.deserialize(StringSerialization.java:55)
> at
> org.apache.gora.mapreduce.StringSerialization$1.deserialize(StringSerialization.java:40)
> at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:224)
> at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:227)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
> at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:224)
> at
> org.apache.gora.util.TestIOUtils.testSerializeDeserialize(TestIOUtils.java:122)
> at
> org.apache.gora.query.impl.TestQueryBase.testReadWrite(TestQueryBase.java:50)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at
> org.apache.maven.surefire.booter.ProviderFactory$

Re: Avro divides large JSON schema string into parts - is this intentional?

2015-08-10 Thread Niels Basjes
Have a look at this
https://issues.apache.org/jira/browse/AVRO-1316

This is the bug that required this change.

Niels Basjes

On Mon, 10 Aug 2015 11:05 mark manwoodv...@googlemail.com wrote:

 I am using Avro v1.7.7 in development, and Avro version 1.7.4 on my Hadoop
 cluster.

 I have a fairly large .avdl schema - a record with about 100 fields. When
 running locally under test there were no issues with this schema,
 everything would serialize and deserialize without issue.

 When running on Hadoop however I was getting this error:

 *Exception in thread main java.lang.NoSuchMethodError:
 org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)*

 The reason was that the JSON schema embedded in the compiled java class
 was being broken into two:

 *public class SomeType extends org.apache.avro.specific.SpecificRecordBase
 implements org.apache.avro.specific.SpecificRecord {*
 *  public static final org.apache.avro.Schema SCHEMA$ = new
 org.apache.avro.Schema.Parser().parse(long schema string part1, long
 schema string part2)*

 Now, version 1.7.7 has this method signature:

 *public Schema parse(String s, String... more)*

 So the broken schema string works fine locally, but version 1.7.4. does
 not, hence the exception when running compiled classes on Hadoop.

 Is this intentional or a bug?
 If intentional, what are the rules determining when Avro breaks up a
 schema string?
 Where is this behaviour documented?
 Why does it do it at all?

 Thanks