RE: Apicurio Avro format proposal

2023-11-23 Thread David Radley
Hi Ryan,
I am reasonable new it this, but here Is my understanding.

If we use pure Avro and Flink SQL, when we create a table – the shape of that 
table is the shape we expect the event to be. This falls down when we evolve 
the schema, i.e. create new versions of the schema. The new versions need to be 
compatible (https://avro.apache.org/docs/1.11.1/specification 
<https://avro.apache.org/docs/1.11.1/specification%20>  schema resolution for 
more details).

So if we want a topic to be for a schema then we need to be able to read 
messages that are at different schema versions. The message needs to identify 
which schema version it was written with – so an identifier in a schema 
registry.

In confluent registry there is a magic byte at the start of the message that is 
the schema id, Confluent schema registry can map this to a schema version. 
Using the Confluent Avro format, the serialisers and deserialisers use the 
schema id (the writer schema) to deserialise and the reader schema (e,g, in 
Flink the shape of the table definition) to convert the message appropriately.

In Apicurio, there is a ‘global id’ that identifies the schema version in the 
Apicurio registry.
See 
https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry
 . You will notice that the global id can be in a header or in the message 
payload. It also can be 8 bytes or the legacy 4 bytes. Apicurio also has an 
option to allow it to work with Confluent forms of messages (with the magic 
byte) using option ENABLE_CONFLUENT_ID_HANDLER .

In terms of the issue https://github.com/apache/flink/pull/21805 - it seems to 
be a change to tolerate the presence of bytes from Apicurio (when  
ENABLE_CONFLUENT_ID_HANDLER is specified?). My suggestion is that we close this 
issue and pr; then explicitly create support for Apicurio and document the 
options for a new format with the new Flip that the community agrees with.

In terms of a common base; it looks like we already have 
RegistryAvroDeserializationSchema
and RegistryAvroSerializationSchema as a common base. There might be 
refactoring we can do, when we create the second implementation.

You ask Outside of configuration options, are there different features? They 
are both schema registries that do schema evolution – I think this is a main 
feature they both do, that is relevant to Flink.

Does this help? If I have misrepresented anything – please let me know,

I am investigating further so I can create a well described FLIP for the 
proposed change,
Kind regards, David.


From: Ryan Skraba 
Date: Thursday, 23 November 2023 at 09:55
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Apicurio Avro format proposal
Pardon me, I forgot to include that I'd seen this before as
FLINK-26654.  There's a linked JIRA with an open PR that kind of
*plugs in* 8-byte ids . I haven't had the chance to check out Apicurio
yet, but I'm interested in schema registries in general.

All my best, Ryan

[1]: https://github.com/apache/flink/pull/21805
"[FLINK-30721][avro-confluent-registry] Enable 8byte schema id"

On Thu, Nov 23, 2023 at 10:48 AM Ryan Skraba  wrote:
>
> Hello David!
>
> In the FLIP, I'd be interested in knowing how the avro-apicurio and
> avro-confluent formats would differ!  Outside of configuration
> options, are there different features?  Would the two schema registry
> formats have a lot of common base that we could take advantage of?
>
> All my best, Ryan
>
> On Thu, Nov 23, 2023 at 10:14 AM David Radley  wrote:
> >
> > Hi Martijn,
> > Ok will do,
> >   Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Wednesday, 22 November 2023 at 21:47
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: Apicurio Avro format proposal
> > Hi David,
> >
> > Can you create a small FLIP for this?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, Nov 22, 2023 at 6:46 PM David Radley  
> > wrote:
> > >
> > > Hi,
> > > I would like to propose a new Apicurio Avro format.
> > > The Apicurio Avro Schema Registry (avro-apicurio) format would allow you 
> > > to read records that were serialized by the 
> > > io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records 
> > > that can in turn be read by the 
> > > io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.
> > >
> > > With format options including:
> > >
> > >   *   Apicurio Registry URL
> > >   *   Artifact resolver strategy
> > >   *   ID location
> > >   *   ID encoding
> > >   *   Avro datum provider
> > >   *   Avro encoding
> > >
> > >
> > >
> > > For m

Re: Apicurio Avro format proposal

2023-11-23 Thread Ryan Skraba
Pardon me, I forgot to include that I'd seen this before as
FLINK-26654.  There's a linked JIRA with an open PR that kind of
*plugs in* 8-byte ids . I haven't had the chance to check out Apicurio
yet, but I'm interested in schema registries in general.

All my best, Ryan

[1]: https://github.com/apache/flink/pull/21805
"[FLINK-30721][avro-confluent-registry] Enable 8byte schema id"

On Thu, Nov 23, 2023 at 10:48 AM Ryan Skraba  wrote:
>
> Hello David!
>
> In the FLIP, I'd be interested in knowing how the avro-apicurio and
> avro-confluent formats would differ!  Outside of configuration
> options, are there different features?  Would the two schema registry
> formats have a lot of common base that we could take advantage of?
>
> All my best, Ryan
>
> On Thu, Nov 23, 2023 at 10:14 AM David Radley  wrote:
> >
> > Hi Martijn,
> > Ok will do,
> >   Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Wednesday, 22 November 2023 at 21:47
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: Apicurio Avro format proposal
> > Hi David,
> >
> > Can you create a small FLIP for this?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, Nov 22, 2023 at 6:46 PM David Radley  
> > wrote:
> > >
> > > Hi,
> > > I would like to propose a new Apicurio Avro format.
> > > The Apicurio Avro Schema Registry (avro-apicurio) format would allow you 
> > > to read records that were serialized by the 
> > > io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records 
> > > that can in turn be read by the 
> > > io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.
> > >
> > > With format options including:
> > >
> > >   *   Apicurio Registry URL
> > >   *   Artifact resolver strategy
> > >   *   ID location
> > >   *   ID encoding
> > >   *   Avro datum provider
> > >   *   Avro encoding
> > >
> > >
> > >
> > > For more details see 
> > > https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry
> > >
> > > I am happy to work on this,
> > >   Kind regards, David.
> > >
> > > Unless otherwise stated above:
> > >
> > > IBM United Kingdom Limited
> > > Registered in England and Wales with number 741598
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: Apicurio Avro format proposal

2023-11-23 Thread Ryan Skraba
Hello David!

In the FLIP, I'd be interested in knowing how the avro-apicurio and
avro-confluent formats would differ!  Outside of configuration
options, are there different features?  Would the two schema registry
formats have a lot of common base that we could take advantage of?

All my best, Ryan

On Thu, Nov 23, 2023 at 10:14 AM David Radley  wrote:
>
> Hi Martijn,
> Ok will do,
>   Kind regards, David.
>
> From: Martijn Visser 
> Date: Wednesday, 22 November 2023 at 21:47
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: Apicurio Avro format proposal
> Hi David,
>
> Can you create a small FLIP for this?
>
> Best regards,
>
> Martijn
>
> On Wed, Nov 22, 2023 at 6:46 PM David Radley  wrote:
> >
> > Hi,
> > I would like to propose a new Apicurio Avro format.
> > The Apicurio Avro Schema Registry (avro-apicurio) format would allow you to 
> > read records that were serialized by the 
> > io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records 
> > that can in turn be read by the 
> > io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.
> >
> > With format options including:
> >
> >   *   Apicurio Registry URL
> >   *   Artifact resolver strategy
> >   *   ID location
> >   *   ID encoding
> >   *   Avro datum provider
> >   *   Avro encoding
> >
> >
> >
> > For more details see 
> > https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry
> >
> > I am happy to work on this,
> >   Kind regards, David.
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Apicurio Avro format proposal

2023-11-23 Thread David Radley
Hi Martijn,
Ok will do,
  Kind regards, David.

From: Martijn Visser 
Date: Wednesday, 22 November 2023 at 21:47
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Apicurio Avro format proposal
Hi David,

Can you create a small FLIP for this?

Best regards,

Martijn

On Wed, Nov 22, 2023 at 6:46 PM David Radley  wrote:
>
> Hi,
> I would like to propose a new Apicurio Avro format.
> The Apicurio Avro Schema Registry (avro-apicurio) format would allow you to 
> read records that were serialized by the 
> io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records that 
> can in turn be read by the 
> io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.
>
> With format options including:
>
>   *   Apicurio Registry URL
>   *   Artifact resolver strategy
>   *   ID location
>   *   ID encoding
>   *   Avro datum provider
>   *   Avro encoding
>
>
>
> For more details see 
> https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry
>
> I am happy to work on this,
>   Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: Apicurio Avro format proposal

2023-11-22 Thread Martijn Visser
Hi David,

Can you create a small FLIP for this?

Best regards,

Martijn

On Wed, Nov 22, 2023 at 6:46 PM David Radley  wrote:
>
> Hi,
> I would like to propose a new Apicurio Avro format.
> The Apicurio Avro Schema Registry (avro-apicurio) format would allow you to 
> read records that were serialized by the 
> io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records that 
> can in turn be read by the 
> io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.
>
> With format options including:
>
>   *   Apicurio Registry URL
>   *   Artifact resolver strategy
>   *   ID location
>   *   ID encoding
>   *   Avro datum provider
>   *   Avro encoding
>
>
>
> For more details see 
> https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry
>
> I am happy to work on this,
>   Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU