Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

SiNan Liu Wed, 01 Mar 2023 18:21:22 -0800

Hello Enrico. Thanks for your suggestion, according to my understanding of
what you said "flag".
How about we add a configuration in the next release:


protoBufNativeSchemaValidatorClassName=org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl

Use the previous implementation if the configuration is empty (check only
the name of the root message). If there is a better third-party or official
solution in the future, develop a new "
ProtobufNativeSchemaBreakValidatorImpl " to give users a choice.
What do you think of this design? If there is a better third party or
official solution in the future, do you think the current pr implementation
should be retained or deleted?


Thanks,
sinan



Enrico Olivelli <eolive...@gmail.com> 于 2023年3月2日周四 上午12:47写道：

> (I apologise for top posting)
>
> Would it be possible to add a flag to fallback to the previous behaviour ?
> I know that adding such flags is a burden but if the upgrade breaks
> some workflows then users won't be able to upgrade.
> We can add the flag in the next release and drop it in the next major
> release
>
> Enrico
>
> Il giorno mer 1 mar 2023 alle ore 15:33 SiNan Liu
> <liusinan1...@gmail.com> ha scritto:
> >
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> >
> >
> > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > avro struct to store.*
> > https://github.com/apache/pulsar/pull/1954
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> >
> > 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> > to `org.apache.avro.Schema`.*
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> >
> >
> >
> > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> >
> >
> > 1.
> > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > avro-protobuf.*
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > *ProtobufNativeSchema needs a native compatibility check, but there is no
> > official or third party implementation. So this PIP does not use
> > avro-protobuf for protobuf compatibility checking.*
> >
> > 2. *By the way, this is implemented in much the same way that Apache avro
> > does compatibility checking.*
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > `canReadStrategy`，`canBeReadStrategy`，`mutualReadStrategy`
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > writtenSchema and readSchema. We only need to change the order of
> arguments
> > we pass to `canRead()`.*
> > ```java
> > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > Descriptors.Descriptor fromDescriptor) throws
> ProtoBufCanReadCheckException
> > {
> > switch (strategy) {
> > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> > case CanBeReadMutualStrategy -> {
> > canRead(toValidate, fromDescriptor);
> > canRead(fromDescriptor, toValidate);
> > }
> > }
> > }
> >
> > private void canRead(Descriptors.Descriptor writtenSchema,
> > Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > readSchema);
> > }
> > ```
> >
> >
> > Thanks,
> > sinan
> >
> >
> >
> > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 21:19写道：
> > >
> > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <liusinan1...@gmail.com>
> wrote:
> > >
> > > > >
> > > > > I read it and they look identical. What's the difference between
> them?
> > > >
> > > > Current avro,json, and protobuf schemas are all implemented based on
> > AVRO.
> > > > > What do you mean, they are all implemented based on Avro? You mean
> the
> > > > > protobuf schema is converted into an Avro Schema, and then you use
> > Avro
> > > > > compatibility validation?
> > > >
> > > >
> > > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > >
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> > >
> > >
> > >
> > > >
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > >
> > > > The Compatibility check strategy is described on the website
> > > >
> > > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> > process
> > > > data written by producers using the last schema version V2. So V2 is
> > > > "writtenSchema" and V3 is "readSchema".
> > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last
> schema
> > > > version V2 can process data written by producers using a new schema
> V3,
> > > > even though they may not be able to use the full capabilities of the
> new
> > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> forward
> > > > compatible.
> > > > Schema can evolve. The old version schema and the new version schema
> > should
> > > > be well understood.
> > > >
> > > >
> > > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> > >
> > > At the current state, I can't understand your design at all. Please
> help
> > > clarify that.
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > So each strategy should have its own section.
> > > >
> > > >
> > > > The arguments of `canRead()` are writtenSchema and readSchema. As
> we've
> > > > just described, we only need to change the order of arguments we
> pass to
> > > > `canRead()`.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月27日周一 20:49写道：
> > > >
> > > > > >
> > > > > > And you can see the difference between ProtoBuf and
> ProtoBufNative:
> > > > > >
> > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > >
> > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > >
> > > > >  I read it and they look identical. What's the difference between
> > them?
> > > > >
> > > > > Current avro,json, and protobuf schemas are all implemented based
> on
> > > > AVRO.
> > > > >
> > > > > What do you mean, they are all implemented based on Avro? You mean
> the
> > > > > protobuf schema is converted into an Avro Schema, and then you use
> > Avro
> > > > > compatibility validation?
> > > > >
> > > > >
> > > > > > *Here are the basic compatibility rules we've defined:*
> > > > >
> > > > >
> > > > > I think you should structure the validation rules differently:
> > > > >
> > > > > * Backward checks
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > > * Forward
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > >
> > > > > So each strategy should have its own section.
> > > > >
> > > > > I'm saying this since you used "writttenSchema" word but it
> represents
> > > > > something completely different if it's backward or forward check.
> > > > >
> > > > > Once you'll have that structure like that, I personally will be
> able
> > to
> > > > > read and understand it.
> > > > >
> > > > >
> > > > > The motivation and problem statement are now good - thanks for
> > improving
> > > > > it.
> > > > >
> > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <liusinan1...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi! I updated the PIP issue again. This time I've added some
> > background
> > > > > and
> > > > > > some explanations.
> > > > > >
> > > > > > The compatibility check rules are already written in the
> > > > Implementation.
> > > > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > > > It does this by checking whether the schema for writing and
> reading
> > is
> > > > > > compatible. I also indicate whether the writtenSchema and
> > readSchema of
> > > > > the
> > > > > > Backward, Forward, and Full strategies are the old or the new
> > version
> > > > of
> > > > > > the schema.
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月26日周日 23:24写道：
> > > > > >
> > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
> > you
> > > > > need
> > > > > > to
> > > > > > > add IMO for people to understand it. You don't need to explain
> the
> > > > > entire
> > > > > > > pulsar in this PIP, but at the very least a few paragraphs
> > detailing
> > > > > all
> > > > > > > you need to know, to put you in context:
> > > > > > >
> > > > > > >
> > > > > > >    - Start by saying Pulsar as a built-in schema registry
> inside
> > > > Pulsar
> > > > > > >    broker.
> > > > > > >       - Every time the client updates the schema, it uploads
> it to
> > > > the
> > > > > > >       broker. When that happens, it has a feature which
> validates
> > if
> > > > > the
> > > > > > > new
> > > > > > >       schema version is compatible with the previous versions.
> > There
> > > > > > > are 4 types
> > > > > > >       of compatibility: Full, ... (complete and explain each
> one
> > > > > briefly)
> > > > > > >    - Also explain Pulsar Schema registry supports various
> schema
> > > > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
> > each
> > > > > > > protocol
> > > > > > >    has a schema which dictates how to serialize and deserialize
> > the
> > > > > > message
> > > > > > >    content into typed object.
> > > > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > > > non-native)
> > > > > > >    - Please don't paste code instead of explaining.
> > > > > > >       - Explain that protobuf native current validation check
> is
> > only
> > > > > > >       composed of checking the root message name is the same
> > between
> > > > > > > the current
> > > > > > >       schema version and the new version.
> > > > > > >          - Explain briefly what is a root message and its name.
> > > > > > >       - Explain the problem (list scenarios) that we have
> because
> > > > > > protobuf
> > > > > > >       native schema only supports FULL compatibility
> validation.
> > > > > > >
> > > > > > >
> > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > I suggest you add "High Level Design" and in it detail how you
> > plan
> > > > to
> > > > > > > validate, per protobuf version, per compatibility check
> (backward,
> > > > > > forward,
> > > > > > > full,...).
> > > > > > > I tried reading the implementation - for me , it's all over the
> > > > place.
> > > > > > Can
> > > > > > > you please list in order what I wrote above, and list the
> > validation
> > > > > > rules
> > > > > > > with a good explanation why you validate it like that?
> > > > > > >
> > > > > > > Lastly, one you have all the validation rules clearly stated,
> you
> > can
> > > > > use
> > > > > > > it to document it properly so users can know what validation to
> > > > expect.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Asaf
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> liusinan1...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Sorry, my mistake. I removed the code and described the
> design
> > to
> > > > > > improve
> > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can
> have a
> > > > look.
> > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月22日周三 21:16写道：
> > > > > > > >
> > > > > > > > > I read it but you're almost directly diving into the code
> - it
> > > > will
> > > > > > > take
> > > > > > > > me
> > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > >
> > > > > > > > > Can you please include a "High Level Design" section in
> which
> > you
> > > > > > > explain
> > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > If I can read that section and explain to someone else how
> > this
> > > > > will
> > > > > > > > work,
> > > > > > > > > it means the section is complete.
> > > > > > > > >
> > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > liusinan1...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I made a PIP to discuss:
> > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > We can talk about the current design here. Especially for
> > the
> > > > > field
> > > > > > > > type
> > > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Sinan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Reply via email to