(I apologise for top posting)

Would it be possible to add a flag to fallback to the previous behaviour ?
I know that adding such flags is a burden but if the upgrade breaks
some workflows then users won't be able to upgrade.
We can add the flag in the next release and drop it in the next major release

Enrico

Il giorno mer 1 mar 2023 alle ore 15:33 SiNan Liu
<liusinan1...@gmail.com> ha scritto:
>
> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
>
>
> 1. *You can take a quick look at the previous design, the PROTOBUF uses
> avro struct to store.*
> https://github.com/apache/pulsar/pull/1954
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>
> 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> to `org.apache.avro.Schema`.*
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>
>
>
> I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
>
>
> 1.
> *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> avro-protobuf.*
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> *ProtobufNativeSchema needs a native compatibility check, but there is no
> official or third party implementation. So this PIP does not use
> avro-protobuf for protobuf compatibility checking.*
>
> 2. *By the way, this is implemented in much the same way that Apache avro
> does compatibility checking.*
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> writtenSchema and readSchema. We only need to change the order of arguments
> we pass to `canRead()`.*
> ```java
> private void validateWithStrategy(Descriptors.Descriptor toValidate,
> Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
> {
> switch (strategy) {
> case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> case CanBeReadMutualStrategy -> {
> canRead(toValidate, fromDescriptor);
> canRead(fromDescriptor, toValidate);
> }
> }
> }
>
> private void canRead(Descriptors.Descriptor writtenSchema,
> Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> readSchema);
> }
> ```
>
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 21:19写道:
> >
> > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <liusinan1...@gmail.com> wrote:
> >
> > > >
> > > > I read it and they look identical. What's the difference between them?
> > >
> > > Current avro,json, and protobuf schemas are all implemented based on
> AVRO.
> > > > What do you mean, they are all implemented based on Avro? You mean the
> > > > protobuf schema is converted into an Avro Schema, and then you use
> Avro
> > > > compatibility validation?
> > >
> > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > `checkCompatible()` is the same implementation with `AVRO`.
> > >
> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
> >
> >
> >
> > >
> > >
> > > I think you should structure the validation rules differently:
> > >
> > >
> > > The Compatibility check strategy is described on the website
> > >
> > >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> process
> > > data written by producers using the last schema version V2. So V2 is
> > > "writtenSchema" and V3 is "readSchema".
> > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last schema
> > > version V2 can process data written by producers using a new schema V3,
> > > even though they may not be able to use the full capabilities of the new
> > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward
> > > compatible.
> > > Schema can evolve. The old version schema and the new version schema
> should
> > > be well understood.
> > >
> > >
> > I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
> >
> > At the current state, I can't understand your design at all. Please help
> > clarify that.
> >
> >
> >
> >
> >
> > >
> > > So each strategy should have its own section.
> > >
> > >
> > > The arguments of `canRead()` are writtenSchema and readSchema. As we've
> > > just described, we only need to change the order of arguments we pass to
> > > `canRead()`.
> > >
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月27日周一 20:49写道:
> > >
> > > > >
> > > > > And you can see the difference between ProtoBuf and ProtoBufNative:
> > > > >
> > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > >
> > > >  I read it and they look identical. What's the difference between
> them?
> > > >
> > > > Current avro,json, and protobuf schemas are all implemented based on
> > > AVRO.
> > > >
> > > > What do you mean, they are all implemented based on Avro? You mean the
> > > > protobuf schema is converted into an Avro Schema, and then you use
> Avro
> > > > compatibility validation?
> > > >
> > > >
> > > > > *Here are the basic compatibility rules we've defined:*
> > > >
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > > * Backward checks
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > > * Forward
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > >
> > > > So each strategy should have its own section.
> > > >
> > > > I'm saying this since you used "writttenSchema" word but it represents
> > > > something completely different if it's backward or forward check.
> > > >
> > > > Once you'll have that structure like that, I personally will be able
> to
> > > > read and understand it.
> > > >
> > > >
> > > > The motivation and problem statement are now good - thanks for
> improving
> > > > it.
> > > >
> > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <liusinan1...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi! I updated the PIP issue again. This time I've added some
> background
> > > > and
> > > > > some explanations.
> > > > >
> > > > > The compatibility check rules are already written in the
> > > Implementation.
> > > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > > It does this by checking whether the schema for writing and reading
> is
> > > > > compatible. I also indicate whether the writtenSchema and
> readSchema of
> > > > the
> > > > > Backward, Forward, and Full strategies are the old or the new
> version
> > > of
> > > > > the schema.
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > >
> > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
> you
> > > > need
> > > > > to
> > > > > > add IMO for people to understand it. You don't need to explain the
> > > > entire
> > > > > > pulsar in this PIP, but at the very least a few paragraphs
> detailing
> > > > all
> > > > > > you need to know, to put you in context:
> > > > > >
> > > > > >
> > > > > >    - Start by saying Pulsar as a built-in schema registry inside
> > > Pulsar
> > > > > >    broker.
> > > > > >       - Every time the client updates the schema, it uploads it to
> > > the
> > > > > >       broker. When that happens, it has a feature which validates
> if
> > > > the
> > > > > > new
> > > > > >       schema version is compatible with the previous versions.
> There
> > > > > > are 4 types
> > > > > >       of compatibility: Full, ... (complete and explain each one
> > > > briefly)
> > > > > >    - Also explain Pulsar Schema registry supports various schema
> > > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
> each
> > > > > > protocol
> > > > > >    has a schema which dictates how to serialize and deserialize
> the
> > > > > message
> > > > > >    content into typed object.
> > > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > > non-native)
> > > > > >    - Please don't paste code instead of explaining.
> > > > > >       - Explain that protobuf native current validation check is
> only
> > > > > >       composed of checking the root message name is the same
> between
> > > > > > the current
> > > > > >       schema version and the new version.
> > > > > >          - Explain briefly what is a root message and its name.
> > > > > >       - Explain the problem (list scenarios) that we have because
> > > > > protobuf
> > > > > >       native schema only supports FULL compatibility validation.
> > > > > >
> > > > > >
> > > > > > Regarding high level design - as in what you plan to do.
> > > > > > I suggest you add "High Level Design" and in it detail how you
> plan
> > > to
> > > > > > validate, per protobuf version, per compatibility check (backward,
> > > > > forward,
> > > > > > full,...).
> > > > > > I tried reading the implementation - for me , it's all over the
> > > place.
> > > > > Can
> > > > > > you please list in order what I wrote above, and list the
> validation
> > > > > rules
> > > > > > with a good explanation why you validate it like that?
> > > > > >
> > > > > > Lastly, one you have all the validation rules clearly stated, you
> can
> > > > use
> > > > > > it to document it properly so users can know what validation to
> > > expect.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Asaf
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <liusinan1...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Sorry, my mistake. I removed the code and described the design
> to
> > > > > improve
> > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can have a
> > > look.
> > > > >
> > > > > > >
> > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > >
> > > > > > > > I read it but you're almost directly diving into the code - it
> > > will
> > > > > > take
> > > > > > > me
> > > > > > > > hours just to reverse engineer your design.
> > > > > > > >
> > > > > > > > Can you please include a "High Level Design" section in which
> you
> > > > > > explain
> > > > > > > > how you plan to tackle any issue?
> > > > > > > > If I can read that section and explain to someone else how
> this
> > > > will
> > > > > > > work,
> > > > > > > > it means the section is complete.
> > > > > > > >
> > > > > > > > Let's leave the code to the PRs.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > liusinan1...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I made a PIP to discuss:
> > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > .
> > > > > > > > >
> > > > > > > > > We can talk about the current design here. Especially for
> the
> > > > field
> > > > > > > type
> > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Sinan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >

Reply via email to