Can you convert the code block which is actually a quote in the
beginning of the PIP to something which doesn't require to scroll
horizontally so much?
Use
https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text

Let's improve the clarity of what you wrote:

"the PROTOBUF uses avro struct to store."
-->
When Schema type PROTOBUF is used, Pulsar Client assumes the object given
to it as message data is an auto-generated POJO containing the annotations
encoding the schema. The client is using a converter, which converts a
Protobuf schema descriptor into an Avro schema and sends that as the Schema
of the producer/consumer.

"On the broker side, protobuf and avro both use SchemaData converted to
org.apache.avro.Schema."
-->
Since the schema is an Avro schema, the implementation of compatibility
check on the broker side is to simply re-use the compatibility check of the
AVRO schema type.

"ProtobufSchema is different from ProtobufNativeSchema in schema
compatibility check it uses avro-protobuf.
https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
But the current implementation of ProtobufNative schema compatibility
check only
checked if the root message name is changed."

-->
PROTOBUF_NATIVE schema type is different.
The client is actually using Protobuf Descriptor as the schema, as opposed
to Avro schema of PROTOBUF schema type. In the broker, the PROTOBUF_NATIVE
compatibility check actually hasn't implemented any rule, besides one:
checking if the root message name has changed.



>    1. For now, there is no official or third-party solution for ProtoBuf
>    compatibility. If in the future have better solutions of a third party or
>    the official, we develop new ProtobufNativeSchemaValidator and use, so
>    add a flag.
>
> Who do you need to make that configurable? Once you found a third party,
just switch to it? Who knows, maybe you never will. Introduce it when you
find it, not now.


We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> these strategies. As with the AVRO implementation, protobuf compatibility
> checking need implementing the canRead method. *This will check that
> the writtenschema can be read by readSchema.*


I completely disagree.
Avro implementation is confusing for our use case. Don't copy that.

You have

public void checkCompatible(SchemaData from, SchemaData to,
SchemaCompatibilityStrategy strategy)
        throws IncompatibleSchemaException {
    Descriptor fromDescriptor =
ProtobufNativeSchemaUtils.deserialize(from.getData());
    Descriptor toDescriptor =
ProtobufNativeSchemaUtils.deserialize(to.getData());
    switch (strategy) {
        case BACKWARD_TRANSITIVE:
        case BACKWARD:
        case FORWARD_TRANSITIVE:
        case FORWARD:
        case FULL_TRANSITIVE:
        case FULL:
            checkRootMessageChange(fromDescriptor, toDescriptor, strategy);
            return;
        case ALWAYS_COMPATIBLE:
            return;
        default:
            throw new IncompatibleSchemaException("Unknown
SchemaCompatibilityStrategy.");
    }
}

I would rename :
from --> currentSchema
to --> newSchema

Use that switch case and have a method for each like:
validateBackwardsCompatibility(currentSchema, newSchema)

I dislike canRead and usage of writtenSchema, since you have two completely
different use cases: from the producing side and the consumer side.

schemaValidatorBuilder
>
> I dislike this proposal. IMO Avro implementation is way too complicated.
Why not have a simple function for validation for each switch case above?
Why do we need strategy and builder, and all this complexity?


*Here are the basic compatibility rules we've defined:*


IMO it's impossible to read the validation rules as you described them.
I wrote how they should be structured numerous times above.
I can't validate them.


IMO, the current design is very hard to read.
Please try to avoid jumping into code sections.
Write a high level design section, in which you describe in words what you
plan to do.
Write the validation rules in the structure that is easy to understand:
rules per each compatibility check, and use proper words (current schema,
new schema), since new schema can be once used for read and once used for
write.

In its current form it takes too much time to understand the design, and it
shouldn't be the case.

Thanks,

Asaf


>



On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <liusinan1...@gmail.com> wrote:

> Hi! I updated the explanation of some things in the PIP issue. And also
> added a new “flag” in the conf is used as the different
> ProtobufNativeSchemaValidator implementation, also set
> ProtobufNativeSchemaValidator default only check whether the name of the
> root message is the same.
>
>
> Thanks,
> sinan
>
>
> Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月5日周日 20:21写道:
>
> > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <liusinan1...@gmail.com> wrote:
> >
> > > >
> > > > Can you please explain how a Protobuf Schema descriptor can be
> > validated
> > > > for backward compatibility check using Avro based compatibility
> rules?
> > > > Doesn't it expect the schema to be Avro, but it is actually a
> Protobuf
> > > > descriptor?
> > > > Is there some translation happening?
> > >
> > >
> > > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > > avro struct to store.*
> > > https://github.com/apache/pulsar/pull/1954
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> >
> >
> > Ok. So to summarize your code (easier to write it than send links):
> > * Pulsar Client, when used with Protobuf Schema, actually converts the
> > Protobuf descriptor into an Avro Schema (using code found inside Avro
> > library) and saves that Avro schema as the schema. It's not saving the
> > protobuf descriptor at all. Very confusing I have to add - never expected
> > that.
> > This explains why In the ProtobufSchemaCompatibilityCheck they just
> extend
> > the Avro without doing any translation.
> >
> > Thanks for that.
> >
> > Now thatI finally understand this, I can say that: you *must* explain
> that
> > in the motivation part in your PIP.
> >
> >
> >
> > >
> > >
> > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> converted
> > > to `org.apache.avro.Schema`.*
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> >
> >
> > Actually those links don't really help.
> > The main link that helps is:
> >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> >
> >
> > >
> > >
> > >
> > >
> > > I'm sorry - I don't understand.
> > > > I understand the different compatibility check strategies.
> > > > If you just spell them out here, then as you say, just translate the
> > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > compatibility validation, no?
> > > > I believe the answer is no, since you may want to verify different
> > things
> > > > when it comes to Protobuf, which are different then Avro.
> > >
> > >
> > > 1.
> > > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > > avro-protobuf.*
> > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > *ProtobufNativeSchema needs a native compatibility check, but there is
> no
> > > official or third party implementation. So this PIP does not use
> > > avro-protobuf for protobuf compatibility checking.*
> > >
> > > 2. *By the way, this is implemented in much the same way that Apache
> avro
> > > does compatibility checking.*
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > writtenSchema and readSchema. We only need to change the order of
> > arguments
> > > we pass to `canRead()`.*
> > > ```java
> > > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > > Descriptors.Descriptor fromDescriptor) throws
> > ProtoBufCanReadCheckException
> > > {
> > > switch (strategy) {
> > > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> fromDescriptor);
> > > case CanBeReadMutualStrategy -> {
> > > canRead(toValidate, fromDescriptor);
> > > canRead(fromDescriptor, toValidate);
> > > }
> > > }
> > > }
> > >
> > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > Descriptors.Descriptor readSchema) throws
> ProtoBufCanReadCheckException {
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > readSchema);
> > > }
> > > ```
> > >
> > >
> > I get that you want to take inspiration from the existing Avro Schema
> > compatibility check, to do your code design.
> > I also understand you *won't* use any existing avro code for that.
> > I also understand, you have to write the validation check on your own,
> > since there is no 3rd party to explain that.
> >
> > The only thing I can't understand are the actual rules as I wrote before,
> > since they are written confusingly.
> > So, I repeat what I asked before:
> >
> > I think you should structure the validation rules differently:
> >
> > * Backward checks
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> > * Forward
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> >
> > Once that's accomplished I will be able to understand the different
> > validation rules for each compatibility check.
> >
> >
> >
> >
> >
> >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > >
> > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <liusinan1...@gmail.com>
> > > wrote:
> > > >
> > > > > >
> > > > > > I read it and they look identical. What's the difference between
> > > them?
> > > > >
> > > > > Current avro,json, and protobuf schemas are all implemented based
> on
> > > AVRO.
> > > > > > What do you mean, they are all implemented based on Avro? You
> mean
> > > the
> > > > > > protobuf schema is converted into an Avro Schema, and then you
> use
> > > Avro
> > > > > > compatibility validation?
> > > > >
> > > > >
> > > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > >
> > > >
> > > > Can you please explain how a Protobuf Schema descriptor can be
> > validated
> > > > for backward compatibility check using Avro based compatibility
> rules?
> > > > Doesn't it expect the schema to be Avro, but it is actually a
> Protobuf
> > > > descriptor?
> > > > Is there some translation happening?
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > I think you should structure the validation rules differently:
> > > > >
> > > > >
> > > > > The Compatibility check strategy is described on the website
> > > > >
> > > > >
> > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> > > process
> > > > > data written by producers using the last schema version V2. So V2
> is
> > > > > "writtenSchema" and V3 is "readSchema".
> > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last
> > > schema
> > > > > version V2 can process data written by producers using a new schema
> > V3,
> > > > > even though they may not be able to use the full capabilities of
> the
> > > new
> > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> > forward
> > > > > compatible.
> > > > > Schema can evolve. The old version schema and the new version
> schema
> > > should
> > > > > be well understood.
> > > > >
> > > > >
> > > > I'm sorry - I don't understand.
> > > > I understand the different compatibility check strategies.
> > > > If you just spell them out here, then as you say, just translate the
> > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > compatibility validation, no?
> > > > I believe the answer is no, since you may want to verify different
> > things
> > > > when it comes to Protobuf, which are different then Avro.
> > > >
> > > > At the current state, I can't understand your design at all. Please
> > help
> > > > clarify that.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > So each strategy should have its own section.
> > > > >
> > > > >
> > > > > The arguments of `canRead()` are writtenSchema and readSchema. As
> > we've
> > > > > just described, we only need to change the order of arguments we
> pass
> > > to
> > > > > `canRead()`.
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > >
> > > > > > >
> > > > > > > And you can see the difference between ProtoBuf and
> > ProtoBufNative:
> > > > > > >
> > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > >
> > > > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > >
> > > > > >  I read it and they look identical. What's the difference between
> > > them?
> > > > > >
> > > > > > Current avro,json, and protobuf schemas are all implemented based
> > on
> > > > > AVRO.
> > > > > >
> > > > > > What do you mean, they are all implemented based on Avro? You
> mean
> > > the
> > > > > > protobuf schema is converted into an Avro Schema, and then you
> use
> > > Avro
> > > > > > compatibility validation?
> > > > > >
> > > > > >
> > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > >
> > > > > >
> > > > > > I think you should structure the validation rules differently:
> > > > > >
> > > > > > * Backward checks
> > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > > * Forward
> > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > >
> > > > > > So each strategy should have its own section.
> > > > > >
> > > > > > I'm saying this since you used "writttenSchema" word but it
> > > represents
> > > > > > something completely different if it's backward or forward check.
> > > > > >
> > > > > > Once you'll have that structure like that, I personally will be
> > able
> > > to
> > > > > > read and understand it.
> > > > > >
> > > > > >
> > > > > > The motivation and problem statement are now good - thanks for
> > > improving
> > > > > > it.
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> liusinan1...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi! I updated the PIP issue again. This time I've added some
> > > background
> > > > > > and
> > > > > > > some explanations.
> > > > > > >
> > > > > > > The compatibility check rules are already written in the
> > > > > Implementation.
> > > > > > > ProtoBufNative implements the same canRead method as Apache
> Avro.
> > > > > > > It does this by checking whether the schema for writing and
> > reading
> > > is
> > > > > > > compatible. I also indicate whether the writtenSchema and
> > > readSchema of
> > > > > > the
> > > > > > > Backward, Forward, and Full strategies are the old or the new
> > > version
> > > > > of
> > > > > > > the schema.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > > >
> > > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge,
> so
> > > you
> > > > > > need
> > > > > > > to
> > > > > > > > add IMO for people to understand it. You don't need to
> explain
> > > the
> > > > > > entire
> > > > > > > > pulsar in this PIP, but at the very least a few paragraphs
> > > detailing
> > > > > > all
> > > > > > > > you need to know, to put you in context:
> > > > > > > >
> > > > > > > >
> > > > > > > >    - Start by saying Pulsar as a built-in schema registry
> > inside
> > > > > Pulsar
> > > > > > > >    broker.
> > > > > > > >       - Every time the client updates the schema, it uploads
> it
> > > to
> > > > > the
> > > > > > > >       broker. When that happens, it has a feature which
> > validates
> > > if
> > > > > > the
> > > > > > > > new
> > > > > > > >       schema version is compatible with the previous
> versions.
> > > There
> > > > > > > > are 4 types
> > > > > > > >       of compatibility: Full, ... (complete and explain each
> > one
> > > > > > briefly)
> > > > > > > >    - Also explain Pulsar Schema registry supports various
> > schema
> > > > > > > >    protocols:  Avro, protobuf native, ... (complete the
> rest),
> > > each
> > > > > > > > protocol
> > > > > > > >    has a schema which dictates how to serialize and
> deserialize
> > > the
> > > > > > > message
> > > > > > > >    content into typed object.
> > > > > > > >    - Explain in short what is protobuf native (compare
> protobuf
> > > > > > > non-native)
> > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > >       - Explain that protobuf native current validation check
> > is
> > > only
> > > > > > > >       composed of checking the root message name is the same
> > > between
> > > > > > > > the current
> > > > > > > >       schema version and the new version.
> > > > > > > >          - Explain briefly what is a root message and its
> name.
> > > > > > > >       - Explain the problem (list scenarios) that we have
> > because
> > > > > > > protobuf
> > > > > > > >       native schema only supports FULL compatibility
> > validation.
> > > > > > > >
> > > > > > > >
> > > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > > I suggest you add "High Level Design" and in it detail how
> you
> > > plan
> > > > > to
> > > > > > > > validate, per protobuf version, per compatibility check
> > > (backward,
> > > > > > > forward,
> > > > > > > > full,...).
> > > > > > > > I tried reading the implementation - for me , it's all over
> the
> > > > > place.
> > > > > > > Can
> > > > > > > > you please list in order what I wrote above, and list the
> > > validation
> > > > > > > rules
> > > > > > > > with a good explanation why you validate it like that?
> > > > > > > >
> > > > > > > > Lastly, one you have all the validation rules clearly stated,
> > you
> > > can
> > > > > > use
> > > > > > > > it to document it properly so users can know what validation
> to
> > > > > expect.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Asaf
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > liusinan1...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Sorry, my mistake. I removed the code and described the
> > design
> > > to
> > > > > > > improve
> > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can
> > have a
> > > > > look.
> > > > > > >
> > > > > > > > >
> > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > > > >
> > > > > > > > > > I read it but you're almost directly diving into the
> code -
> > > it
> > > > > will
> > > > > > > > take
> > > > > > > > > me
> > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > >
> > > > > > > > > > Can you please include a "High Level Design" section in
> > which
> > > you
> > > > > > > > explain
> > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > If I can read that section and explain to someone else
> how
> > > this
> > > > > > will
> > > > > > > > > work,
> > > > > > > > > > it means the section is complete.
> > > > > > > > > >
> > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > liusinan1...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > .
> > > > > > > > > > >
> > > > > > > > > > > We can talk about the current design here. Especially
> for
> > > the
> > > > > > field
> > > > > > > > > type
> > > > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Sinan
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Reply via email to