>
> Can you please explain how a Protobuf Schema descriptor can be validated
> for backward compatibility check using Avro based compatibility rules?
> Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> descriptor?
> Is there some translation happening?


1. *You can take a quick look at the previous design, the PROTOBUF uses
avro struct to store.*
https://github.com/apache/pulsar/pull/1954
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115

2. *On the broker side, protobuf and avro both use `SchemaData` converted
to `org.apache.avro.Schema`.*
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70



I'm sorry - I don't understand.
> I understand the different compatibility check strategies.
> If you just spell them out here, then as you say, just translate the
> Protobuf Descriptor into an Avro schema and run the Avro
> compatibility validation, no?
> I believe the answer is no, since you may want to verify different things
> when it comes to Protobuf, which are different then Avro.


1.
*ProtobufSchema is different from ProtobufNativeSchema in that it uses
avro-protobuf.*
https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
*ProtobufNativeSchema needs a native compatibility check, but there is no
official or third party implementation. So this PIP does not use
avro-protobuf for protobuf compatibility checking.*

2. *By the way, this is implemented in much the same way that Apache avro
does compatibility checking.*
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
`canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
*In `ValidateMutualRead.java`, the arguments of `canRead()` are
writtenSchema and readSchema. We only need to change the order of arguments
we pass to `canRead()`.*
```java
private void validateWithStrategy(Descriptors.Descriptor toValidate,
Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
{
switch (strategy) {
case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
case CanBeReadMutualStrategy -> {
canRead(toValidate, fromDescriptor);
canRead(fromDescriptor, toValidate);
}
}
}

private void canRead(Descriptors.Descriptor writtenSchema,
Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
readSchema);
}
```


Thanks,
sinan



Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 21:19写道:
>
> On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <liusinan1...@gmail.com> wrote:
>
> > >
> > > I read it and they look identical. What's the difference between them?
> >
> > Current avro,json, and protobuf schemas are all implemented based on
AVRO.
> > > What do you mean, they are all implemented based on Avro? You mean the
> > > protobuf schema is converted into an Avro Schema, and then you use
Avro
> > > compatibility validation?
> >
> >
> >
`org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > `checkCompatible()` is the same implementation with `AVRO`.
> >
>
> Can you please explain how a Protobuf Schema descriptor can be validated
> for backward compatibility check using Avro based compatibility rules?
> Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> descriptor?
> Is there some translation happening?
>
>
>
> >
> >
> > I think you should structure the validation rules differently:
> >
> >
> > The Compatibility check strategy is described on the website
> >
> >
https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
process
> > data written by producers using the last schema version V2. So V2 is
> > "writtenSchema" and V3 is "readSchema".
> > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last schema
> > version V2 can process data written by producers using a new schema V3,
> > even though they may not be able to use the full capabilities of the new
> > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward
> > compatible.
> > Schema can evolve. The old version schema and the new version schema
should
> > be well understood.
> >
> >
> I'm sorry - I don't understand.
> I understand the different compatibility check strategies.
> If you just spell them out here, then as you say, just translate the
> Protobuf Descriptor into an Avro schema and run the Avro
> compatibility validation, no?
> I believe the answer is no, since you may want to verify different things
> when it comes to Protobuf, which are different then Avro.
>
> At the current state, I can't understand your design at all. Please help
> clarify that.
>
>
>
>
>
> >
> > So each strategy should have its own section.
> >
> >
> > The arguments of `canRead()` are writtenSchema and readSchema. As we've
> > just described, we only need to change the order of arguments we pass to
> > `canRead()`.
> >
> >
> >
> > Thanks,
> > sinan
> >
> >
> > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月27日周一 20:49写道:
> >
> > > >
> > > > And you can see the difference between ProtoBuf and ProtoBufNative:
> > > >
> > > >
https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > >
> > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > >
> > >  I read it and they look identical. What's the difference between
them?
> > >
> > > Current avro,json, and protobuf schemas are all implemented based on
> > AVRO.
> > >
> > > What do you mean, they are all implemented based on Avro? You mean the
> > > protobuf schema is converted into an Avro Schema, and then you use
Avro
> > > compatibility validation?
> > >
> > >
> > > > *Here are the basic compatibility rules we've defined:*
> > >
> > >
> > > I think you should structure the validation rules differently:
> > >
> > > * Backward checks
> > > ** List down rules, where use newSchema (the schema used by producer
or
> > > consumer) and existingSchema (last schema used)
> > > * Forward
> > > ** List down rules, where use newSchema (the schema used by producer
or
> > > consumer) and existingSchema (last schema used)
> > >
> > > So each strategy should have its own section.
> > >
> > > I'm saying this since you used "writttenSchema" word but it represents
> > > something completely different if it's backward or forward check.
> > >
> > > Once you'll have that structure like that, I personally will be able
to
> > > read and understand it.
> > >
> > >
> > > The motivation and problem statement are now good - thanks for
improving
> > > it.
> > >
> > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <liusinan1...@gmail.com>
> > wrote:
> > >
> > > > Hi! I updated the PIP issue again. This time I've added some
background
> > > and
> > > > some explanations.
> > > >
> > > > The compatibility check rules are already written in the
> > Implementation.
> > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > It does this by checking whether the schema for writing and reading
is
> > > > compatible. I also indicate whether the writtenSchema and
readSchema of
> > > the
> > > > Backward, Forward, and Full strategies are the old or the new
version
> > of
> > > > the schema.
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > >
> > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
you
> > > need
> > > > to
> > > > > add IMO for people to understand it. You don't need to explain the
> > > entire
> > > > > pulsar in this PIP, but at the very least a few paragraphs
detailing
> > > all
> > > > > you need to know, to put you in context:
> > > > >
> > > > >
> > > > >    - Start by saying Pulsar as a built-in schema registry inside
> > Pulsar
> > > > >    broker.
> > > > >       - Every time the client updates the schema, it uploads it to
> > the
> > > > >       broker. When that happens, it has a feature which validates
if
> > > the
> > > > > new
> > > > >       schema version is compatible with the previous versions.
There
> > > > > are 4 types
> > > > >       of compatibility: Full, ... (complete and explain each one
> > > briefly)
> > > > >    - Also explain Pulsar Schema registry supports various schema
> > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
each
> > > > > protocol
> > > > >    has a schema which dictates how to serialize and deserialize
the
> > > > message
> > > > >    content into typed object.
> > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > non-native)
> > > > >    - Please don't paste code instead of explaining.
> > > > >       - Explain that protobuf native current validation check is
only
> > > > >       composed of checking the root message name is the same
between
> > > > > the current
> > > > >       schema version and the new version.
> > > > >          - Explain briefly what is a root message and its name.
> > > > >       - Explain the problem (list scenarios) that we have because
> > > > protobuf
> > > > >       native schema only supports FULL compatibility validation.
> > > > >
> > > > >
> > > > > Regarding high level design - as in what you plan to do.
> > > > > I suggest you add "High Level Design" and in it detail how you
plan
> > to
> > > > > validate, per protobuf version, per compatibility check (backward,
> > > > forward,
> > > > > full,...).
> > > > > I tried reading the implementation - for me , it's all over the
> > place.
> > > > Can
> > > > > you please list in order what I wrote above, and list the
validation
> > > > rules
> > > > > with a good explanation why you validate it like that?
> > > > >
> > > > > Lastly, one you have all the validation rules clearly stated, you
can
> > > use
> > > > > it to document it properly so users can know what validation to
> > expect.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Asaf
> > > > >
> > > > >
> > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <liusinan1...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Sorry, my mistake. I removed the code and described the design
to
> > > > improve
> > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can have a
> > look.
> > > >
> > > > > >
> > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > >
> > > > > > > I read it but you're almost directly diving into the code - it
> > will
> > > > > take
> > > > > > me
> > > > > > > hours just to reverse engineer your design.
> > > > > > >
> > > > > > > Can you please include a "High Level Design" section in which
you
> > > > > explain
> > > > > > > how you plan to tackle any issue?
> > > > > > > If I can read that section and explain to someone else how
this
> > > will
> > > > > > work,
> > > > > > > it means the section is complete.
> > > > > > >
> > > > > > > Let's leave the code to the PRs.
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > liusinan1...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I made a PIP to discuss:
> > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > .
> > > > > > > >
> > > > > > > > We can talk about the current design here. Especially for
the
> > > field
> > > > > > type
> > > > > > > > change check rules, please give your valuable advice.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Sinan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Reply via email to