> > Can you please explain how a Protobuf Schema descriptor can be validated > for backward compatibility check using Avro based compatibility rules? > Doesn't it expect the schema to be Avro, but it is actually a Protobuf > descriptor? > Is there some translation happening?
1. *You can take a quick look at the previous design, the PROTOBUF uses avro struct to store.* https://github.com/apache/pulsar/pull/1954 https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61 https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115 2. *On the broker side, protobuf and avro both use `SchemaData` converted to `org.apache.avro.Schema`.* https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293 https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31 https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70 I'm sorry - I don't understand. > I understand the different compatibility check strategies. > If you just spell them out here, then as you say, just translate the > Protobuf Descriptor into an Avro schema and run the Avro > compatibility validation, no? > I believe the answer is no, since you may want to verify different things > when it comes to Protobuf, which are different then Avro. 1. *ProtobufSchema is different from ProtobufNativeSchema in that it uses avro-protobuf.* https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview *ProtobufNativeSchema needs a native compatibility check, but there is no official or third party implementation. So this PIP does not use avro-protobuf for protobuf compatibility checking.* 2. *By the way, this is implemented in much the same way that Apache avro does compatibility checking.* https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy` https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java *In `ValidateMutualRead.java`, the arguments of `canRead()` are writtenSchema and readSchema. We only need to change the order of arguments we pass to `canRead()`.* ```java private void validateWithStrategy(Descriptors.Descriptor toValidate, Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException { switch (strategy) { case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate); case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor); case CanBeReadMutualStrategy -> { canRead(toValidate, fromDescriptor); canRead(fromDescriptor, toValidate); } } } private void canRead(Descriptors.Descriptor writtenSchema, Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException { ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema, readSchema); } ``` Thanks, sinan Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 21:19写道: > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <liusinan1...@gmail.com> wrote: > > > > > > > I read it and they look identical. What's the difference between them? > > > > Current avro,json, and protobuf schemas are all implemented based on AVRO. > > > What do you mean, they are all implemented based on Avro? You mean the > > > protobuf schema is converted into an Avro Schema, and then you use Avro > > > compatibility validation? > > > > > > `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck` > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck` > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck` > > They all extends `AvroSchemaBasedCompatibilityCheck`, the > > `checkCompatible()` is the same implementation with `AVRO`. > > > > Can you please explain how a Protobuf Schema descriptor can be validated > for backward compatibility check using Avro based compatibility rules? > Doesn't it expect the schema to be Avro, but it is actually a Protobuf > descriptor? > Is there some translation happening? > > > > > > > > > I think you should structure the validation rules differently: > > > > > > The Compatibility check strategy is described on the website > > > > https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can process > > data written by producers using the last schema version V2. So V2 is > > "writtenSchema" and V3 is "readSchema". > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last schema > > version V2 can process data written by producers using a new schema V3, > > even though they may not be able to use the full capabilities of the new > > schema. So V3 is "writtenSchema" and V2 is "readSchema". > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward > > compatible. > > Schema can evolve. The old version schema and the new version schema should > > be well understood. > > > > > I'm sorry - I don't understand. > I understand the different compatibility check strategies. > If you just spell them out here, then as you say, just translate the > Protobuf Descriptor into an Avro schema and run the Avro > compatibility validation, no? > I believe the answer is no, since you may want to verify different things > when it comes to Protobuf, which are different then Avro. > > At the current state, I can't understand your design at all. Please help > clarify that. > > > > > > > > > So each strategy should have its own section. > > > > > > The arguments of `canRead()` are writtenSchema and readSchema. As we've > > just described, we only need to change the order of arguments we pass to > > `canRead()`. > > > > > > > > Thanks, > > sinan > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月27日周一 20:49写道: > > > > > > > > > > And you can see the difference between ProtoBuf and ProtoBufNative: > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf > > > > > > > I read it and they look identical. What's the difference between them? > > > > > > Current avro,json, and protobuf schemas are all implemented based on > > AVRO. > > > > > > What do you mean, they are all implemented based on Avro? You mean the > > > protobuf schema is converted into an Avro Schema, and then you use Avro > > > compatibility validation? > > > > > > > > > > *Here are the basic compatibility rules we've defined:* > > > > > > > > > I think you should structure the validation rules differently: > > > > > > * Backward checks > > > ** List down rules, where use newSchema (the schema used by producer or > > > consumer) and existingSchema (last schema used) > > > * Forward > > > ** List down rules, where use newSchema (the schema used by producer or > > > consumer) and existingSchema (last schema used) > > > > > > So each strategy should have its own section. > > > > > > I'm saying this since you used "writttenSchema" word but it represents > > > something completely different if it's backward or forward check. > > > > > > Once you'll have that structure like that, I personally will be able to > > > read and understand it. > > > > > > > > > The motivation and problem statement are now good - thanks for improving > > > it. > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <liusinan1...@gmail.com> > > wrote: > > > > > > > Hi! I updated the PIP issue again. This time I've added some background > > > and > > > > some explanations. > > > > > > > > The compatibility check rules are already written in the > > Implementation. > > > > ProtoBufNative implements the same canRead method as Apache Avro. > > > > It does this by checking whether the schema for writing and reading is > > > > compatible. I also indicate whether the writtenSchema and readSchema of > > > the > > > > Backward, Forward, and Full strategies are the old or the new version > > of > > > > the schema. > > > > > > > > Thanks, > > > > sinan > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月26日周日 23:24写道: > > > > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so you > > > need > > > > to > > > > > add IMO for people to understand it. You don't need to explain the > > > entire > > > > > pulsar in this PIP, but at the very least a few paragraphs detailing > > > all > > > > > you need to know, to put you in context: > > > > > > > > > > > > > > > - Start by saying Pulsar as a built-in schema registry inside > > Pulsar > > > > > broker. > > > > > - Every time the client updates the schema, it uploads it to > > the > > > > > broker. When that happens, it has a feature which validates if > > > the > > > > > new > > > > > schema version is compatible with the previous versions. There > > > > > are 4 types > > > > > of compatibility: Full, ... (complete and explain each one > > > briefly) > > > > > - Also explain Pulsar Schema registry supports various schema > > > > > protocols: Avro, protobuf native, ... (complete the rest), each > > > > > protocol > > > > > has a schema which dictates how to serialize and deserialize the > > > > message > > > > > content into typed object. > > > > > - Explain in short what is protobuf native (compare protobuf > > > > non-native) > > > > > - Please don't paste code instead of explaining. > > > > > - Explain that protobuf native current validation check is only > > > > > composed of checking the root message name is the same between > > > > > the current > > > > > schema version and the new version. > > > > > - Explain briefly what is a root message and its name. > > > > > - Explain the problem (list scenarios) that we have because > > > > protobuf > > > > > native schema only supports FULL compatibility validation. > > > > > > > > > > > > > > > Regarding high level design - as in what you plan to do. > > > > > I suggest you add "High Level Design" and in it detail how you plan > > to > > > > > validate, per protobuf version, per compatibility check (backward, > > > > forward, > > > > > full,...). > > > > > I tried reading the implementation - for me , it's all over the > > place. > > > > Can > > > > > you please list in order what I wrote above, and list the validation > > > > rules > > > > > with a good explanation why you validate it like that? > > > > > > > > > > Lastly, one you have all the validation rules clearly stated, you can > > > use > > > > > it to document it properly so users can know what validation to > > expect. > > > > > > > > > > Thanks, > > > > > > > > > > Asaf > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <liusinan1...@gmail.com> > > > > wrote: > > > > > > > > > > > Sorry, my mistake. I removed the code and described the design to > > > > improve > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can have a > > look. > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月22日周三 21:16写道: > > > > > > > > > > > > > I read it but you're almost directly diving into the code - it > > will > > > > > take > > > > > > me > > > > > > > hours just to reverse engineer your design. > > > > > > > > > > > > > > Can you please include a "High Level Design" section in which you > > > > > explain > > > > > > > how you plan to tackle any issue? > > > > > > > If I can read that section and explain to someone else how this > > > will > > > > > > work, > > > > > > > it means the section is complete. > > > > > > > > > > > > > > Let's leave the code to the PRs. > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu < > > liusinan1...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I made a PIP to discuss: > > > > > https://github.com/apache/pulsar/issues/19565 > > > > > > . > > > > > > > > > > > > > > > > We can talk about the current design here. Especially for the > > > field > > > > > > type > > > > > > > > change check rules, please give your valuable advice. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >