Also:

(2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for protobuf
> schema compatibility checking.

The root message name is the class name we pass in when we create the
> producer or consumer. ProtoBuf has many nested messages or dependencies.
> The current implementation only checks if the passed class name is the
> same. It does not check if the fields in the file change in a way that is
> compatible with older versions of the schema.


You missed the most important thing there: PROTOBUF_NATIVE uses Protobuf
Descriptor when persisting the schema. It's not using Avro Schema
definition. As I wrote you in previous email:

PROTOBUF_NATIVE was created to fix that shortcoming, by actually persisting
> the Protobuf Descriptor and using Protobuf for encoding.
>





On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <liusinan1...@gmail.com> wrote:

> 1.
>
> > > Why? The root message name is not written over the wire to the best of
> my
> > > knowledge. I haven't found it written in the official doc.
>
>
> The name of the root message check is the rules in the previous
> `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message has a
> different name, there is no need to check its contents. "Same" schema,
> their names must be the same.
>
>
> 2.
>
> > >The writtenSchema can not change the field number of any field in
> > readSchema (the > field name is the same, but the field number is
> > different).
> > >You have to take into account field type as well when comparing.
>
>
> The first sentence on the website says that the number of fields cannot be
> changed.
>
> > Don’t change the field numbers for any existing fields.
>
>
>
> 3.
>
> > > - The writtenSchema cannot add required fields, but optional or
> > > duplicate fields can be added (The field number must be new).
> > >
> > > That's not true.
> > You can have a required field in writeSchema, and not have that field in
> > readSchema (based on tag number).
>
>
> The required field must exist. In your case, where does readSchema go to
> read its required fields? It's not in writtenSchema.
> The second and third sentences in the official website say, add and delete
> do not operate required field!
>
>
> 4.
>
> > (4) The writtenSchema can not change the field name of any field in
> > > readSchema (the field number is the same, but the field name is
> > > different).
> > This is incorrect.
> > Fields names are encoded into the wire. I don't see this in any best
> > practice.
>
>
> The third sentence on the website:
>
> > You may want to rename the field instead, perhaps adding the prefix
> > “OBSOLETE_”, or make the field number reserved, so that future users of
> > your .proto can’t accidentally reuse the number.
>
> If you want to rename a field, or add a new field. To delete with the new
> number!
>
>
> 5.
>
> > The writtenSchema does not change the field name and number, but it does
> > change the field type.
> > > Small correction: for the same field number you are not allowed to
> change
> > types. Name is irrelevant.
>
>
> Why doesn't the name irrelevant?
> Here is the change in type, which is the rule stated in Alternatives. There
> is no check here, just a warning to the user.
> Another change is that the name of enum is changed, or the name of MESSAGE
> is changed, which is the same as the root message name check in 1, is the
> change still the same? This is not allowed to change!
>
> *Here is example:*
> readSchema(
>
> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
> )
> writtenSchema(
>
> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
> )
>
>
> 6.
>
> > (6) The writtenSchema removes fields that do not have default values in
> > > readSchema. Then the schema is incompatible.
> > Protobuf gives you its own default if you don't supply one. This is
> > incorrect.
>
>
> (1) This rule only applies if proto2 does not set the default value. If
> proto3 does not check, the default value will always be there.
>
> (2) In PIP issue:
>
> > Proto3 canceled the required field, so there is no need to check the
> > required field. We get the syntax(proto2 or proto3) of proto in the code,
> > and skip the check of the required field if it is proto3. All other
> > checking rules also apply to proto3.
>
>
> *I made a mistake here. This default value check is not need in proto3. I
> will modify the rules later according to your suggestion.*
>
> > I would remove the proto2/proto3 sections, since they only differ in 1
> > rule, and just mention that distinction inside that rule (less work for
> the
> > reade).
>
>
> (3) And add rules that look like they should be checked:
>
> > Rules that you don't have in the doc, but should IMO*
> > .......
>
> There can't be a field in writerSchema, that exists in readerSchema (tag
> > number based), which in writerSchema its type is scalar, but in
> readSchema
> > its type is scalar, it's repeated but with packed=true.
>
>
> But I don't think rule number three needs to be added.
>
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月14日周二 22:33写道:
>
> > Hi Sinan,
> >
> > The doc looks much better!
> >
> > I have a few additional comments:
> >
> > Pasting comment from previous emails:
> >
> > Can you convert the code block which is actually a quote in the
> > beginning of the PIP to something which doesn't require to scroll
> > horizontally so much?
> > Use
> >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> >
> > *Validation Rules*
> >
> > (1) If the root message names of writtenSchema and readSchema are
> > > different, then incompatible.
> >
> > Why? The root message name is not written over the wire to the best of my
> > knowledge. I haven't found it written in the official doc.
> >
> >
> > >    - The writtenSchema cannot add required fields, but optional or
> > >    duplicate fields can be added (The field number must be new).
> > >
> > > That's not true.
> > You can have a required field in writeSchema, and not have that field in
> > readSchema (based on tag number).
> >
> > The writtenSchema can not change the field number of any field in
> > readSchema (the
> > > field name is the same, but the field number is different).
> >
> > You have to take into account field type as well when comparing.
> >
> > If I have in readSchema
> > int32 justAnID = 1
> > int32 customerId = 2
> >
> > and in writeSchema I have
> > int32 justAnID = 1
> > string customerId = 3
> >
> > This is valid.
> >
> > (4) The writtenSchema can not change the field name of any field in
> > > readSchema (the field number is the same, but the field name is
> > > different).
> >
> > This is incorrect.
> > Fields names are encoded into the wire. I don't see this in any best
> > practice.
> >
> > ) The writtenSchema does not change the field name and number, but it
> does
> > > change the field type.
> > >
> > >    - If the field type is ENUM or MESSAGE, the schema is not compatible
> > >    when the type name is changed
> > >    - If the type of the field is another type. The schemas under this
> > >    rule are not incompatible, but warn the user.(There is another way
> of
> > >    testing in PIP issue Alternatives)
> > >
> > > Small correction: for the same field number you are not allowed to
> change
> > types. Name is irrelevant.
> >
> > (6) The writtenSchema removes fields that do not have default values in
> > > readSchema. Then the schema is incompatible.
> >
> > Protobuf gives you its own default if you don't supply one. This is
> > incorrect.
> >
> >
> > *Rules that you don't have in the doc, but should IMO*
> > * There can not be a field which exists both in readSchema and
> writeSchema,
> > with same tag number, having different default values
> > * There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > number based), which in writerSchema is repeated and its type is Scalar (
> > https://protobuf.dev/programming-guides/proto/#scalar) but in readSchema
> > it
> > is not repeated anymore.
> > * There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > number based), which in writerSchema its type is scalar, but in
> readSchema
> > its type is scalar, it's repeated but with packed=true.
> >
> > *Rules you have , but I would phrase a bit differently*
> >
> > I would remove the proto2/proto3 sections, since they only differ in 1
> > rule, and just mention that distinction inside that rule (less work for
> the
> > reade).
> >
> > * readSchema has a field which doesn't exist in writerSchema (based on
> tag
> > number).
> >    * Proto v2:
> >       * That field must be `optional` or `repeated` (must not be
> > `required`)
> >    * Proto v3:
> >       * No problem.
> > * There can not be a field which exists both in readSchema and
> writeSchema,
> > with the same tag number, but having different types.
> >
> > *Motivation*
> >
> > Basically in the motivation section you want people to understand the
> > following:
> >
> > Pulsar has built-in support for typed messages. It allows specifying an
> > encoding scheme and its matching schema.
> > For example, it supports Avro. You specify a schema for a given topic,
> > using Avro Schema Definition (i.e. a JSON describing the schema).
> Everytime
> > you produce a message, you first declare the schema definition you wish
> to
> > use for your messages. The message data should be an avro-encoded binary
> > data (which the client in some SDKs helps encode a given
> > data-structure/object).
> > The same applies when you consume a message. You first specify the schema
> > you use to read the messages, and the client in some SDKs helps by
> decoding
> > the message binary data into an object/data-structure.
> >
> > Each time you specify a schema to be used, either by a producer or a
> > consumer, the schema is persisted in Pulsar and given an increasing
> version
> > number. If the schema was the same as the previous version, it is not
> > saved. When the message is persisted, the version number is encoded in
> the
> > message headers.
> >
> > Pulsar provides a very useful feature named Schema Evolution
> > <
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> > >.
> > It allows us to check if a new schema version is compatible with previous
> > versions or versions. When you configure the schema for the topic you
> > decide the strategy to use for doing the validation check. The strategies
> > validate the following:
> >
> >    - BACKWARD strategy
> >       - A consumer with newSchema can read a message written using
> >       existingSchema
> >    - BACKWARD_TRANSITIVE strategy
> >       - A consumer with newSchema can read messages written using all
> >       existingSchema
> >    - FORWARD
> >       - A consumer with existingSchema can read messages written using
> >       newSchema
> >    - FORWARD_TRANSITIVE
> >       - A consumer defined with any of the existingSchema can read
> messages
> >       written using newSchema
> >    - FULL
> >       - A consumer defined with newSchema can read messages written using
> >       existingSchema
> >       - A consumer defined with existingSchema can read messages written
> >       using newSchema
> >    - FULL_TRANSITIVE
> >       - A consumer defined with newSchema can read messages written using
> >       any of the existingSchema
> >       - A consumer defined with any of the existingSchema can read
> messages
> >       written using newSchema
> >
> >
> > Aside from Avro, Pulsar also has two additional supported encodings:
> > PROTOBUF and PROTOBUF_NATIVE.
> >
> > PROTOBUF is a bit strange. It encodes the messages using Protobuf
> encoding,
> > but the schema that is persisted to Pulsar is *not* Protobuf Descriptor
> as
> > you would have expected. The saved schema is a translation of the
> Protobuf
> > Descriptor to an Avro Schema, so in fact an Avro schema definition is
> saved
> > as the schema.
> >
> > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> persisting
> > the Protobuf Descriptor and using Protobuf for encoding.
> > The problem is that the authors of PROTOBUF_NATIVE haven't completed it
> > fully, and the backward compatibility validation code almost does not
> > exist: It only checks if the root message name is the same between
> > versions.
> >
> > GOALS
> > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> > functional validation for any of the defined Schema Compatibility
> > Strategies.
> > A secondary goal is to allow the user to choose between different
> > implementations: The new fully functional validation or the existing
> > barebones validation.
> >
> > -------- END
> >
> > I'm ok with having links in the Motivation , as *further reading*.
> > I'm against stacking up work for the reader to go read 5-6 different
> links
> > just to understand the motivation and background knowledge required to
> > understand the feature.
> >
> > I'm against putting code in the Motivation. Especially if it is supposed
> to
> > replace description in plain English making it easy to understand the
> > design.
> > Leave the code to the motivation.
> > Paste code only if you absolutely can't use plain old descriptions to
> > explain.
> >
> >
> >
> > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <liusinan1...@gmail.com>
> wrote:
> >
> > > *I guess that's right, too! *
> > >
> > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> better,
> > > because we don't know whether the future will have V2, V3. The official
> > > solution can be called
> `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> > or
> > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck` third-party
> > > solution.
> > >
> > > I've updated my design in PIP issue.
> > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a
> > > ProtobufNativeSchemaCompatibilityCheck different validator
> > implementation.
> > > 2. Remove the 'builder'
> > > 3. Clarify the relationship between newSchema, existingSchema, and
> > > writtenSchema in canRead.
> > >
> > > Help to see if the description is comprehensive and what changes and
> > > improvements need to be made.
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月9日周四 17:35写道:
> > >
> > > > I like Bo's suggestion - I haven't realized each schema type
> > > > compatibility check is actually a plugin.
> > > > It makes sense for any schema type checks to evolve, sometimes in a
> > > > non-backward compatible way hence having two plugins like
> > > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > > >
> > > >
> > > >
> > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bog...@apache.org> wrote:
> > > >
> > > > >  Hi siNan:
> > > > >
> > > > > From my point of view, it is just a plug-in. I don't think it is
> > > > > necessary to add configuration for the plugin.
> > > > > This is meaningless, and it will increase the difficulty of use for
> > > > users.
> > > > >
> > > > >
> > > > > SiNan Liu <liusinan1...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > > > >
> > > > > > Hi, bo.
> > > > > >
> > > > > > 1. I understand what you say, to develop a new
> > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> > > changing
> > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a
> > few
> > > > > small
> > > > > > problems:
> > > > > >
> > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> > PROTOBUF_NATIVE.
> > > > It
> > > > > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's
> > > > "native"
> > > > > > implementation, which leads to some problems or "unreasonable"
> and
> > > > gives
> > > > > me
> > > > > > some extended thinking and questions.
> > > > > >
> > > > > `CompatibilityCheck ` its only a plugin.
> > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave
> > the
> > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > > stable, we can make it the default Checker.
> > > > >
> > > > > It is just a plug-in, users can change it at will and ensure that
> it
> > > > > is used correctly
> > > > > > (2)In broker.conf
> > > > > >
> > > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is
> > going
> > > > to
> > > > > > overwrite each other. Because this is a map:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > > >
> > > > > > ```java
> > > > > >
> > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> > HashMap<>();
> > > > > >
> > > > > > for (String className : checkerClasses) {
> > > > > >
> > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > > Reflections.createInstance(className,
> > > > > >
> > > > > > SchemaCompatibilityCheck.class,
> > > > > > Thread.currentThread().getContextClassLoader());
> > > > > >
> > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > > schemaCompatibilityCheck);
> > > > > >
> > > > > > ```
> > > > > >
> > > > > > Is this a big problem or a small one? Is it possible or
> > unnecessary?
> > > > > Maybe
> > > > > > we can write in the documentation that protobufNative checkers
> can
> > > only
> > > > > > choose one of the two? Why are there two Checkers for different
> > > > > > implementations of the same schemaType? Why not the checker to
> > create
> > > > > > different validator, so we don not have to change
> > > > > > schemaRegistryCompatibilityCheckers.
> > > > >
> > > > > users can only use one, not two, which will bring complexity to
> users
> > > > >
> > > > > >
> > > > > > (3)And after the update to
> > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > > Existing topics previously only checked the name of the root
> > message,
> > > > not
> > > > > > the content of protobuf.
> > > > > >
> > > > > > What if the user wants both Checkers?
> > > > > >
> > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the
> > > topic
> > > > > of
> > > > > > the existing schema?
> > > > > >
> > > > > > Older topics still use the old checker, and newer topics or
> certain
> > > > older
> > > > > > topics use the new advancedchecker.
> > > > > >
> > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > because it not a complete checker.
> > > > > > (4)So should we have one schemaType for a checker?
> > > > protobufNativeChecker
> > > > > > can have as many different implementation classes as possible.
> This
> > > > > > classname configuration in PIP, let's see if it can be set at the
> > > topic
> > > > > > level. In the current PIP design I just load this parameter into
> > the
> > > > > > checker when the broker is started and the checkers map is set
> up.
> > > Can
> > > > I
> > > > > do
> > > > > > this in the new normal pr if I want to support topic level? Or
> > > perfect
> > > > it
> > > > > > here?
> > > > > >
> > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more
> > > > > trouble).
> > > > > >
> > > > > > Sorry I can not use the computer and network in the company, I
> use
> > my
> > > > > > mobile phone to reply to the email, the format may be a bit
> messy.
> > > > Please
> > > > > > understand.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > > 丛搏 <bog...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > > > >
> > > > > > > SiNan Liu <liusinan1...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > > > >
> > > > > > > > Great to see your comment, bo!
> > > > > > > >
> > > > > > > > 1. The first way. The protobuf website has a description of
> the
> > > > > rules,
> > > > > > > but
> > > > > > > > no plans to implement them.
> > > > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > > > >
> > > > > > > https://groups.google.com/g/protobuf
> > > > > > > maybe ask here
> > > > > > >
> > > > > > > >
> > > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker
> > starts.
> > > > > > > > ValidatorClassName is empty by default, and the
> implementation
> > > > > continues
> > > > > > > as
> > > > > > > > before, with no change for the user.
> > > > > > >
> > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > > `broker.conf`
> > > > > > > ```
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > > ```
> > > > > > > I do not recommend that we directly modify this plugin and
> > continue
> > > > to
> > > > > > > add configuration items, which will cause trouble for users.
> > > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > > in my opinion, we don't change
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > > it is a simple implementation, it doesn't go wrong very often,
> > most
> > > > > > > users will use it. we can add another ProtobufNativeCheck named
> > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in
> > this
> > > > > > > way, we don't need to add this flag. There is no need to
> consider
> > > > > > > compatibility, it is just a plug-in and will not affect current
> > > > logic.
> > > > > > > If the user needs it, just change the plugin to the new
> > > > implementation
> > > > > > >
> > > > > > > > ```java
> > > > > > > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > > > > > > toDescriptor)
> > > > > > > > -> {
> > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > > fromDescriptors) {
> > > > > > > >             // The default implementation only checks if the
> > root
> > > > > message
> > > > > > > > has changed.
> > > > > > > >             if
> > > > > > > >
> > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > > {
> > > > > > > >                 throw new
> > ProtoBufCanReadCheckException("Protobuf
> > > > > root
> > > > > > > > message isn't allow change!");
> > > > > > > >             }
> > > > > > > >         }
> > > > > > > >     };
> > > > > > > > ```
> > > > > > > > `ValidatorClassName` value also can be set to the current
> > > > > implementation
> > > > > > > of
> > > > > > > > PIP add
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > > >
> > > > > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`.
> > > Through
> > > > > the
> > > > > > > flag
> > > > > > > > (`ValidatorClassName`) to build different
> > > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > > Isn't it just a plug-in? The user can develop and choose a
> > > > different
> > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change the
> > > > logic,
> > > > > it
> > > > > > > > just allowed him to expand it.
> > > > > > > >
> > > > > > > >
> > > > > > > > I think this PIP should be an enhancement and supplement to
> the
> > > > > function,
> > > > > > > > and there is no such thing as unnecessary and meaningless.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 丛搏 <bog...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > > > >
> > > > > > > > > I think we have two ways to do that.
> > > > > > > > >
> > > > > > > > > First way: We need to advance the improvement of java in
> > > > protobuf.
> > > > > Ask
> > > > > > > > > if they have plans to improve.
> > > > > > > > >
> > > > > > > > > Second way: the new PROTOBUF_NATIVE
> > `SchemaCompatibilityCheck`
> > > > > should
> > > > > > > > > be implemented as a plugin, don't change any existing
> plugin
> > > > logic
> > > > > > > > > (it's simple and already used). I don't recommend adding
> > flags
> > > > for
> > > > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Bo
> > > > > > > > >
> > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Can you convert the code block which is actually a quote
> in
> > > the
> > > > > > > > > > beginning of the PIP to something which doesn't require
> to
> > > > scroll
> > > > > > > > > > horizontally so much?
> > > > > > > > > > Use
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > > >
> > > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > > >
> > > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > > -->
> > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes
> > the
> > > > > object
> > > > > > > given
> > > > > > > > > > to it as message data is an auto-generated POJO
> containing
> > > the
> > > > > > > > > annotations
> > > > > > > > > > encoding the schema. The client is using a converter,
> which
> > > > > converts
> > > > > > > a
> > > > > > > > > > Protobuf schema descriptor into an Avro schema and sends
> > that
> > > > as
> > > > > the
> > > > > > > > > Schema
> > > > > > > > > > of the producer/consumer.
> > > > > > > > > >
> > > > > > > > > > "On the broker side, protobuf and avro both use
> SchemaData
> > > > > converted
> > > > > > > to
> > > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > > -->
> > > > > > > > > > Since the schema is an Avro schema, the implementation of
> > > > > > > compatibility
> > > > > > > > > > check on the broker side is to simply re-use the
> > > compatibility
> > > > > check
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > AVRO schema type.
> > > > > > > > > >
> > > > > > > > > > "ProtobufSchema is different from ProtobufNativeSchema in
> > > > schema
> > > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > But the current implementation of ProtobufNative schema
> > > > > compatibility
> > > > > > > > > > check only
> > > > > > > > > > checked if the root message name is changed."
> > > > > > > > > >
> > > > > > > > > > -->
> > > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > > The client is actually using Protobuf Descriptor as the
> > > schema,
> > > > > as
> > > > > > > > > opposed
> > > > > > > > > > to Avro schema of PROTOBUF schema type. In the broker,
> the
> > > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > > compatibility check actually hasn't implemented any rule,
> > > > besides
> > > > > > > one:
> > > > > > > > > > checking if the root message name has changed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >    1. For now, there is no official or third-party
> > solution
> > > > for
> > > > > > > > > ProtoBuf
> > > > > > > > > > >    compatibility. If in the future have better
> solutions
> > > of a
> > > > > third
> > > > > > > > > party or
> > > > > > > > > > >    the official, we develop new
> > > ProtobufNativeSchemaValidator
> > > > > and
> > > > > > > use,
> > > > > > > > > so
> > > > > > > > > > >    add a flag.
> > > > > > > > > > >
> > > > > > > > > > > Who do you need to make that configurable? Once you
> > found a
> > > > > third
> > > > > > > > > party,
> > > > > > > > > > just switch to it? Who knows, maybe you never will.
> > Introduce
> > > > it
> > > > > > > when you
> > > > > > > > > > find it, not now.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> > > BACKWARD,
> > > > > > > FORWARD
> > > > > > > > > > > these strategies. As with the AVRO implementation,
> > protobuf
> > > > > > > > > compatibility
> > > > > > > > > > > checking need implementing the canRead method. *This
> will
> > > > check
> > > > > > > that
> > > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I completely disagree.
> > > > > > > > > > Avro implementation is confusing for our use case. Don't
> > copy
> > > > > that.
> > > > > > > > > >
> > > > > > > > > > You have
> > > > > > > > > >
> > > > > > > > > > public void checkCompatible(SchemaData from, SchemaData
> to,
> > > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > > >     switch (strategy) {
> > > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > > >         case BACKWARD:
> > > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > > >         case FORWARD:
> > > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > > >         case FULL:
> > > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > > toDescriptor,
> > > > > > > > > strategy);
> > > > > > > > > >             return;
> > > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > > >             return;
> > > > > > > > > >         default:
> > > > > > > > > >             throw new
> IncompatibleSchemaException("Unknown
> > > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > > >     }
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > I would rename :
> > > > > > > > > > from --> currentSchema
> > > > > > > > > > to --> newSchema
> > > > > > > > > >
> > > > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > > > > > > >
> > > > > > > > > > I dislike canRead and usage of writtenSchema, since you
> > have
> > > > two
> > > > > > > > > completely
> > > > > > > > > > different use cases: from the producing side and the
> > consumer
> > > > > side.
> > > > > > > > > >
> > > > > > > > > > schemaValidatorBuilder
> > > > > > > > > > >
> > > > > > > > > > > I dislike this proposal. IMO Avro implementation is way
> > too
> > > > > > > > > complicated.
> > > > > > > > > > Why not have a simple function for validation for each
> > switch
> > > > > case
> > > > > > > above?
> > > > > > > > > > Why do we need strategy and builder, and all this
> > complexity?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > IMO it's impossible to read the validation rules as you
> > > > described
> > > > > > > them.
> > > > > > > > > > I wrote how they should be structured numerous times
> above.
> > > > > > > > > > I can't validate them.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > > Write a high level design section, in which you describe
> in
> > > > words
> > > > > > > what
> > > > > > > > > you
> > > > > > > > > > plan to do.
> > > > > > > > > > Write the validation rules in the structure that is easy
> to
> > > > > > > understand:
> > > > > > > > > > rules per each compatibility check, and use proper words
> > > > (current
> > > > > > > schema,
> > > > > > > > > > new schema), since new schema can be once used for read
> and
> > > > once
> > > > > > > used for
> > > > > > > > > > write.
> > > > > > > > > >
> > > > > > > > > > In its current form it takes too much time to understand
> > the
> > > > > design,
> > > > > > > and
> > > > > > > > > it
> > > > > > > > > > shouldn't be the case.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Asaf
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > > liusinan1...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi! I updated the explanation of some things in the PIP
> > > > issue.
> > > > > And
> > > > > > > also
> > > > > > > > > > > added a new “flag” in the conf is used as the different
> > > > > > > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > > > > > > ProtobufNativeSchemaValidator default only check
> whether
> > > the
> > > > > name
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > root message is the same.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月5日周日
> > 20:21写道:
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > > liusinan1...@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > descriptor
> > > > > can
> > > > > > > be
> > > > > > > > > > > > validated
> > > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > > compatibility
> > > > > > > > > > > rules?
> > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> is
> > > > > actually a
> > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. *You can take a quick look at the previous
> design,
> > > the
> > > > > > > PROTOBUF
> > > > > > > > > uses
> > > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Ok. So to summarize your code (easier to write it
> than
> > > send
> > > > > > > links):
> > > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema,
> > actually
> > > > > > > converts
> > > > > > > > > the
> > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using code
> > found
> > > > > inside
> > > > > > > Avro
> > > > > > > > > > > > library) and saves that Avro schema as the schema.
> It's
> > > not
> > > > > > > saving
> > > > > > > > > the
> > > > > > > > > > > > protobuf descriptor at all. Very confusing I have to
> > add
> > > -
> > > > > never
> > > > > > > > > expected
> > > > > > > > > > > > that.
> > > > > > > > > > > > This explains why In the
> > ProtobufSchemaCompatibilityCheck
> > > > > they
> > > > > > > just
> > > > > > > > > > > extend
> > > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for that.
> > > > > > > > > > > >
> > > > > > > > > > > > Now thatI finally understand this, I can say that:
> you
> > > > *must*
> > > > > > > explain
> > > > > > > > > > > that
> > > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both use
> > > > > `SchemaData`
> > > > > > > > > > > converted
> > > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > > The main link that helps is:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > I understand the different compatibility check
> > > > > strategies.
> > > > > > > > > > > > > > If you just spell them out here, then as you say,
> > > just
> > > > > > > translate
> > > > > > > > > the
> > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> the
> > > > Avro
> > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > > verify
> > > > > > > > > different
> > > > > > > > > > > > things
> > > > > > > > > > > > > > when it comes to Protobuf, which are different
> then
> > > > Avro.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1.
> > > > > > > > > > > > > *ProtobufSchema is different from
> > ProtobufNativeSchema
> > > in
> > > > > that
> > > > > > > it
> > > > > > > > > uses
> > > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > *ProtobufNativeSchema needs a native compatibility
> > > check,
> > > > > but
> > > > > > > > > there is
> > > > > > > > > > > no
> > > > > > > > > > > > > official or third party implementation. So this PIP
> > > does
> > > > > not
> > > > > > > use
> > > > > > > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. *By the way, this is implemented in much the
> same
> > > way
> > > > > that
> > > > > > > > > Apache
> > > > > > > > > > > avro
> > > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > > >
> > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > > > `canRead()` are
> > > > > > > > > > > > > writtenSchema and readSchema. We only need to
> change
> > > the
> > > > > order
> > > > > > > of
> > > > > > > > > > > > arguments
> > > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > > ```java
> > > > > > > > > > > > > private void
> > > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > > toValidate,
> > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > > {
> > > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > > case CanReadExistingStrategy ->
> > canRead(fromDescriptor,
> > > > > > > > > toValidate);
> > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> > canRead(toValidate,
> > > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > }
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > > writtenSchema,
> > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > > readSchema);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > ```
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > I get that you want to take inspiration from the
> > existing
> > > > > Avro
> > > > > > > Schema
> > > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > > I also understand you *won't* use any existing avro
> > code
> > > > for
> > > > > > > that.
> > > > > > > > > > > > I also understand, you have to write the validation
> > check
> > > > on
> > > > > your
> > > > > > > > > own,
> > > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > > >
> > > > > > > > > > > > The only thing I can't understand are the actual
> rules
> > > as I
> > > > > wrote
> > > > > > > > > before,
> > > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > > >
> > > > > > > > > > > > I think you should structure the validation rules
> > > > > differently:
> > > > > > > > > > > >
> > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > by
> > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > * Forward
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > by
> > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > >
> > > > > > > > > > > > Once that's accomplished I will be able to understand
> > the
> > > > > > > different
> > > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > sinan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三
> > > > 21:19写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > > liusinan1...@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I read it and they look identical. What's the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > them?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > > > implemented
> > > > > > > > > based
> > > > > > > > > > > on
> > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > based
> > > on
> > > > > > > Avro? You
> > > > > > > > > > > mean
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > Schema,
> > > > and
> > > > > > > then
> > > > > > > > > you
> > > > > > > > > > > use
> > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > They all extends
> > > `AvroSchemaBasedCompatibilityCheck`,
> > > > > the
> > > > > > > > > > > > > > > `checkCompatible()` is the same implementation
> > with
> > > > > `AVRO`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > descriptor
> > > > > can
> > > > > > > be
> > > > > > > > > > > > validated
> > > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > > compatibility
> > > > > > > > > > > rules?
> > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> is
> > > > > actually a
> > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think you should structure the validation
> rules
> > > > > > > differently:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The Compatibility check strategy is described
> on
> > > the
> > > > > > > website
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers
> > > using
> > > > > > > schema
> > > > > > > > > V3 can
> > > > > > > > > > > > > process
> > > > > > > > > > > > > > > data written by producers using the last schema
> > > > version
> > > > > > > V2. So
> > > > > > > > > V2
> > > > > > > > > > > is
> > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> > Consumers
> > > > > using
> > > > > > > the
> > > > > > > > > last
> > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > version V2 can process data written by
> producers
> > > > using
> > > > > a
> > > > > > > new
> > > > > > > > > schema
> > > > > > > > > > > > V3,
> > > > > > > > > > > > > > > even though they may not be able to use the
> full
> > > > > > > capabilities
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > > > "readSchema".
> > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are
> > both
> > > > > > > backward and
> > > > > > > > > > > > forward
> > > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > > Schema can evolve. The old version schema and
> the
> > > new
> > > > > > > version
> > > > > > > > > > > schema
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > I understand the different compatibility check
> > > > > strategies.
> > > > > > > > > > > > > > If you just spell them out here, then as you say,
> > > just
> > > > > > > translate
> > > > > > > > > the
> > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> the
> > > > Avro
> > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > > verify
> > > > > > > > > different
> > > > > > > > > > > > things
> > > > > > > > > > > > > > when it comes to Protobuf, which are different
> then
> > > > Avro.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > At the current state, I can't understand your
> > design
> > > at
> > > > > all.
> > > > > > > > > Please
> > > > > > > > > > > > help
> > > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema
> > and
> > > > > > > readSchema.
> > > > > > > > > As
> > > > > > > > > > > > we've
> > > > > > > > > > > > > > > just described, we only need to change the
> order
> > of
> > > > > > > arguments
> > > > > > > > > we
> > > > > > > > > > > pass
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com>
> > 于2023年2月27日周一
> > > > > 20:49写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > And you can see the difference between
> > ProtoBuf
> > > > and
> > > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  I read it and they look identical. What's
> the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are
> all
> > > > > > > implemented
> > > > > > > > > based
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > based
> > > on
> > > > > > > Avro? You
> > > > > > > > > > > mean
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > Schema,
> > > > and
> > > > > > > then
> > > > > > > > > you
> > > > > > > > > > > use
> > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > *Here are the basic compatibility rules
> we've
> > > > > defined:*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think you should structure the validation
> > rules
> > > > > > > > > differently:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > > schema
> > > > > used
> > > > > > > by
> > > > > > > > > > > > producer
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> used)
> > > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > > schema
> > > > > used
> > > > > > > by
> > > > > > > > > > > > producer
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> used)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm saying this since you used
> "writttenSchema"
> > > > word
> > > > > but
> > > > > > > it
> > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > something completely different if it's
> backward
> > > or
> > > > > > > forward
> > > > > > > > > check.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Once you'll have that structure like that, I
> > > > > personally
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > able
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The motivation and problem statement are now
> > > good -
> > > > > > > thanks
> > > > > > > > > for
> > > > > > > > > > > > > improving
> > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > > > > > liusinan1...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This
> time
> > > I've
> > > > > added
> > > > > > > > > some
> > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The compatibility check rules are already
> > > written
> > > > > in
> > > > > > > the
> > > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > > ProtoBufNative implements the same canRead
> > > method
> > > > > as
> > > > > > > Apache
> > > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > > It does this by checking whether the schema
> > for
> > > > > > > writing and
> > > > > > > > > > > > reading
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > > > writtenSchema
> > > > > > > and
> > > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > Backward, Forward, and Full strategies are
> > the
> > > > old
> > > > > or
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > > > version
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com>
> > > > 于2023年2月26日周日
> > > > > > > 23:24写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of
> > > > background
> > > > > > > > > knowledge,
> > > > > > > > > > > so
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > add IMO for people to understand it. You
> > > don't
> > > > > need
> > > > > > > to
> > > > > > > > > > > explain
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very
> least a
> > > few
> > > > > > > > > paragraphs
> > > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a built-in
> > > > schema
> > > > > > > registry
> > > > > > > > > > > > inside
> > > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > > >       - Every time the client updates the
> > > > > schema, it
> > > > > > > > > uploads
> > > > > > > > > > > it
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >       broker. When that happens, it has a
> > > > feature
> > > > > > > which
> > > > > > > > > > > > validates
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > >       schema version is compatible with
> the
> > > > > previous
> > > > > > > > > > > versions.
> > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
> (complete
> > > and
> > > > > > > explain
> > > > > > > > > each
> > > > > > > > > > > > one
> > > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema registry
> > > > supports
> > > > > > > various
> > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native, ...
> > > > > (complete
> > > > > > > the
> > > > > > > > > > > rest),
> > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > > >    has a schema which dictates how to
> > > serialize
> > > > > and
> > > > > > > > > > > deserialize
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > > >    - Explain in short what is protobuf
> > native
> > > > > > > (compare
> > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > > >    - Please don't paste code instead of
> > > > > explaining.
> > > > > > > > > > > > > > > > > >       - Explain that protobuf native
> > current
> > > > > > > validation
> > > > > > > > > check
> > > > > > > > > > > > is
> > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > >       composed of checking the root
> message
> > > > name
> > > > > is
> > > > > > > the
> > > > > > > > > same
> > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > > > > > > >          - Explain briefly what is a root
> > > > message
> > > > > > > and its
> > > > > > > > > > > name.
> > > > > > > > > > > > > > > > > >       - Explain the problem (list
> > scenarios)
> > > > > that we
> > > > > > > have
> > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > >       native schema only supports FULL
> > > > > compatibility
> > > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding high level design - as in what
> > you
> > > > > plan to
> > > > > > > do.
> > > > > > > > > > > > > > > > > > I suggest you add "High Level Design" and
> > in
> > > it
> > > > > > > detail
> > > > > > > > > how
> > > > > > > > > > > you
> > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > > compatibility
> > > > > > > check
> > > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > > I tried reading the implementation - for
> > me ,
> > > > > it's
> > > > > > > all
> > > > > > > > > over
> > > > > > > > > > > the
> > > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > you please list in order what I wrote
> > above,
> > > > and
> > > > > > > list the
> > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > > with a good explanation why you validate
> it
> > > > like
> > > > > > > that?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Lastly, one you have all the validation
> > rules
> > > > > clearly
> > > > > > > > > stated,
> > > > > > > > > > > > you
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > it to document it properly so users can
> > know
> > > > what
> > > > > > > > > validation
> > > > > > > > > > > to
> > > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan
> Liu <
> > > > > > > > > > > > > liusinan1...@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code
> and
> > > > > > > described the
> > > > > > > > > > > > design
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
> compatibility
> > > > > checks.
> > > > > > > You
> > > > > > > > > can
> > > > > > > > > > > > have a
> > > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com>
> > > > > 于2023年2月22日周三
> > > > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I read it but you're almost directly
> > > diving
> > > > > into
> > > > > > > the
> > > > > > > > > > > code -
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > > hours just to reverse engineer your
> > > design.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can you please include a "High Level
> > > > Design"
> > > > > > > section
> > > > > > > > > in
> > > > > > > > > > > > which
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > > > If I can read that section and
> explain
> > to
> > > > > someone
> > > > > > > > > else
> > > > > > > > > > > how
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan
> > > Liu <
> > > > > > > > > > > > > > > liusinan1...@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > > >
> > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > We can talk about the current
> design
> > > > here.
> > > > > > > > > Especially
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > change check rules, please give
> your
> > > > > valuable
> > > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to