> Will it register a new schema?

Only when it could pass the schema compatibility strategy. BTW, the
existing schema compatibility checker does not check the order of
fields, while it is very important. IMO, it's a bug of the broker.

Just checked this thread and found I didn't paste this issue:
https://github.com/apache/pulsar-client-python/issues/108. You can see
the schema compatibility strategy is FORWARD, then the sorted schema
from the Java client overwrote the unsorted schema from the Python
client. However, the Python consumer that uses the old schema failed
to decode the message of the new schema.

My goal is to make the Python client act the same as the Java client
since the next formal release. Regarding how the broker processes it,
I think it's another thing to be fixed.

Thanks,
Yunze

On Thu, Mar 30, 2023 at 8:42 PM 丛搏 <congbobo...@gmail.com> wrote:
>
> Hi, Yunze:
>
> > Regarding the 1st question, yes, that's why I open this thread to
> > discuss. If we change these default values, the behavior of new Python
> > clients will be like the Java client. In addition, it actually reverts
> > the breaking change brought in #12232.
>
> I also kind of forget why we have #12232 to change the default behavior
> Maybe the python2 and python3 order rule is different.
>
> If we change the order is the default value, for every topic that uses
> python client will register a new schema. Will it register a new
> schema? Maybe we should add a special logic in the broker to
> check the python client version and make it will not register
> a new schema. Otherwise, the impact may still be quite large.
>
> Thanks,
> Bo
> >
> > Regarding the 2nd question, yes, they are both sorted in alphabetical
> > order. I don't know the behavior of the .NET clients, for C++, Golang,
> > Node.js clients, they all do not support generating schema definition
> > from a DTO.
> >
> > Thanks,
> > Yunze
> >
> > On Thu, Mar 30, 2023 at 10:14 AM 丛搏 <congbobo...@gmail.com> wrote:
> > >
> > > Hi, Yunze :
> > >
> > > 1. If the changes may cause some compatibility issues.
> > > How do we solve the compatibility issues? It may be a
> > > breaking change.
> > >
> > > 2. Another question is if sorting is enabled by default,
> > > is the sorting rule the same as java or other clients?
> > >
> > > Putting aside the above two problems, I think it is
> > > good to be consistent with other clients.
> > >
> > > Thanks,
> > > Bo
> > >
> > > Eric Hare <eric.h...@datastax.com> 于2023年3月29日周三 22:42写道:
> > > >
> > > > +1 - i think keeping the `_sorted_fields` and `_required` defaults 
> > > > consistent between the clients is the way to go.
> > > >
> > > > > On Mar 29, 2023, at 7:09 AM, Yunze Xu <y...@streamnative.io.INVALID> 
> > > > > wrote:
> > > > >
> > > > > I found the Python client has two options to control the behavior:
> > > > > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > > > > but it's true in the Java client. i.e. the Java client sorts all
> > > > > fields by default.
> > > > > 2. Set `_required`. It's false by default for all types in the Python
> > > > > client, but it's only false for the string type in the Java client.
> > > > >
> > > > > i.e. given the following Java class:
> > > > >
> > > > > ```java
> > > > > class User {
> > > > >    String name;
> > > > >    int age;
> > > > >    double score;
> > > > > }
> > > > > ```
> > > > >
> > > > > We have to give the following definition in Python:
> > > > >
> > > > > ```python
> > > > > class User(Record):
> > > > >    _sorted_fields = True
> > > > >    name = String()
> > > > >    age = Integer(required=True)
> > > > >    score = Double(required=True)
> > > > > ```
> > > > >
> > > > > I see https://github.com/apache/pulsar/pull/12232 adds the
> > > > > `_sorted_fields` field and disables the field sort by default. It
> > > > > breaks compatibility with the Java client.
> > > > >
> > > > > IMO, we should make `_sorted_fields` true by default and `_required`
> > > > > true for all types other than `String` by default.
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <y...@streamnative.io> wrote:
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >> Recently I found the default generated schema definition in the 
> > > > >> Python
> > > > >> client is different from the Java client, which leads to some
> > > > >> unexpected behavior.
> > > > >>
> > > > >> For example, given the following class definition in Python:
> > > > >>
> > > > >> ```python
> > > > >> class Data(Record):
> > > > >>    i = Integer()
> > > > >> ```
> > > > >>
> > > > >> The type of `i` field is a union: "type": ["null", "int"]
> > > > >>
> > > > >> While given the following class definition in Java:
> > > > >>
> > > > >> ```java
> > > > >> class Data {
> > > > >>    private final int i;
> > > > >>    /* ... */
> > > > >> }
> > > > >> ```
> > > > >>
> > > > >> The type of `i` field is an integer: "type": "int"
> > > > >>
> > > > >> It brings an issue that if a Python consumer subscribes to a topic
> > > > >> with schema defined above, then a Java producer will fail to create
> > > > >> because of the schema incompatibility.
> > > > >>
> > > > >> Currently, the workaround is to change the schema compatibility
> > > > >> strategy to FORWARD.
> > > > >>
> > > > >> Should we change the way to generate schema definition in the Python
> > > > >> client to be compatible with the Java client? It could bring breaking
> > > > >> changes to old Python clients, but it could guarantee compatibility
> > > > >> with the Java client.
> > > > >>
> > > > >> If not, we still have to introduce an extra configuration to make
> > > > >> Python schema compatible with Java schema. But it requires code
> > > > >> changes. e.g. here is a possible solution:
> > > > >>
> > > > >> ```python
> > > > >> class Data(Record):
> > > > >>    # NOTE: Users might have to add this extra field to control how to
> > > > >> generate the schema
> > > > >>    __java_compatible = True
> > > > >>    i = Integer()
> > > > >> ```
> > > > >>
> > > > >> Thanks,
> > > > >> Yunze
> > > >

Reply via email to