> Why can't we upload negative schema types?

I want to avoid the changes to existing methods like
Commands#getSchemaType, which converts all negative schema types to
NONE:

```java
private static Schema.Type getSchemaType(SchemaType type) {
    if (type.getValue() < 0) {
        return Schema.Type.None;
    } else {
        return Schema.Type.valueOf(type.getValue());
    }
}
```

I guess the above code was written because:
1. NONE schema type means it's not uploaded into the registry. (See #3940 [1])
2. There is no existing schema that uses NONE as its schema type, i.e.
NONE schema is used as something special.

> every different language client will code the special logic.

If other clients follow the behavior of the Java client, they should
also convert negative schemas to NONE currently. Therefore, changes
cannot be avoided. No matter if the semantic of `setSchemaType` is
changed, they should follow the Java implementation as well.

> This will change the meaning of the schema data field

The existing definition only defines its meaning to the AVRO and JSON
schema. But from a more general view, the schema data should be
something associated with the current schema. Giving it more meaning
for other schema types is acceptable IMO. For example, the schema data
field represents the serialized Protobuf descriptor in Protobuf Native
schema, see `ProtobufNativeSchema#of`:

```java
.schema(ProtobufNativeSchemaUtils.serialize(descriptor))
```

[1] https://github.com/apache/pulsar/pull/3940

Thanks,
Yunze

On Wed, Jan 4, 2023 at 8:27 PM 丛搏 <bog...@apache.org> wrote:
>
> > It does not affect the public API so it can be cherry-picked into old
> > branches. The main difference with this proposal is that my solution
> > carries the identity info (i.e. `AUTO_CONSUME`) in the schema data,
> > which is a byte array. The negative schema types should not be exposed
> > to users. Adding a field to the subscribe request might be okay but it
> > could be unnecessary to cover such a corner case.
>
> This will change the meaning of the schema data field and couple the
> schema type and schema data. `schema type = NONE` and `schema data =
> "AUTO_CONSUME" ` represent `AUTO_ CONSUME`, I think it's weird. Why
> can't we upload negative schema types?
>
> > It does not affect the public API
> upload negative schema types only changes the proto, if using `schema
> type = NONE` and `schema data = "AUTO_CONSUME" `, every different
> language client will code the special logic. This special logic can
> easily be ignored.
>
> Thanks,
> Bo
>
> Yunze Xu <y...@streamnative.io.invalid> 于2023年1月4日周三 17:02写道:
> >
> > I opened a PR to fix this issue: https://github.com/apache/pulsar/pull/19128
> >
> > It does not affect the public API so it can be cherry-picked into old
> > branches. The main difference with this proposal is that my solution
> > carries the identity info (i.e. `AUTO_CONSUME`) in the schema data,
> > which is a byte array. The negative schema types should not be exposed
> > to users. Adding a field to the subscribe request might be okay but it
> > could be unnecessary to cover such a corner case.
> >
> > It might be controversial if schema data should be used in such a way,
> > because the original purpose is to represent the AVRO or JSON
> > definition. However, this semantic is defined just for AVRO or JSON
> > schema. IMO, the data field of other schemas is never used well.
> >
> > Another solution is to make use of the name field of schema, which
> > might be more natural. I think we can continue the discussion in my
> > PR.
> >
> > Thanks,
> > Yunze
> >
> > On Wed, Jan 4, 2023 at 11:07 AM Yunze Xu <y...@streamnative.io> wrote:
> > >
> > > Modifying the subscribe request is better than exposing AUTO_CONSUME
> > > schema type IMO. The negative value of a schema type, like BYTES,
> > > AUTO_PRODUCE, means this schema type should only be used internally.
> > > Adding the negative enum value to the Schema definition in
> > > PulsarApi.proto looks very weird.
> > >
> > > But I'm still wondering if we can avoid the API changes. I will look
> > > deeper into this issue.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Wed, Jan 4, 2023 at 12:12 AM Enrico Olivelli <eolive...@gmail.com> 
> > > wrote:
> > > >
> > > > Il Mar 3 Gen 2023, 14:37 Yunze Xu <y...@streamnative.io.invalid> ha 
> > > > scritto:
> > > >
> > > > > Hi Bo,
> > > > >
> > > > > I got it now. The PIP title sounds ambiguous. Using the term "Upload
> > > > > xxx SchemaType" sounds like uploading the schema into the registry.
> > > > > Instead, it should be "carrying schema in the request when subscribing
> > > > > with AUTO_CONSUME schema".
> > > > >
> > > >
> > > >
> > > > I agree that we should change the naming and we should probably not use 
> > > > a
> > > > new Schema type but add an optional field in the subscribe request (and 
> > > > do
> > > > not send it if the broker is an old version)
> > > >
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Tue, Jan 3, 2023 at 4:56 PM 丛搏 <bog...@apache.org> wrote:
> > > > > >
> > > > > > Hi, Yunze
> > > > > > > What I am concerned about is that if the old clients with other
> > > > > > > schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe 
> > > > > > > to
> > > > > > > the topic with AUTO_CONSUME schema, what will happen?
> > > > > >
> > > > > > AUTO_CONSUME schema will not store in `SchemaRegistryServiceImpl`, 
> > > > > > it
> > > > > > only represents one consumer with AUTO_CONSUME schema to subscribe 
> > > > > > to
> > > > > > a topic. If old clients with other schemas subscribe to this topic,
> > > > > > Its behavior will not be changed by this PIP.
> > > > > >
> > > > > > > What's the schema compatibility check rule on a topic with
> > > > > AUTO_CONSUME schema?
> > > > > >
> > > > > > it's only the consumer schema compatibility check, not on topic. if 
> > > > > > a
> > > > > > consume with AUTO_CONSUME schema will do any compatibility check
> > > > > >
> > > > > > Thanks,
> > > > > > Bo
> > > > > >
> > > > > > Yunze Xu <y...@streamnative.io.invalid> 于2023年1月3日周二 10:16写道:
> > > > > > >
> > > > > > > What I am concerned about is that if the old clients with other
> > > > > > > schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe 
> > > > > > > to
> > > > > > > the topic with AUTO_CONSUME schema, what will happen? What's the
> > > > > > > schema compatibility check rule on a topic with AUTO_CONSUME 
> > > > > > > schema?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yunze
> > > > > > >
> > > > > > > On Mon, Jan 2, 2023 at 12:38 AM SiNan Liu <liusinan1...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > 1.Schema.Type and org.apache.pulsar.common.schema.SchemaType 
> > > > > > > > value
> > > > > should
> > > > > > > > be the same.
> > > > > > > > 2.These changes do not affect produce and are only affect 
> > > > > > > > consumer
> > > > > > > > subscribe behavior.
> > > > > > > > 3.backward compatibility:
> > > > > > > > (1)In 
> > > > > > > > org.apache.pulsar.broker.service.ServerCnx#handleSubscribe.
> > > > > > > > if (schema != null && schema.getType() != 
> > > > > > > > SchemaType.AUTO_CONSUME) {
> > > > > > > > return topic.addSchemaIfIdleOrCheckCompatible(schema)
> > > > > > > > .thenCompose(v -> topic.subscribe(option));
> > > > > > > > } else {
> > > > > > > > return topic.subscribe(option);
> > > > > > > > }
> > > > > > > > For the older pulsar client, the schema is null if AUTO_CONSUME
> > > > > consumer
> > > > > > > > subscribe to the Topic.
> > > > > > > > For the new pulsar client, if AUTO_CONSUME consumer subscribe 
> > > > > > > > the
> > > > > Topic,
> > > > > > > > then schema is not null and schema.getType() =
> > > > > SchemaType.AUTO_CONSUME.
> > > > > > > > Both new and old pulsar clients consume the topic, will return 
> > > > > > > > topic.
> > > > > > > > subscribe(option).
> > > > > > > >
> > > > > > > > (2)In 
> > > > > > > > org.apache.pulsar.broker.service.persistent.PersistentTopic
> > > > > > > > #addSchemaIfIdleOrCheckCompatible.
> > > > > > > > @Override
> > > > > > > > public CompletableFuture<Void>
> > > > > addSchemaIfIdleOrCheckCompatible(SchemaData
> > > > > > > > schema) {
> > > > > > > > return hasSchema().thenCompose((hasSchema) -> {
> > > > > > > > int numActiveConsumersWithoutAutoSchema =
> > > > > subscriptions.values().stream()
> > > > > > > > .mapToInt(subscription -> subscription.getConsumers().stream()
> > > > > > > > .filter(consumer -> consumer.getSchemaType() !=
> > > > > SchemaType.AUTO_CONSUME)
> > > > > > > > .toList().size())
> > > > > > > > .sum();
> > > > > > > > if (hasSchema
> > > > > > > > || (!producers.isEmpty())
> > > > > > > > || (numActiveConsumersWithoutAutoSchema != 0)
> > > > > > > > || (ledger.getTotalSize() != 0)) {
> > > > > > > > return checkSchemaCompatibleForConsumer(schema);
> > > > > > > > } else {
> > > > > > > > return addSchema(schema).thenCompose(schemaVersion ->
> > > > > > > > CompletableFuture.completedFuture(null));
> > > > > > > > }
> > > > > > > > });
> > > > > > > > }
> > > > > > > > Only in one case will there be a bug.
> > > > > > > > First, the old pulsar client consume the empty topic, the 
> > > > > > > > consumer
> > > > > schema
> > > > > > > > is AUTO_CONSUME, and then whether the new or old pulsar client
> > > > > consume(i.e.
> > > > > > > > schema is AVRO) the topic.
> > > > > > > > The broker will return the error message as
> > > > > IncompatibleSchemaException ("
> > > > > > > > Topic does not have a schema to check "). The bug at issue17354 
> > > > > > > > is
> > > > > not
> > > > > > > > fixed in this case.
> > > > > > > > All the other cases will be normal.
> > > > > > > >
> > > > > > > > Yunze Xu <y...@streamnative.io.invalid> 于2022年12月31日周六 20:23写道:
> > > > > > > >
> > > > > > > > > Defining `AutoConsume` as -3 is somehow strange. Could you 
> > > > > > > > > clarify
> > > > > if
> > > > > > > > > backward compatibility is guaranteed? i.e. if the new Pulsar 
> > > > > > > > > client
> > > > > > > > > uploaded the AUTO_CONSUME schema to the broker, can the old 
> > > > > > > > > Pulsar
> > > > > > > > > clients produce or consume the same topic anymore?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yunze
> > > > > > > > >
> > > > > > > > > On Fri, Dec 30, 2022 at 11:32 PM 思楠刘 <liusinan1...@gmail.com>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I made a PIP to discuss:
> > > > > https://github.com/apache/pulsar/issues/19113.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Sinan
> > > > > > > > >
> > > > >

Reply via email to