sijie commented on issue #3741: POJO AvroSchema always allowNull URL: https://github.com/apache/pulsar/issues/3741#issuecomment-469483661 > I don't understand why you say we are not respecting the rules of AVRO > The only flag that could be settable is the allow null flag that gets set in Pulsar's Avro schema. We can expose that flag to users. @jerrypeng That is the problem I am reporting this in the issue. Because Pulsar Avro Schema always set AllowNull and pulsar's users don't have the ability to bypass the AllowNull. My point here is Pulsar AVRO should not try to attach AllowNull. > If people want null to all the fields then they can add it at the avro schema. @skyrocknroll exactly. > I don't think the current API is the problem, I think we just need to separate API to specify a custom schema. Jerry, that's a different feature. The point here for a given POJO, Pulsar should maintain a basic contract: the schema generated by Pulsar AVRO should be same/compatible with what other tools generated. Pulsar AVRO should not automatically attach AllowNull for all fields, otherwise we shouldn't call it AVRO schema, we should be calling it AlwaysAllowNullAvroSchema. > This seems to be a problem only with Avro generated code, in which there was a schema defined and we're extracting the wrong schema from the generated POJO > This shouldn't be an issue when one defines an ad-hoc POJO (without starting from Avro schema), because annotations can be used to request nullable/non-nullable. @merlimat : that's not the *only* case. People can use ReflectData to generate a schema and write events in a different system (let's call it source). when people migrate data from source to pulsar, people use same POJO, these two schemas are not compatible, which will result in events are not able to be written to pulsar. > The easiest solution would be to detect that a POJO is generated by Avro and use the getClassSchema() instead of doing the reflection based approach. That's a very limited solution to cover only one use case. --- In summary, the questions is about "shall Pulsar automatically attach AllowNull to POJO generated schema". Attaching AllowNull changes the default behavior of how AVRO generates POJO and makes Pulsar inconsistent with how other systems use AVRO. If the team decides that AllowNull should be the default behavior in AvroSchema, it should be well documented and highlighted to all avro schema users. and a separate AvroSchema or a flag to disable AllowNull should also be provided for users to not use AllowNull. I don't think detecting if it is an Avro generated POJO is the right approach. We should just remove (or provide the ability to) AllowNull when constructing AvroSchema.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services