sijie commented on issue #3741: POJO AvroSchema always allowNull
URL: https://github.com/apache/pulsar/issues/3741#issuecomment-469483661
 
 
   > I don't understand why you say we are not respecting the rules of AVRO
   > The only flag that could be settable is the allow null flag that gets set 
in Pulsar's Avro schema. We can expose that flag to users.
   
   @jerrypeng 
   
   That is the problem I am reporting this in the issue. Because Pulsar Avro 
Schema always set AllowNull and pulsar's users don't have the ability to bypass 
the AllowNull. My point here is Pulsar AVRO should not try to attach AllowNull.
   
   > If people want null to all the fields then they can add it at the avro 
schema.
   
   @skyrocknroll exactly.
   
   > I don't think the current API is the problem, I think we just need to 
separate API to specify a custom schema. 
   
   Jerry, that's a different feature. 
   
   The point here for a given POJO, Pulsar should maintain a basic contract: 
the schema generated by Pulsar AVRO should be same/compatible with what other 
tools generated. Pulsar AVRO should not automatically attach AllowNull for all 
fields, otherwise we shouldn't call it AVRO schema, we should be calling it 
AlwaysAllowNullAvroSchema.
   
   > This seems to be a problem only with Avro generated code, in which there 
was a schema defined and we're extracting the wrong schema from the generated 
POJO
   > This shouldn't be an issue when one defines an ad-hoc POJO (without 
starting from Avro schema), because annotations can be used to request 
nullable/non-nullable.
   
   @merlimat : that's not the *only* case. People can use ReflectData to 
generate a schema and write events in a different system (let's call it 
source). when people migrate data from source to pulsar, people use same POJO, 
these two schemas are not compatible, which will result in events are not able 
to be written to pulsar.
   
   > The easiest solution would be to detect that a POJO is generated by Avro 
and use the getClassSchema() instead of doing the reflection based approach.
   
   That's a very limited solution to cover only one use case. 
   
   ---
   
   In summary, the questions is about "shall Pulsar automatically attach 
AllowNull to POJO generated schema". Attaching AllowNull changes the default 
behavior of how AVRO generates POJO and makes Pulsar inconsistent with how 
other systems use AVRO.
   
   If the team decides that AllowNull should be the default behavior in 
AvroSchema, it should be well documented and highlighted to all avro schema 
users. and a separate AvroSchema or a flag to disable AllowNull should also be 
provided for users to not use AllowNull.
   
   I don't think detecting if it is an Avro generated POJO is the right 
approach. We should just remove (or provide the ability to) AllowNull when 
constructing AvroSchema.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to