Re: [PR] fix #226: Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if=..)] [avro-rs]

via GitHub Thu, 24 Jul 2025 08:29:10 -0700


jdarais commented on PR #227:
URL: https://github.com/apache/avro-rs/pull/227#issuecomment-3113920339


   Sorry about the delayed response, I've been traveling.  On this question:
   
   > For full roundtrip we will need a Serde-driven deserialization too.
   ser.rs and de.rs are Schema-driven impls.
   ser_schema.rs is Serde-driven serialization. We will need de_schema.rs (or 
some better names) for Serde-driven impl.
   
   I was looking into creating a serde-driven deserializer, but in the end it 
seemed easier to first deserialize into an avro `Value`, which provides some 
flexibility that helps with schema resolution (if the reader schema is 
different from the writer schema,) and then use the existing `de.rs` 
implementation to convert from `Value` to the final type.
   
   I feel like this approach should still work for a round-trip of 
serialization and deserialization.  If it doesn't then I think that would mean 
that either the serde-driven serialization or the schema-driven deserialization 
has a bug, or simply that the default value used for serialization, (provided 
by the schema,) is different from the default value used for deserialization, 
(which as far as I can tell uses `Default::default` and bypasses 
deserialization logic altogether.)
   
   In general, the `skip_serializing_if` attribute seems to be at odds with 
serialization to Avro, since Avro structs give no accommodation for simply 
skipping serialization of a field: all fields of a struct must be present.  
(Using the default value for the field provided by the schema is a reasonable 
workaround, since that simulates omiting the field, though per the 
[spec](https://avro.apache.org/docs/1.12.0/specification/#schema-record), the 
default value is "only used when reading instances that lack the field for 
schema evolution purposes. The presence of a default value does not make the 
field optional at encoding time."  Also, with the changes from this PR, if the 
default value is not present, then the struct will be serialized incorrectly.)
   
   Using the `skip` attribute would be easier to use with Avro since you can 
just ensure that the skipped field isn't included in the Avro schema used to 
write.
   
   Now that I think about it, I'm guessing that `skip_serializing_if` is mostly 
useful for removing attributes that are just `null` or the default value so 
they don't take up unnecessary space in the serialized payload.  It might work 
to just have the `skip_field` implementation do the exact same thing as 
`serialize_field`, effectively always including the field, since you're not 
allowed to omit the field anyway.
   
   On a different topic: I see from the PR that there was some logic in there 
before that ensured that struct fields were written in the correct order, (they 
must be written in the order in which they appear in the schema,) even if the 
order of the fields in the rust struct is different from what's in the schema.  
It'd be nice to get that back in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] fix #226: Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if=..)] [avro-rs]

Reply via email to