Re: [I] Schema resolution lacks named type checking and allows invalid coercions [avro-rs]

via GitHub Tue, 13 Jan 2026 01:59:22 -0800


KitFieldhouse commented on issue #383:
URL: https://github.com/apache/avro-rs/issues/383#issuecomment-3743361391


   Thanks for the discussion. I apologize for not responding earlier, I've been 
out on vacation.
   
   > @KitFieldhouse Would that make something possible that is currently 
impossible/broken?
   
   What I had in mind is exactly what Kriskras demonstrates in their above post.
   
   As for storing name in the value, as Kriskras mentions:
   
   > We can also store the name in the value, but that would cause a lot of 
extra string clones.
   
   I agree that we want to avoid making a bunch of string clones. For the fork 
I have been working on, copying names is not too much of an issue, as names are 
(usually) passed via `Arc` references, so for values created by reading a 
binary stream with a schema, we can just store this reference. For user created 
values, the user supplied string literal only needs to live long enough for us 
to verify the value against a schema, at which point we can again use the 
reference. 
   
   I do think optionally accepting names in value makes sense. For instance, 
for the schema 
   
   ```
   [{"type": "record", "name": "A", "fields": []},{"type": "record", "name": 
"B", "fields": []}]
   ```
   
   With the current state of `Value`, to specify our record as being record 
"B", we would need to construct our value like so
   
   ```
   Value::Union(1,Box::new(Value::Record(vec![])));
   ```
   
   which, in my opinion is a roundabout way of saying what we really want to 
say which is that we are creating a record with name "B". I think this can lead 
to a footgun where if a user changes the order of the union in the schema, the 
"name" of the record may change or we may fail to validate all together.
   
   > I think this is best solved by changing Value::resolve(self, schema: 
&Schema) -> Value to Value::resolve(self, reader_schema: &Schema, 
writer_schema: &Schema) -> Value
   
   A few thoughts, 
   
   - Would we check `Value` against `writer_schema` first and then, if this 
passes, perform our resolution against 
   `reader_schema`?
   
   - What do we do when the specific type written in `Value` disagrees with 
what is specified in `writer_schema`? 
   Should we reject this case, or coerce according to what makes sense? For 
instance, lets say we have a `writer_schema`  of `{"type": "long"}`, a 
`reader_schema` of `{"type": "double"}` and a value of `Value::Int(123)`? 
Should we reject this since `Value` does not agree with `writer_schema` even 
though there is a clear coercion here? The only situation I can think of where 
this disagreement between writer schema and value would occur is if the 
provided value was constructed by the user and not read from a binary stream.
   
   Having `TypedValue` represent a type-level promise that its wrapped value 
and schema have been checked against each other and any necessary coercions 
have been performed coupled with appropriately limiting the ways in which a 
user can create a `TypedValue` provides a way of eliminating the situation 
where we provide a `Value` and a `writer_schema` that are not mutually 
compatible. Then, if we write functions that take `TypedValue` instead of 
`Value` we can assume the fast path that no checking between `writer_schema` 
and `Value` is needed.
   
   > The fact that resolving is more lenient than allowed by the specification 
is also a valid concern, but is not something I think needs to be fixed.
   
   In so far as this lenience doesn't cause incorrect behavior (like the 
example in your post) I agree with this argument to a degree. To play devil's 
advocate here, I see an argument that since resolution is unambiguously defined 
by the specification, it should be expected that implementations follow those 
rules and only those rules to ensure consistent behavior between 
implementations. I could see someone being confused on why this crate can 
resolve between a reader and writer schema where the Java implementation can 
not. Though, maybe this is fixable via a note in the documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Schema resolution lacks named type checking and allows invalid coercions [avro-rs]

Reply via email to