KitFieldhouse commented on PR #467:
URL: https://github.com/apache/avro-rs/pull/467#issuecomment-4014871627

   A few concerns about this change:
   
   **1. `default` field in map and array example was removed in the spec**
   
   [This jira issue ](https://issues.apache.org/jira/browse/AVRO-3155?) points 
out that the `default` field was added to map and array examples in the 
specification starting from 1.10. The [associated 
PR](https://github.com/apache/avro/pull/3046) indicates that this is a mistake 
in the specification and fixes it for versions starting at 1.12.1.
   
   **2. Interoperability regression**
   
   The Avro specification states "Attributes not defined in this document are 
permitted as metadata, but must not affect the format of serialized data". The 
other implementations I have tested (Java, Python) treat `"default"` on array 
and map type definitions as an unrecognized attribute — they store it as opaque 
metadata and do not validate it against the schema.
   
   This PR promotes `"default"` to a validated, parsed field on `ArraySchema` 
and `MapSchema`. Because it validates the value against the item/value schema, 
schemata that parse successfully in other implementations will now fail to 
parse in Rust. For example, in the python implementation, the below parses 
successfully and shows "default" as being considered an "other_prop".
   
   ```python
   import avro.schema
   import json
   
   # Array with a "default" that wouldn't even validate against the items schema
   array_schema = avro.schema.parse(json.dumps({
       "type": "array",
       "items": "string",
       "default": 12345  # intentionally invalid — not an array of strings
   }))
   
   print("Array props:      ", dict(array_schema.props))
   print("Array other_props: ", dict(array_schema.other_props))
   
   # Map with a default
   map_schema = avro.schema.parse(json.dumps({
       "type": "map",
       "values": "long",
       "default": {"key": "not-a-long"}  # also intentionally invalid
   }))
   
   print("Map props:         ", dict(map_schema.props))
   print("Map other_props:   ", dict(map_schema.other_props))
   
   ```
   
   I believe this is an edge case, but it does cause a schism between this 
implementation and the others.
   
   **3. No semantic effect**
   
   This PR does not appear to modify the encoding or decoding paths and has no 
effect on data serialization or schema resolution. A user who sets a default on 
an array type might reasonably expect it to behave like record field defaults 
during schema evolution, but it does not.
   
   The spec has never defined semantics for type-level defaults on arrays or 
maps. The `"default"` examples that appeared in the 1.10.0–1.12.0 spec were, as 
mentioned above, identified as a documentation error (AVRO-3155) and removed.
   
   **Suggested alternative**
   
   I believe the pre-PR version actually was the correct implementation given 
the behavior of other implementations and the spec, so I would vote to revert 
this change, or at minimum stop validating the default value against the schema 
so that it passes through.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to