I think the behavior when encoding that would be to produce the map. I would expect that because I'm assuming Python uses the first path that appears to match. When it's ambiguous which way an in-memory representation maps to a schema, it's up to the implementation to choose.
Whatever python chooses, the actual encoding is deterministic. Either the map or the record will be chosen and the bytes produced will always deserialize to that representation if you read it in another language implementation. On Thu, Mar 4, 2021 at 5:30 PM Spencer Nelson <s...@spencerwnelson.com> wrote: > Suppose a schema like this - a union of a map and a record: > > [ > {"type": "map", "values": "int"}, > {"type": "record", "name": "Record", fields: [{"name": "field", > "type": "int"}]} > ] > > In Python, unserialized maps and records are both represented as > dictionaries. So, if an Avro Python library were asked to encode this > message: > > {"field": 1} > > What should it do? Should it describe the value as the map type, or > the record type, when encoding the union? > > Similarly, I wonder about cases where multiple records are in a union. > I think it's easy to imagine the ambiguous cases without spelling it > all out. > > Maybe this ambiguity is specific to the Python implementation, I'm not > sure. > -- Ryan Blue Software Engineer Netflix