It turns out that the Python implementation takes the last path that matches.
Agreed that it's deterministic within a language, but it might round-trip inconsistently. For example, suppose Java takes the first path that matches, and Python takes the last path that matches. Then, if I 1. serialize with Java, 2. deserialize with Python, and 3. reserialize with Python, then the encoded bytes will be different after 1 vs 3. Certainly, each will be able to read the encoded data, but its binary representation has changed. Maybe that's okay. It's a little unfortunate for writing tests, and violates some expectations - you'd think that decoding and re-encoding data, without changing anything in it, would not change its bytes on disk. On Fri, Mar 5, 2021 at 8:53 AM Ryan Blue <rb...@netflix.com.invalid> wrote: > I think the behavior when encoding that would be to produce the map. I > would expect that because I'm assuming Python uses the first path that > appears to match. When it's ambiguous which way an in-memory representation > maps to a schema, it's up to the implementation to choose. > > Whatever python chooses, the actual encoding is deterministic. Either the > map or the record will be chosen and the bytes produced will always > deserialize to that representation if you read it in another language > implementation. > > On Thu, Mar 4, 2021 at 5:30 PM Spencer Nelson <s...@spencerwnelson.com> > wrote: > > > Suppose a schema like this - a union of a map and a record: > > > > [ > > {"type": "map", "values": "int"}, > > {"type": "record", "name": "Record", fields: [{"name": "field", > > "type": "int"}]} > > ] > > > > In Python, unserialized maps and records are both represented as > > dictionaries. So, if an Avro Python library were asked to encode this > > message: > > > > {"field": 1} > > > > What should it do? Should it describe the value as the map type, or > > the record type, when encoding the union? > > > > Similarly, I wonder about cases where multiple records are in a union. > > I think it's easy to imagine the ambiguous cases without spelling it > > all out. > > > > Maybe this ambiguity is specific to the Python implementation, I'm not > > sure. > > > > > -- > Ryan Blue > Software Engineer > Netflix >