I think the behavior when encoding that would be to produce the map. I
would expect that because I'm assuming Python uses the first path that
appears to match. When it's ambiguous which way an in-memory representation
maps to a schema, it's up to the implementation to choose.

Whatever python chooses, the actual encoding is deterministic. Either the
map or the record will be chosen and the bytes produced will always
deserialize to that representation if you read it in another language
implementation.

On Thu, Mar 4, 2021 at 5:30 PM Spencer Nelson <s...@spencerwnelson.com> wrote:

> Suppose a schema like this - a union of a map and a record:
>
> [
>     {"type": "map", "values": "int"},
>     {"type": "record", "name": "Record", fields: [{"name": "field",
> "type": "int"}]}
> ]
>
> In Python, unserialized maps and records are both represented as
> dictionaries. So, if an Avro Python library were asked to encode this
> message:
>
>     {"field": 1}
>
> What should it do? Should it describe the value as the map type, or
> the record type, when encoding the union?
>
> Similarly, I wonder about cases where multiple records are in a union.
> I think it's easy to imagine the ambiguous cases without spelling it
> all out.
>
> Maybe this ambiguity is specific to the Python implementation, I'm not
> sure.
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to