[ 
https://issues.apache.org/jira/browse/AVRO-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839982#action_12839982
 ] 

Thiruvalluvan M. G. commented on AVRO-435:
------------------------------------------

+1 for "unique":"true".

Array differs from set in two ways:
   - Arrays guarantee the order of elements and sets don't. Using arrays to 
implement sets shouldn't be a problem.
   - Arrays do not guarantee uniqueness. Using arrays for sets would work if we 
can define _equality_ in the spec. Unfortunately we cannot say "two entities 
are equal if their schemas are equal and their bit representations in Avro are 
equal". The trouble comes from maps (and sets when we have them) because Avro 
does not enforce order in them. But we can define _equality_ something like 
this:
      - Two primitive entities are equal if and only if their schemas and Avro 
binary representations are equal.
      - Two complex entities (other than maps and sets) are equal if an only if 
their schemas are equal and their contents are equal. For example, two unions 
are equal if and only if their union indexes are equal and the union members 
are equal.
      - Two Maps(sets) are equal if and only if their schemas are equal and 
their elements are equal except for their order.

Another question to answer is: Is it the responsibility of the Avro library to 
ensure uniqueness? My answer is no. Since, in Avro we interpret the contents 
assuming that the the schema we have is indeed the schema that the writer used, 
we can trust "unique":"true" as well.

That leads us to the next question: What are the resolution rules between 
arrays and sets? My answer to this is: sets written can always be read as 
arrays. Arrays written can be read as sets as long as the uniqueness constraint 
is not violated. It extends our schema resolution philosophy - we try to be as 
lenient as possible unless we encounter data violations.


> Support Set containers
> ----------------------
>
>                 Key: AVRO-435
>                 URL: https://issues.apache.org/jira/browse/AVRO-435
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Cassandra uses Set as a return type for some methods.  It would be nice to 
> not have to use a List as a workaround.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to