[ 
https://issues.apache.org/jira/browse/AVRO-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840001#action_12840001
 ] 

Scott Carey commented on AVRO-435:
----------------------------------

An ordered set is easily compared to an array, perhaps they should be ordered.

If things are unordered, comparison gets more complicated.  For example, the 
simple way to compare would be to build both sets up in memory -- this would 
not work for large sets or lists.
I propose that a set is ordered by default.  A client can do this by either 
sorting when writing, or with a data structure like a linked set.  A client can 
choose to disregard order and accept that set equivalence is invalid if they 
wish.

We could consider an "ordered": true property as well
"unique": false, "ordered": true
Would then be the implicit default.  An array or set that is not ordered is 
always unequal to another array or set.  A reader with an 'ordered' schema 
reading an 'unordered' serialization would be a tough spot.  That might not be 
a supported promotion.

{quote}
That leads us to the next question: What are the resolution rules between 
arrays and sets? My answer to this is: sets written can always be read as 
arrays. Arrays written can be read as sets as long as the uniqueness constraint 
is not violated.{quote}

One can always read an array as a set, even with duplicates.  The duplicates 
get eliminated in the process of creating the set.  Interestingly, one can go 
either direction, but not back and forth.

I think in the short run, Doug's version is appropriate.  The above would be 
valuable but also take a while to sort out the details for what works best 
across languages and is a spec change, rather than a Java API extension.  
Besides, it should be possible to specify what object to use as an array 
container regardless.

Taken in combination with AVRO-436, ordered maps, these two things are 
potentially significant changes for something that clients can emulate on their 
own.  Ordered maps can be emulated by an array of {key, value} tuples.
The simplest option other than Doug's Java Reflect API version is to require 
that Avro sets are ordered for the purposes of equivalence and comparisson, and 
if a client wants to compare two objects for equality or sort order they must 
guarantee order on writing (this restriction already happens when serializing 
sets as arrays).  
"unique": (true|false) is then just a reserved keyword hint for languages to 
construct Set - like APIs for data access.
We can consider adding the more difficult support for unordered sets/lists 
incrementally.

> Support Set containers
> ----------------------
>
>                 Key: AVRO-435
>                 URL: https://issues.apache.org/jira/browse/AVRO-435
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Cassandra uses Set as a return type for some methods.  It would be nice to 
> not have to use a List as a workaround.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to