[ 
https://issues.apache.org/jira/browse/THRIFT-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated THRIFT-1630:
----------------------------------

    Description: 
There's a subtle issue with trying to compare the serialized bytes of Thrift 
objects that contain maps or sets in Java. Even though the objects that go into 
sets (or serve as map keys) have consistent hashcodes, if they are inserted in 
different order, then the iteration order of the collection will also be 
different. Since serialization occurs in iteration order, this can lead to 
objects that are .equals() when in-memory being not-equals when serialized.

In most cases this isn't an issue. However, in cases where the user is doing 
raw comparison (ie, Hadoop), then it is a big issue.

One solution is to just switch the internal Map implementation to the Sorted 
version (TreeSet/TreeMap). However, these implementations are about 3x slower 
than their Hash counterparts, and I can certainly foresee situations in which 
that would upset a lot of users. I propose we add a compiler switch that 
toggles the Map/Set implementation between sorted and unsorted so that users 
can select which they prefer.


  was:allow users to indicate that they'd like sets/maps in their types. 
meaning, for example, that they'd be backed by TreeSet/TreeMap in Java.

        Summary: Equivalent objects that contain sets and maps can serialize 
differently  (was: add support for sorted sets/maps)
    
> Equivalent objects that contain sets and maps can serialize differently
> -----------------------------------------------------------------------
>
>                 Key: THRIFT-1630
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1630
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Java - Compiler
>            Reporter: Chris Mullins
>            Assignee: Bryan Duxbury
>
> There's a subtle issue with trying to compare the serialized bytes of Thrift 
> objects that contain maps or sets in Java. Even though the objects that go 
> into sets (or serve as map keys) have consistent hashcodes, if they are 
> inserted in different order, then the iteration order of the collection will 
> also be different. Since serialization occurs in iteration order, this can 
> lead to objects that are .equals() when in-memory being not-equals when 
> serialized.
> In most cases this isn't an issue. However, in cases where the user is doing 
> raw comparison (ie, Hadoop), then it is a big issue.
> One solution is to just switch the internal Map implementation to the Sorted 
> version (TreeSet/TreeMap). However, these implementations are about 3x slower 
> than their Hash counterparts, and I can certainly foresee situations in which 
> that would upset a lot of users. I propose we add a compiler switch that 
> toggles the Map/Set implementation between sorted and unsorted so that users 
> can select which they prefer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to