[This affects GeoServer too, but I'd rather avoid a cross-list post.]

Synopsis:
- HashMap has platform-dependent iteration order.
- Use LinkedHashMap to make iteration order consistent across platforms.
- Same rule applies for HashSet: replace with LinkedHashSet.

HashMap uses key hashCodes to put entries into bins. Because hashCodes 
are typically based on the underlying machine pointer address, they 
differ across platform and configuration (Unix/Windows, 32-bit/64-bit, 
JVM implementation, GC options, ...). Iteration over a HashMap is 
performed by walking the bins in order, which relies on hashCodes, and 
thus varies across platforms.

In a nutshell: HashMap iteration order == chaos (across platforms).

It is my experience that many developers (including myself) infer 
software behaviour by observation, as a workaround for incomplete 
documentation, lack of explicit design-by-contract, and sheer software 
complexity. One example of this, and I am as much as fault as anyone, 
are unit tests that assume the behaviour of software on my platform is 
the correct one. The results are unjustified assumptions encoded into 
unit tests, which then fail on other platforms.

Platform iteration chaos is also exposed to end-users. For example, have 
a look at GeoServer WFS namespace order, which varies across platforms. 
There are many other examples in the code base.

I propose that wherever we might be tempted to use a HashMap in a manner 
that will expose its iteration order, we instead use a LinkedHashMap. 
Furthermore, I also propose that we retrofit all existing uses of 
HashMap that expose iteration order with LinkedHashMap.

A HashSet is just a thin wrapper around a HashMap with null values; all 
the behaviour is determined by the keys. As a consequence, a HashSet 
suffers from the same iteration order problems as a HashMap. The 
solution is similar: LinkedHashSet.

The main performance difference between Hash{Map,Set} and 
LinkedHash{Map,Set} is that the latter has slower insertions but faster 
iteration:
http://www.artima.com/weblogs/viewpost.jsp?thread=122295

No doubt those with experience in high-performance computing will be 
able to advise us on the potential impacts of migrating to 
LinkedHash{Map,Set}. (Yes, Andrea, this is your cue.)

References:

http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashMap.html

*** begin quote ***

Hash table and linked list implementation of the Map interface, with 
predictable iteration order. This implementation differs from HashMap in 
that it maintains a doubly-linked list running through all of its 
entries. This linked list defines the iteration ordering, which is 
normally the order in which keys were inserted into the map 
(insertion-order). Note that insertion order is not affected if a key is 
re-inserted into the map. (A key k is reinserted into a map m if 
m.put(k, v) is invoked when m.containsKey(k) would return true 
immediately prior to the invocation.)

This implementation spares its clients from the unspecified, generally 
chaotic ordering provided by HashMap (and Hashtable), without incurring 
the increased cost associated with TreeMap.

*** end quote ***

See also:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashSet.html

Kind regards,

-- 
Ben Caradoc-Davies <ben.caradoc-dav...@csiro.au>
Software Engineering Team Leader
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Geotools-devel mailing list
Geotools-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to