[This affects GeoServer too, but I'd rather avoid a cross-list post.]
Synopsis:
- HashMap has platform-dependent iteration order.
- Use LinkedHashMap to make iteration order consistent across platforms.
- Same rule applies for HashSet: replace with LinkedHashSet.
HashMap uses key hashCodes to put entries into bins. Because hashCodes
are typically based on the underlying machine pointer address, they
differ across platform and configuration (Unix/Windows, 32-bit/64-bit,
JVM implementation, GC options, ...). Iteration over a HashMap is
performed by walking the bins in order, which relies on hashCodes, and
thus varies across platforms.
In a nutshell: HashMap iteration order == chaos (across platforms).
It is my experience that many developers (including myself) infer
software behaviour by observation, as a workaround for incomplete
documentation, lack of explicit design-by-contract, and sheer software
complexity. One example of this, and I am as much as fault as anyone,
are unit tests that assume the behaviour of software on my platform is
the correct one. The results are unjustified assumptions encoded into
unit tests, which then fail on other platforms.
Platform iteration chaos is also exposed to end-users. For example, have
a look at GeoServer WFS namespace order, which varies across platforms.
There are many other examples in the code base.
I propose that wherever we might be tempted to use a HashMap in a manner
that will expose its iteration order, we instead use a LinkedHashMap.
Furthermore, I also propose that we retrofit all existing uses of
HashMap that expose iteration order with LinkedHashMap.
A HashSet is just a thin wrapper around a HashMap with null values; all
the behaviour is determined by the keys. As a consequence, a HashSet
suffers from the same iteration order problems as a HashMap. The
solution is similar: LinkedHashSet.
The main performance difference between Hash{Map,Set} and
LinkedHash{Map,Set} is that the latter has slower insertions but faster
iteration:
http://www.artima.com/weblogs/viewpost.jsp?thread=122295
No doubt those with experience in high-performance computing will be
able to advise us on the potential impacts of migrating to
LinkedHash{Map,Set}. (Yes, Andrea, this is your cue.)
References:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashMap.html
*** begin quote ***
Hash table and linked list implementation of the Map interface, with
predictable iteration order. This implementation differs from HashMap in
that it maintains a doubly-linked list running through all of its
entries. This linked list defines the iteration ordering, which is
normally the order in which keys were inserted into the map
(insertion-order). Note that insertion order is not affected if a key is
re-inserted into the map. (A key k is reinserted into a map m if
m.put(k, v) is invoked when m.containsKey(k) would return true
immediately prior to the invocation.)
This implementation spares its clients from the unspecified, generally
chaotic ordering provided by HashMap (and Hashtable), without incurring
the increased cost associated with TreeMap.
*** end quote ***
See also:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashSet.html
Kind regards,
--
Ben Caradoc-Davies <[email protected]>
Software Engineering Team Leader
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel