[This affects GeoServer too, but I'd rather avoid a cross-list post.] Synopsis: - HashMap has platform-dependent iteration order. - Use LinkedHashMap to make iteration order consistent across platforms. - Same rule applies for HashSet: replace with LinkedHashSet.
HashMap uses key hashCodes to put entries into bins. Because hashCodes are typically based on the underlying machine pointer address, they differ across platform and configuration (Unix/Windows, 32-bit/64-bit, JVM implementation, GC options, ...). Iteration over a HashMap is performed by walking the bins in order, which relies on hashCodes, and thus varies across platforms. In a nutshell: HashMap iteration order == chaos (across platforms). It is my experience that many developers (including myself) infer software behaviour by observation, as a workaround for incomplete documentation, lack of explicit design-by-contract, and sheer software complexity. One example of this, and I am as much as fault as anyone, are unit tests that assume the behaviour of software on my platform is the correct one. The results are unjustified assumptions encoded into unit tests, which then fail on other platforms. Platform iteration chaos is also exposed to end-users. For example, have a look at GeoServer WFS namespace order, which varies across platforms. There are many other examples in the code base. I propose that wherever we might be tempted to use a HashMap in a manner that will expose its iteration order, we instead use a LinkedHashMap. Furthermore, I also propose that we retrofit all existing uses of HashMap that expose iteration order with LinkedHashMap. A HashSet is just a thin wrapper around a HashMap with null values; all the behaviour is determined by the keys. As a consequence, a HashSet suffers from the same iteration order problems as a HashMap. The solution is similar: LinkedHashSet. The main performance difference between Hash{Map,Set} and LinkedHash{Map,Set} is that the latter has slower insertions but faster iteration: http://www.artima.com/weblogs/viewpost.jsp?thread=122295 No doubt those with experience in high-performance computing will be able to advise us on the potential impacts of migrating to LinkedHash{Map,Set}. (Yes, Andrea, this is your cue.) References: http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashMap.html *** begin quote *** Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map. (A key k is reinserted into a map m if m.put(k, v) is invoked when m.containsKey(k) would return true immediately prior to the invocation.) This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashMap (and Hashtable), without incurring the increased cost associated with TreeMap. *** end quote *** See also: http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedHashSet.html Kind regards, -- Ben Caradoc-Davies <ben.caradoc-dav...@csiro.au> Software Engineering Team Leader CSIRO Earth Science and Resource Engineering Australian Resources Research Centre ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ Geotools-devel mailing list Geotools-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel