On Wed, 22 Jun 2022 16:11:33 GMT, Roger Riggs <rri...@openjdk.org> wrote:
>> This PR improves the performance of deduplication done by >> ResourceBundleGenerator. >> >> The original implementation compared every pair of values, requiring O(n^2) >> time. The new implementation uses a HashMap to find duplicates, trading off >> some extra memory consumption for O(n) computational complexity. In practice >> the time to generate jdk.localedata on my Linux VM files dropped from 14 to >> 8 seconds. >> >> The resulting files (under build/support/gensrc/java.base and >> jdk.localedata) have different contents; map iteration order depends on the >> insertion order, and the insertion order of the new implementation is >> different from the original. >> The files generated before and after this change have the same size. > > make/jdk/src/classes/build/tools/cldrconverter/ResourceBundleGenerator.java > line 146: > >> 144: // generic reduction of duplicated values >> 145: Map<String, Object> newMap = new HashMap<>(map); >> 146: Map<BundleEntryValue, BundleEntryValue> dedup = new >> HashMap<>(map.size()); > > LinkedHashMap could be used to retain the iteration order. > Or TreeMap if some deterministic order was desirable. True. Which raises the question: do we need any arbitrary order? The original code used a hashmap too. It preserved the original order only when no duplicates were detected. ------------- PR: https://git.openjdk.org/jdk/pull/9243