While code reviewing, I saw that core/src/main/java/org/apache/accumulo/core/clientImpl/TabletLocator.java was using a WeakHashMap to deduplicate some strings.
This code can probably be removed in favor of one of the following two options: 1. Just explicitly use String.intern() - As of Java 7, there is no longer a separate, fixed-size PermGen space, so intern'd strings will be in the main heap, no longer constrained to a limited size pool. These strings are still subject to garbage collection. It is implemented as a HashMap internally (native implementation), with a default bucket size of more than 60K, plenty big enough for the interning that TabletLocator is doing... but this is configurable by the user with JVM flags if it's not. Interning will use less memory as WeakHashMap and similar performance, as long as the bucket size is big enough. 2. Just use -XX:+UseStringDeduplication JVM flag - as of Java 9, G1 is the new default Java garbage collector. This garbage collector has the option to automatically attempt to deduplicate all strings behind the scenes, by swapping out their underlying char arrays (so, it likely won't affect == equality because the String object references themselves won't change, unlike option 1). This is more passive than option 1, but would apply to the entire JVM. G1GC also implements some heuristics to prevent too much overhead. With both options, it's possible to output statistics. If I remove the WeakHashMap for the string deduplication in TabletLocator, does anybody have an opinion on which option I should replace it with? I'm leaning towards option 2 (adding it to assemble/conf/accumulo-env.sh as one of the default flags).
