On Tue, 27 Apr 2021 22:47:06 GMT, Coleen Phillimore <cole...@openjdk.org> wrote:
>> Please review this change to the String Deduplication facility. It is being >> changed to use OopStorage to hold weak references to relevant objects, >> rather than bespoke data structures requiring dedicated processing phases by >> supporting GCs. >> >> (The Shenandoah update was contributed by Zhengyu Gu.) >> >> This change significantly simplifies the interface between a given GC and >> the String Deduplication facility, which should make it much easier for >> other GCs to opt in. However, this change does not alter the set of GCs >> providing support; currently only G1 and Shenandoah support string >> deduplication. Adding support by other GCs will be in followup RFEs. >> >> Reviewing via the diffs might not be all that useful for some parts, as >> several files have been essentially completely replaced, and a number of >> files have been added or eliminated. The full webrev might be a better >> place to look. >> >> The major changes are in gc/shared/stringdedup/* and in the supporting >> collectors, but there are also some smaller changes in other places, most >> notably classfile/{stringTable,javaClasses}. >> >> This change is additionally labeled for review by core-libs, although there >> are no source changes there. This change injects a byte field of bits into >> java.lang.String, using one of the previously unused padding bytes in that >> class. (There were two unused bytes, now only one.) >> >> Testing: >> mach5 tier1-2 with and without -XX:+UseStringDeduplication >> >> Locally (linux-x64) ran all of the existing tests that use string >> deduplication with both G1 and Shenandoah. Note that >> TestStringDeduplicationInterned.java is disabled for shenandoah, as it >> currently fails, for reasons I haven't figured out but suspect are test >> related. >> >> - gc/stringdedup/ -- these used to be in gc/g1 >> - runtime/cds/SharedStringsDedup.java >> - runtime/cds/appcds/cacheObject/DifferentHeapSizes.java >> - runtime/cds/appcds/sharedStrings/* >> >> shenandoah-only: >> - gc/shenandoah/jvmti/TestHeapDump.java >> - gc/shenandoah/TestStringDedup.java >> - gc/shenandoah/TestStringDedupStress.java >> >> Performance tested baseline, baseline + stringdedup, new with stringdedup, >> finding no significant differences. > > src/hotspot/share/classfile/stringTable.cpp line 784: > >> 782: SharedStringIterator iter(f); >> 783: _shared_table.iterate(&iter); >> 784: } > > So with this change (somewhere below) do you no longer deduplicate strings > from the shared string table? ie > > // The CDS archive does not include the string deduplication table. Only the > string > // table is saved in the archive. The shared strings from CDS archive need to > be > // added to the string deduplication table before deduplication occurs. That > is > // done in the beginning of the StringDedupThread (see > StringDedupThread::do_deduplication()). > void StringDedupThread::deduplicate_shared_strings(StringDedupStat* stat) { > StringDedupSharedClosure sharedStringDedup(stat); > StringTable::shared_oops_do(&sharedStringDedup); > } > Asking @iklam to have a look then. If I understand correctly, we no longer need to add the entire set of shared strings into the dedup table. +// As a space optimization, when shared StringTable entries exist the shared +// part of the StringTable is also used as a source for byte arrays. This +// permits deduplication of strings against those shared entries without +// recording them in this table too. +class StringDedup::Table : AllStatic { ... +void StringDedup::Table::deduplicate(oop java_string) { + assert(java_lang_String::is_instance(java_string), "precondition"); + _cur_stat.inc_inspected(); + if ((StringTable::shared_entry_count() > 0) && + try_deduplicate_shared(java_string)) { + return; // Done if deduplicated against shared StringTable. ------------- PR: https://git.openjdk.java.net/jdk/pull/3662