On Wed, 7 Feb 2024 20:50:57 GMT, Stuart Marks <sma...@openjdk.org> wrote:
>> Joshua Cao has updated the pull request incrementally with one additional >> commit since the last revision: >> >> extract msize variable > > I think we should step back from benchmarks a bit and examine the various > tradeoffs occurring here. Certainly we can speed things up by pre-resizing > the map to be larger. However, this can result in a map that is too large for > the number of mappings it contains, in the case where this map and the > argument map have keys in common. In other words, it might waste space. We > really have little idea of how frequently this occurs. It's easy to imagine > scenarios where there is no commonality (which this change will speed up) and > also where there is 100% commonality (where this change will result in wasted > space). What's the right tradeoff? > > I've linked some older bugs to the bug report for some historical > perspective. In particular, see [this > comment](https://bugs.openjdk.org/browse/JDK-4710319?focusedId=12240692&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-12240692) > from Josh Bloch on JDK-4710319: > >> The conservatism of the resizing heuristic for putAll is intentional. It can >> cause at most one extra resizing, and can result in substantial footprint >> savings. This too should be documented in the code. > > The comment he added to putAll() for this change is still visible > [here](https://github.com/openjdk/jdk/blob/e1b3c5b5ba5cfb8243d760e99887bbe1015a9d69/jdk/src/share/classes/java/util/HashMap.java#L1271), > but unfortunately it was removed by a later refactoring. The comment is: > > > /* > * Expand the map if the map if the number of mappings to be added > * is greater than or equal to threshold. This is conservative; the > * obvious condition is (m.size() + size) >= threshold, but this > * condition could result in a map with twice the appropriate > capacity, > * if the keys to be added overlap with the keys already in this map. > * By using the conservative calculation, we subject ourself > * to at most one extra resize. > */ > > > Note that this comment addresses fairly directly the essence of the change > being discussed here. I think it's still applicable; the current code in > putMapEntries() compares `m.size()` to `threshold` in deciding whether to > resize immediately. We're not constrained by a 20-year-old comment, though. > We can change the resizing policy if we have good reason to do so. > > However, I think the concern about space wastage is still relevant, even > after 20 years of increased memory capacity and decreased memory cost. Memory > is cheap but not free. Larger memory cons... @stuart-marks Thanks for reviewing. I think there is some history that is hard to find right now. I'd suggest the following: 1. We don't need to close this PR. We can rename it to "Add comments for HashMap::putAll conservative expanding" 2. Undo the aggressive expanding in ConcurrentHashMap::putAll in a separate PR @simonis what are your thoughts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1934554352