On Wed, 7 Feb 2024 20:50:57 GMT, Stuart Marks <sma...@openjdk.org> wrote:

>> Joshua Cao has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   extract msize variable
>
> I think we should step back from benchmarks a bit and examine the various 
> tradeoffs occurring here. Certainly we can speed things up by pre-resizing 
> the map to be larger. However, this can result in a map that is too large for 
> the number of mappings it contains, in the case where this map and the 
> argument map have keys in common. In other words, it might waste space. We 
> really have little idea of how frequently this occurs. It's easy to imagine 
> scenarios where there is no commonality (which this change will speed up) and 
> also where there is 100% commonality (where this change will result in wasted 
> space). What's the right tradeoff?
> 
> I've linked some older bugs to the bug report for some historical 
> perspective. In particular, see [this 
> comment](https://bugs.openjdk.org/browse/JDK-4710319?focusedId=12240692&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-12240692)
>  from Josh Bloch on JDK-4710319:
> 
>> The conservatism of the resizing heuristic for putAll is intentional. It can 
>> cause at most one extra resizing, and can result in substantial footprint 
>> savings. This too should be documented in the code.
> 
> The comment he added to putAll() for this change is still visible 
> [here](https://github.com/openjdk/jdk/blob/e1b3c5b5ba5cfb8243d760e99887bbe1015a9d69/jdk/src/share/classes/java/util/HashMap.java#L1271),
>  but unfortunately it was removed by a later refactoring. The comment is:
> 
> 
>         /*
>          * Expand the map if the map if the number of mappings to be added
>          * is greater than or equal to threshold.  This is conservative; the
>          * obvious condition is (m.size() + size) >= threshold, but this
>          * condition could result in a map with twice the appropriate 
> capacity,
>          * if the keys to be added overlap with the keys already in this map.
>          * By using the conservative calculation, we subject ourself
>          * to at most one extra resize.
>          */
> 
> 
> Note that this comment addresses fairly directly the essence of the change 
> being discussed here. I think it's still applicable; the current code in 
> putMapEntries() compares `m.size()` to `threshold` in deciding whether to 
> resize immediately. We're not constrained by a 20-year-old comment, though. 
> We can change the resizing policy if we have good reason to do so. 
> 
> However, I think the concern about space wastage is still relevant, even 
> after 20 years of increased memory capacity and decreased memory cost. Memory 
> is cheap but not free. Larger memory cons...

@stuart-marks Thanks for reviewing. I think there is some history that is hard 
to find right now. I'd suggest the following:

1. We don't need to close this PR. We can rename it to "Add comments for 
HashMap::putAll conservative expanding"
2. Undo the aggressive expanding in ConcurrentHashMap::putAll in a separate PR

@simonis what are your thoughts?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1934554352

Reply via email to