Re: RFR: 8186958: Need method to create pre-sized HashMap [v7]

XenoAmess Wed, 30 Mar 2022 16:21:56 -0700

On Wed, 30 Mar 2022 00:12:59 GMT, Stuart Marks <sma...@openjdk.org> wrote:


>> XenoAmess has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - update jmh
>>  - refine javadoc; refine implement when expectedSize < 0
>
> OK, finally got some time to look at this. Here's a rewrite of the spec 
> words, at least for HashMap::newHashMap. If this settles down, I'll write the 
> CSR for this and LHM and WHM.
> 
>     /**
>      * Creates a new, empty HashMap suitable for the expected number of 
> mappings.
>      * The returned map uses the default load factor of 0.75, and its initial 
> capacity is
>      * generally large enough so that the expected number of mappings can be 
> added
>      * without resizing the map.
>      *
>      * @param numMappings the expected number of mappings
>      * @param <K>         the type of keys maintained by this map
>      * @param <V>         the type of mapped values
>      * @return the newly created map
>      * @throws IllegalArgumentException if numMappings is negative
>      * @since 19
>      */
> 
> The original wording was taken from CHM, which generally is a reasonable 
> thing to do. Unfortunately, it's wrong. :-) "Table size" isn't accurate; 
> HashMap uses "capacity" as the term for the number of buckets (== length of 
> the internal table array). "Size" means "number of mappings" so its use of 
> "table size" confuses these two concepts. The CHM wording also uses 
> "elements" which applies to linear collections; the things inside a map are 
> usually referred to as "mappings" or "entries". (I guess we should fix up CHM 
> at some point too.)
> 
> While "expectedSize" isn't inaccurate, it's not tied to the main text, so 
> I've renamed it to numMappings.
> 
> There are a couple other javadoc style rules operating here. The first 
> sentence is generally a sentence fragment that is short and descriptive, as 
> it will be pulled into a summary table. (It's often written as if it were a 
> sentence that begins "This method..." but those words are elided.) Details 
> are in the rest of the first paragraph. The text for `@param`, `@return`, and 
> `@throws` are generally also sentence fragments, and they have no initial 
> capital letter nor trailing period.
> 
> --
> 
> On performance and benchmarking: this is a distraction from the main point of 
> this effort, which is to add an API that allows callers a correct and 
> convenient way to create a properly-sized HashMap. Any work spent on 
> optimizing performance is almost certainly wasted.
> 
> First, the benchmark: it's entirely unclear what this is measuring. It's 
> performing the operation 2^31 times, but it sends the result to a black hole 
> so that the JIT doesn't eliminate the computation. One of the actual results 
> is 0.170 ops/sec. This includes both the operation and the black hole, so we 
> don't actually have any idea what that result represents. _Maybe_ it's 
> possible to infer some idea of relative performance of the different 
> operations by comparing the results. It's certainly plausible that the 
> integer code is faster than the float or double code. But the benchmark 
> doesn't tell us how long the actual computation takes.
> 
> Second, how significant is the cost of the computation? I'll assert that it's 
> insignificant. The table length is computed once at HashMap creation time, 
> and it's amortized over the addition of all the initial entries and 
> subsequent queries and updates to the HashMap. Any of the computations 
> (whether integer or floating point) are a handful of nanoseconds. This will 
> be swamped by the first hashCode computation that causes a cache miss.
> 
> Third: I'll stipulate that the integer computation is probably a few ns 
> faster than the floating-point computation. But the computation is entirely 
> non-obvious. To make up for it being non-obvious, there's a big comment that 
> explains it all. It's not worth doing something that increases performance by 
> an insignificant amount that also requires a big explanation.
> 
> Finally, note that most of the callers are already doing a floating-point 
> computation to compute the desired capacity, and it doesn't seem to be a 
> problem.
> 
> Sorry, you probably spent a bunch of time on this already, but trying to 
> optimize the performance here just isn't worthwhile. Let's please just stick 
> with our good old `(int) Math.ceil(numMappings / 0.75)`.
> 
> --
> 
> There should be regression tests added for the three new methods. I haven't 
> thought much about this. It might be possible to reuse some of the 
> infrastructure in the WhiteBoxResizeTest we worked on previously.
> 
> --
> 
> I think it would be good to include updates to some of the use sites in this 
> PR. It's often useful to put new APIs into practice. One usually learns 
> something from the effort. Even though this is a really simple API, looking 
> at use sites can illuminating, to see how the code reads. This might affect 
> the method name, for example.
> 
> You don't need to update all of the use sites in the JDK, but it would be 
> good to choose a reasonable sample. Maybe the ones from a single package, or 
> a handful (like java.lang or java.util). Maybe include 
> Class::enumConstantDirectory. If this use site is updated, then maybe it will 
> allow us to get rid of that ConstantDirectoryOptimalCapacity test that we 
> problem-listed in the previous PR.

@stuart-marks codes and javadocs done.
usages and tests are on the way.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7928

Re: RFR: 8186958: Need method to create pre-sized HashMap [v7]

Reply via email to