On Wed, 30 Mar 2022 00:12:59 GMT, Stuart Marks <sma...@openjdk.org> wrote:
>> XenoAmess has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - update jmh >> - refine javadoc; refine implement when expectedSize < 0 > > OK, finally got some time to look at this. Here's a rewrite of the spec > words, at least for HashMap::newHashMap. If this settles down, I'll write the > CSR for this and LHM and WHM. > > /** > * Creates a new, empty HashMap suitable for the expected number of > mappings. > * The returned map uses the default load factor of 0.75, and its initial > capacity is > * generally large enough so that the expected number of mappings can be > added > * without resizing the map. > * > * @param numMappings the expected number of mappings > * @param <K> the type of keys maintained by this map > * @param <V> the type of mapped values > * @return the newly created map > * @throws IllegalArgumentException if numMappings is negative > * @since 19 > */ > > The original wording was taken from CHM, which generally is a reasonable > thing to do. Unfortunately, it's wrong. :-) "Table size" isn't accurate; > HashMap uses "capacity" as the term for the number of buckets (== length of > the internal table array). "Size" means "number of mappings" so its use of > "table size" confuses these two concepts. The CHM wording also uses > "elements" which applies to linear collections; the things inside a map are > usually referred to as "mappings" or "entries". (I guess we should fix up CHM > at some point too.) > > While "expectedSize" isn't inaccurate, it's not tied to the main text, so > I've renamed it to numMappings. > > There are a couple other javadoc style rules operating here. The first > sentence is generally a sentence fragment that is short and descriptive, as > it will be pulled into a summary table. (It's often written as if it were a > sentence that begins "This method..." but those words are elided.) Details > are in the rest of the first paragraph. The text for `@param`, `@return`, and > `@throws` are generally also sentence fragments, and they have no initial > capital letter nor trailing period. > > -- > > On performance and benchmarking: this is a distraction from the main point of > this effort, which is to add an API that allows callers a correct and > convenient way to create a properly-sized HashMap. Any work spent on > optimizing performance is almost certainly wasted. > > First, the benchmark: it's entirely unclear what this is measuring. It's > performing the operation 2^31 times, but it sends the result to a black hole > so that the JIT doesn't eliminate the computation. One of the actual results > is 0.170 ops/sec. This includes both the operation and the black hole, so we > don't actually have any idea what that result represents. _Maybe_ it's > possible to infer some idea of relative performance of the different > operations by comparing the results. It's certainly plausible that the > integer code is faster than the float or double code. But the benchmark > doesn't tell us how long the actual computation takes. > > Second, how significant is the cost of the computation? I'll assert that it's > insignificant. The table length is computed once at HashMap creation time, > and it's amortized over the addition of all the initial entries and > subsequent queries and updates to the HashMap. Any of the computations > (whether integer or floating point) are a handful of nanoseconds. This will > be swamped by the first hashCode computation that causes a cache miss. > > Third: I'll stipulate that the integer computation is probably a few ns > faster than the floating-point computation. But the computation is entirely > non-obvious. To make up for it being non-obvious, there's a big comment that > explains it all. It's not worth doing something that increases performance by > an insignificant amount that also requires a big explanation. > > Finally, note that most of the callers are already doing a floating-point > computation to compute the desired capacity, and it doesn't seem to be a > problem. > > Sorry, you probably spent a bunch of time on this already, but trying to > optimize the performance here just isn't worthwhile. Let's please just stick > with our good old `(int) Math.ceil(numMappings / 0.75)`. > > -- > > There should be regression tests added for the three new methods. I haven't > thought much about this. It might be possible to reuse some of the > infrastructure in the WhiteBoxResizeTest we worked on previously. > > -- > > I think it would be good to include updates to some of the use sites in this > PR. It's often useful to put new APIs into practice. One usually learns > something from the effort. Even though this is a really simple API, looking > at use sites can illuminating, to see how the code reads. This might affect > the method name, for example. > > You don't need to update all of the use sites in the JDK, but it would be > good to choose a reasonable sample. Maybe the ones from a single package, or > a handful (like java.lang or java.util). Maybe include > Class::enumConstantDirectory. If this use site is updated, then maybe it will > allow us to get rid of that ConstantDirectoryOptimalCapacity test that we > problem-listed in the previous PR. @stuart-marks codes and javadocs done. usages and tests are on the way. ------------- PR: https://git.openjdk.java.net/jdk/pull/7928