HashSet toArray() methods

Stuart Marks Wed, 05 Jun 2019 16:49:24 -0700



On 6/5/19 7:27 AM, Claes Redestad wrote:

On 2019-06-05 15:49, Tagir Valeev wrote:

In particular it's never mentioned in HashSet, HashMap or Collection spec thattoArray implements a fail-fast behavior: this is said only about theiterator() method.


that's not entirely true, since the @implSpec for toArray on
AbstractCollection states it is equivalent to iterating over the
collection[1].

So since the implementations you're changing inherit their current
implementation from AbstractCollection and the iterators of HashMap is
specified to be fail-fast, then that behavior can be argued to be
specified also for the affected toArray methods.

I'm not fundamentally objecting to the behavior change, but I do think
it needs careful review and a CSR (or at least plenty of reviewers
agreeing that one isn't needed).


OK, let's slow down here a bit.

The fail-fast, CME-throwing behavior is primarily of interest when iteration isexternal, that is, it's driven by the application making a succession of callson the Iterator. The main concern arises when the application makes othermodifications to the collection being iterated, between calls to the Iterator.

There is always the possibility of modification by some other thread, and thisis mentioned obliquely in the spec near "fail-fast", but given memory visibilityissues it's pretty much impossible to make any concrete statements about whathappens if modifications are made by another thread.

In the case of these methods, the iteration occurs entirely within the contextof a single method call. There's no possibility of the application making aconcurrent modification on the same thread. So, while the fail-fast behavior isbeing changed, strictly speaking, I consider it "de minimis", as it's prettydifficult for an application to observe this change in behavior.

Regarding the @implSpec, that applies only to the implementation inAbstractCollection, and @implSpec is not inherited. Again, it is a change inbehavior since this method is being changed from being inherited to beingoverridden, but it's not a specification issue.

In any case I don't think the concurrent modification behavior change is anissue to be concerned about.

**

Now, regarding visible changes in behavior, it's quite easy to observe thischange in behavior, at least from a subclass of HashSet. A HashSet subclasscould override the iterator() method and detect that it was called whentoArray() is called. With this change, toArray() is overridden, and soiterator() would no longer be called in this circumstance.

This kind of change is generally allowable, but we have had complaints about thepattern of self-calls changing from release to release. This isn't a reason NOTto make the change, but the fact is that it does make changes to the visiblebehavior, and this potentially does affect actual code. (Code that I'd considerpoorly written, but still.)

I talked this over with Joe Darcy (CSR chair) and we felt that it would beprudent to file a CSR a request to document the behavior change.


**

Some comments on the code.

Overall I think the changes are going in the right direction. It's amazing thatafter all this time, there are still cases in core collections that areinheriting the slow Abstract* implementations.

In HashMap, I think the factoring between keysToArray() and prepareArray() canbe improved. The prepareArray() method itself is sensible, in that it implementsto weird toArray(T[]) semantics of allocating a new array if the given one istoo short, and it puts a 'null' at the right place if the given array is toolong, and returns the possibly reallocated array.

The first thing that keysToArray() does is to call prepareArray() and use itsreturn value. What concerns me is that the no-arg toArray() method does this:


    return keysToArray(new Object[size]);

so the array is created with exactly the right size; yet the first thingkeysToArray() does is to call prepareArray(), which checks the size again anddetermines that nothing need be done.

It would be a small optimization to shift things around so that keysToArray()takes an array known to be "prepared" already and to remove its call toprepareArray(). Then, require the caller to call prepareArray() if necessary.Then we'd have this:


 953  public Object[] toArray() { return keysToArray(new Object[size]); }
 954  public <T> T[] toArray(T[] a) { return keysToArray(prepareArray(a)); }

I'm not terribly concerned about avoiding an extra method call. But other codein this file, for example like the following:


 930  Node<K,V>[] tab;
 931  int idx = 0;
 932  if (size > 0 && (tab = table) != null) {

is written with an embedded assignment to 'tab', probably in order to avoid auseless load of the 'table' field if the size is zero. (This style occurs inseveral places.) So, if this code is making this level of micro-optimizationsalready, let's not waste it by calling extra methods unnecessarily. (I don'tknow if this will impact the benchmarks though.)

Aside from performance, refactoring keysToArray/prepareArray this way makes moresense to me anyway.

Similar adjustments to the call sites within HashSet and LinkedHashMap wouldneed to be done.

I'd recommend adding doc comments to HashMap prepareArray() and keysToArray(),since they are either called from or overridden from outside this file. Itneedn't be a full-on spec comment, but enough to make it clear that other thingsare depending on these internal interfaces.


Thanks,

s'marks

Re: RFR JDK-8225339 Optimize HashMap.keySet()/HashMap.values()/HashSet toArray() methods

Reply via email to